scylladb

Author	SHA1	Message	Date
Vlad Zolotarov	4b28ea216d	utils::loading_cache: cancel the timer after closing the gate The timer is armed inside the section guarded by the _timer_reads_gate therefore it has to be canceled after the gate is closed. Otherwise we may end up with the armed timer after stop() method has returned a ready future. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1501603059-32515-1-git-send-email-vladz@scylladb.com>	2017-08-01 17:21:44 +01:00
Avi Kivity	3fe6731436	Merge "educe the effect of the latency metrics" from Amnon "This series reduce that effect in two ways: 1. Remove the latency counters from the system keyspaces 2. Reduce the histogram size by limiting the maximum number of buckets and stop the last bucket." Fixes #2650. * 'amnon/remove_cf_latency_v2' of github.com:cloudius-systems/seastar-dev: database: remove latency from the system table estimated histogram: return a smaller histogram	2017-07-31 15:58:30 +03:00
Duarte Nunes	4e3232fc29	utils/log_histogram: Fix typo when calculating number of buckets We weren't correctly calculating the number of buckets due to returning the wrong variable. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170731094733.7746-1-duarte@scylladb.com>	2017-07-31 12:49:11 +03:00
Avi Kivity	85056f3611	log_histogram: fix constexpr-ness of log_histogram_options 1. assert() is not constexpr. 2. can't use static_assert(), because the contructor may be called in a non-constexpr environment; moved to log_histogram 3. pow2_rank() uses count_leading_zeros() which is not constexpr; split into constexpr and non-constexpr versions 4. duplicated number_of_buckets() because bucket_of() can't be constexpr due to pow2_rank Message-Id: <20170726105444.32698-1-avi@scylladb.com>	2017-07-31 09:11:40 +01:00
Paweł Dziepak	e62403190b	Merge "Introduce perf_cache_eviction test" from Tomasz Runs appending writes to a single partition, at full speed, and a reader which selects the head of the partition, with 100ms delay between reads. Prints latency percentiles and some stats. Intended to test performance at the transition from non-evicting to evicting modes. Currently we can see that after the transition, whole partition gets evicted and reads constantly miss. Sample output: rd/s: 10, wr/s: 135947, ev/s: 0, pmerge/s: 1, miss/s: 0, cache: 708/778 [MB], LSA: 820/910 [MB], std free: 82 [MB] reads : min: 149 , 50%: 179 , 90%: 1331 , 99%: 1331 , 99.9%: 1331 , max: 6866 [us] writes: min: 3 , 50%: 4 , 90%: 4 , 99%: 5 , 99.9%: 258 , max: 51012 [us] rd/s: 7, wr/s: 93354, ev/s: 9, pmerge/s: 1, miss/s: 3, cache: 0/0 [MB], LSA: 107/128 [MB], std free: 82 [MB] reads : min: 179 , 50%: 179 , 90%: 73457 , 99%: 73457 , 99.9%: 73457 , max: 105778 [us] writes: min: 3 , 50%: 4 , 90%: 4 , 99%: 5 , 99.9%: 258 , max: 105778 [us] * tag 'tgrabiec/row-eviction-perf-test' of github.com:scylladb/seastar-dev: tests: Introduce perf_cache_eviction tests: simple_schema: Add getter for DDL statement estimated_histogram: Implement percentile() utils: estimated_histogram: Make printable	2017-07-28 09:49:22 +01:00
Tomasz Grabiec	6a3703944b	utils: Introduce serialized_action	2017-07-27 20:08:21 +02:00
Tomasz Grabiec	5602be72fa	estimated_histogram: Implement percentile()	2017-07-27 17:19:07 +02:00
Tomasz Grabiec	1bc305ed7b	utils: estimated_histogram: Make printable	2017-07-27 17:19:03 +02:00
Amnon Heiman	1b05f23d12	estimated histogram: return a smaller histogram The current histogram contains 91 buckets, this is a very high resolution with a high upper limit. To reduce traffic passed, between scylla and the prometheus, this patch generate a smaller histogram. It limit the number of buckets (16 by default), set a lower limit to the lowest bucket, and uses 2 as the bucket coeficient. Highest empty buckets will not be reported. Signed-off-by: Amnon Heiman <amnon@scylladb.com> estimated histogram	2017-07-27 11:41:10 +03:00
Vlad Zolotarov	9adabd1bc4	utils::loading_cache: add stop() method loading_cache invokes a timer that may issue asynchronous operations (queries) that would end with writing into the internal fields. We have to ensure that these operations are over before we can destroy the loading_cache object. Fixes #2624 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1501096256-10949-1-git-send-email-vladz@scylladb.com>	2017-07-26 21:28:49 +02:00
Vlad Zolotarov	76ea74f3fd	utils::loading_cache: arm the timer with a period equal to min(_expire, _update) Arm the timer with a period that is not greater than either the permissions_validity_in_ms or the permissions_update_interval_in_ms in order to ensure that we are not stuck with the values older than permissions_validity_in_ms. Fixes #2590 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-07-13 10:48:59 -04:00
Vlad Zolotarov	121e3c7b8f	utils::loading_cache: make a timer use a loading_cache_clock_type clock as a source Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-07-13 10:42:12 -04:00
Botond Dénes	b1082641f9	Make sure keyspace strategy class is stored in qualified form Even when it's provided in unqualified (short) form. Fixes #767 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4379f8864843e64c097d432fd06129ce4025f100.1499322476.git.bdenes@scylladb.com>	2017-07-06 14:50:00 +03:00
Duarte Nunes	d157e4558a	utils/log_histogram: Remove largest() function It should never have existed in the first place, as there are no legitimate callers and it can be misused. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170630095939.2429-1-duarte@scylladb.com>	2017-07-02 14:29:17 +03:00
Avi Kivity	fc966c0c4c	Merge "tombstone removal compaction" from Raphael "This feature is intended to make compaction more efficient at getting rid of droppable tombstone and expired data wasting disk space. So far, people have been dealing with it manually through major compaction. With strategies other than date tiered, large sstables will be left untouched for a long time even though it's all expired. Date tiered suffers from it when mixing data with different TTL because it only includes for compaction sstable that is fully expired. sstables keeps as metadata a histogram which allows us to easily estimate droppable data ratio from gc_before. sstables which droppable data ratio is above 20% (default value for tombstone_threshold option) will be considered candidates for the operation. Like in C, we will only do tombstone removal compaction when there's nothing to compact in standard way. It would be interesting to trigger it too when disk usage is above a given threshold, but I decided to leave this for later. Fixes #2306." 'tombstone_removal_compaction_v4' of github.com:raphaelsc/scylla: tests: more testing for tombstone compaction options tests: basic tombstone compaction test for date tiered compaction/dtcs: add support for tombstone compaction tests: basic test of tombstone compaction with lcs compaction/lcs: add support for tombstone compaction tests: basic tombstone compaction test for size tiered compaction/stcs: add support for tombstone compaction tests: add test for estimation of droppable tombstone ratio sstables: introduce function to estimate droppable tombstone ratio compaction_manager: periodically submit cfs for compaction streaming_histogram: fix coding style tests: add streaming_histogram_test streaming_histogram: implement sum tests: add test for sstable with bad tombstone histogram sstables: discard bad streaming histogram for future use tests: add sstable tombstone histogram test streaming_histogram: fix update streaming_histogram: move it to utils streaming_histogram: do not limit it to be used by sstables sstables: update tombstone_histogram for cells with expiration time	2017-06-29 10:19:59 +03:00
Raphael S. Carvalho	719dbf547d	streaming_histogram: fix coding style Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	a65b9eb8b4	streaming_histogram: implement sum This function is used to estimate number of points in interval [-inf,b]. It will be useful for estimating droppable tombstone ratio in a given sstable. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 02:08:12 -03:00
Raphael S. Carvalho	f35bd66da4	streaming_histogram: fix update This bug was introduced when converting java code. Return value of map::erase() was used as if it were the value of the removed entry, but it's actually the number of removed entries. update() also relies on ordered keys, so map is used instead by histogram. In addition, histograms will be written in sorted order (like C* does) such that we can detect bad histograms, using disk_array. disk_array is also used from now on to read histograms. The conversion from array to map is fine because histograms for sstables are limited to 100 elements. Coming patch will detect bad histograms (generated only by us) and discard them, because we can't rely on their information. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-29 01:17:26 -03:00
Tomasz Grabiec	3489c68a68	lsa: Fix performance regression in eviction and compact_on_idle Region comparator, used by the two, calls region_impl::min_occupancy(), which calls log_histogram::largest(). The latter is O(N) in terms of the number of segments, and is supposed to be used only in tests. We should call one_of_largest() instead, which is O(1). This caused compact_on_idle() to take more CPU as the number of segments grew (even when there was nothing to compact). Eviction would see the same kind of slow down as well. Introduced in `11b5076b3c`. Message-Id: <1498641973-20054-1-git-send-email-tgrabiec@scylladb.com>	2017-06-28 12:32:43 +03:00
Raphael S. Carvalho	d90f46000d	streaming_histogram: move it to utils It's not specific to sstables. May be needed somewhere else in the future. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2017-06-28 01:07:13 -03:00
Tomasz Grabiec	4b4aef789e	utils: Add helpers for dealing with nonwrapping_range<int>	2017-06-24 18:06:11 +02:00
Vlad Zolotarov	1ae40ee91a	utils::timestamped_val: fix the touch() comment The current comment has been written when the function has not been a timestamped_val member. Let's adjust it to the current code. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495555659-10881-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:56 +03:00
Vlad Zolotarov	0619c2cb71	utils::serialization: remove not used deserialization_xxx() functions Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495556124-16672-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:20 +03:00
Paweł Dziepak	3b9c0a6ae2	Merge "loading_cache: fix the known complexity issue in the shrink() method" from Vlad Use the boost::intrusive containers in order to achieve a O(1) complexity for both "LRU list" update and to minimize the memory overhead in the hash table item to "LRU list" item connection: - Make the timestamped_val be both a bi::list and a bi::unordered_set item. - Make a bi::unordered_set be a cache backend instead of the std::unordered_map. As a result dropping k LRU items becomes an O(k) operation instead of O(N log N), where N is a total number of all cached items: - Every time a value is read - move it to the front of the "LRU list" (O(1)). - When we need to remove k LRU items: - Repeat k times: - Take an element from the back of the "LRU list". (O(1)). - Remove it from the bi::unordered_set and dispose. (O(1)). We use an auto-unlink configuration for bi::list, therefore disposing an item is going to auto unlink it from the list. * 'permissions_cache_move_to_intrusive-v1' of github.com:scylladb/seastar-dev: utils::loading_cache: cleanup utils/loading_cache.hh: use intrusive list to store the lru entry utils::loading_cache: implement automatic rehashing utils::loading_cache: make the underlying map to be an intrusive unordered_set	2017-05-23 16:18:16 +01:00
Avi Kivity	fd0e1eb1e2	Merge "Fixes for mutation algebra" from Tomasz "Enforces commutativity of addition: m1 + m2 == m2 + m1 and consistency of difference and addition with equality: m1 + (m2 - m1) == m1 + m2" * tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev: mutation: Make compare_*_for_merge() consistent with equals() tests: mutation: Improve assertion failure message tests: Use default equality in test_mutation_diff_with_random_generator mutation: Make counter cell difference consistent with apply tests: range_tombstone_list_test: Improve error message tests: range_tombstone_list: Check adjacent range merging range_tombstone_list: Merge adjacent range tombstones in apply() tests: mutation: Check commutativity of mutation addition range_tombstone_list: Avoid violating set invariant range_tombstone_list: Make tombstone merging commutative range_tombstone_list: Add erase() operation to the reverter range_tombstone_list: Make all undo operations ordered relative to each other utils: Extract to_boost_visitor() to a separate header allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-23 15:20:38 +03:00
Vlad Zolotarov	2d4d198fb9	utils::loading_cache: cleanup - Remove "_" at the beginning of the type names. - s/Pred/EqualPred/ Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:02:18 -04:00
Vlad Zolotarov	fd59a548c0	utils/loading_cache.hh: use intrusive list to store the lru entry Fix the shrink() O(n log n) complexity issue by constantly pushing the corresponding intrusive list entry to the head of the list every time the values are read. This will keep the list ordered by the last read time from the most recently read to the least recently read entry. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:00:18 -04:00
Vlad Zolotarov	0c4e9efce7	utils::loading_cache: implement automatic rehashing - Start the cache with 256 buckets - the minimum number of buckets. - Limit the maximal number of buckets by 1M buckets. - Keep the load factor between 0.25 and 1.0 as long as the number of buckets is between the minimum and the maximum values mentioned above. - Grow and shrink the hash every "refresh" period if needed. - Enable bi::power_2_buckets and bi::compare_hash bi::unordered_set options. - Enable bi::unordered_set_base_hook's bi::store_hash option. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 22:57:44 -04:00
Vlad Zolotarov	2be3596a4f	utils::loading_cache: make the underlying map to be an intrusive unordered_set Make the underlying map to be a boost::intrusive::unordered_set<timestamped_val> instead of std::unordered_set<Key, timestamped_val>. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 18:45:13 -04:00
Tomasz Grabiec	5aeb9eb70c	utils: Extract to_boost_visitor() to a separate header	2017-05-22 19:30:02 +02:00
Tomasz Grabiec	69e2eccf68	allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-22 19:30:02 +02:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Tomasz Grabiec	cd4d15672b	utils: estimated_histogram: Fix clear() It was a no-op. It doesn't seem currently used, but I will have a use for it soon. Message-Id: <1495198172-1969-1-git-send-email-tgrabiec@scylladb.com>	2017-05-19 14:34:34 +01:00
Vlad Zolotarov	6a63c87a9f	utils::loading_cache: avoid the reads storm when the key is not in the cache Use a mutex to serialize producers when the key is not present in the cache. Fixes #2262 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-18 07:55:48 -04:00
Vlad Zolotarov	1ef22f84c1	utils::loading_cache: cleanup - Fix a callback signature: receive a const ref. - White spaces. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:14 -04:00
Vlad Zolotarov	87ce0b2d47	utils::loading_cache: align the constrains in the constructor with the parameters description According to description of permissions_validity_in_ms the permissions_cache is enabled if this value is set to a non-zero value. Otherwise the permissions_cache is disabled. According to the permissions_update_interval_in_ms description it must have a non-zero value if permissions_cache is enabled. permissions_cache_max_entries description doesn't explicitly state it but it makes no sense to allow it to be zero if permissions_cache is enabled. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:14 -04:00
Vlad Zolotarov	e286828472	utils::loading_cache: refresh in the background This patch changes the way a loading_cache works. Before this patch: 1) If a permissions key is not in the cache it's loaded in the foreground and the original query is blocked till the permissions are loaded. 2) Every _period the timer does the following: 1) If a value was loaded more than _expiry time ago it is removed from the cache. 2) If the cache is too big - the less recently loaded values are removed till the cache fits the requested size. After this patch: 1) If a permissions key is not in the cache it's loaded in the foreground and the original query is blocked till the permissions are loaded. 2) Every _period the timer does the following: 1) If a value in the cache was loaded or read for the last time more than _expiry time ago - it's removed from the cache. 2) If the cache is too big - the less recently read values are removed till the cache fits the requested size. 3) The values that were loaded more than _refresh time ago are re-read in the background. The new implementation allows to minimize the amount of the foreground reads for a frequently used value to a single event (when the value is loaded for the first time). It also ensures we do not reload values we no longer need. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:06 -04:00
Avi Kivity	1c6cecd9d0	utils: introduce div_ceil() Divides integrals but rounds up rather than down.	2017-05-17 12:30:03 +03:00
Vlad Zolotarov	494ea82a88	utils::UUID: align the UUID serialization API with the similar API of other classes in the project The standard serialization API (e.g. in data_value) includes the following methods: size_t serialized_size() const; void serialize(bytes::iterator& it) const; bytes serialize() const; Align the utils::UUID API with the pattern above. The only addition is that we are going to make an output iterator parameter of a second method above a template so that we may serialize into different output sources. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Vlad Zolotarov	7706775a63	utils: serialization: unify the variety of serialize_XXX(...) Use the same templated implementation for all different serialize_XXX(...). The chosen implementation is based on the std::copy_n(char*, size, OutputIterator), which is heavily optimized and will be using memcpy/memmove where possible. This patch also removes the not needed specializations that accept signed integer values since we were casting them to unsigned value anyway. The std::ostream based specifications are also removed since they are not used anywhere except for a test-serialization.cc and adjusting the ostream to the iterator is a single-liner. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Avi Kivity	7e29dd7066	managed_bytes: improve alignment hygene While blob_storage is marked as an unaligned type, the back references also point to an unaligned type (a pointer to blob_storage), since a back reference can live in a blob_storage. This triggers errors from zapcc/clang 4. Fix by creating a type for the reference, which is marked as unaligned. Message-Id: <20170502071404.507-1-avi@scylladb.com>	2017-05-02 10:04:13 +01:00
Avi Kivity	1d12d69881	logalloc: define segment_zone::maximum_size Yield build errors with some compilers, if missing.	2017-05-01 16:31:29 +03:00
Paweł Dziepak	f5cf86484e	lsa: introduce upper bound on zone size Attempting to create huge zones may introduce significant latency. This patch introduces the maximum allowed zone size so that the time spent trying to allocate and initialising zone is bounded. Fixes #2335. Message-Id: <20170428145916.28093-1-pdziepak@scylladb.com>	2017-04-30 10:58:11 +03:00
Duarte Nunes	d216c3dbd2	tombstone: Extract out relational operators This patch extracts out the relational operators in struct tombstone to a class capable of generating them from a tri-compare function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Pekka Enberg	940c3f4330	Merge "Clang fixes (part 2)" from Avi "This series fixes some more errors found by clang, with the aim of enabling clang/zapcc as a supported compiler. A single issue remains, but it's probably in std::experimental::optional::swap(); not in our code." * tag 'clang/2/v1' of https://github.com/avikivity/scylla: sstable_test: avoid passing negative non-type template arguments to unsigned parameters UUID: add more comparison operators sstable_datafile_test: avoid string_view user-defined literal conversion operator mutation_source_test: avoid template function without template keyword cql_query_test: define static variable cql_query_test: add braces for single-item collection initializers storage_service: don't use typeid(temporary) logalloc: remove unused max_occupancy_for_compaction storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic storage_proxy: drop unused member access from return value storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare read_repair_decision: fix operator<<(std::ostream&, ...)	2017-04-24 20:32:16 +03:00
Avi Kivity	6d9e18fd61	logalloc: reduce descriptor overhead Every lsa-allocated object is prefixed by a header that contains information needed to free or migrate it. This includes its size (for freeing) and an 8-byte migrator (for migrating). Together with some flags, the overhead is 14 bytes (16 bytes if the default alignment is used). This patch reduces the header size to 1 byte (8 bytes if the default alignment is used). It uses the following techniques: - ULEB128-like encoding (actually more like ULEB64) so a live object's header can typically be stored using 1 byte - indirection, so that migrators can be encoded in a small index pointing to a migrator table, rather than using an 8-byte pointer; this exploits the fact that only a small number of types are stored in LSA - moving the responsibility for determining an object's size to its migrator, rather than storing it in the header; this exploits the fact that the migrator stores type information, and object size is in fact information about the type The patch improves the results of memory_footprint_test as following: Before: - in cache: 976 - in memtable: 947 After: mutation footprint: - in cache: 880 - in memtable: 858 A reduction of about 10%. Further reductions are possible by reducing the alignment of lsa objects. logalloc_test was adjusted to free more objects, since with the lower footprint, rounding errors (to full segments) are different and caused false errors to be detected. Missing: adjustments to scylla-gdb.py; will be done after we agree on the new descriptor's format.	2017-04-24 12:23:12 +02:00
Avi Kivity	dc6ea51ffa	UUID: add more comparison operators Clang wanted them for some unit test; not sure how gcc was able to synthesize them, but they're clearly needed.	2017-04-22 22:12:33 +03:00
Avi Kivity	9303b09a64	logalloc: remove unused max_occupancy_for_compaction Noticed by clang.	2017-04-22 21:09:41 +03:00
Tomasz Grabiec	20f4c9bf23	lsa: Reduce reclamation latency Currently eviction is performed until occupancy of the whole region drops below the 85% threshold. This may take a while if region had high occupancy and is large. We could improve the situation by only evicting until occupancy of the sparsest segment drops below the threshold, as is done by this change. I tested this using a c-s read workload in which the condition triggers in the cache region, with 1G per shard: lsa-timing - Reclamation cycle took 12.934 us. lsa-timing - Reclamation cycle took 47.771 us. lsa-timing - Reclamation cycle took 125.946 us. lsa-timing - Reclamation cycle took 144356 us. lsa-timing - Reclamation cycle took 655.765 us. lsa-timing - Reclamation cycle took 693.418 us. lsa-timing - Reclamation cycle took 509.869 us. lsa-timing - Reclamation cycle took 1139.15 us. The 144ms pause is when large eviction is necessary. Statistics for reclamation pauses for a read workload over larger-than-memory data set: Before: avg = 865.796362 stdev = 10253.498038 min = 93.891000 max = 264078.000000 sum = 574022.988000 samples = 663 After: avg = 513.685650 stdev = 275.270157 min = 212.286000 max = 1089.670000 sum = 340573.586000 samples = 663 Refs #1634. Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>	2017-04-21 12:52:31 +02:00
Tomasz Grabiec	4313641c03	tests: Add test for log_histogram	2017-04-21 12:52:31 +02:00

1 2 3 4 5 ...

398 Commits