scylladb

Author	SHA1	Message	Date
Vlad Zolotarov	1ae40ee91a	utils::timestamped_val: fix the touch() comment The current comment has been written when the function has not been a timestamped_val member. Let's adjust it to the current code. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495555659-10881-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:56 +03:00
Vlad Zolotarov	0619c2cb71	utils::serialization: remove not used deserialization_xxx() functions Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1495556124-16672-1-git-send-email-vladz@scylladb.com>	2017-05-26 19:26:20 +03:00
Paweł Dziepak	3b9c0a6ae2	Merge "loading_cache: fix the known complexity issue in the shrink() method" from Vlad Use the boost::intrusive containers in order to achieve a O(1) complexity for both "LRU list" update and to minimize the memory overhead in the hash table item to "LRU list" item connection: - Make the timestamped_val be both a bi::list and a bi::unordered_set item. - Make a bi::unordered_set be a cache backend instead of the std::unordered_map. As a result dropping k LRU items becomes an O(k) operation instead of O(N log N), where N is a total number of all cached items: - Every time a value is read - move it to the front of the "LRU list" (O(1)). - When we need to remove k LRU items: - Repeat k times: - Take an element from the back of the "LRU list". (O(1)). - Remove it from the bi::unordered_set and dispose. (O(1)). We use an auto-unlink configuration for bi::list, therefore disposing an item is going to auto unlink it from the list. * 'permissions_cache_move_to_intrusive-v1' of github.com:scylladb/seastar-dev: utils::loading_cache: cleanup utils/loading_cache.hh: use intrusive list to store the lru entry utils::loading_cache: implement automatic rehashing utils::loading_cache: make the underlying map to be an intrusive unordered_set	2017-05-23 16:18:16 +01:00
Avi Kivity	fd0e1eb1e2	Merge "Fixes for mutation algebra" from Tomasz "Enforces commutativity of addition: m1 + m2 == m2 + m1 and consistency of difference and addition with equality: m1 + (m2 - m1) == m1 + m2" * tag 'tgrabiec/fix-range-tombstone-commutativity-v2' of github.com:cloudius-systems/seastar-dev: mutation: Make compare_*_for_merge() consistent with equals() tests: mutation: Improve assertion failure message tests: Use default equality in test_mutation_diff_with_random_generator mutation: Make counter cell difference consistent with apply tests: range_tombstone_list_test: Improve error message tests: range_tombstone_list: Check adjacent range merging range_tombstone_list: Merge adjacent range tombstones in apply() tests: mutation: Check commutativity of mutation addition range_tombstone_list: Avoid violating set invariant range_tombstone_list: Make tombstone merging commutative range_tombstone_list: Add erase() operation to the reverter range_tombstone_list: Make all undo operations ordered relative to each other utils: Extract to_boost_visitor() to a separate header allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-23 15:20:38 +03:00
Vlad Zolotarov	2d4d198fb9	utils::loading_cache: cleanup - Remove "_" at the beginning of the type names. - s/Pred/EqualPred/ Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:02:18 -04:00
Vlad Zolotarov	fd59a548c0	utils/loading_cache.hh: use intrusive list to store the lru entry Fix the shrink() O(n log n) complexity issue by constantly pushing the corresponding intrusive list entry to the head of the list every time the values are read. This will keep the list ordered by the last read time from the most recently read to the least recently read entry. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 23:00:18 -04:00
Vlad Zolotarov	0c4e9efce7	utils::loading_cache: implement automatic rehashing - Start the cache with 256 buckets - the minimum number of buckets. - Limit the maximal number of buckets by 1M buckets. - Keep the load factor between 0.25 and 1.0 as long as the number of buckets is between the minimum and the maximum values mentioned above. - Grow and shrink the hash every "refresh" period if needed. - Enable bi::power_2_buckets and bi::compare_hash bi::unordered_set options. - Enable bi::unordered_set_base_hook's bi::store_hash option. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 22:57:44 -04:00
Vlad Zolotarov	2be3596a4f	utils::loading_cache: make the underlying map to be an intrusive unordered_set Make the underlying map to be a boost::intrusive::unordered_set<timestamped_val> instead of std::unordered_set<Key, timestamped_val>. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-22 18:45:13 -04:00
Tomasz Grabiec	5aeb9eb70c	utils: Extract to_boost_visitor() to a separate header	2017-05-22 19:30:02 +02:00
Tomasz Grabiec	69e2eccf68	allocating_strategy: Introduce alloc_strategy_unique_ptr<>	2017-05-22 19:30:02 +02:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Tomasz Grabiec	cd4d15672b	utils: estimated_histogram: Fix clear() It was a no-op. It doesn't seem currently used, but I will have a use for it soon. Message-Id: <1495198172-1969-1-git-send-email-tgrabiec@scylladb.com>	2017-05-19 14:34:34 +01:00
Vlad Zolotarov	6a63c87a9f	utils::loading_cache: avoid the reads storm when the key is not in the cache Use a mutex to serialize producers when the key is not present in the cache. Fixes #2262 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-18 07:55:48 -04:00
Vlad Zolotarov	1ef22f84c1	utils::loading_cache: cleanup - Fix a callback signature: receive a const ref. - White spaces. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:14 -04:00
Vlad Zolotarov	87ce0b2d47	utils::loading_cache: align the constrains in the constructor with the parameters description According to description of permissions_validity_in_ms the permissions_cache is enabled if this value is set to a non-zero value. Otherwise the permissions_cache is disabled. According to the permissions_update_interval_in_ms description it must have a non-zero value if permissions_cache is enabled. permissions_cache_max_entries description doesn't explicitly state it but it makes no sense to allow it to be zero if permissions_cache is enabled. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:14 -04:00
Vlad Zolotarov	e286828472	utils::loading_cache: refresh in the background This patch changes the way a loading_cache works. Before this patch: 1) If a permissions key is not in the cache it's loaded in the foreground and the original query is blocked till the permissions are loaded. 2) Every _period the timer does the following: 1) If a value was loaded more than _expiry time ago it is removed from the cache. 2) If the cache is too big - the less recently loaded values are removed till the cache fits the requested size. After this patch: 1) If a permissions key is not in the cache it's loaded in the foreground and the original query is blocked till the permissions are loaded. 2) Every _period the timer does the following: 1) If a value in the cache was loaded or read for the last time more than _expiry time ago - it's removed from the cache. 2) If the cache is too big - the less recently read values are removed till the cache fits the requested size. 3) The values that were loaded more than _refresh time ago are re-read in the background. The new implementation allows to minimize the amount of the foreground reads for a frequently used value to a single event (when the value is loaded for the first time). It also ensures we do not reload values we no longer need. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-17 15:03:06 -04:00
Avi Kivity	1c6cecd9d0	utils: introduce div_ceil() Divides integrals but rounds up rather than down.	2017-05-17 12:30:03 +03:00
Vlad Zolotarov	494ea82a88	utils::UUID: align the UUID serialization API with the similar API of other classes in the project The standard serialization API (e.g. in data_value) includes the following methods: size_t serialized_size() const; void serialize(bytes::iterator& it) const; bytes serialize() const; Align the utils::UUID API with the pattern above. The only addition is that we are going to make an output iterator parameter of a second method above a template so that we may serialize into different output sources. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Vlad Zolotarov	7706775a63	utils: serialization: unify the variety of serialize_XXX(...) Use the same templated implementation for all different serialize_XXX(...). The chosen implementation is based on the std::copy_n(char*, size, OutputIterator), which is heavily optimized and will be using memcpy/memmove where possible. This patch also removes the not needed specializations that accept signed integer values since we were casting them to unsigned value anyway. The std::ostream based specifications are also removed since they are not used anywhere except for a test-serialization.cc and adjusting the ostream to the iterator is a single-liner. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-05-16 15:56:03 -04:00
Avi Kivity	7e29dd7066	managed_bytes: improve alignment hygene While blob_storage is marked as an unaligned type, the back references also point to an unaligned type (a pointer to blob_storage), since a back reference can live in a blob_storage. This triggers errors from zapcc/clang 4. Fix by creating a type for the reference, which is marked as unaligned. Message-Id: <20170502071404.507-1-avi@scylladb.com>	2017-05-02 10:04:13 +01:00
Avi Kivity	1d12d69881	logalloc: define segment_zone::maximum_size Yield build errors with some compilers, if missing.	2017-05-01 16:31:29 +03:00
Paweł Dziepak	f5cf86484e	lsa: introduce upper bound on zone size Attempting to create huge zones may introduce significant latency. This patch introduces the maximum allowed zone size so that the time spent trying to allocate and initialising zone is bounded. Fixes #2335. Message-Id: <20170428145916.28093-1-pdziepak@scylladb.com>	2017-04-30 10:58:11 +03:00
Duarte Nunes	d216c3dbd2	tombstone: Extract out relational operators This patch extracts out the relational operators in struct tombstone to a class capable of generating them from a tri-compare function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-04-25 11:43:04 +02:00
Pekka Enberg	940c3f4330	Merge "Clang fixes (part 2)" from Avi "This series fixes some more errors found by clang, with the aim of enabling clang/zapcc as a supported compiler. A single issue remains, but it's probably in std::experimental::optional::swap(); not in our code." * tag 'clang/2/v1' of https://github.com/avikivity/scylla: sstable_test: avoid passing negative non-type template arguments to unsigned parameters UUID: add more comparison operators sstable_datafile_test: avoid string_view user-defined literal conversion operator mutation_source_test: avoid template function without template keyword cql_query_test: define static variable cql_query_test: add braces for single-item collection initializers storage_service: don't use typeid(temporary) logalloc: remove unused max_occupancy_for_compaction storage_proxy: drop overzealous use of __int128_t in recently-modified-no-read-repair logic storage_proxy: drop unused member access from return value storage_proxy: fix reference bound to temporary in data_read_resolver::less_compare read_repair_decision: fix operator<<(std::ostream&, ...)	2017-04-24 20:32:16 +03:00
Avi Kivity	6d9e18fd61	logalloc: reduce descriptor overhead Every lsa-allocated object is prefixed by a header that contains information needed to free or migrate it. This includes its size (for freeing) and an 8-byte migrator (for migrating). Together with some flags, the overhead is 14 bytes (16 bytes if the default alignment is used). This patch reduces the header size to 1 byte (8 bytes if the default alignment is used). It uses the following techniques: - ULEB128-like encoding (actually more like ULEB64) so a live object's header can typically be stored using 1 byte - indirection, so that migrators can be encoded in a small index pointing to a migrator table, rather than using an 8-byte pointer; this exploits the fact that only a small number of types are stored in LSA - moving the responsibility for determining an object's size to its migrator, rather than storing it in the header; this exploits the fact that the migrator stores type information, and object size is in fact information about the type The patch improves the results of memory_footprint_test as following: Before: - in cache: 976 - in memtable: 947 After: mutation footprint: - in cache: 880 - in memtable: 858 A reduction of about 10%. Further reductions are possible by reducing the alignment of lsa objects. logalloc_test was adjusted to free more objects, since with the lower footprint, rounding errors (to full segments) are different and caused false errors to be detected. Missing: adjustments to scylla-gdb.py; will be done after we agree on the new descriptor's format.	2017-04-24 12:23:12 +02:00
Avi Kivity	dc6ea51ffa	UUID: add more comparison operators Clang wanted them for some unit test; not sure how gcc was able to synthesize them, but they're clearly needed.	2017-04-22 22:12:33 +03:00
Avi Kivity	9303b09a64	logalloc: remove unused max_occupancy_for_compaction Noticed by clang.	2017-04-22 21:09:41 +03:00
Tomasz Grabiec	20f4c9bf23	lsa: Reduce reclamation latency Currently eviction is performed until occupancy of the whole region drops below the 85% threshold. This may take a while if region had high occupancy and is large. We could improve the situation by only evicting until occupancy of the sparsest segment drops below the threshold, as is done by this change. I tested this using a c-s read workload in which the condition triggers in the cache region, with 1G per shard: lsa-timing - Reclamation cycle took 12.934 us. lsa-timing - Reclamation cycle took 47.771 us. lsa-timing - Reclamation cycle took 125.946 us. lsa-timing - Reclamation cycle took 144356 us. lsa-timing - Reclamation cycle took 655.765 us. lsa-timing - Reclamation cycle took 693.418 us. lsa-timing - Reclamation cycle took 509.869 us. lsa-timing - Reclamation cycle took 1139.15 us. The 144ms pause is when large eviction is necessary. Statistics for reclamation pauses for a read workload over larger-than-memory data set: Before: avg = 865.796362 stdev = 10253.498038 min = 93.891000 max = 264078.000000 sum = 574022.988000 samples = 663 After: avg = 513.685650 stdev = 275.270157 min = 212.286000 max = 1089.670000 sum = 340573.586000 samples = 663 Refs #1634. Message-Id: <1484730859-11969-1-git-send-email-tgrabiec@scylladb.com>	2017-04-21 12:52:31 +02:00
Tomasz Grabiec	4313641c03	tests: Add test for log_histogram	2017-04-21 12:52:31 +02:00
Tomasz Grabiec	c83768d6bb	log_histogram: Allow non-power-of-two minimum values We will want to reuse the min_size mechanism for the whole compaction threshold, including the occupancy threshold. That threshold is close to the segment size and we cannot pick a power of two which would be close enough to what we need. Therefore, change log_histogram to support arbitrary minimum base. bucket_of() was moved into log_histogram_options so that it can be used in number_of_buckets(), which makes for a simple and much less error-prone implementation.	2017-04-21 10:54:50 +02:00
Tomasz Grabiec	7a800c54bf	lsa: Use regular compaction threshold in on-idle compaction Idle-time compaction should not produce not-compactible segments becuase that means we would have to evict a lot when we finally need to reclaim some memory, so that occupancy falls below the regular compaction threshold. This may cause latency spikes. Refs #1634.	2017-04-20 15:00:15 +02:00
Tomasz Grabiec	7aa286439f	lsa: Add getter for region's eviction function	2017-04-20 14:51:42 +02:00
Duarte Nunes	af37a3fdbf	logalloc: Fix compilation error This patch moves a function using the region_impl type after the type has been defined. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170418124551.25369-1-duarte@scylladb.com>	2017-04-18 15:56:26 +03:00
Avi Kivity	844529fe33	logalloc: avoid auto in function argument declaration 'auto' in a non-lambda function argument is not legal C++, and is hard to read besides. Replace with the right type. Since the right type is private, add some friendship.	2017-04-17 23:18:44 +03:00
Avi Kivity	a0858dda3e	date: use correct casts for years Our date implementation uses int64_t for years, but some of the code was not changed; clang complains, so use the correct casts to make it happy.	2017-04-17 23:03:15 +03:00
Paweł Dziepak	0318dccafd	lsa: avoid unnecessary segment migrations during reclaim segment_zone::migrate_all_segments() was trying to migrate all segments inside a zone to the other one hoping that the original one could be completely freed. This was an attempt to optimise for throughput. However, this may unnecesairly hurt latency if the zone is large, but only few segments are required to satisfy reclaimer's demands. Message-Id: <20170410171912.26821-1-pdziepak@scylladb.com>	2017-04-11 08:55:29 +02:00
Tomasz Grabiec	3609665b19	lsa: Fix debug-mode compilation error By moving definitions of setters out of #ifdef	2017-03-16 18:23:05 +01:00
Tomasz Grabiec	88e7b3ff79	lsa: Ensure can_allocate_more_memory() always leaves a gap above seastar's min_free_memory() One of the goals of can_allocate_more_memory() is to prevent depleting seastar's free memory close to its minimum, leaving a head room above that minimum so that standard allocations will not cause reclamation immediately. Currently the function doesn't take into accoutn actual threshold used by the seastar allocator, so there could be no gap or even could go below the minimum. Fix that by ensuring there's always a gap above min_free_memory(). min_gap was reduced to 1 MiB so that low memory setups are not impacted significantly by the change. Message-Id: <1489667863-15099-1-git-send-email-tgrabiec@scylladb.com>	2017-03-16 12:42:50 +00:00
Tomasz Grabiec	4ab8b255da	lsa: Allow adjusting reserves in allocating_section	2017-03-16 10:21:10 +01:00
Paweł Dziepak	60c6b9a240	Merge "Implement sstable_streamed_mutation::fast_forward_to()" from Tomasz "This replaces use of a generic forwarding wrapper in sstable reader with specialized implentation. Forwarding doesn't yet utilize indexes in this series, only integrates it with mp_row_consumer, which is a prerequisite. It's still an optimization, since mp_row_consumer will not try to consume past the range as it used to. Sending early for easier consumption." * tag 'tgrabiec/forwarding-of-mp-row-consumer-v2' of github.com:scylladb/seastar-dev: sstables: Remove use of forwarding wrapper sstables: Implement sstable_streamed_mutation::fast_forward_to() sstables: Extract and use clustering_ranges_walker tests: sstables: Add test for handling of repeated tombstones sstables: Extract writer parameters into config objects tests: Move as_mutation_source() helper to header tests: Extract ensure_monotonic_positions() to streamed_mutation_assertions streamed_mutation: Add streamed_mutation_returning() helper tests: mutation_source_test: Add test case for forwarding to a full range tests: simple_schema: Add fragment factories tests: Extract simple_schema sstables: Move workaround for out-of-order range tombstones to mp_row_consumer sstables: Drop default mp_row_consumer constructor sstables: Swap order of values in "proceed" so that "no" is assigned 0 util/optimized_optional: Make printable position_in_partition: Add is_static_row() in the view range_tombstone_stream: Add reset() range_tombstone_stream: Add get_next(position_in_partition_view) sstables: streamed_mutation: Stop reading when end of slice reached sstables: Switch is_in_range() to position_in_partition	2017-03-10 13:55:46 +00:00
Tomasz Grabiec	58c29be45c	util/optimized_optional: Make printable	2017-03-10 14:42:21 +01:00
Duarte Nunes	d32c848d73	utils/logalloc: Change linkage of hist_options to external Change linkage of segment_descriptor_hist_options to external to keep good old GCC5 happy, despite C++11 allowing static linkage of non-type template arguments. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170309213206.10383-1-duarte@scylladb.com>	2017-03-10 11:02:51 +02:00
Avi Kivity	439b38f5ab	Merge "Improvements to counter implementation" from Paweł "This series adds various optimisations to counter implementation (nothing extreme, mostly just avoiding unnecessary operations) as well as some missing features such as tracing and dropping timed out queries. Performance was tested using: perf-simple-query -c4 --counters --duration 60 The following results are medians. before after diff write 18640.41 33156.81 +77.9% read 58002.32 62733.93 +8.2%" * tag 'pdziepak/optimise-counters/v3' of github.com:cloudius-systems/seastar-dev: (30 commits) cell_locker: add metrics for lock acquisition storage_proxy: count counter updates for which the node was a leader storage_proxy: use counter-specific timeout for writes storage_proxy: transform counter timeouts to mutation_write_timeout_exception db: avoid allocations in do_apply_counter_update() tests/counters: add test for apply reversability counters: attempt to apply in place atomic_cell: add COUNTER_IN_PLACE_REVERT flag counters: add equality operators counters: implement decrement operators for shard_iterator counters: allow using both views and mutable_views atomic_cell: introduce atomic_cell_mutable_view managed_bytes: add cast to mutable_view bytes: add bytes_mutable_view utils: introduce mutable_view db: add more tracing events for counter writes db: propagate tracing state for counter writes tests/cell_locker: add test for timing out lock acquisition counter_cell_locker: allow setting timeouts db: propagate timeout for counter writes ...	2017-03-07 11:48:13 +02:00
Duarte Nunes	ca4f5cabd4	lsa: Extract log_histogram class Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-04 14:47:19 +01:00
Duarte Nunes	2b6abd5a91	lsa: Make log_histogram more generic Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-03 17:59:17 +01:00
Duarte Nunes	3819e6d55f	lsa: log_histogram cleanups Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-03-03 17:09:07 +01:00
Paweł Dziepak	1293073019	managed_bytes: add cast to mutable_view	2017-03-02 09:05:11 +00:00
Paweł Dziepak	0ed2352ade	utils: introduce mutable_view std::basic_string_view does not allow modifying the underlying buffer. This patch introduces a mutable_view which permits that.	2017-03-02 09:05:10 +00:00
Duarte Nunes	11b5076b3c	lsa: Use log histogram for closed segments This patch replaces the current heap with a logarithmic histogram to hold the closed segment descriptors. This histogram stores elements in different buckets according to their size. Values are mapped to a sequence of power-of-two ranges that are split in N sub-buckets. Values less than a minimum value are placed in bucket 0, whereas values bigger than a maximum value are not admitted. There is some loss of precision as segments are now not totally ordered, and precision decreases the more sparse a segment is. This allows to reduce the cost of the computations needed when freeing from a closed segment. Performance results for perf_simple_query -c4 --duration 60 before after diff read 43954.27 45246.10 +2.9% write 48911.54 52807.76 +7.9% Fixes #1442 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170227235328.27937-1-duarte@scylladb.com>	2017-02-28 18:40:38 +02:00
Calle Wilund	0d87f3dd7d	utils::UUID: operator< should behave as comparison of hex strings/bytes I.e. need to be unsigned comparison. Message-Id: <1487683665-23426-1-git-send-email-calle@scylladb.com>	2017-02-22 09:19:22 +00:00

1 2 3 4 5 ...

377 Commits