scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	57e25fa0f8	utils: phased_barrier: Make advance_and_await() have strong exception guarantees Currently, when advance_and_await() fails to allocate the new gate object, it will throw bad_alloc and leave the phased_barrier object in an invalid state. Calling advance_and_await() again on it will result in undefined behavior (typically SIGSEGV) beacuse _gate will be disengaged. One place affected by this is table::seal_active_memtable(), which calls _flush_barrier.advance_and_await(). If this throws, subsequent flush attempts will SIGSEGV. This patch rearranges the code so that advance_and_await() has strong exception guarantees. Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>	2018-11-20 16:15:12 +00:00
Avi Kivity	be99101f36	utils: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	3cf434b863	utils: estimated_histogram: convert generated format strings to fmt Convert printf games to format games. Note that fmt supports specifying the field width as an argument, but that is left to a dedicated change.	2018-11-01 13:16:17 +00:00
Avi Kivity	7726ce23b7	utils: i_filter: rename "format" variable The format variable hides the format function, which we'll soon want to use here. Rename the format variable to unhide the function.	2018-11-01 13:16:17 +00:00
Yibo Cai (Arm Technology China)	79136e895f	utils/crc: calculate crc in parallel It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on various arm64 microarchitectures. The algorithm cuts data into blocks of 1024 bytes and calculates crc for each block, which is furthur divided into three subblocks of 336 bytes(42 uint64) each, and 16 remaining bytes(2 uint64). For each iteration, three independent crc are caculated for one uint64 from each subgroup. It increases IPC(instructions per cycle) much. After subblocks are done, three crc and remaining two uint64 are combined using carry-less multiplication to reach the final result for one block of 1024 bytes. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>	2018-11-01 10:19:32 +02:00
Yibo Cai (Arm Technology China)	1c48e3fbec	utils/crc: leverage arm64 crc extension It achieves 6.7x to 11x speedup on various arm64 microarchitectures. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1540781879-15465-1-git-send-email-yibo.cai@arm.com>	2018-10-29 10:50:48 +02:00
Rafi Einstein	32525f2694	Space-Saving Top-k algorithm for handling stream summary statistics Based on the following implementation ([2]) for the Space-Saving algorithm from [1]. [1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf [2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys. Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation of the error which results from that. This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K, which we can display alongside the results. Accuracy depends on the number of tracked keys. Introduced as part of 'nodetool toppartition' query implementation. Refs #2811 Message-Id: <20181027220937.58077-1-rafie@scylladb.com>	2018-10-28 10:10:28 +02:00
Tomasz Grabiec	fe0a0bdf1e	utils/loading_shared_values: Add missing stat update call in one of the cases Message-Id: <1540469591-32738-1-git-send-email-tgrabiec@scylladb.com>	2018-10-25 15:15:05 +03:00
Avi Kivity	aaab8a3f46	utils: crc32: mark power crc32 assembly as not requiring an executable stack The linker uses an opt-in system for non-executable stack: if all object files opt into a non-executable stack, the binary will have a non-executable stack, which is very desirable for security. The compiler cooperates by opting into a non-executable stack whenever possible (always for our code). However, we also have an assembly file (for fast power crc32 computations). Since it doesn't opt into a non-executable stack, we get a binary with executable stack, which Gentoo's build system rightly complains about. Fix by adding the correct incantation to the file. Fixes #3799. Reported-by: Alexys Jacob <ultrabug@gmail.com> Message-Id: <20181002151251.26383-1-avi@scylladb.com>	2018-10-02 18:48:23 +01:00
Paweł Dziepak	2bcaf4309e	utils/reusable_buffer: do not warn about large allocations Reusable buffers are meant to be used when protocol or third-party library limiations force us to allocate large contiguous buffers. There isn't much that can be done about this so there is little point in warning about that. Fixes #3788. Message-Id: <20180928085141.6469-1-pdziepak@scylladb.com>	2018-09-30 11:12:23 +03:00
Paweł Dziepak	2e5b375309	utils: drop data_output	2018-09-18 17:22:59 +01:00
Paweł Dziepak	cbe2ef9e5c	utils: fragmented_temporary_buffer::view: add remove_prefix()	2018-09-18 17:22:59 +01:00
Paweł Dziepak	e464ad4f5d	utils: fragmented_temporary_buffer: add empty() and size_bytes()	2018-09-18 11:29:37 +01:00
Paweł Dziepak	f4bb219a8b	utils: fragmented_temporary_buffer: add get_ostream()	2018-09-18 11:29:37 +01:00
Paweł Dziepak	252cf0c681	utils: crc: accept FragmentRange	2018-09-18 11:29:36 +01:00
Tomasz Grabiec	4fb3f7e8eb	managed_vector: Make external_memory_usage() ignore reserved space This ensures that row::external_memory_usage() is invariant to insertion order of cells. It should be so, so that accounting of a clustering_row, merged from multiple MVCC versions by the partition_snapshot_flat_reader on behalf of a memtable flush, doesn't give a greater result than what is used by the memtable region. Overaccounting leads to assertion failure in ~flush_memory_accounter. Fixes #3625 (hopefully). Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com>	2018-09-03 17:09:54 +03:00
Vlad Zolotarov	945d26e4ee	loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' Reloading may hold value in the underlying loading_shared_values while the corresponding cache values have already been deleted. This may create weird situations like this: <populate cache with 10 entries> cache.remove(key1); for (auto& e : cache) { std::out << e << std::endl; } <all 10 entries are printed, including the one for "key1"> In order to avoid such situations we are going to make the loading_cache::iterator to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator because lru_list contains entries only for cached items. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 20:56:44 -04:00
Vlad Zolotarov	1e56c7dd58	loading_cache: make size() return the size of lru_list instead of loading_shared_values reloading flow may hold the items in the underlying loading_shared_values after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size() doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 15:55:30 -04:00
Duarte Nunes	f6aadd8077	Merge 'utils::loading_cache: improve reload() robustness' from Vlad "This series introduces a few improvements related to a reload flow. From now on the callback may assume that the "key" parameter value is kept alive till the end of its execution in the reloading flow. It may also safely evict as many items from the cache as needed." Fixes #3606 * 'loading_cache_improve_reload-v1' of https://github.com/vladzcloudius/scylla: utils::loading_cache: hold a shared_value_ptr to the value when we reload utils::loading_cache::on_timer(): remove not needed capture of "this" utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload	2018-08-28 10:52:20 +01:00
Tomasz Grabiec	1e50f85288	database: Make soft-pressure memtable flusher not consider already flushed memtables The flusher picks the memtable list which contains the largest region according to region_impl::evictable_occupancy().total_space(), which follows region::occupancy().total_space(). But only the latest memtable in the list can start flushing. It can happen that the memtable corresponding to the largest region was already flushed to an sstable (flush permit released), but not yet fsynced or moved to cache, so it's still in the memtable list. The latest memtable in the winning list may be small, or empty, in which case the soft pressure flusher will not be able to make much progress. There could be other memtable lists with non-empty (flushable) latest memtables. This can lead to writes unnecessarily blocking on dirty. I observed this for the system memtable group, where it's easy for the memtables to overshoot small soft pressure limits. The flusher kept trying to flush empty memtables, while the previous non-empty memtable was still in the group. The CPU scheduler makes this worse, because it runs memtable_to_cache in a separate scheduling group, so it further defers in time the removal of the flushed memtable from the memtable list. This patch fixes the problem by making regions corresponding to memtables which started flushing report evictable_occupancy() as 0, so that they're picked by the flusher last. Fixes #3716. Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:02:34 +03:00
Tomasz Grabiec	364418b5c5	logalloc: Make evictable_occupancy() indicate no free space Doesn't fix any bug, but it's closer to the truth that all segments are used rather than none is used. Message-Id: <1535040132-11153-1-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:02:32 +03:00
Avi Kivity	2c9b886b6d	logalloc: reindent No functional changes. Message-Id: <20180731125116.32009-1-avi@scylladb.com>	2018-08-01 00:35:54 +01:00
Avi Kivity	0fc54aab98	logalloc: run releaser() in user-provided scheduling group Let the user specify which scheduling group should run the releaser, since it is running functions on the user's behalf. Perhaps a cleaner interface is to require the user to call a long-running function for the releaser, and so we'd just inherit its scheduling group, but that's a much bigger change.	2018-07-31 11:57:58 +03:00
Paweł Dziepak	b485deb124	utils: uuid: don't use std::random_device() std::random_device() is extremely slow. This patch modifies make_rand_uuid() so that it requires only two invocations of the PRNG.	2018-07-26 12:02:32 +01:00
Avi Kivity	761931659a	Merge "Do not linearise incoming CQL3 requests" from Paweł " This series changes the native CQL3 protocl layer so that it works with fragmented buffers instead of a single temporary_buffer per request. The main part is fragmented_temporary_buffer which represents a fragmented buffer consisting of multiple temporary_buffers. It provides helpers for reading fragmented buffer from an input_stream, interpreting the data in the fragmented buffer as well as view that satisfy FragmentRange concept. There are still situations where a fragmented buffer is linearised. That includes decompressing client requests (this uses reusable buffers in a similar way to the code that sends compressed responses), CQL statement restrictions and values that are hard-coded in prepared statements (hopefully, the values in those cases will be small), value validation in some cases (blobs are not validated, irrelevant for many fixed-size small types, but may be a problem for large text cells) as well as operations on collections. Tests: unit(release), dtests(cql_prepared_test.py, cql_tests.py, cql_additional_tests.py) " * tag 'fragmented-cql3-receive/v1' of https://github.com/pdziepak/scylla: (23 commits) types: bytes_view: override fragmented validate() cql3: value_view: switch to fragmented_temporary_buffer::view types: add validate that accepts fragmented_temporary_buffer::view cql3 query_options: add linearize() cql3: query_options: use bytes_ostream for temporaries cql3: operation: make make_cell accept fragmented_temporary_buffer::view atomic_cell: accept fragmented_temporary_buffer::view values cql3: avoid ambiguity in a call to update_parameters::make_cell() transport: switch to fragmented_temporary_buffer transport: extract compression buffers from response class tests/reusable_buffer: test fragmented_temporary_buffer support utils: reusable_buffer: support fragmented_temporary_buffer tests: add test for fragmented_temporary_buffer util fragment_range: add general linearisation functions utils: add fragmented_temporary_buffer tests: add basic test for transport requests and responses tests/random-utils: print seed tests/random-utils: generate sstrings cql3: add value_view printer and equality comparison transport: move response outside of cql_server class ...	2018-07-22 19:40:37 +03:00
Vladimir Krivopalov	79c2f0095c	utils: Add overloaded_functor helper. The overloaded_functor class template can be used to encompass multiple lambdas accepting different types into a single callable object that can be used with any of those types. One application is visitors for std::variant where different handling is required for different types. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>	2018-07-20 13:50:17 -07:00
Paweł Dziepak	32ba47fb87	utils: reusable_buffer: support fragmented_temporary_buffer reusable_buffer already supports bytes_ostream which is often used for handling data sent from Scylla. This patch adds support for fragmented_temporary_buffer which is going to be mainly used for data received by Scylla.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	b152aafd67	util fragment_range: add general linearisation functions All FragmentRange implementations can be linearised in the same way, so let's provide linearized() and with_linearized() functions for all of them.	2018-07-18 12:28:06 +01:00
Paweł Dziepak	fc484f0819	utils: add fragmented_temporary_buffer Seastar output_streams produce temporary_buffer<char>s. fragmented_temporary_buffer represents a single fragmented buffer that consists of, possibly multiple, temporary_buffer<char>s.	2018-07-18 12:28:06 +01:00
Tomasz Grabiec	612b223819	managed_bytes: Mark read_linearize() as an allocation point	2018-07-17 16:39:43 +02:00
Tomasz Grabiec	d94c7c07a3	lsa: Disable alloc failure injector inside the LSA sanitizer Message-Id: <1531814822-30259-1-git-send-email-tgrabiec@scylladb.com>	2018-07-17 11:27:56 +01:00
Vlad Zolotarov	235520292e	utils::loading_cache: hold a shared_value_ptr to the value when we reload This allows to remove the requirement to hold the key value inside the _load callback if its value is needed in the asynchronous continuation inside the callback in the context of a reload. This also resolves the use-after-free issue when a _load() callback removes the item for a given key. See `a9b72db34d`.1528794135.git.bdenes%40scylladb.com for a discussion about this. In addition this patch makes the loading_cache more robust for any existing and potential situations when cached entries are being removed from inside the callback. This is achieved by extending the idea implemented by Duarte in the "utils/loading_cache: Avoid using invalidated iterators" by capturing timestamped_val_ptr (which is essentially a lw_shared_ptr to an intrusive set entry which holds both the key and the cached value) instead of a naked pointer. Tests {debug, release}: - Unit tests: - loading_cache_test - view_build_test - auth_test - auth_resource_test - dtest: - auth_test.py Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-13 11:27:58 -04:00
Vlad Zolotarov	b44ad5677a	utils::loading_cache::on_timer(): remove not needed capture of "this" Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-13 11:27:43 -04:00
Vlad Zolotarov	4aa0e5914b	utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload The list of elements that needs to be reloaded may be rather large. Use chunked_vector in order to make the allocator's life easier. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-13 09:53:59 -04:00
Duarte Nunes	63b63b0461	utils/loading_cache: Avoid using invalidated iterators When periodically reloading the values in the loading_cache, we would iterate over the list of entries and call the load() function for those which need to be reloaded. For some concrete caches, load() can remove the entry from the LRU set, and can be executed inline from the parallel_for_each(). This means we could potentially keep iterating using an invalidated iterator. Fix this by using a temporary container to hold those entries to be reloaded. Spotted when reading the code. Also use if constexpr and fix the comment in the function containing the changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180712124143.13638-1-duarte@scylladb.com>	2018-07-12 13:59:09 +01:00
Botond Dénes	2e7bf9c6f9	loading_cache::reload(): obtain key before calling _load() The continuation attached to _load() needs the key of the loaded entry to check whether it was disposed during the load. However if _load() invalidates the entry the continuation's capture line will access invalid memory while trying to obtain the key. To avoid this save a copy of the key before calling _load() and pass it to both _load() and the continuation. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <b571b73076ca863690f907fbd3fb4ff54e597b28.1531393608.git.bdenes@scylladb.com>	2018-07-12 13:42:42 +01:00
Avi Kivity	a4a2f743a8	Merge "Avoid large allocations when reading sstable index pages" from Tomasz " If there is a lot of partitions in the index page, index_list may grow large and require large contiguous blocks of memory, because it's based on std::vector. That puts pressure on the memory allocator, and if memory is fragmented, may not be possible to satisfy without a lot of eviction. Switch to chunked_vector to avoid this. Refs #3597 " * 'tgrabiec/avoid-large-alloc-in-index-reader' of github.com:tgrabiec/scylla: sstables: Switch index_list to chunked_vector to avoid large allocations utils: chunked_vector: Do not require T to be default-constructible for clear() utils: chunked_vector: Implement front()	2018-07-12 15:30:18 +03:00
Duarte Nunes	1fb3b924f4	utils/loading_cache: Remove superfluous continuation Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180712122031.13424-1-duarte@scylladb.com>	2018-07-12 15:22:35 +03:00
Tomasz Grabiec	b0f5df10d2	utils: chunked_vector: Do not require T to be default-constructible for clear() resize(), used by clear(), requires T to be default-constructible in case the vector is expanded. It's not actually needed for clearing, and there will be users which use clear() with non-default-constructible T, so implement clear() without using resize().	2018-07-11 16:55:20 +02:00
Tomasz Grabiec	03832dab97	utils: chunked_vector: Implement front() std::vector<> has it, so should this, for easy migration.	2018-07-11 16:55:20 +02:00
Avi Kivity	28621066e6	observable: allow an observable to disconnect() twice without penalty Message-Id: <20180711070754.13286-1-avi@scylladb.com>	2018-07-11 10:15:01 +01:00
Avi Kivity	1895483781	observable: add comments explaining the purpose and use of the mechanism Message-Id: <20180710133706.8791-1-avi@scylladb.com>	2018-07-11 10:15:01 +01:00
Avi Kivity	7db394ce50	observable: switch to noncopyable_function std::function's move constructor is not noexcept, so observer's move constructor and assignment operator also cannot be. Switch to Seastar's noncopyable_function which provides better guarantees. Tests: observer_tests (release) Message-Id: <20180710073628.30702-1-avi@scylladb.com>	2018-07-10 09:42:49 +01:00
Avi Kivity	96737d140f	utils: add observer/observable templates An observable is used to decouple an information producer from a consumer (in the same way as a callback), while allowing multiple consumers (called observers) to coexist and to manage their lifetime separately. Two classes are introduced: observable: a producer class; when an observable is invoked all observers receive the information observer: a consumer class; receives information from a observable Modelled after boost::signals2, with the following changes - all signals return void; information is passed from the producer to the consumer but not back - thread-unsafe - modern C++ without preprocessor hacks - connection lifetime is always managed rather than leaked by default - renamed to avoid the funky "slot" name Message-Id: <20180709172726.5079-1-avi@scylladb.com>	2018-07-09 18:48:44 +01:00
Avi Kivity	f3da043230	Merge "Make in-memory partition version merging preemptable" from Tomasz " Partition snapshots go away when the last read using the snapshot is done. Currently we will synchronously attempt to merge partition versions on this event. If partitions are large, that may stall the reactor for a significant amount of time, depending on the size of newer versions. Cache update on memtable flush can create especially large versions. The solution implemented in this series is to allow merging to be preemptable, and continue in the background. Background merging is done by the mutation_cleaner associated with the container (memtable, cache). There is a single merging process per mutation_cleaner. The merging worker runs in a separate scheduling group, introduced here, called "mem_compaction". When the last user of a snapshot goes away the snapshot is slided to the oldest unreferenced version first so that the version is no longer reachable from partition_entry::read(). The cleaner will then keep merging preceding (newer) versions into it, until it merges a version which is referenced. The merging is preemtable. If the initial merging is preempted, the snapshot is enqueued into the cleaner, the worker woken up, and merging will continue asynchronously. When memtable is merged with cache, its cleaner is merged with cache cleaner, so any outstanding background merges will be continued by the cache cleaner without disruption. This reduces scheduling latency spikes in tests/perf_row_cache_update for the case of large partition with many rows. For -c1 -m1G I saw them dropping from >23ms to 1-2ms. System-level benchmark using scylla-bench shows a similar improvement. " * tag 'tgrabiec/merge-snapshots-gradually-v4' of github.com:tgrabiec/scylla: tests: perf_row_cache_update: Test with an active reader surviving memtable flush memtable, cache: Run mutation_cleaner worker in its own scheduling group mutation_cleaner: Make merge() redirect old instance to the new one mvcc: Use RAII to ensure that partition versions are merged mvcc: Merge partition version versions gradually in the background mutation_partition: Make merging preemtable tests: mvcc: Use the standard maybe_merge_versions() to merge snapshots	2018-07-01 15:32:51 +03:00
Calle Wilund	054514a47a	sstables::compress: Ensure unqualified compressor name if possible Fixes #3546 Both older origin and scylla writes "known" compressor names (i.e. those in origin namespace) unqualified (i.e. LZ4Compressor). This behaviour was not preserved in the virtualization change. But probably should be. Message-Id: <20180627110930.1619-1-calle@scylladb.com>	2018-06-27 14:16:50 +03:00
Tomasz Grabiec	4d3cc2867a	mutation_partition: Make merging preemtable	2018-06-27 12:48:30 +02:00
Avi Kivity	9a7ecdb3b9	Merge "Deglobalise cache_tracker" from Paweł " Cache tracker is a thread-local global object that indirectly depends on the lifetimes of other objects. In particular, a member of cache_tracker: mutation_cleaner may extend the lifetime of a mutation_partition until the cleaner is destroyed. The mutation_partition itself depends on LSA migrators which are thread-local objects. Since, there is no direct dependency between LSA-migrators and cache_tracker it is not guarantee that the former won't be destroyed before the latter. The easiest (barring some unit tests that repeat the same code several billion times) solution is to stop using globals. This series also improves the part of LSA sanitiser that deals with migrators. Fixes #3526. Tests: unit(release) " * tag 'deglobalise-cache-tracker/v1-rebased' of https://github.com/pdziepak/scylla: mutation_cleaner: add disclaimer about mutation_partition lifetime lsa: enhance sanitizer for migrators lsa: formalise migrator id requirements row_cache: deglobalise row cache tracker	2018-06-26 16:38:12 +01:00
Paweł Dziepak	55bf9d78a6	lsa: enhance sanitizer for migrators Current LSA sanitizer performs only basic checks on the migrators use, without doing any additonal reporting in case an error is detected. This patch enhances it so that when a problem is detected relevant stack traces get printed.	2018-06-25 09:37:43 +01:00
Paweł Dziepak	fcd9b1f821	lsa: formalise migrator id requirements object_descriptor uses special encoding for migrator ids which assumes that the valid ones are in a range smaller than uint32_t. Let's add some static asserts that make this fact more visible.	2018-06-25 09:37:43 +01:00

1 2 3 4 5 ...

576 Commits