scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 04:26:48 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	dbc1894bd5	lsa: Avoid unnecessary compact_and_evict_locked() When the reclaim request was satisfied from the pool there's no need to call compact_and_evict_locked(). This allows us to avoid calling boost::range::make_heap(), which is a tiny performance difference, as well as some confusing log messages. Message-Id: <1548091941-8534-1-git-send-email-tgrabiec@scylladb.com>	2019-01-21 20:19:20 +02:00
Paweł Dziepak	e212d37a8a	utils/small_vector: fix leak in copy assignment slow path Fixes #4105. Message-Id: <20190118153936.5039-1-pdziepak@scylladb.com>	2019-01-18 17:49:46 +02:00
Tomasz Grabiec	6461e085fe	managed_bytes: Fix compilation on gcc 8.2 The compilation fails on -Warray-bounds, even though the branch is never taken: inlined from ‘managed_bytes::managed_bytes(bytes_view)’ at ./utils/managed_bytes.hh:195:22, inlined from ‘managed_bytes::managed_bytes(const bytes&)’ at ./utils/managed_bytes.hh:162:77, inlined from ‘dht::token dht::bytes_to_token(bytes)’ at dht/random_partitioner.cc:68:57, inlined from ‘dht::token dht::random_partitioner::get_token(bytes)’ at dht/random_partitioner.cc:85:39: /usr/include/c++/8/bits/stl_algobase.h:368:23: error: ‘void* __builtin_memmove(void, const void, long unsigned int)’ offset 16 from the object at ‘<anonymous>’ is out of the bounds of referenced subobject ‘managed_bytes::small_blob::data’ with type ‘signed char [15]’ at offset 0 [-Werror=array-bounds] __builtin_memmove(__result, __first, sizeof(_Tp) * _Num); ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Work around by disabling the diagnostic locally. Message-Id: <1547205350-30225-1-git-send-email-tgrabiec@scylladb.com>	2019-01-18 13:48:05 +00:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Rafael Ávila de Espíndola	67039e942b	Remove the only use of with_alignment from scylla In c++17 there are standard ways of requesting aligned memory, so seastar doesn't need to provide one. This patch is in preparation for removing with_alignment from seastar. Tests: unit (debug) Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20190107191019.22295-1-espindola@scylladb.com>	2019-01-07 21:34:47 +02:00
Duarte Nunes	3235c13125	utils/fragmented_temporary_buffer: Correctly implement remove_suffix() The current implementation breaks the invariant that _size_bytes = reduce(_fragments, &temporary_buffer::size) In particular, this breaks algorithms that check the individual segment size. Correctly implement remove_suffix() by destroying superfluous temporary_buffer's and by trimming the last one, if needed. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190103133523.34937-1-duarte@scylladb.com>	2019-01-03 13:37:01 +00:00
Duarte Nunes	1a88cd7992	utils/fragmented_temporary_buffer: Add remove_suffix Essentially hide some bytes off the end of the buffer. Needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Duarte Nunes	8eab0a3e01	utils/fragmented_temporary_buffer: Allow skipping in the input stream Add fragmented_temporary_buffer::istream::skip(), needed for subsequent commit log changes. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Tomasz Grabiec	7747f2dde3	Merge "nodetool toppartitions" from Rafi & Avi Implementation of nodetool toppartiotion query, which samples most frequest PKs in read/write operation over a period of time. Content: - data_listener classes: mechanism that interfaces with mutation readers in database and table classes, - toppartition_query and toppartition_data_listener classes to implement toppartition-specific query (this interfaces with data_listeners and the REST api), - REST api for toppartitions query. Uses Top-k structure for handling stream summary statistics (based on implementation in C, see #2811). What's still missing: - JMX interface to nodetool (interface customization may be required), - Querying #rows and #bytes (currently, only #partitions is supported). Fixes #2811 https://github.com/avikivity/scylla rafie_toppartitions_v7.1: top_k: whitespace and minor fixes top_k: map template arguments top_k: std::list -> chunked_vector top_k: support for appending top_k results nodetool toppartitions: refactor table::config constructor nodetool toppartitions: data listeners nodetool toppartitions: add data_listeners to database/table nodetool toppartitions: fully_qualified_cf_name nodetool toppartitions: Toppartitions query implementation nodetool toppartitions: Toppartitions query REST API nodetool toppartitions: nodetool-toppartitions script	2018-12-28 16:31:24 +01:00
Rafi Einstein	eda43b93c9	top_k: support for appending top_k results Allow appending results of one top_k into another. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:56 +02:00
Rafi Einstein	aeebe8e86b	top_k: std::list -> chunked_vector Replaced std::list with chunked_vector. Because chunked_vector requires a noexcept move constructor from its value type, change the bad_boy type in the unit test not to throw in the move constructor. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-28 16:45:07 +02:00
Yibo Cai (Arm Technology China)	422987ab04	utils: add fast ascii string validation Validate ascii string by ORing all bytes and check if 7-th bit is 0. Compared with original std::any_of(), which checks ascii string byte by byte, this new approach validates input in 8 bytes and two independent streams. Performance is much higher for normal cases, though slightly slower when string is very short. See table below. Speed(MB/s) of ascii string validation +---------------+-------------+---------+ \| String length \| std::any_of \| u64 x 2 \| +---------------+-------------+---------+ \| 9 bytes \| 1691 \| 1635 \| +---------------+-------------+---------+ \| 31 bytes \| 2923 \| 3181 \| +---------------+-------------+---------+ \| 129 bytes \| 3377 \| 15110 \| +---------------+-------------+---------+ \| 1039 bytes \| 3357 \| 31815 \| +---------------+-------------+---------+ \| 16385 bytes \| 3448 \| 47983 \| +---------------+-------------+---------+ \| 1048576 bytes \| 3394 \| 31391 \| +---------------+-------------+---------+ Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544669646-31881-1-git-send-email-yibo.cai@arm.com>	2018-12-24 09:58:08 +02:00
Rafi Einstein	533e46ac72	top_k: map template arguments Added Hash and KeyEqual template arguments to enable unordered_map in top_k implementation. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-20 16:41:40 +02:00
Rafi Einstein	75f21954d4	top_k: whitespace and minor fixes Style and minor logic changes from code review. Signed-off-by: Rafi Einstein <rafie@scylladb.com>	2018-12-20 16:41:33 +02:00
Calle Wilund	66472bc52d	sequenced_set: Add "insert" method, following std::set semantics	2018-12-12 09:32:05 +00:00
Avi Kivity	475b151c97	Merge "Use utils::small_vector more in read path" from Paweł " This series optimises the read path by replacing some usages of std::vector by utils::small_vector. The motivation for this change was an observation that memory allocation functions are pointed out by the profiler as the ones where we spent most time and while they have a large number of callers storage allocation for some vectors was close to the top. The gains are not huge, since the problem is a lot of things adding up and not a single slow thing, but we need to start with something. Unfortunately, the performance of boost::container::small_vector is quite disappointing so a new implementation of a small_vector was introduced. perf_simple_query -c4 --duration 60, medians: ./perf_before ./perf_after diff read 343086.80 360720.53 5.1% Tests: unit(release, small_vector in debug) " * tag 'small_vector/v2.1' of https://github.com/pdziepak/scylla: partition_slice: use small_vector for column_ids mutation_fragment_merger: use small_vector auth: use small_vector in resource auth: avoid list-initialisation of vectors idl: serialiser: add serialiser for utils::small_vector idl: serialiser: deduplicate vector serialisers utils: introduce small_vector intrusive_set_external_comparator: make iterator nothrow move constructible mutation_fragment_merger: value-initialise iterator	2018-12-10 13:50:59 +02:00
Yibo Cai (Arm Technology China)	6717816a8d	utils/gz: optimize crc_combine for arm64 Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1544418903-26290-1-git-send-email-yibo.cai@arm.com>	2018-12-10 10:31:08 +02:00
Paweł Dziepak	23d19d21bd	utils: introduce small_vector small_vector is a variation of std::vector<> that reserves a configurable amount of storage internally, without the need for memory allocation. This can bring measurable gains if the expected number of elements is small. The drawback is that moving such small_vector is more expensive and invalidates iterators as well as references which disqualifies it in some cases.	2018-12-06 14:21:04 +00:00
Yibo Cai (Arm Technology China)	6fadba56cc	utils: optimize UTF-8 validation UTF-8 string is now validated by boost::locale::conv::utf_to_utf, it actually does string conversions which is more than necessary. As observed on Arm server, UTF-8 validation can become bottleneck under heavy loads. This patch introduces a brand new SIMD implementation supporting both NEON and SSE, as well as a naive approach to handle short strings. The naive approach is 3x faster than boost utf_to_utf, whilst SIMD method outperforms naive approach 3x ~ 5x on Arm and x86. Details at https://github.com/cyb70289/utf8/. UTF-8 unit test is added to check various corner cases. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1543978498-12123-1-git-send-email-yibo.cai@arm.com>	2018-12-05 21:51:01 +02:00
Tomasz Grabiec	9a4c00beb7	utils/gz: Fix compilation on non-x86 archs gen_crc_combine_table is now executed on every build, so it should not fail on unsupported archs. The generated file will not contain data, but this is fine since it should not be used. Another problem is that u32 and u64 aliases were not visible in the #else branch in crc_combine.cc Message-Id: <1543864425-5650-1-git-send-email-tgrabiec@scylladb.com>	2018-12-04 18:17:27 +00:00
Tomasz Grabiec	1fb792c547	utils/gz: Add fast implementation of crc32_combine() zlib's crc32_combine() is not very efficient. It is faster to re-combine the buffer using crc32(). It's still substantial amount of work which could be avoided. This patch introduces a fast implementation of crc32_combine() which uses a different algorithm than zlib. It also utilizes intrinsics for carry-less multiplication instruction to perform the computation faster. The details of the algorithm can be found in code comments. Performance results using perf_checksum and second buffer of length 64 KiB: zlib CRC32 combine: 38'851 ns libdeflate CRC32: 4'797 ns fast_crc32_combine(): 11 ns So the new implementation is 3500x faster than zlib's, and 417x faster than re-checksumming the buffer using libdeflate. Tested on i7-5960X CPU @ 3.00GHz Performance was also evaluated using sstable writer benchmark: perf_fast_forward --populate --sstable-format=mc --data-directory /tmp/perf-mc \ --value-size=10000 --rows 1000000 --datasets small-part It yielded 9% improvement in median frag/s (129'055 vs 117'977).	2018-12-03 14:40:35 +01:00
Tomasz Grabiec	cd3d9d357b	utils/gz: Add pre-computed polynomials gen_crc_combine_table.cc will be run during build to produce tables with precomputed polynomials (4 x 256 x u32). The definitions will reside in: build/<mode>/gen/utils/gz/crc_combine_table.cc It takes 20ms to generate on my machine. The purpose of those polynomials will be explained in crc_combine.cc	2018-12-03 14:36:09 +01:00
Tomasz Grabiec	63e0da9e58	utils/gz: Import Barett reduction implementation from libdeflate	2018-12-03 14:36:09 +01:00
Tomasz Grabiec	bb7d95d6c3	utils: Extract clmul() from crc.hh	2018-12-03 14:36:08 +01:00
Avi Kivity	c6d700279b	class_registry: introduce a non-static variant of class_registry class_registry's staticness brings has the usual problem of static classes (loss of dependency information) and prevents us from librarifying Scylla since all objects that define a registration must be linked in. Take a first step against this staticness by defining a nonstatic variant. The static class_registry is then redefined in terms of the nonstatic class. After all uses have been converted, the static variant can be retired. Message-Id: <20181126130935.12837-1-avi@scylladb.com>	2018-11-26 13:30:21 +00:00
Benny Halevy	dcd18e2b62	remove exec permission from top_k source files This was introduced by `32525f2694` Cc: Rafi Einstein <rafie@scylladb.com> Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20181121163352.13325-1-bhalevy@scylladb.com>	2018-11-21 18:38:50 +02:00
Tomasz Grabiec	143fd6e1c2	utils: Introduce memory_data_sink	2018-11-21 14:04:27 +01:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Tomasz Grabiec	57e25fa0f8	utils: phased_barrier: Make advance_and_await() have strong exception guarantees Currently, when advance_and_await() fails to allocate the new gate object, it will throw bad_alloc and leave the phased_barrier object in an invalid state. Calling advance_and_await() again on it will result in undefined behavior (typically SIGSEGV) beacuse _gate will be disengaged. One place affected by this is table::seal_active_memtable(), which calls _flush_barrier.advance_and_await(). If this throws, subsequent flush attempts will SIGSEGV. This patch rearranges the code so that advance_and_await() has strong exception guarantees. Message-Id: <1542645562-20932-1-git-send-email-tgrabiec@scylladb.com>	2018-11-20 16:15:12 +00:00
Avi Kivity	be99101f36	utils: convert sprint() to format() sprint() recently became more strict, throwing on sprint("%s", 5). Replace with the more modern format(). Mechanically converted with https://github.com/avikivity/unsprint.	2018-11-01 13:16:17 +00:00
Avi Kivity	3cf434b863	utils: estimated_histogram: convert generated format strings to fmt Convert printf games to format games. Note that fmt supports specifying the field width as an argument, but that is left to a dedicated change.	2018-11-01 13:16:17 +00:00
Avi Kivity	7726ce23b7	utils: i_filter: rename "format" variable The format variable hides the format function, which we'll soon want to use here. Rename the format variable to unhide the function.	2018-11-01 13:16:17 +00:00
Yibo Cai (Arm Technology China)	79136e895f	utils/crc: calculate crc in parallel It achieves 2.0x speedup on intel E5 and 1.1x to 2.5x speedup on various arm64 microarchitectures. The algorithm cuts data into blocks of 1024 bytes and calculates crc for each block, which is furthur divided into three subblocks of 336 bytes(42 uint64) each, and 16 remaining bytes(2 uint64). For each iteration, three independent crc are caculated for one uint64 from each subgroup. It increases IPC(instructions per cycle) much. After subblocks are done, three crc and remaining two uint64 are combined using carry-less multiplication to reach the final result for one block of 1024 bytes. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1541042759-24767-1-git-send-email-yibo.cai@arm.com>	2018-11-01 10:19:32 +02:00
Yibo Cai (Arm Technology China)	1c48e3fbec	utils/crc: leverage arm64 crc extension It achieves 6.7x to 11x speedup on various arm64 microarchitectures. Signed-off-by: Yibo Cai <yibo.cai@arm.com> Message-Id: <1540781879-15465-1-git-send-email-yibo.cai@arm.com>	2018-10-29 10:50:48 +02:00
Rafi Einstein	32525f2694	Space-Saving Top-k algorithm for handling stream summary statistics Based on the following implementation ([2]) for the Space-Saving algorithm from [1]. [1] http://www.cse.ust.hk/~raywong/comp5331/References/EfficientComputationOfFrequentAndTop-kElementsInDataStreams.pdf [2] https://github.com/addthis/stream-lib/blob/master/src/main/java/com/clearspring/analytics/stream/StreamSummary.java The algorithm keeps a map between keys seen and their counts, keeping a bound on the number of tracked keys. Replacement policy evicts the key with the lowest count while inheriting its count, and recording an estimation of the error which results from that. This error estimation can be later used to prove if the distribution we arrived at corresponds to the real top-K, which we can display alongside the results. Accuracy depends on the number of tracked keys. Introduced as part of 'nodetool toppartition' query implementation. Refs #2811 Message-Id: <20181027220937.58077-1-rafie@scylladb.com>	2018-10-28 10:10:28 +02:00
Tomasz Grabiec	fe0a0bdf1e	utils/loading_shared_values: Add missing stat update call in one of the cases Message-Id: <1540469591-32738-1-git-send-email-tgrabiec@scylladb.com>	2018-10-25 15:15:05 +03:00
Avi Kivity	aaab8a3f46	utils: crc32: mark power crc32 assembly as not requiring an executable stack The linker uses an opt-in system for non-executable stack: if all object files opt into a non-executable stack, the binary will have a non-executable stack, which is very desirable for security. The compiler cooperates by opting into a non-executable stack whenever possible (always for our code). However, we also have an assembly file (for fast power crc32 computations). Since it doesn't opt into a non-executable stack, we get a binary with executable stack, which Gentoo's build system rightly complains about. Fix by adding the correct incantation to the file. Fixes #3799. Reported-by: Alexys Jacob <ultrabug@gmail.com> Message-Id: <20181002151251.26383-1-avi@scylladb.com>	2018-10-02 18:48:23 +01:00
Paweł Dziepak	2bcaf4309e	utils/reusable_buffer: do not warn about large allocations Reusable buffers are meant to be used when protocol or third-party library limiations force us to allocate large contiguous buffers. There isn't much that can be done about this so there is little point in warning about that. Fixes #3788. Message-Id: <20180928085141.6469-1-pdziepak@scylladb.com>	2018-09-30 11:12:23 +03:00
Paweł Dziepak	2e5b375309	utils: drop data_output	2018-09-18 17:22:59 +01:00
Paweł Dziepak	cbe2ef9e5c	utils: fragmented_temporary_buffer::view: add remove_prefix()	2018-09-18 17:22:59 +01:00
Paweł Dziepak	e464ad4f5d	utils: fragmented_temporary_buffer: add empty() and size_bytes()	2018-09-18 11:29:37 +01:00
Paweł Dziepak	f4bb219a8b	utils: fragmented_temporary_buffer: add get_ostream()	2018-09-18 11:29:37 +01:00
Paweł Dziepak	252cf0c681	utils: crc: accept FragmentRange	2018-09-18 11:29:36 +01:00
Tomasz Grabiec	4fb3f7e8eb	managed_vector: Make external_memory_usage() ignore reserved space This ensures that row::external_memory_usage() is invariant to insertion order of cells. It should be so, so that accounting of a clustering_row, merged from multiple MVCC versions by the partition_snapshot_flat_reader on behalf of a memtable flush, doesn't give a greater result than what is used by the memtable region. Overaccounting leads to assertion failure in ~flush_memory_accounter. Fixes #3625 (hopefully). Message-Id: <1535982513-19922-1-git-send-email-tgrabiec@scylladb.com>	2018-09-03 17:09:54 +03:00
Vlad Zolotarov	945d26e4ee	loading_cache: make iterator work on top of lru_list iterators instead of loading_shared_values' Reloading may hold value in the underlying loading_shared_values while the corresponding cache values have already been deleted. This may create weird situations like this: <populate cache with 10 entries> cache.remove(key1); for (auto& e : cache) { std::out << e << std::endl; } <all 10 entries are printed, including the one for "key1"> In order to avoid such situations we are going to make the loading_cache::iterator to be a transform_iterator of lru_list::iterator instead of loading_shared_values::iterator because lru_list contains entries only for cached items. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 20:56:44 -04:00
Vlad Zolotarov	1e56c7dd58	loading_cache: make size() return the size of lru_list instead of loading_shared_values reloading flow may hold the items in the underlying loading_shared_values after they have been removed (e.g. via remove(key) API) thereby loading_shared_values.size() doesn't represent the correct value for the loading_cache. lru_list.size() on the other hand - does. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-08-30 15:55:30 -04:00
Duarte Nunes	f6aadd8077	Merge 'utils::loading_cache: improve reload() robustness' from Vlad "This series introduces a few improvements related to a reload flow. From now on the callback may assume that the "key" parameter value is kept alive till the end of its execution in the reloading flow. It may also safely evict as many items from the cache as needed." Fixes #3606 * 'loading_cache_improve_reload-v1' of https://github.com/vladzcloudius/scylla: utils::loading_cache: hold a shared_value_ptr to the value when we reload utils::loading_cache::on_timer(): remove not needed capture of "this" utils::loading_cache::on_timer(): use chunked_vector for storing elements we want to reload	2018-08-28 10:52:20 +01:00
Tomasz Grabiec	1e50f85288	database: Make soft-pressure memtable flusher not consider already flushed memtables The flusher picks the memtable list which contains the largest region according to region_impl::evictable_occupancy().total_space(), which follows region::occupancy().total_space(). But only the latest memtable in the list can start flushing. It can happen that the memtable corresponding to the largest region was already flushed to an sstable (flush permit released), but not yet fsynced or moved to cache, so it's still in the memtable list. The latest memtable in the winning list may be small, or empty, in which case the soft pressure flusher will not be able to make much progress. There could be other memtable lists with non-empty (flushable) latest memtables. This can lead to writes unnecessarily blocking on dirty. I observed this for the system memtable group, where it's easy for the memtables to overshoot small soft pressure limits. The flusher kept trying to flush empty memtables, while the previous non-empty memtable was still in the group. The CPU scheduler makes this worse, because it runs memtable_to_cache in a separate scheduling group, so it further defers in time the removal of the flushed memtable from the memtable list. This patch fixes the problem by making regions corresponding to memtables which started flushing report evictable_occupancy() as 0, so that they're picked by the flusher last. Fixes #3716. Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:02:34 +03:00
Tomasz Grabiec	364418b5c5	logalloc: Make evictable_occupancy() indicate no free space Doesn't fix any bug, but it's closer to the truth that all segments are used rather than none is used. Message-Id: <1535040132-11153-1-git-send-email-tgrabiec@scylladb.com>	2018-08-26 11:02:32 +03:00
Avi Kivity	2c9b886b6d	logalloc: reindent No functional changes. Message-Id: <20180731125116.32009-1-avi@scylladb.com>	2018-08-01 00:35:54 +01:00

1 2 3 4 5 ...

604 Commits