scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Avi Kivity	9322c07c71	Merge "Use binary search in sstable promoted index" from Tomasz " The "promoted index" is how the sstable format calls the clustering key index within a given partition. Large partitions with many rows have it. It's embedded in the partition index entry. Currently, lookups in the promoted index are done by scanning the index linearly so the lookup is O(N). For large partitions that's inefficient. It consumes both a lot of CPU and I/O. We could do better and use binary search in the index. This patch series switches the mc-format index reader to do that. Other formats use the old way. The "mc" format promoted index has an extra structure at the end of the index called "offset map". It's a vector of offsets of consecutive promoted index entries. This allows us to access random entries in the index without reading the whole index. The location of the offset entry for a given promoted index entry can be derived by knowing where the offset vector ends in the index file, so the offset map also doesn't have to be read completely into the memory. The most tricky part is caching. We need to cache blocks read from the index file to amortize the cost of binary search: - if the promoted index fits in the 32 KiB which was read from the index when looking for the partition entry, we don't want to issue any additional I/O to search the promoted index. - with large promoted indexes, the last few bisections will fall into the same I/O block and we want to reuse that block. - we don't want the cache to grow too big, we don't want to cache the whole promoted index as the read progresses over the index. Scanning reads may skip multiple times. This series implements a rather simple approach which meets all the above requirements and is not worse than the current state of affairs: - Each index cursor has its own cache of the index file area which corresponds to promoted index This is managed by the cached_file class. - Each index cursor has its own cache of parsed blocks. This allows the upper bound estimation to reuse information obtained during lower bound lookup. This estimation is used to limit read-aheads in the data file. - Each cursor drops entries that it walked past so that memory footprint stays O(log N) - Cached buffers are accounted to read's reader_permit. Later, we could have a single cache shared by many readers. For that, we need to come up with eviction policy. Fixes #4007. TESTING RESULTS * Point reads, large promoted index: Config: rows: 10000000, value size: 2000 Partition size: 20 GB Index size: 7 MB Notes: - Slicing read into the middle of partition (offset=5000000, read=1) is a clear win for the binary search: time: 1.9ms vs 22.9ms CPU utilization: 8.9% vs 92.3% I/O: 21 reqs / 172 KiB vs 29 reqs / 3'520 KiB It's 12x faster, CPU utilization is 10x times smaller, disk utilization is 20x smaller. - Slicing at the front (offset=0) is a mixed bag. time is similar: 1.8ms CPU utilization is 6.7x smaller for bsearch: 8.5% vs 57.7% disk bandwidth utilization is smaller for bsearch but uses more IOs: 4 reqs / 320 KiB (scan) vs 17 reqs / 188 KiB (bsearch) bsearch uses less bandwidth because the series reduces buffer size used for index file I/O. scan is issuing: 2 * 128 KB (index page) 2 * 32 KB (data file) bsearch is issuing: 1 * 64 KB (index page) 15 * 4 KB (promoted index) 1 * 64 KB (data file) The 1 * 64 KB is chosen dynamically by seastar. Sometimes it chooses 2 * 32 KB (with read-ahead). 32 KB is the minimum I/O currently. Disk utilization could be further improved by changing the way seastar's dynamic I/O adjustments work so that it uses 1 * 4 KB when it suffices. This is left for the follow-up. Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys -c1 --test-case-duration=1 Before: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001836 172 1 545 9 563 175 4.0 4 320 2 2 0 1 1 0 0 0 57.7% 0 0 32 0.001858 502 32 17220 126 17776 11526 3.2 3 324 2 1 0 1 1 0 0 0 56.4% 0 0 256 0.002833 339 256 90374 427 91757 85931 7.0 7 776 3 1 0 1 1 0 0 0 41.1% 0 0 4096 0.017211 58 4096 237984 2011 241802 233870 66.1 66 8376 59 2 0 1 1 0 0 0 21.4% 0 5000000 1 0.022952 42 1 44 1 45 41 29.2 29 3520 22 2 0 1 1 0 0 0 92.3% 0 5000000 32 0.023052 43 32 1388 14 1414 1331 31.1 32 3588 26 2 0 1 1 0 0 0 91.7% 0 5000000 256 0.024795 41 256 10325 129 10721 9993 43.1 39 4544 29 2 0 1 1 0 0 0 86.4% 0 5000000 4096 0.038856 27 4096 105414 398 106918 103162 95.2 95 12160 78 5 0 1 1 0 0 0 61.4% 0 After (v2): offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001831 248 1 546 21 581 252 17.6 17 188 2 0 0 1 1 0 0 0 8.5% 0 0 32 0.001910 535 32 16751 626 17770 13896 17.9 19 160 3 0 0 1 1 0 0 0 8.8% 0 0 256 0.003545 266 256 72207 2333 89076 62852 26.9 24 764 7 0 0 1 1 0 0 0 9.7% 0 0 4096 0.016800 56 4096 243812 524 245430 239736 83.6 83 8700 64 0 0 1 1 0 0 0 16.6% 0 5000000 1 0.001968 351 1 508 19 538 380 21.3 21 172 2 0 0 1 1 0 0 0 8.9% 0 5000000 32 0.002273 431 32 14077 436 15503 11551 22.7 22 268 3 0 0 1 1 0 0 0 8.9% 0 5000000 256 0.003889 257 256 65824 2197 81833 57813 34.0 37 652 18 0 0 1 1 0 0 0 11.2% 0 5000000 4096 0.017115 54 4096 239324 834 241310 231993 88.3 88 8844 65 0 0 1 1 0 0 0 16.8% 0 After (v1): offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.001886 259 1 530 4 545 261 18.0 18 376 2 2 0 1 1 0 0 0 9.1% 0 0 32 0.001954 513 32 16381 93 16844 15618 19.0 19 408 3 2 0 1 1 0 0 0 9.3% 0 0 256 0.003266 318 256 78393 1820 81567 61663 30.8 26 1272 7 2 0 1 1 0 0 0 10.4% 0 0 4096 0.017991 57 4096 227666 855 231915 225781 83.1 83 8888 55 5 0 1 1 0 0 0 15.5% 0 5000000 1 0.002353 232 1 425 2 432 232 23.0 23 396 2 2 0 1 1 0 0 0 8.7% 0 5000000 32 0.002573 384 32 12437 47 12571 429 25.0 25 460 4 2 0 1 1 0 0 0 8.5% 0 5000000 256 0.003994 259 256 64101 2904 67924 51427 37.0 35 1484 11 2 0 1 1 0 0 0 10.6% 0 5000000 4096 0.018567 56 4096 220609 448 227395 219029 89.8 89 9036 59 5 0 1 1 0 0 0 15.1% 0 * Point reads, small promoted index (two blocks): Config: rows: 400, value size: 200 Partition size: 84 KiB Index size: 65 B Notes: - No significant difference in time - the same disk utilization - similar CPU utilization Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-slicing-clustering-keys -c1 --test-case-duration=1 Before: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.000279 470 1 3587 31 3829 478 3.0 3 68 2 1 0 1 1 0 0 0 21.1% 0 0 32 0.000276 3498 32 116038 811 122756 104033 3.0 3 68 2 1 0 1 1 0 0 0 24.0% 0 0 256 0.000412 2554 256 621044 1778 732150 559221 2.0 2 72 2 0 0 1 1 0 0 0 32.6% 0 0 4096 0.000510 1901 400 783883 4078 819058 665616 2.0 2 88 2 0 0 1 1 0 0 0 36.4% 0 200 1 0.000339 2712 1 2951 8 3001 2569 2.0 2 72 2 0 0 1 1 0 0 0 17.8% 0 200 32 0.000352 2586 32 91019 266 92427 83411 2.0 2 72 2 0 0 1 1 0 0 0 20.8% 0 200 256 0.000458 2073 200 436503 1618 453945 385501 2.0 2 88 2 0 0 1 1 0 0 0 29.4% 0 200 4096 0.000458 2097 200 436475 1676 458349 381558 2.0 2 88 2 0 0 1 1 0 0 0 29.0% 0 After (v1): Testing slicing of large partition using clustering keys: offset read time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 0 1 0.000278 492 1 3598 30 3831 500 3.0 3 68 2 1 0 1 1 0 0 0 19.4% 0 0 32 0.000275 3433 32 116153 753 122915 92559 3.0 3 68 2 1 0 1 1 0 0 0 22.5% 0 0 256 0.000458 2576 256 559437 2978 728075 504375 2.1 2 88 2 0 0 1 1 0 0 0 29.0% 0 0 4096 0.000506 1888 400 790064 3306 822360 623109 2.0 2 88 2 0 0 1 1 0 0 0 36.6% 0 200 1 0.000382 2493 1 2619 10 2675 2268 2.0 2 88 2 0 0 1 1 0 0 0 16.3% 0 200 32 0.000398 2393 32 80422 333 84759 22281 2.0 2 88 2 0 0 1 1 0 0 0 19.0% 0 200 256 0.000459 2096 200 435943 1608 453989 380749 2.0 2 88 2 0 0 1 1 0 0 0 30.5% 0 200 4096 0.000458 2097 200 436410 1651 455779 382485 2.0 2 88 2 0 0 1 1 0 0 0 29.2% 0 * Scan with skips, large index: Config: rows: 10000000, value size: 2000 Partition size: 20 GB Index size: 7 MB Notes: - Similar time, slightly worse for binary search: 36.1 s (scan) vs 36.4 (bsearch) - Slightly more I/O for bsearch: 153'932 reqs / 19'703'260 KiB (scan) vs 155'651 reqs / 19'704'088 KiB (bsearch) Binary search reads more by 828 KB and by 1719 IOs. It does more I/O to read the the promoted index offset map. - similar (low) memory footprint. The danger here is that by caching index blocks which we touch as we scan we would end up caching the whole index. But this is protected against by eviction as demonstrated by the last "mem" column. Command: perf_fast_forward --datasets=large-part-ds1 \ --run-tests=large-partition-skips -c1 --test-case-duration=1 Before: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 36.103451 4 5000000 138491 38 138601 138453 153932.0 153932 19703260 153561 1 0 1 1 0 0 0 31.5% 502690 After (v2): read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 37.000145 4 5000000 135135 6 135146 135128 155651.0 155651 19704088 138968 0 0 1 1 0 0 0 34.2% 0 After (v1): read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk cpu mem 1 1 36.965520 4 5000000 135261 30 135311 135231 155628.0 155628 19704216 139133 1 0 1 1 0 0 0 33.9% 248738 Also in: git@github.com:tgrabiec/scylla.git sstable-use-index-offset-map-v2 Tests: - unit (all modes) - manual using perf_fast_forward " * tag 'sstable-use-index-offset-map-v2' of github.com:tgrabiec/scylla: sstables: Add promoted index cache metrics position_in_partition: Introduce external_memory_usage() cached_file, sstables: Add tracing to index binary search and page cache sstables: Dynamically adjust I/O size for index reads sstables, tests: Allow disabling binary search in promoted index from perf tests sstables: mc: Use binary search over the promoted index utils: Introduce cached_file sstables: clustered_index: Relax scope of validity of entry_info sstables: index_entry: Introduce owning promoted_index_block_position compound_compat: Allow constructing composite from a view sstables: index_entry: Rename promoted_index_block_position to promoted_index_block_position_view sstables: mc: Extract parser for promoted index block sstables: mc: Extract parser for clustering out of the promoted index block parser sstables: consumer: Extract primitive_consumer sstables: Abstract the clustering index cursor behavior sstables: index_reader: Rearrange to reduce branching and optionals	2020-06-18 12:09:39 +03:00
Tomasz Grabiec	58532cdf11	cached_file, sstables: Add tracing to index binary search and page cache	2020-06-16 16:15:24 +02:00
Tomasz Grabiec	c95dd67d11	utils: Introduce cached_file It is a read-through cache of a file. Will be used to cache contents of the promoted index area from the index file. Currently, cached pages are evicted manually using the invalidate_*() method family, or when the object is destroyed. The cached_file represents a subset of the file. The reason for this is to satisfy two requirements. One is that we have a page-aligned caching, where pages are aligned relative to the start of the underlying file. This matches requirements of the seastar I/O engine on I/O requests. Another requirement is to have an effective way to populate the cache using an unaligned buffer which starts in the middle of the file when we know that we won't need to access bytes located before the buffer's position. See populate_front(). If we couldn't assume that, we wouldn't be able to insert an unaligned buffer into the cache.	2020-06-16 16:15:23 +02:00
Tomasz Grabiec	1c5db178dd	Merge "logalloc: Get rid of segments migration" from Pavel But not compaction. When reclaiming segments to seastar non-empty segments are copied as-is to some other place. Instead of doing this reclaimer can copy only allocated objects and leave the freed holes behing, i.e. -- do the regular compaction. This would be the same or better from the timing perspective, and will help to avoid yet another compaction pass over the same set of objects in the future. Current migration code checks for the free segments reserve to be above minimum to proceed with migration, so does the code after this patch, thus the segment compaction is called with non-empty free segments set and thus it's guaranteed not to fail the new segment allocation (if it will be required at all). Plus some bikeshedding patches for the run-up. tests: unit(dev) * https://github.com/xemul/scylla/tree/br-logalloc-compact-on-reclaim-2: logalloc: Compact segments on reclaim instead of migration logallog: Introduce RAII allocation lock logalloc: Shuffle code around region::impl::compact logalloc: Do not lock reclaimer twice logalloc: Do not calculate object size twice logalloc: Do not convert obj_desc to migrator back and forth	2020-06-15 16:28:16 +02:00
Avi Kivity	d17b05e911	Merge 'Adding Optimized pseudo floating point estimated histogram' from Amnon " This series Adds a pseudo-floating-point histogram implementation. The histogram is used for time_estimated_histogram a histogram for latency tracking and then used in storage_proxy as a more efficient with a higher resolution histogram. Follow up series would use the new histogram in other places in the system and will add an implementation that supports lower values. Fixes #5815 Fixes #4746 " * amnonh-quicker_estimated_histogram: storage_proxy: use time_estimated_histogram for latencies test/boost/estimated_histogram_test utils/histogram_metrics_helper Adding histogram converter utils/estimated_histogram: Adding approx_exponential_histogram	2020-06-15 10:19:36 +03:00
Amnon Heiman	f30f926703	utils/histogram_metrics_helper Adding histogram converter This patch adds a helper converter function to convert from a approx_exponential_histogram histogram to a seastar::metrics::histogram Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:49 +03:00
Amnon Heiman	3319756f36	utils/estimated_histogram: Adding approx_exponential_histogram This patch adds an efficient histogram implementation. The implementation chooses efficiency over flexibility. That is why templates are used. How the approx_exponential_histogram pseudo floating point histogram works: It split the range [MIN, MAX] into log2(MAX/MIN) ranges it then split each of that ranges linearly according to a given resolution. For example, using resolution of 4, would be similar to using an exponentially growing histogram with a coefficient of 1.2. All values are uint64. To prevent handling of corner cases, it is not allowed to set the MIN to be lower than the resolution. The approx_exponential_histogram will probably not be used directly, the first used is by time_estimated_histogram. A histogram for durations. It should be compared to the estimated_histogram. Performance comparison: Comparison was done by inserting 2^20 values into time_estimated_histogram and estimated_histogram. In debug mode on a local machine insert operation took an average of 26.0 nanoseconds vs 342.2 nanoseconds. In release mode insert operation took an average of 1.90 vs 8.28 nanoseconds Fixes #5815 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-06-15 08:22:43 +03:00
Rafael Ávila de Espíndola	336d541f58	database: Use a flat_hash_map for _ks_cf_to_uuid Given that the key is a std::pair, we have to explicitly mark the hash and eq types as transparent for heterogeneous lookup to work. With that, pass std::string_view to a few functions that just check if a value is in the map. This increases the .text section by 11 KiB (0.03%). Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>	2020-06-14 08:18:39 -07:00
Pavel Emelyanov	d908646b28	logalloc: Compact segments on reclaim instead of migration When reclaiming segments to the seastar the code tries to free the segments sequentially. For this it walks the segments from left to right and frees them, but every time a non-empty segment is met it gets migrated to another segment, that's allocated from the right end of the list. This is waste of cycles sometimes. The destination segment inherits the holes from the source one, and thus it will be compacted some time in the future. Why not compact it right at the reclamation time? It will take the same time or less, but will result in better compaction. To acheive this, the segment to be reclaimed is compacted with the existing compact_segment_locked() code with some special care around it. 1. The allocation of new segments from seastar is locked 2. The reclaiming of segments with evict-and-compact is locked as well 3. The emergency pool is opened (the compaction is called with non-empty reserve to avoid bad_alloc exception throw in the middle of compaction) 4. The segment is forcibly removed from the histogram and the closed_occupancy is updated just like it is with general compaction The segments-migration auxiliary code can be removed after this. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 14:07:35 +03:00
Pavel Emelyanov	4db6ef7b6d	logallog: Introduce RAII allocation lock The lock disables the segment_pool to call for more segments from the underlying allocator. To be used in next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 14:07:30 +03:00
Pavel Emelyanov	2005aca444	logalloc: Shuffle code around region::impl::compact This includes 3 small changes to facilitate next patching: - rename region::impl::compact into compact_segment_locked - merging former compact with compact_single_segment_locked - moving log print and stats update into compact_segment_locked Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 14:06:45 +03:00
Pavel Emelyanov	8c81c6b7aa	logalloc: Do not lock reclaimer twice The tracker::impl::reclaim is already in reclaim-locked section, no need for yet another nested lock. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 13:14:33 +03:00
Pavel Emelyanov	0392c5ca77	logalloc: Do not calculate object size twice When walking objects on compaction the migrator->size() virtual fn is called twice. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 13:14:33 +03:00
Pavel Emelyanov	81c9c4c7b2	logalloc: Do not convert obj_desc to migrator back and forth When calling alloc_small the migrator is passed just to get the object descriptor, but during compaction the descriptor is already at hands, so no need to re-get it again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-08 13:14:33 +03:00
Tomasz Grabiec	087fa42c1d	Merge "utils: inject errors around paxos stages" from Alejo Add Paxos error injections before/after save promise, proposal, decision, paxos_response_handler, delete decision. Adds a method to inject an error providing a lambda while avoiding to add a continuation when the error injection is disabled. For this provide error exception and enter() to allow flow control (i.e. return) on simple error injections without lambdas. Also includes Pavel's patch for CQL API for error injections, updated to current error injection API and added one_shot support. Also added some basic CQL API boost tests. For CQL API there's a limitation of the current grammar not supporting f(<terminal>) so values have to be inserted in a table until this is resolved. See #5411 * https://github.com/alecco/scylla/tree/error_injection_v11: paxos: fix indentation paxos: add error injections utils: add timeout error injection with lambda utils: error injection add enter() for control flow utils: error injections provide error exceptions failure_injector: implement CQL API for failure injector class lwt: fix disabled error injection templates	2020-06-03 15:42:10 +02:00
Alejo Sanchez	a8b14b0227	utils: add timeout error injection with lambda Even though calling then() on a ready future does not allocate a continuation, calling then on the result of it will allocate. This error injection only adds a continuation in the dependency chain if error injections are enabled at compile timeand this particular error injection is enabled. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-06-03 14:44:00 +02:00
Alejo Sanchez	0321172677	utils: error injection add enter() for control flow For control flow (i.e. return) and simplicity add enter() method. For disabled injections, this method is const returning false, therefore it has no overhead. Add boost test. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-06-03 14:42:48 +02:00
Avi Kivity	6f394e8e90	tombstone: use comparison operator instead of ad-hoc compare() function and with_relational_operators The comparison operator (<=>) default implementation happens to exactly match tombstone::compare(), so use the compiler-generated defaults. Also default operator== and operator!= (these are not brought in by operator<=>). These become slightly faster as they perform just an equality comparison, not three-way compare. shadowable_tombstone and row_tombstone depend on tombstone::compare(), so convert them too in a similar way. with_relational_operations.hh becomes unused, so delete it. Tests: unit (dev) Message-Id: <20200602055626.2874801-1-avi@scylladb.com>	2020-06-02 09:28:52 +03:00
Avi Kivity	a4c44cab88	treewide: update concepts language from the Concepts TS to C++20 Seastar recently lost support for the experimental Concepts Technical Specification (TS) and gained support for C++20 concepts. Re-enable concepts in Scylla by updating our use of concepts to the C++20 standard. This change: - peels off uses of the GCC6_CONCEPT macro - removes inclusions of <seastar/gcc6-concepts.hh> - replaces function-style concepts (no longer supported) with equation-style concepts - semicolons added and removed as needed - deprecated std::is_pod replaced by recommended replacement - updates return type constraints to use concepts instead of type names (either std::same_as or std::convertible_to, with std::same_as chosen when possible) No attempt is made to improve the concepts; this is a specification update only. Message-Id: <20200531110254.2555854-1-avi@scylladb.com>	2020-06-02 09:12:21 +03:00
Piotr Sarna	7b5db478ed	big_decimal: migrate to string views Big decimals are, among other use cases, used as a main number type for alternator, and as such can appear on the fast path. Parsing big decimals was performed via std::regex, which is not precisely famous for its speeds, and also enforces unnecessary string copying. Therefore, the implementation is replaced with an open-coded version based on string_views. One previous iteration of this series also included a hand-coded state machine implementation, but it proved to be slower than the slightly naive string_view one. Overall, execution time is reduced by 61.6% according to microbenchmarks, which sounds like a promising improvement. Perf results: test iterations median mad min max Regex (original): big_decimal_test.from_string 88895 11.228us 25.891ns 11.202us 11.510us String view (new): big_decimal_test.from_string 232334 4.303us 21.660ns 4.282us 4.736us State machine (experimental, ditched): big_decimal_test.from_string 148318 6.723us 51.896ns 6.672us 6.877us Tests: unit(dev + release(big_decimal_test))	2020-06-01 16:11:49 +02:00
Pavel Emelyanov	ee31191e21	storage_service: Move get_generation_number to util/ This is purely utility helper routine. As a nice side effect the inclusion of storage_service.hh is removed from several unrelated places. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-06-01 09:08:40 +03:00
Botond Dénes	a9e6fe4071	utils: introduce ranges::to() Sadly, std::ranges is missing an equivalent of boost::copy_range(), so we introduce a replacement: ranges::to(). There is an existing proposal to introduce something similar to the standard library: std::ranges::to() (https://github.com/cplusplus/papers/issues/145). We name our own version similarly, so if said proposal makes it in we can just prepend std:: and be good. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200529141407.158960-2-bdenes@scylladb.com>	2020-05-31 12:58:59 +03:00
Pavel Emelyanov	878f8d856a	logalloc: Report reclamation timing with rate The timer.stop() call, that reports not only the time-taken, but also the reclaimation rate, was unintentionally dropped while expanding its scope (`c70ebc7c`). Take it back (and mark the compact_and_evict_locked as private while at it). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200528185331.10537-1-xemul@scylladb.com>	2020-05-29 14:50:43 +02:00
Pavel Emelyanov	7696ed1343	shard_tracker: Configure it in one go Instead of doing 3 smp::invoke_on_all-s and duplicating tracker::impl API for the tracker itself, introduce the tracker::configure, simplify the tracker configuration and narrow down the public tracker API. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20200528185442.10682-1-xemul@scylladb.com>	2020-05-29 14:50:43 +02:00
Alejo Sanchez	bb08b5ad5a	utils: error injections provide error exceptions Provide non-timeout error exception to facilitate control flow in injected errors. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-28 11:13:55 +02:00
Alejo Sanchez	2c7e01a3b6	lwt: fix disabled error injection templates Fix disabled injection templates to match enabled ones. Fix corresponding test to not be a continuation. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-05-28 11:13:55 +02:00
Avi Kivity	bdb5b11d19	treewide: stop using deprecated seastar::apply() seastar::apply() is deprecated in recent versions of seastar in favor of std::apply(), so stop including its header. Calls to unqualified apply(..., std::tuple<>) are resolved to std::apply() by argument dependent lookup, so no changes to call sites are necessary. This avoids a huge number of deprecation warnings with latest seastar. Message-Id: <20200526090552.1969633-1-avi@scylladb.com>	2020-05-27 14:07:35 +03:00
Amnon Heiman	3e5beba403	estimated_histogram: clean if0 and FIXME This patch cleans the estimated histogram implementation. It removes the FIXME that were left in the code from the migration time and the if0 commented out code. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2020-05-27 08:40:05 +03:00
Avi Kivity	076c8317c7	streaming_histogram: add missing include for uint64_t Fails dev-headers build without it. Message-Id: <20200523061555.72087-1-avi@scylladb.com>	2020-05-23 11:09:10 +03:00
Avi Kivity	e774ee06ed	Update seastar submodule * seastar e708d1df3a...92365e7b87 (11): > tests: distributed_test: convert to SEASTAR_TEST_CASE > Merge "Avoid undefined behavior on future self move assignments" from Rafael > Merge "C++20 support" from Avi > optimized_optional: don't use experimental C++ features > tests: scheduling_group_test: verify that later() doesn't modify the current group > tests: demos: coroutine_demo: add missing include for open_file_dma() > rpc: minor documentation improvements > rpc: Assert that sinks are closed > Merge "Fix most tests under valgrind" from Rafael > distributed_test: Fix it on slow machines > rpc_test: Make sure we always flush and close the sink loading_shard_values.hh: added missing include for gcc6-concepts.hh, exposed by the submodule update. Frozen toolchain updated for the new valgrind dependency.	2020-05-12 14:04:16 +03:00
Rafael Ávila de Espíndola	e6f4996e44	atomic_vetor: Don't pass references to callbacks This is more strict than it needs to be, but it avoids any bugs like the one fixed by the previous patch. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200422182304.120906-2-espindola@scylladb.com>	2020-04-23 16:06:37 +03:00
Alejo Sanchez	bd849764e0	utils: error injection sleep add support for manual_clock Requested by @tgrabiec in previous patch (already merged). Adds support for sleep using manual clock. Add test. NOTE: Removes system_clock support (and test) as sleep is not explicitly instantiated in seastar/src/core/reactor.cc Branch URL: https://github.com/alecco/scylla/tree/error_injection_5_manual_clock Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200417081518.868900-1-alejo.sanchez@scylladb.com>	2020-04-17 11:45:05 +02:00
Avi Kivity	88ade3110f	treewide: replace calls to engine().some_api() with some_api() This removes the need to include reactor.hh, a source of compile time bloat. In some places, the call is qualified with seastar:: in order to resolve ambiguities with a local name. Includes are adjusted to make everything compile. We end up having 14 translation units including reactor.hh, primarily for deprecated things like reactor::at_exit(). Ref #1	2020-04-05 12:46:04 +03:00
Avi Kivity	1799cfa88a	logalloc: use namespace-scope seastar::idle_cpu_handler and related rather than reactor scope This allows us to drop a #include <reactor.hh>, reducing compile time. Several translation units that lost access to required declarations are updated with the required includes (this can be an include of reactor.hh itself, in case the translation unit that lost it got it indirectly via logalloc.hh) Ref #1.	2020-04-05 12:45:08 +03:00
Rafael Ávila de Espíndola	8da235e440	everywhere: Use futurize_invoke instead of futurize<T>::invoke No functionality change, just simpler. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200330165308.52383-1-espindola@scylladb.com>	2020-04-03 15:53:35 +02:00
Alejo Sanchez	3a4dd0a856	utils: error injection inject() returning a future Make inject() return a future. Suggested by Gleb. Botond helped on dealing with complex function/lambda overload. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-7-alejo.sanchez@scylladb.com>	2020-04-01 16:22:52 +02:00
Alejo Sanchez	8bae38cef9	utils: error injection support multiple clocks Use template to support multiple clock classes for time point for deadline injection. Refs: #3295 (closed) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-6-alejo.sanchez@scylladb.com>	2020-04-01 16:22:45 +02:00
Alejo Sanchez	71f2f423bc	utils: error injection reorder args for exceptions Move exception factory to end of argument list. Refs: #3295 (closed) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-5-alejo.sanchez@scylladb.com>	2020-04-01 16:22:38 +02:00
Alejo Sanchez	fd1eb6a466	utils: error injection simplify API Split error injection C++ API to have 1. sleep duration 2. sleep to deadline (timeout) TODO: support multiple types of clocks Refs: #3295 (closed) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200331143839.1781424-4-alejo.sanchez@scylladb.com>	2020-04-01 16:22:30 +02:00
Alejo Sanchez	e5a2ba32b9	utils: error injection allocate string for remote invoke Allocate string before sending to other shards. Reported by Pavel Solodovnikov. Refs #3295 (closed) Tests: unit ({dev}) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200328204454.1326514-2-alejo.sanchez@scylladb.com>	2020-03-31 11:58:27 +02:00
Rafael Ávila de Espíndola	c5795e8199	everywhere: Replace engine().cpu_id() with this_shard_id() This is a bit simpler and might allow removing a few includes of reactor.hh. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200326194656.74041-1-espindola@scylladb.com>	2020-03-27 11:40:03 +03:00
Alejo Sanchez	febcced4f1	utils: error injection with timeout/deadline Most of Scylla code runs with a user-supplied query timeout, expressed as absolute clock (deadline). When injecting test sleeps into such code, we most often want to not sleep beyond the user supplied deadline. Extend error injection API to optionally accept a deadline, and, if it is provided, sleep no more than up to the deadline. If current time is beyond deadline, sleep injection is skipped altogether. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Message-Id: <20200326091600.1037717-2-alejo.sanchez@scylladb.com>	2020-03-26 12:41:10 +01:00
Rafael Ávila de Espíndola	eca0ac5772	everywhere: Update for deprecated apply functions Now apply is only for tuples, for varargs use invoke. This depends on the seastar changes adding invoke. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200324163809.93648-1-espindola@scylladb.com>	2020-03-25 08:49:53 +02:00
Avi Kivity	0d885dbb00	Merge "Make all headers standalone" from Botond " Make sure all headers compile on their own, without requiring any additional includes externally. Even though this requirement is not documented in our coding guides it is still quasi enforced and we semi-regularly get and merge patches adding missing includes to headers. This patch-set fixes all headers and adds a `{mode}-headers` target that can be used to verify each header. This target should be built by promotion to ensure no new non-conforming code sneaks in. Individual headers can be verified using the `build/dev/path/to/header.hh.o` target, that is generated for every header. The majority of the headers was just missing `seastarx.hh`. I think we should just include this via a compiler flag to remove the noise from our code (in a followup). " * 'compiling-headers/v2' of https://github.com/denesb/scylla: configure.py: add {mode}-headers phony target treewide: add missing headers and/or forward declarations test/boost/sstable_test.hh: move generic stuff to test/lib/sstable_utils.hh sstables: size_tiered_backlog_tracker: move methods out-of-line sstables: date_tiered_compaction_strategy.hh: move methods out-of-line	2020-03-23 13:09:09 +02:00
Avi Kivity	c6a441f9c2	Update seastar submodule * seastar 3c498abcab...92c488706c (14): > dpdk: restore including reactor.hh > tests: distributed_test: add missing #include <mutex> > reactor: un-static-ify make_pollfn() > merge: Reduce inclusions of reactor.hh A few #includes added to compensate for this > sharded: delete move constructor > future: Avoid a move constructor call > future: Erase types a bit more in then_wrapped > memory: Drop a never nullopt optional > semaphore: specify get_units and with_semaphore as noexcept > spinlock.hh: Add include for <cassert> header > dpdk: Avoid a variable sized array > future: Add an explicit promise member to continuation > net: remove smart pointer wrappers around pollable_fd > Merge "cleanup reactor file functions" from Benny	2020-03-23 11:59:30 +02:00
Piotr Sarna	602a771105	Merge 'utils: error injector API' from Alejo Closes #3295 The error_injection class allows injecting custom handlers into normal control flow at the pre-determined injection points. This is especially useful in various testing scenarios: * Throwing an exception at some rare and extreme corner-cases * Injecting a delay to test for timeouts to be handled correctly * More advanced uses with custom lambda as an injection handler Injection points are defined by `inject` calls. Enabling and disabling injections are done by the corresponding `enable` and `disable` calls. REST frontend APIs is provided for convenience. Branch URL: https://github.com/alecco/scylla/tree/as_error_injection Tests: unit {{dev}}, unit {{debug}} * 'as_error_injection' of github.com:alecco/scylla: api: add error injection to REST API utils: add error injection	2020-03-23 08:39:22 +01:00
Botond Dénes	e0284bb9ee	treewide: add missing headers and/or forward declarations	2020-03-23 09:29:45 +02:00
Pavel Solodovnikov	057adc8b4d	utils: add error injection Error injection class is implemented in order to allow injecting various errors (exceptions, stalls, etc.) in code for testing purposes. Error injection is enabled via compile flag SCYLLA_ENABLE_ERROR_INJECTION TODO: manage shard instances Enable error injection in debug/dev/sanitize modes. Unit tests for error injection class. Closes #3295 Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-03-20 19:37:48 +01:00
Rafael Ávila de Espíndola	517a01a3f6	utils: Use sstring as keys in nonstatic_class_registry Now that seastar::string::compare has been updated, it is possible to use sstring for this. This reverts commit `01fe766f1f`. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200311005219.280737-1-espindola@scylladb.com>	2020-03-16 11:01:15 +02:00
Avi Kivity	c020b4e5e2	logalloc: increase capacity of _regions vector outside reclaim lock Reclaim consults the _regions vector, so we don't want it moving around while allocating more capacity. For that we take the reclaim lock. However, that can cause a false-positive OOM during startup: 1. all memory is allocated to LSA as part of priming (`2baa16b371`) 2. the _regions vector is resized from 64k to 128k, requiring a segment to be freed (plenty are free) 3. but reclaiming_lock is taken, so we cannot reclaim anything. To fix, resize the _regions vector outside the lock. Fixes #6003. Message-Id: <20200311091217.1112081-1-avi@scylladb.com>	2020-03-11 12:29:31 +02:00

1 2 3 4 5 ...

755 Commits