scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Lakshmi Narayanan Sreethar	64dadd5ec2	sstables/index_reader: stop consuming index when abort has been requested When an abort is requested, stop further reading of the index file and throw and exception from index_consume_entry_context::process_state(). Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	c2524337a2	sstables::index_consume_entry_context: store abort_source Store abort source inside sstables::index_consume_entry_context, so that the next patch can implement cancelling the index read when abort is requested. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Aleksandra Martyniuk	4530be9e5b	test: add test to check if reader is closed Add test to check if reader is closed in sstable::has_partition_key.	2024-02-22 14:53:14 +01:00
Michał Chojnowski	5a3e4a1cc0	utils: managed_bytes: optimize memory usage for small buffers managed_bytes is implemented as chain of blob_storage objects. Each blob_storage contains 24 bytes of metadata. But in the most common case -- when there is only a single element in the chain -- 16 bytes of this metadata is trivial/unused. This is regrettable waste because managed_bytes is used for every database cell in the memtables and cache. It means that every value of size >= 7 bytes (smaller ones fit in the inline storage of managed_bytes) receives 16 bytes of useless overhead. To correct that, this patch adds to managed_bytes an alternative storage layout -- used for buffers small enough to fit in one contiguous fragment -- which only stores the necessary minimum of metadata. (That is: a pointer to the parent, to facilitate moving the storage during memory defragmentation).	2024-02-09 20:56:20 +01:00
Kefu Chai	a6152cb87b	sstables: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16666	2024-01-09 11:45:44 +02:00
Kefu Chai	893f319004	sstables: add formatter for index_consume_entry_context_state before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, in order to enable the code in the header to access the formatter without being moved down after the full specialization's definition, we * move the enum definition out of the class and before the class, * rename the enum's name from state to index_consume_entry_context_state * define a formatter for index_consume_entry_context_state * remove its operator<<(). as fmt v10 is able to use `format_as()` as a fallback, the formatter full specialization is guarded with `#if FMT_VERSION < 10'00'00`. we will remove it after we start build with fmt v10. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16204	2023-12-08 12:45:38 +02:00
Michał Chojnowski	f00bed9429	sstables: partition_index_cache: deglobalize stats Move partition_index_cache stats from a thread_local variable to cache_tracker. After the change, partition_index_cache receives a reference to the stats via constructor, instead of referencing a global. This is needed so that cache_tracker can know the memory usage of index caches (for cache eviction purposes) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Michał Chojnowski	c7d9d35030	utils: cached_file: deglobalize cached_file metrics Move cached_file metrics from a thread_local variable to cache_tracker. This is needed so that cache_tracker can know the memory usage of index caches (for purposes of cache eviction) without relying on globals. But it also makes sense even without that motive.	2023-09-01 22:34:41 +02:00
Avi Kivity	0cabf4eeb9	build: disable implicit fallthrough Prevent switch case statements from falling through without annotation ([[fallthrough]]) proving that this was intended. Existing intended cases were annotated. Closes #14607	2023-07-10 19:36:06 +02:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Pavel Emelyanov	2bb024c948	index_reader: Introduce and use default arguments to constructor Most of creators of index_reader construct it with default prio class, null trace pointer and use_caching::yes. Assigning implicit defaults to constructor arguments keeps the code shorter and easier to read. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 11:29:04 +03:00
Pavel Emelyanov	3fd5d3cc2b	index_reader: Use _pc field in get_file_input_stream_options() directly No need to pass this-> field into this-> call Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 11:18:14 +03:00
Pavel Emelyanov	21d24e8ea3	index_reader: Move index_reader::get_file_input_stream_options to private: block A "while at it" cleanup. When pathing the method (next patch) it turned out that there are no other callers other than local class, so it _is_ private. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-05-23 11:18:14 +03:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Benny Halevy	63c2cdafe8	sstables: index_reader: close(index_bound&) reset current_list When closing _lower_bound and *_upper_bound in the final close() call, they are currently left with an engaged current_list member. If the index_reader uses a _local_index_cache, it is evicted with evict_gently which will, rightfully, see the respective pages as referenced, and they won't be evicted gently (only later when the index_reader is destroyed). Reset index_bound.current_list on close(index_bound&) to free up the reference. Ref #12271 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12370	2023-01-02 16:42:33 +01:00
Michał Chojnowski	d9269abf5b	sstables: index_reader: always evict the local cache gently Due to an oversight, the local index cache isn't evicted gently when _upper_bound existed. This is a source of reactor stalls. Fix that. Fixes #12271 Closes #12364	2022-12-20 18:23:27 +02:00
Pavel Emelyanov	5f579eb405	sstable: Introduce index_filename() Currently the sstable::filename(Index) is used in several places that get the filename as a printable or throwable string and don't treat is as a real location of any file. For those, add the index_filename() helper symmetrical to toc_filename() and (in some sense) the get_filename() one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-12-15 10:14:49 +03:00
Michał Chojnowski	0c54e7c5c7	sstables: index_reader: remove a stray vsprintf call from the hot path sstable::get_filename() constructs the filename from components, which takes some work. It happens to be called on every index_reader::index_reader() call even though it's only used for TRACE logs. That's 1700 instructions (~1% of a full query) wasted on every SSTable read. Fix that. Closes #11485	2022-09-08 14:29:23 +03:00
Botond Dénes	7501a075bd	sstables/index_reader: push down eof() check to advance_to(index_bound&, dht::ring_position_view) Commit `e8f3d7dd13` added eof() checks to public partition-level advance_to() methods, to ensure we do not attempt to re-read the last page of the index when at eof(). It was noted however that this check would be safer in advance_to(index_bound&, dht::ring_position_view) because that is the method that all these higher-level methods end up calling. Placing the check there would guarantee safety for all such operations. This path does exactly that: it pushes down the check to said method. One change needed for this to work is to check eof on the bound that is currently advanced, instead of unconditionally checking the lower bound. Closes #10531	2022-05-11 14:46:30 +02:00
Botond Dénes	e8f3d7dd13	sstables/index_reader: short-circuit fast-forward-to when at EOF Attempting to call advance_to() on the index, after it is positioned at EOF, can result in an assert failure, because the operation results in an attempt to move backwards in the index-file (to read the last index page, which was already read). This only happens if the index cache entry belonging to the last index page is evicted, otherwise the advance operation just looks-up said entry and returns it. To prevent this, we add an early return conditioned on eof() to all the partition-level advance-to methods. A regression unit test reproducing the above described crash is also added.	2022-05-05 14:42:37 +03:00
Avi Kivity	585c0841c3	Merge 'sstables: enable read ahead for the partition index reader' from Wojciech Mitros Currently, when advancing one of `index_reader`'s bounds, we're creating a new `index_consume_entry_context` with a new underlying file `input_stream` for each new page. For either bound, the streams can be reused, because the indexes of pages that we are reading are never decreasing. This patch adds a `index_consume_entry_context` to each of `index_reader`'s bounds, so that for each new page, the same file `input_stream` is used. As a result, when reading consecutive pages, the reads that follow the first one can be satisfied by the `input_stream`'s read aheads, decreasing the number of blocking reads and increasing the throughput of the `index_reader`. Additionally, we're reusing the `index_consumer` for all pages, calling `index_consumer::prepare` when we need to increase the size of the `_entries` `chunked_managed_vector`. A big difference can be seen when we're reading the entire table, frequently skipping a few rows; which we can test using perf_fast_forward: Before: ``` running: small-partition-skips on dataset small-part Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu -> 1 0 0.899447 4 1000000 1111794 12284 1113248 1096537 975.5 972 124356 1 0 0 0 0 0 0 0 12032202 29103 8967 100.0% -> 1 1 1.805811 4 500000 276884 907 278214 275977 3655.8 3654 135084 2688 0 3161 4548 5935 0 0 0 7225100 140466 27010 75.6% -> 1 8 0.927339 4 111112 119818 357 120465 119461 3654.0 3654 135084 2685 0 2133 4548 6963 0 0 0 1749663 107922 57502 50.2% -> 1 16 0.790630 4 58824 74401 782 74617 73497 3654.0 3654 135084 2695 0 1975 4548 7121 0 0 0 1019189 109349 90832 42.7% -> 1 32 0.717235 4 30304 42251 243 42266 41975 3654.0 3654 135084 2689 0 1871 4548 7225 0 0 0 619876 109199 156751 37.3% -> 1 64 0.681624 4 15385 22571 244 22815 22286 3654.0 3654 135084 2685 0 1870 4548 7226 0 0 0 407671 105798 285688 34.0% -> 1 256 0.630439 4 3892 6173 24 6214 6150 3549.0 3549 135116 2581 0 1313 3927 6505 0 0 0 232541 100803 1022454 29.1% -> 1 1024 0.313303 4 976 3115 219 3126 2766 1956.0 1956 130608 986 0 0 987 1962 0 0 0 81165 41385 1724979 29.1% -> 1 4096 0.083688 4 245 2928 85 3012 2134 738.8 737 17212 492 244 0 247 491 0 0 0 30500 19406 1999263 24.6% -> 64 1 1.509011 4 984616 652491 2746 660930 649745 3673.5 3654 135084 2687 0 4507 4548 4589 0 0 0 11075882 117074 13157 68.9% -> 64 8 1.424147 4 888896 624160 4446 625675 617713 3654.0 3654 135084 2691 0 4248 4548 4848 0 0 0 10019098 117383 13700 66.5% -> 64 16 1.343276 4 800000 595559 5834 605880 589725 3654.0 3654 135084 2698 0 3989 4548 5107 0 0 0 9043830 124022 14206 64.9% -> 64 32 1.249721 4 666688 533469 5056 536638 526212 3654.0 3654 135084 2688 0 3616 4548 5480 0 0 0 7570848 123043 15377 60.9% -> 64 64 1.154549 4 500032 433097 10215 443312 415001 3654.0 3654 135084 2703 0 3161 4548 5935 0 0 0 5718758 110657 17787 53.2% -> 64 256 1.005309 4 200000 198944 1179 199338 196989 3935.0 3935 137216 2966 0 690 4048 5592 0 0 0 2398359 110510 27855 51.3% -> 64 1024 0.441913 4 58880 133239 8094 135471 120467 2161.0 2161 131820 1190 0 0 1192 1848 0 0 0 725092 45449 33740 59.7% -> 64 4096 0.124826 4 15424 123564 5958 126814 95101 795.5 794 17400 553 240 0 312 482 0 0 0 199943 20869 46621 41.9% ``` After: ``` running: small-partition-skips on dataset small-part Testing scanning small partitions with skips. Reads whole range interleaving reads with skips according to read-skip pattern: read skip time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu -> 1 0 0.917468 4 1000000 1089956 1422 1091378 1073112 975.5 972 124356 1 0 0 0 0 0 0 0 12032761 29721 8972 100.0% -> 1 1 1.311446 4 500000 381259 3212 384470 377238 1087.0 1083 138420 2 0 4445 4548 4651 0 0 0 7096216 55681 20869 100.0% -> 1 8 0.467975 4 111112 237432 1446 239372 235985 1121.2 1119 143124 9 0 4344 4548 4752 0 0 0 1619944 23502 28844 98.7% -> 1 16 0.337085 4 58824 174508 3410 178451 171099 1117.5 1120 143276 11 0 4319 4548 4777 0 0 0 883692 19152 37460 96.8% -> 1 32 0.262798 4 30304 115313 1222 116535 112400 1070.2 1066 135620 166 26 4354 4548 4742 0 0 0 483185 18856 54275 94.9% -> 1 64 0.283954 4 15385 54181 531 56177 53650 2022.5 2040 137036 319 19 4351 4548 4745 0 0 0 292766 32998 102276 84.9% -> 1 256 0.207020 4 3892 18800 575 19105 17520 1315.5 1334 136072 418 24 3703 3927 4115 0 0 0 118400 27427 292146 82.1% -> 1 1024 0.164396 4 976 5937 57 5993 5842 1208.2 1195 135384 568 14 932 987 1030 0 0 0 62999 27554 503559 70.0% -> 1 4096 0.085079 4 245 2880 108 2987 2714 635.8 634 26468 248 246 233 247 258 0 0 0 31264 12872 1546404 37.4% -> 64 1 1.073331 4 984616 917346 7614 923983 909314 1812.2 1824 136792 11 20 4544 4548 4552 0 0 0 10971661 54538 9919 99.6% -> 64 8 1.024389 4 888896 867733 6327 870429 845215 3027.2 3072 138212 31 0 4523 4548 4573 0 0 0 9933078 68059 10050 99.5% -> 64 16 0.978754 4 800000 817366 7802 827665 809564 3012.2 3008 139884 39 0 4486 4548 4610 0 0 0 8947041 64050 10302 98.1% -> 64 32 0.837266 4 666688 796267 10312 806579 785370 2275.8 2266 139672 29 0 4465 4548 4631 0 0 0 7458644 50754 10564 97.8% -> 64 64 0.645627 4 500032 774490 4713 779203 768432 1136.8 1137 145428 8 0 4438 4548 4658 0 0 0 5593168 29982 10938 98.4% -> 64 256 0.386192 4 200000 517877 22509 544067 495368 1134.8 1136 145300 109 0 2135 4048 4147 0 0 0 2270291 22840 13682 94.5% -> 64 1024 0.238617 4 58880 246755 55856 305110 190899 1176.0 1118 135324 451 13 625 1192 1223 0 0 0 701262 24418 17323 71.1% -> 64 4096 0.133340 4 15424 115674 14837 117978 99072 974.0 961 27132 366 347 99 312 383 0 0 0 209595 20657 43096 50.4% ``` For single partition reads, the index_reader is modified to behave in practically the same way, as before the change (not reading ahead past the page with the partition). For example, a single partition read from a table with 10 rows per partition performs a single 6KB read from the index file, and the same read is performed before the change (as can be seen in traces below). If we enabled read aheads in that case, we would perform 2 16KB reads. Relevant traces: Before: ``` ./tmp/data/ks/t2-75ebed30eb0211eb837a8f4cd3d1cf62/md-1-big-Index.db: scheduling bulk DMA read of size 6478 at offset 0 [shard 0] \| 2021-07-23 15:22:25.847362 \| 127.0.0.1 \| 148 \| 127.0.0.1 ./tmp/data/ks/t2-75ebed30eb0211eb837a8f4cd3d1cf62/md-1-big-Index.db: finished bulk DMA read of size 6478 at offset 0, successfully read 6478 bytes [shard 0] \| 2021-07-23 15:22:25.900996 \| 127.0.0.1 \| 53782 \| 127.0.0.1 ``` After: ``` ./tmp/data/ks/t2-75ebed30eb0211eb837a8f4cd3d1cf62/md-1-big-Index.db: scheduling bulk DMA read of size 6478 at offset 0 [shard 0] \| 2021-07-23 15:19:37.380033 \| 127.0.0.1 \| 149 \| 127.0.0.1 ./tmp/data/ks/t2-75ebed30eb0211eb837a8f4cd3d1cf62/md-1-big-Index.db: finished bulk DMA read of size 6478 at offset 0, successfully read 6478 bytes [shard 0] \| 2021-07-23 15:19:37.433662 \| 127.0.0.1 \| 53777 \| 127.0.0.1 ``` Tests: unit(dev) Closes #9063 * github.com:scylladb/scylla: sstables: index_reader: optimize single partition reads sstables: use read-aheads in the index reader sstables: index_reader: remove unused members from index reader context	2022-03-21 13:47:28 +02:00
Pavel Solodovnikov	95c8d65949	treewide: fix compilation issues with fmtlib 8.1.0+ Due to `fd62fba985` scoped enums are not automatically converted to integers anymore, this is the intended behavior, according to the fmtlib devs. A bit nicer solution would be to use `std::to_underlying` instead of a direct `static_cast`, but it's not available until C++23 and some compilers are still missing the support for it. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-03-16 12:31:50 +03:00
Wojciech Mitros	7f590a3686	sstables: index_reader: optimize single partition reads All entries from a single partition can be found in a single summary page. Because of that, in cases when we know we want to read only one partition, we can limit the underyling file input_stream to the range of the page. Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-02-22 02:16:52 +01:00
Wojciech Mitros	c81992c665	sstables: use read-aheads in the index reader Currently, when advancing one of index_reader's bounds, we're creating a new index_consume_entry_context with a new underlying file input_stream for each new page. For either bound, the streams can be reused, because the indexes of pages that we are reading are never decreasing. This patch adds a index_consume_entry_context to each of index_reader's bounds, so that for each new page, the same file input_stream is used. As a result, when reading consecutive pages, the reads that follow the first one can be satisfied by the input_stream's read aheads, decreasing the number of blocking reads and increasing the throughput of the index_reader. Fixes #2388 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>	2022-02-22 01:51:33 +01:00
Wojciech Mitros	0a1500acd2	sstables: index_reader: remove unused members from index reader context The _file_name and _index_file fields in index_consume_entry_context are no longer used anywhere in the class (_file_name isn't even set, and _index_file was previously used when creating a promoted_index, which doesn't store the file object anymore)	2022-02-20 16:24:27 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	940874f3ff	sstables/index_reader: process_state(): add additional information to trace logging The amount of data available for parsing at the start of each entry, and the parsed key size.	2022-01-18 10:38:11 +02:00
Botond Dénes	afb14508c4	sstables/index_reader: verify_end_state(): add check for premature EOS Add a check which ensures that parsing ended in a valid state and not in the middle of a half-parsed entry.	2022-01-18 10:38:11 +02:00
Botond Dénes	36c0fe904e	sstables/index_reader: convert exception in verify_end_state() to malformed sstable exception Errors during parsing are usually reported via malformed sstable exception to signify their gravity of potentially being caused by corrupt sstables. This patch converts the exception thrown in `index_consume_entry_context::verify_end_state()`. While at it the error message is improved as well. It currently suggests that parsing was ended prematurely because data ran out, while in fact the condition under which this error is thrown is the opposite: parsing ended but there is unconsumed data left. The current state is also added to the error message.	2022-01-18 10:38:11 +02:00
Botond Dénes	7508b4fd22	sstables/index_reader: add const sstable& to index_consume_entry_context To be used by the next patches to throw malformed sstable exception.	2022-01-18 10:38:11 +02:00
Botond Dénes	9f3e5ae801	sstables/index_reader: remove unused members from index_consume_entry_context The unused members are: _s and _file_name.	2022-01-18 10:38:11 +02:00
Botond Dénes	259649c779	sstables/index_reader: improved diagnostics on missing index entry Add the summary index and the bound's address to the error message, so it can be correlated with other trace level logging when investigating a problem. Refs: #9446 Tests: unit(dev) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211202124955.542293-2-bdenes@scylladb.com>	2021-12-02 19:43:30 +02:00
Wojciech Mitros	8385f3eb21	sstables: index_reader: add support for iterating over clustering ranges in reverse In the sstable reader, we iterate over clustering ranges using the index_reader, which normally only accepts advancing to increasing positions. In this patch we add methods for advancing the index reader in reverse. To simplify our job we restrict our attention to a single implementation of the promoted index block cursor: `bsearch_clustered_cursor`. The `index_reader` methods for advancing in reverse will thus assume that this implementation is used. The assumption is correct given that we're working only with sstables of versions >= mc, which is indeed the intended use case. We add some documentation in appropriate places to make this obvious. We extend `bsearch_clustered_cursor` with two methods: `advance_past(pos)`, which advances the cursor to the first block after `pos` (or to the end if there is no such block), and `last_block_offset()`, which returns the data file offset of the first row from the last promoted index block. To efficiently find the position in the data file of the last row of the partition (which we need when performing a reversed query) the sstable reader may need to read the span of the entire last promoted index block in the data file. To learn where the block starts it can use `index_reader::last_block_offset()`, which is implemented in terms of `bsearch_clustered_cursor::last_block_offset()`. When performing a single partition read in forward order, the reader asks the index to position its lower bound at the start of the partition and its upper bound after the end of the slice. It starts by reading the first range. After exhausting a range it jumps to the next one by asking the index to advance the lower bound. For reverse single partition reads we'll take a similar approach: the initial bound positions are as in the forward case. However, we start with the last range and after exhausting a range we want to jump to a previous one; we will do it by advancing the upper bound in reverse (i.e. moving it closer to the beginning of the partition). For this we introduce the `index_reader::advance_reverse` function.	2021-10-04 15:24:12 +02:00
Kamil Braun	e3f1667744	sstables: remove use_binary_search_in_promoted_index This was a global variable that was potentially modified from a performance benchmark. It would modify the behavior of `index_reader` in certain scenarios. Remove the variable so we can specify the behavior of `index_reader` functions without relying on anything other than what's passed into the constructor and the function parameters.	2021-09-19 13:59:25 +03:00
Tomasz Grabiec	21f1a7be8b	sstables: Do not populate page cache when searching in promoted index for "bypass cache" reads Reads which bypass cache will use a private temporary instance of cached_file which dies together with the index cursor. The cursor still needs a cached_file with cachig layer. Binary searching needs caching for performance, some of the pages will be reused. Another reason to still use cached_file is to work with a common interface, and reusing it requires minimal changes.	2021-07-15 12:14:28 +02:00
Tomasz Grabiec	f4227c303b	sstables: Do not populate partition index cache for "bypass cache" reads Index cursor for reads which bypass cache will use a private temporary instance of the partition index cache. Promoted index scanner (ka/la format) will not go through the page cache.	2021-07-15 12:13:20 +02:00
Tomasz Grabiec	f14576f4be	sstables: Hide partition_index_cache implementation away from sstables.hh Reduces scope of the header to index_reader.hh which reduces recompilation time.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	7d34799f3f	sstables: Drop shared_index_lists alias	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	8360a64f73	sstables: Drop the _use_binary_search flag from index entries It doesn't have to be set by the parser now that the cursors are created lazily, pass it to the cursor when it's created.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	06e373e272	sstables: index_reader: Keep index objects under LSA In preparation for caching index objects, manage them under LSA. Implementation notes: key_view was changed to be a view on managed_bytes_view instead of bytes, so it now can be fragmented. Old users of key_view now have to linearize it. Actual linearization should be rare since partition keys are typically small. Index parser is now not constructing the index_entry directly, but produces value objects which live in the standard allocator space: class parsed_promoted_index_entry; calss parsed_partition_index_entry; This change was needed to support consumers which don't populate the partition index cache and don't use LSA, e.g. sstable::generate_summary(). It's now consumer's responsibility to allocate index_entry out of parsed_partition_index_entry.	2021-07-02 19:02:14 +02:00
Tomasz Grabiec	2b673478aa	sstables: index_reader: Do not expose index_entry references index_entry will be an LSA-managed object. Those have to be accessed with care, with the LSA region locked. This patch hides most of direct index_entry accesses inside the index_reader so that users are safe.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	a955e7971d	sstables: index_reader: Don't store schema reference inside index_entry To save space.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	9e7bf066a9	sstables: index_reader: Don't store file object inside promoted_index The file object which is currently stored there has per-request tracing wrappers (permit, trace_state) attached to it. It doesn't make sense once the entry is cached and shared. Annotate when the cursor is created instead.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	86b135056c	sstables: index_reader: Don't store front buffer inside promoted_index Index reads and promoted index reads are both using the same cached_file now, so there's no need to pass the buffers between the index parser and promoted index reader. Makes the promoted_index structure easier to move to LSA.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	078a6e422b	sstables: Cache all index file reads After this patch, there is a singe index file page cache per sstable, shared by index readers. The cache survives reads, which reduces amount of I/O on subsequent reads. As part of this, cached_file needed to be adjusted in the following ways. The page cache may occupy a significant portion of memory. Keeping the pages in the standard allocator could cause memory fragmentation problems. To avoid them, the cache_file is changed to keep buffers in LSA using lsa_buffer allocation method. When a page is needed by the seastar I/O layer, it needs to be copied to a temporary_buffer which is stable, so must be allocated in the standard allocator space. We copy the page on-demand. Concurrent requests for the same page will share the temporary_buffer. When page is not used, it only lives in the LSA space. In the subsequent patches cached_file::stream will be adjusted to also support access via cached_page::ptr_type directly, to avoid materializating a temporary_buffer. While a page is used, it is not linked in the LRU so that it is not freed. This ensures that the storage which is actively consumed remains stable, either via temporary_buffer (kept alive by its deleter), or by cached_page::ptr_type directly.	2021-07-02 19:02:13 +02:00
Tomasz Grabiec	8e2118069b	sstables: cached_file: Account buffers returned by cached_file under read_permit We want buffers to be accounted only when they are used outside cached_file. Cached pages should not be accounted because they will stay around for longer than the read after subsequent commits.	2021-07-02 10:25:58 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	32d9ba2fbb	sstables: index_consumer: drop unused max_quantity field	2021-05-21 21:02:16 +03:00
Benny Halevy	6a82e9f4be	sstables: index_reader: mark close noexcept We'd like that to simplify the soon-to-be-introduced sstable_mutation_reader::close error handling path. close_index_list can be marked noexcept since parallel_for_each is, with that index_reader::close can be marked noexcept too. Note that since reader close can not fail both lower and upper bounds are closed (since closing lower_bound cannot fail). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00

1 2 3

146 Commits