scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	92b5e4d63d	cache, mvcc: Preempt cache update when applying range tombstone from memtable Range tombstones are represented as entry attributes, which applies to the interval between entries. So if a range tombstone covers many rows, to apply it we have to update all covered entries. In some workloads that could be many entries, even the whole cache. Before the patch, we did this update without preemption, which can cause reactor stalls in such workloads. This scenario is already covered by mvcc_tests, e.g. test_apply_to_incomplete_respects_continuity. And I verified that the new preemption point is hit in the test. perf-row-cache-update results show no significant stalls anymore (max 2ms scheduling delay, instead of previous 1.5 s): Generated 1124195 rows Memtable fill took 4179.457520 [ms], {count: 8295, 99%: 0.654949 [ms], max: 32.817176 [ms]} Draining... took 0.000616 [ms] cache: 2506/2948 [MB], memtable: 781/1024 [MB], alloc/comp: 1051/662 [MB] (amp: 0.630) update: 2874.157471 [ms], preemption: {count: 26650, 99%: 1.131752 [ms], max: 2.068762 [ms]}, cache: 3027/3973 [MB], alloc/comp: 3951/2424 [MB] (amp: 0.614), pr/me/dr 1124195/0/0 Fixes #23479 Fixes #2578	2025-12-06 13:45:35 +01:00
Ernest Zaslavsky	d2c5765a6b	treewide: Move keys related files to a new keys directory As requested in #22102, #22103 and #22105 moved the files and fixed other includes and build system. Moved files: - clustering_bounds_comparator.hh - keys.cc - keys.hh - clustering_interval_set.hh - clustering_key_filter.hh - clustering_ranges_walker.hh - compound_compat.hh - compound.hh - full_position.hh Fixes: #22102 Fixes: #22103 Fixes: #22105 Closes scylladb/scylladb#25082	2025-07-25 10:45:32 +03:00
Ran Regev	edd56a2c1c	moved cache files to db As requested in #22097, moved the files and fixed other includes and build system. Fixes: #22097 Signed-off-by: Ran Regev <ran.regev@scylladb.com> Closes scylladb/scylladb#22495	2025-02-04 12:21:31 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Michał Chojnowski	35921eb67e	mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity For performance reasons, mutation_partition_v2::maybe_drop(), and by extension also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&) can evict empty row entries, and hence change the continuity of the merged entry. For checking that apply_to_incomplete respects continuity, test_apply_to_incomplete_respects_continuity obtains the continuity of the partition entry before and after apply_to_incomplete by calling e.squashed().get_continuity(). But squashed() uses apply_monotonically(), so in some circumstances the result of squashed() can have smaller continuity than the argument of squashed(), which messes with the thing that the test is trying to check, and causes spurious failures. This patch changes the method of calculating the continuity set, so that it matches the entry exactly, fixing the test failures. Fixes scylladb/scylladb#13757 Closes scylladb/scylladb#21459	2024-11-08 06:08:39 +01:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Kefu Chai	f03f69ad4f	partition_version: move the base class in move ctor before this change, `partition_version` uses a hand-crafted move constructor. but it suffers from the warning from clang-tidy, which believe there is a use-after-move issue, as the inner instance of it's parent class is constructed using `anchorless_list_base_hook(std::move(pv))`, and its other member variables are initialized like `_partition(std::move(pv._partition))` `std::move(pv)` does not do anything, but indicates `pv` maybe moved from. and what is moved away is but the part belong to its parent class. so this issue is benign. but, it's still annoying. as we need to tell the genuine issues reported by clang-tidy from the false alarms. so we have at least two options: - stop using clang-tidy - ignore this warning - silence this warning using LINT direction in a comment - use another way to implement the move constructor in this change, we just cast the moved instance to its base class and move it instead, this should applease clang-tidy. Fixes #18354 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18359	2024-04-28 18:34:45 +02:00
Kefu Chai	e5b30ae2ad	partition_version: do not rereference moved variable in `partition_entry::apply_to_incomplete()`, we pass `dst_snp` and `std::move(dst_snp)` to build the capture variable list of a lambda, but the order of evaluation of these variables are unspecified. fortunately, we haven't run into any issues at this moment. but this is not future-proof. so, let's avoid this by storing a reference of the dereferenced smart pointer, and use it later on. this issue is identified by clang-tidy: ``` /home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: warning: 'dst_snp' used after it was moved [bugprone-use-after-move] 500 \| cur = partition_snapshot_row_cursor(s, dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:502:23: note: move occurred here 502 \| dst_snp = std::move(dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:500:53: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 500 \| cur = partition_snapshot_row_cursor(s, dst_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: warning: 'src_snp' used after it was moved [bugprone-use-after-move] 501 \| src_cur = partition_snapshot_row_cursor(s, src_snp, can_move), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:504:23: note: move occurred here 504 \| src_snp = std::move(src_snp), \| ^ /home/kefu/dev/scylladb/mutation/partition_version.cc:501:57: note: the use and move are unsequenced, i.e. there is no guarantee about the order in which they are evaluated 501 \| src_cur = partition_snapshot_row_cursor(s, *src_snp, can_move), \| ^ ``` Fixes #18360 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18361	2024-04-25 14:57:52 +03:00
Kefu Chai	e1a9340cc1	partition_version: add fmt::formatter for partition_entry::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `parition_entry::printer`, and drop its operator<< . Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17812	2024-03-15 09:52:27 +02:00
Kefu Chai	94d25e02ad	mutation: add fmt::formatter for mutation_partition_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_partition_v2::printer, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:09 +08:00
Michał Chojnowski	bed20a2e37	row_cache: use preemption_source in update() To facilitate testing the state of cache after the update is preempted at various points, pass a preemption_source& to update() instead of calling the reactor directly. In release builds, the calls to preemption_source methods should compile to the same direct reactor calls as today. Only in dev mode they should add an extra branch. (However, the `preemption_source&` argument has to be shoveled in any mode).	2024-02-07 18:31:36 +01:00
Michał Chojnowski	9ccd4ea416	partition_version: fix violation of "older versions are evicted first" during schema upgrades A schema upgrade appends a MVCC version B after an existing version A. The last dummy in B is added to the front of LRU, so it will be evicted after the entries in A. This alone doesn't quite violate the "older versions are evicted first" rule, because the new last dummy carries no information. But apply_monotonically generally assumes that entries on the same position have the obvious eviction order, even if they carry no information. Thus, after the merge, the rule can become broken. The proposed fix is as follows: - In the case where A is merged into B, the merged last dummy inherits the link of A. - The merging of B into anything is prevented until its merge with A is finished. This is relatively hacky, because it still involves a state that goes against some natural expectations granted by the "older versions..." rule. A less hacky fix would be to ensure that the new dummy is inserted into a proper place in the eviction order to begin with. Or, better yet, we could eliminate the rule altogether. Aside from being very hard to maintain, it also prevents the introduction of any eviction algorithm other than LRU.	2023-11-16 19:01:18 +01:00
Michał Chojnowski	80c8a6d0e6	partition_version: make partition_entry::upgrade() gentle Preceding commits in this patch series have extended the MVCC mechanism to allow for versions with different schemas in the same entry/snapshot, with on-the-fly and background schema upgrades to the most recent version in the chain. Given that, we can perform gentle schema upgrades by simply adding an empty version with the target schema to the front of the entry. This patch is intended to be the first and only behaviour-changing patch in the series. Previous patches added code paths for multi-schema snapshots, but never exercised them, because before this patch two different schemas within a single MVCC chain never happened. This patch makes it happen and thus exercises all the code in the series up until now. Fixes #2577	2023-05-04 03:35:15 +02:00
Michał Chojnowski	fe576f8f29	partition_version: handle multi-schema snapshots in merge_partition_versions Each partition_version is allowed to have a different schema now. As of this patch, all versions reachable from a snapshot/entry always have the same schema, but this will change in an upcoming patch. This commit prepares merge_partition_versions() for that. See code comments added in this patch for a detailed description. The design chosen in this patch requires adding a bit of information to partition_version. Due to alignment, it results in a regrettable waste of 8 bytes per partition. If we want, we can recover that in the future by squeezing the bit into some free bit in other fields, for example the highest or lowest bits of one of the pointers in partition_version. After this patch, MVCC should be prepared for replacing the atomic schema upgrade() of cache/memtable entries with a gentle upgrade().	2023-05-04 03:35:15 +02:00
Michał Chojnowski	0273101890	partition_version: remove the unused "from" argument in partition_entry::upgrade() partition_entry now contains a reference to its schema, so it doesn't have to be supplied by the caller anymore.	2023-05-04 02:37:30 +02:00
Michał Chojnowski	db6a35e3a8	partition_version: handle multi-schema entries in partition_entry::squashed An upcoming patch will enable multiple schemas within a single entry, after the entry is upgraded. partition_entry::squashed isn't prepared for that yet. This patch prepares it.	2023-05-04 02:37:30 +02:00
Michał Chojnowski	f4e853b32d	partiton_version: prepare partition_snapshot::squashed() for multi-schema snapshots When in upcoming patches we allow multiple schema versions within a single snapshot, reads will have to upgrade rows on the fly. This also applies to squashed()	2023-05-04 02:37:29 +02:00
Michał Chojnowski	a2e3cf7463	partition_version: prepare partition_snapshot::static_row() for multi-schema snapshots When in upcoming patches we allow multiple schema versions within a single snapshot, reads will have to upgrade rows on the fly. This also applies to the static row.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	94e4dc3d8d	partition_version: add a logalloc::region argument to partition_entry::upgrade() The argument is currently unused, but will be further propagated to add_version() in an upcoming patch.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	caaf0bd6bf	partition_version: remove _schema from partition_entry::operator<< operator<< accepts a schema& and a partition_entry&. But since the latter now contains a reference to its schema inside, the former is redundant. Remove it.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	f6e11c95e2	partition_version: remove the schema argument from partition_entry::read() partition_entry now contains a reference to its schema, so it no longer needs to be supplied by the caller.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	d7d6449a8f	partition_version: remove the _schema field from partition_snapshot After adding a _schema field to each partition version, the field in partition_snapshot is redundant. It can be always recovered from the latest version. Remove it.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	1d01a4a168	partition_version: add a _schema field to partition_version Currently, partition_version does not reference its schema. All partition_version reachable from a entry/snapshot have the same schema, which is referenced in memtable_entry/cache_entry/partition_snapshot. To enable gentle schema upgrades, we want to use the existing background version merging mechanism. To achieve that, we will move the schema reference into partition_version, and we will allow neighbouring MVCC versions to have different schemas, and we will merge them on-the-fly during reads and persistently during background version merges. This way, an upgrade will boil down to adding a new empty version with the new schema. This patch adds the _schema field to partition_version and propagates the schema pointer to it from the version's containers (entry/snapshot). Subsequent patches will remove the schema references from the containers, because they are now redundant.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	781514acfe	mutation_partition_v2: change schema_ptr to schema& in mutation_partition_v2 constructor We don't have a convention for when to pass `schema_ptr` and and when to pass `const schema&` around. In general, IMHO the natural convention for such a situation is to pass the shared pointer if the callee might extend the lifetime of shared_ptr, and pass a reference otherwise. But we convert between them willy-nilly through shared_from_this(). While passing a reference to a function which actually expects a shared_ptr can make sense (e.g. due to the fact that smart pointers can't be passed in registers), the other way around is rather pointless. This patch takes one occurence of that and modifies the parameter to a reference. Since enable_shared_from_this makes shared pointer parameters and reference parameters interchangeable, this is a purely cosmetic change.	2023-05-04 02:37:29 +02:00
Michał Chojnowski	49a02b08de	mutation_partition_v2: clean up variants of apply() Most variants of apply() and apply_monotonically() in mutation_partition_v2 are leftovers from mutation_partition, and are unused. Thus they only add confusion and maintenance burden. Since we will be modifying apply_monotonically() in upcoming patches, let's clean them up, lest the variants become stale. This patch removes all unused variants of apply() and apply_monotonically() and "manually inlines" the variants which aren't used often enough to carry their own weight. In the end, we are left with a single apply_monotonically() and two convenience apply() helpers. The single apply_monotonically() accepts two schema arguments. This facility is unimplemented and unused as of this patch - the two arguments are always the same - but it will be implemented and used in later parts of the series.	2023-05-04 02:37:29 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00

27 Commits