scylladb

Author	SHA1	Message	Date
Tomasz Grabiec	3bb147ae95	db: mutation_cleaner: Enqueue new snapshots at the back This fixes a quadratic behavior in case lots of snapshots with range tombstones are queued for merging. Before the change, new snapshots were inserted at the front, which is also where the worker looks at. Merging a version has a linear component in complexity function which depends on the number of range tombstones. If we merge snapshots starting from the latest to oldest then the whole process becomes quadratic because the version which is merged accumulates an increasing amont of tombstones, ones which were already merged before. We should instead merge starting from the oldest snapshots, this way each tombstone is applied exactly once during merge. This bug got wose after `4bd4aa2e88`, which makes merging tombstones more expensive. Closes #10916	2022-06-28 18:29:29 +03:00
Tomasz Grabiec	53026f3ba6	memtable: Subtract from flushed memory when cleaning This patch prevents virtual dirty from going negative during memtable flush in case partition version merging erases data previously accounted by the flush reader. There is an assert in ~flush_memory_accounter which guards for this. This will start happening after tombstones are compacted with rows on partition version merging. This problem is prevented by the patch by having the cleaner notify the memtable layer via callback about the amount of dirty memory released during merging, so that the memtable layer can adjust its accounting.	2022-06-15 11:30:25 +02:00
Avi Kivity	5129280f45	Revert "Merge 'memtable, cache: Eagerly compact data with tombstones' from Tomasz Grabiec" This reverts commit `e0670f0bb5`, reversing changes made to `605ee74c39`. It causes failures in debug mode in database_test.test_database_with_data_in_sstables_is_a_mutation_source_plain, though with low probability. Fixes #10780 Reopens #652.	2022-06-14 18:06:22 +03:00
Tomasz Grabiec	9135d1fd1f	memtable: Subtract from flushed memory when cleaning This patch prevents virtual dirty from going negative during memtable flush in case partition version merging erases data previously accounted by the flush reader. There is an assert in ~flush_memory_accounter which guards for this. This will start happening after tombstones are compacted with rows on partition version merging. This problem is prevented by the patch by having the cleaner notify the memtable layer via callback about the amount of dirty memory released during merging, so that the memtable layer can adjust its accounting.	2022-06-06 19:25:41 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Piotr Dulikowski	59fbbb993f	memtables: add partition/row hit/miss counters Adds per-table metrics for counting partition and row reuse in memtables. New metrics are as follows: - memtable_partition_writes - number of write operations performed on partitions in memtables, - memtable_partition_hits - number of write operations performed on partitions that previously existed in a memtable, - memtable_row_writes - number of row write operations performed in memtables, - memtable_row_hits - number of row write operations that ovewrote rows previously present in a memtable. Tests: unit(release)	2019-11-12 13:35:41 +01:00
Tomasz Grabiec	ac49b1def0	mutation_cleaner: Migrate partition_snapshots when queueing for background cleanup partition_snapshots created in the memtable will keep a reference to the memtable (as region*) and to memtable::_cleaner. As long as the reader is alive the memtable will be kept alive by partition_snapshot_flat_reader::_container_guard. But after that, nothing prevents it from being destroyed. The snapshot can outlive the read if mutation_cleaner::merge_and_destroy() defers its destruction for later. When the read ends after memtable was flushed, the snapshot will be queued in the cache's cleaner, but internally will reference memtable's region and cleaner. This will result in a use-after-free when the snapshot resumses destruction. The fix is to update snapshots's region and cleaner references at the time of queueing to point to the cache's region and cleaner. When memtable is destroyed without being moved to cache there is no problem, because the snapshot would be queued into memtable's cleaner, which will be drained on destruction from all snapshots. Introduced in `f3da043`. Fixes #4030.	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	67f9afbd1a	mutation_cleaner: impl: Store a back-reference to the owning mutation_cleaner	2018-12-27 18:08:50 +01:00
Tomasz Grabiec	074be4d4e8	memtable, cache: Run mutation_cleaner worker in its own scheduling group The worker is responsible for merging MVCC snapshots, which is similar to merging sstables, but in memory. The new scheduling group will be therefore called "memory compaction". We should run it in a separate scheduling group instead of main/memtables, so that it doesn't disrupt writes and other system activities. It's also nice for monitoring how much CPU time we spend on this.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	6c6ffaee71	mutation_cleaner: Make merge() redirect old instance to the new one If memtable snapshot goes away after memtable started merging to cache, it would enqueue the snapshots for cleaning on the memtable's cleaner, which will have to clean without deferrring when the memtable is destroyed. That may stall the reactor. To avoid this, make merge() cause the old instance of the cleaner to redirect to the new instance (owned by cache), like we do for regions. This way the snapshots mentioned earlier can be cleaned after memtable is destroyed, gracefully.	2018-06-27 21:51:04 +02:00
Tomasz Grabiec	c26a304fbb	mvcc: Merge partition version versions gradually in the background When snapshots go away, typically when the last reader is destroyed, we used to merge adjacent versions atomically. This could induce reactor stalls if partitions were large. This is especially true for versions created on cache update from memtables. The solution is to allow this process to be preempted and move to the background. mutation_cleaner keeps a linked list of such unmerged snapshots and has a worker fiber which merges them incrementally and asynchronously with regards to reads. This reduces scheduling latency spikes in tests/perf_row_cache_update for the case of large partition with many rows. For -c1 -m1G I saw them dropping from 23ms to 2ms.	2018-06-27 12:48:30 +02:00
Paweł Dziepak	bdc299cc38	mutation_cleaner: add disclaimer about mutation_partition lifetime mutation_cleaner has already caused problems by extending lifetime of mutation_partition past the lifetime of LSA migrators that it uses (due to the fact that both the cleaner and migrators where thread-local globals). Since, the long term goal is to make mutation_partition internal representation depend more and more on schema that lifetime extension may again cause problems in the future, so let's add a disclaimer that hopefuly, will help avoiding them.	2018-06-25 09:37:43 +01:00
Tomasz Grabiec	e0803ff71e	Introduce mutation_cleaner Used for collecting unsued partition_version objects and freeing them incrementally. Will be used for both cache and memtables.	2018-05-30 14:41:39 +02:00

14 Commits