mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-22 01:20:39 +00:00
Currently, the sstable_set in a table is copied before every change to allow accessing the unchanged version by existing sstable readers. This patch changes the sstable_set to a structure that allows copying without actually copying all the sstables in the set, while providing the same methods(and some extra) without majorly decreasing their speed. This is achieved by associating all copies with sstable_set versions which hold the changes that were performed in them, and references to the versions that were copied, a.k.a. their parents. The set represented by a version is the result of combining all changes of its ancestors. This causes most methods of the version to have a time complexity dependent on the number of its ancestors. To limit this number, versions that represent copies that have already been deleted are merged with its descendants. The strategy used for deciding when and with which of its children should a version be merged heavily depends on the use case of sstable_sets: there is a main copy of the set in a table class which undergoes many insertions and deletions, and there are copies of it in compaction or mutation readers which are further copied or edited few or zero times. It's worth to mention, that when a copy is made, the copied set should not be modified anymore, because it would also modify the results given by the copy. In order to still allow modifying the copied set, if a change is to be performed on it, the version assiociated with this set is replaced with a new version depending on the previous one. As we can see, in our use case there is a main chain of versions(with changes from the table), and smaller branches of versions that start from a version from this chain, but are deleted soon after. In such case we can merge a version when it has exactly one descendant, as this limits the number of concurrent ancestors of a version to the number of copies of its ancestors are concurrently used. During each such merge, the parent version is removed and the child version is modified so that all operations on it give the same results. In order to preserve the same interface, the sstable_set still contains a lw_shared_ptr<sstable_list>, but sstable_list (previously an alias for unordered_set<shared_sstable>) is now a new structure. Each sstable_set contains a sstable_list but not every sstable_list has to be contained by a sstable_set, and we also want to allow fast copying of sstable_lists, so the reference to the sstable_set_version is kept by the sstable_lists and the sstable_set can access the sstable_set_version it's associated with through its sstable_list. Accessing sstables that are elements of a certain sstable_set copy(so the select, select_sstable_runs and sstable_list's iterator) get results from containers that hold all sstables from all versions(which are stored in a single, shared "versioned_sstable_set_data" structure), and then filter out these sstables that aren't present in the version in question. This version of the sstable_set allows adding and erasing the same sstable repeatedly. Inserting and erasing from the set modifies the containers in a version only when it has an actual effect: if an sstable has been added in the parent version, and hasn't been erased in the child version, adding it again will have no effect. This ensures that when merging versions, the versions have disjoint sets of added, and erased sstables (an sstable can still be added in one and erased in the second). It's worth noting hat if an sstable has been added in one of the merged sets and erased in the second, the version that remains after merging doesn't need to have any info about the sstable's inclusion in the set - it can be inferred from the changes in previous versions (and it doesn't matter if the sstable has been erased before or after being added). To release pointers to sstables as soon as possible (i.e. when all references to versions that contain them die), if an sstable is added/erased in all child versions that are based on a version which has no external references, this change gets removed from these versions and added to the parent version. If an sstable's insertion gets overwritten as a result, we might be able to remove the sstable completely from the set. We know how many times this needs to happen by counting, for each sstable, in how many different verisions has it been added. When a change that adds an sstable gets merged with a change that removes it, or when a such a change simply gets deleted alongside its associated version, this count is reduced, and when an sstable gets added to a version that doesn't already contain it, this count is increased. The methods that modify the sets contents give strong exception guarantee by trying to insert new sstables to its containers, and erasing them in the case of an caught exception. Fixes #2622 Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>