scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	2164aa8d5b	move compaction manager from /utils to /sstables Compaction manager was initially created at utils because it was more generic, and wasn't only intended for compaction. It was more like a task handler based on futures, but now it's only intended to manage compaction tasks, and thus should be moved elsewhere. /sstables is where compaction code is located. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-21 15:23:05 -02:00
Raphael S. Carvalho	3bd240d9e8	compaction: add ability to stop an ongoing compaction That's needed for nodetool stop, which is called to stop all ongoing compaction. The implementation is about informing an ongoing compaction that it was asked to stop, so the compaction itself will trigger an exception. Compaction manager will catch this exception and re-schedule the compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Raphael S. Carvalho	ec4c73d451	compaction: rename compaction_stats to compaction_info compaction_info makes more sense because this structure doesn't only store stats about ongoing compaction. Soon, we will add information to it about whether or not an user asked to stop the respective ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Tomasz Grabiec	237819c31f	logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space Historically the purpose of the metric is to show how much memory is in standard allocations. After zones were introduced, this would also include free space in lsa zones, which is almost all memory, and thus the metric lost its original meaning. This change brings it back to its original meaning. Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>	2016-01-18 10:48:14 +02:00
Raphael S. Carvalho	a5c90194f5	db: add support to clean up a column family Cleanup is a procedure that will discard irrelevant keys from all sstables of a column family, thus saving disk space. Scylla will clean up a sstable by using compaction code, in which this sstable will be the only input used. Compaction manager was changed to become aware of cleanup, such that it will be able to schedule cleanup requests and also know how to handle them properly. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 03:53:04 -02:00
Raphael S. Carvalho	d44a5d1e94	compaction: filter out compacting sstables The implementation is about storing generation of compacting sstables in an unordered set per column family, so before strategy is called, compaction manager will filter out compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 01:18:29 -02:00
Raphael S. Carvalho	9c13c1c738	compaction: move compaction execution from strategy to manager Currently, compaction strategy is the responsible for both getting the sstables selected for compaction and running compaction. Moving the code that runs compaction from strategy to manager is a big improvement, which will also make possible for the compaction manager to keep track of which sstables are being compacted at a moment. This change will also be needed for cleanup and concurrent compaction on the same column family. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 00:04:27 -02:00
Tomasz Grabiec	eb1b21eb4b	Introduce hashing helpers	2016-01-08 21:10:25 +01:00
Avi Kivity	c8b09a69a9	lsa: disable constant_time_size in binomial_heap implementation Corrupts heap on boost < 1.60, and not needed. Fixes #698.	2015-12-29 12:59:00 +01:00
Vlad Zolotarov	33552829b2	core: use steady_clock where monotinic clock is required Use steady_clock instead of high_resolution_clock where monotonic clock is required. high_resolution_clock is essentially a system_clock (Wall Clock) therefore may not to be assumed monotonic since Wall Clock may move backwards due to time/date adjustments. Fixes issue #638 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-12-27 18:07:53 +02:00
Calle Wilund	803b58620f	data_output: specialize serialized_size for bool to ensure sync with write	2015-12-21 14:19:45 +00:00
Paweł Dziepak	442bc90505	compaction_manager: check whether the manager is already stopped Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-17 14:06:41 +01:00
Tomasz Grabiec	157af1036b	data_output: Introduce write_view() which matches data_input::read_view()	2015-12-16 18:06:54 +01:00
Raphael S. Carvalho	e74dcc86bd	compaction_manager: introduce list of compaction_stats This list will store compaction_stats for each ongoing compaction. That's why register and deregister methods are provided. This change is important for compaction stats API that needs data of each ongoing compaction, such as progress, ks, cf, etc. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:28 -02:00
Lucas Meneghel Rodrigues	2167173251	utils/logalloc.cc - Declare member minimum_size from segment_zone struct This fixes compile error: In function `logalloc::segment_zone::segment_zone()': /home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>	2015-12-10 12:54:34 +02:00
Paweł Dziepak	ec453c5037	managed_bytes: fix potentially unaligned accesses blob_storage defined with attribute packed which makes its alignment requirement equal 1. This means that its members may be unaligned. GCC is obviously aware of that and will generate appropriate code (and not generate ubsan checks). However, there are few places where members of blob_storage are accessed via pointers, these have to be wrapped by unaligned_cast<> to let the compiler know that the location pointed to may be not aligned properly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-10 11:59:54 +02:00
Avi Kivity	204610ac61	Merge "Make LSA more large-allocation-friendly" from Paweł "This series attempts to make LSA more friendly for large (i.e. bigger than LSA segment) allocations. It is achieved by introducing segment zones – large, contiguous areas of segments and using them to allocate segments instead of calling malloc() directly. Zones can be shrunk when needed to reclaim memory and segments can be migrated either to reduce number of zone or to defragment one in order to be able to shrink it. LSA tries to keep all segments at the lower addresses and reclaims memory starting from the zones in the highest parts of the address space."	2015-12-09 10:49:23 +02:00
Paweł Dziepak	8ba66bb75d	managed_bytes: fix copy size in move constructor Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-09 10:38:28 +02:00
Paweł Dziepak	0d66300d43	lsa: add more counters Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	83b004b2fb	lsa: avoid fragmenting memory Originally, lsa allocated each segment independently what could result in high memory fragmentation. As a result many compaction and eviction passes may be needed to release a sufficiently big contiguous memory block. These problems are solved by introduction of segment zones, contiguous groups of segments. All segments are allocated from zones and the algorithm tries to keep the number of zones to a minimum. Moreover, segments can be migrated between zones or inside a zone in order to deal with fragmentation inside zone. Segment zones can be shrunk but cannot grow. Segment pool keeps a tree containing all zones ordered by their base addresses. This tree is used only by the memory reclamer. There is also a list of zones that have at least one free segments that is used during allocation. Segment allocation doesn't have any preferences which segment (and zone) to choose. Each zone contains a free list of unused segments. If there are no zones with free segments a new one is created. Segment reclamation migrates segments from the zones higher in memory to the ones at lower addresses. The remaining zones are shrunk until the requested number of segments is reclaimed. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2fb14a10b6	utils: add dynamic_bitset A dynamic bitset implementation that provides functions to search for both set and cleared bits in both directions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	40dda261f2	lsa: maintain segment to region mapping Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2e94086a2c	lsa: use bi::list to implement segment_stack Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Tomasz Grabiec	6ead7a0ec5	Merge tag 'large-blobs/v3' from git@github.com:avikivity/scylla.git Scattering of blobs from Avi: This patchset converts the stack to scatter managed_bytes in lsa memory, allowing large blobs (and collections) to be stored in memtable and cache. Outside memtable/cache, they are still stored sequentially, but it is assumed that the number of transient objects is bounded. The approach taken here is to scatter managed_bytes data in multiple blob_storage objects, but to linearize them back when accessing (for example, to merge cells). This allows simple access through the normal bytes_view. It causes an extra two copies, but copying a megabyte twice is cheap compared to accessing a megabyte's worth of small cells, so per-byte throughput is increased. Testing show that lsa large object space is kept at zero, but throughput is bad because Scylla easily overwhelms the disk with large blobs; we'll need Glauber's throttling patches or a really fast disk to see good throughput with this.	2015-12-08 19:15:13 +01:00
Avi Kivity	0c2fba7e0b	lsa: advertize our preferred maximum allocation size Let managed_bytes know that allocating below a tenth of the segment size is the right thing to do.	2015-12-08 15:17:09 +02:00
Avi Kivity	13324607e6	managed_bytes: conform to allocation_strategy's max_preferred_allocation_size Instead of allocating a single blob_storage, chain multiple blob_storage objects in a list, each limited not to exceed the allocation_strategy's max_preferred_allocation_size. This allows lsa to allocate each blob_storage object as an lsa managed object that can be migrated in memory. Also provide linearize()/scatter() methods that can be used to temporarily consolidate the storage into a single blob_storage. This makes the data contiguous, so we can use a regular bytes_view to examine it.	2015-12-08 15:17:08 +02:00
Tomasz Grabiec	657841922a	Mark move constructors noexcept when possible	2015-12-07 09:50:27 +01:00
Avi Kivity	2437fc956c	allocation_strategy: expose preferred allocation size limit Our premier allocation_strategy, lsa, prefers to limit allocations below a tenth of the segment size so they can be moved around; larger allocations are pinned and can cause memory fragmentation. Provide an API so that objects can query for this preferred size limit. For now, lsa is not updated to expose its own limit; this will be done after the full stack is updated to make use of the limit, or intermediate steps will not work correctly.	2015-12-06 16:23:42 +02:00
Amnon Heiman	61abc85eb3	histogram: Add started counter This patch adds a started counter, that is used to mark the number of operation that were started. This counter serves two purposes, it is a better indication for when to sample the data and it is used to indicate how many pending operations are. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-01 15:28:06 +02:00
Amnon Heiman	88dcf2e935	latency: Switch to steady_clock The system clock is less suitable for for time difference than steady_clock. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-01 15:28:06 +02:00
Tomasz Grabiec	a3e3add28a	utils: Introduce phased_barrier Utility for waiting on a group of async actions started before certain point in time.	2015-11-29 16:25:21 +01:00
Pekka Enberg	cf7541020f	Merge "Enable more config options" from Asias	2015-11-25 16:09:22 +02:00
Paweł Dziepak	89f7f746cb	lsa: fix printing object_descriptor::_alignment object_descriptor::_alignment is of type uint8_t which is actually an unsigned char. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 20:13:29 +01:00
Paweł Dziepak	65875124b7	lsa: guarantee that segment_heap doesn't throw boost::heap::binomial_heap allocates helper object in push() and, therefore, may throw an exception. This shouldn't happen during compaction. The solution is to reserve space for this helper object in segment_descriptor and use a custom allocator with boost::heap::binomial_heap. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 19:51:22 +01:00
Paweł Dziepak	273b8daeeb	lsa: add no-op default constructor for segment Zero initialization of segment::data when segment is value initialized is undesirable. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 16:37:37 +01:00
Paweł Dziepak	e6cf3e915f	lsa: add counters for memory used by large objects Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 16:36:27 +01:00
Paweł Dziepak	6b113a9a7a	lsa: fix eviction of large blobs LSA memory reclaimer logic assumes that the amount of memory used by LSA equals: segments_in_use * segment_size. However, LSA is also responsible for eviction of large objects which do not affect the used segmentcount, e.g. region with no used segments may still use a lot of memory for large objects. The solution is to switch from measuring memory in used segments to used bytes count that includes also large objects. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-24 16:29:09 +01:00
Asias He	33ef58c5c9	utils: Add get_broadcast_rpc_address and set_broadcast_rpc_address helper	2015-11-24 10:07:31 +08:00
Avi Kivity	ba859acb3b	big_decimal: add default constructor Arithmetic types should have a default constructor, and anyway the following patch wants it.	2015-11-18 10:36:03 +02:00
Paweł Dziepak	c37afcfdee	lsa: account for size of objects too big for LSA While the objects above max_manage_object_size aren't stored in the LSA segments they are still considered to be belonging to the LSA region and are evictable using that region evictor. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-16 12:22:12 +01:00
Tomasz Grabiec	7e0f99cc3b	Merge tag 'native-preparatory/v1' from https://github.com/avikivity/scylla.git Assorted patches that pave the way for native storage (while not committing us in any way).	2015-11-16 10:01:38 +01:00
Avi Kivity	1c425d6b50	logalloc: allow allocating_section code blocks to return references	2015-11-15 19:10:24 +02:00
Avi Kivity	36994a5d08	managed_bytes: add a constructor from std::initializer_list<> Not actually used in the patchset now, but nice.	2015-11-13 17:13:07 +02:00
Avi Kivity	f3afe3e876	allocation_strategy: constify migrate_fn Since abstract_type will be providing our migrate_fn, they must be const, and indeed a migration does not change the migration function.	2015-11-13 17:13:07 +02:00
Calle Wilund	0fa543800a	data_output: Template "blob" writers (bytes*) to allow for varying "size" type	2015-11-10 13:12:33 +01:00
Calle Wilund	9ee8204993	data_input: Fix missing bounds check	2015-11-10 13:12:33 +01:00
Paweł Dziepak	64f1c2866c	lsa: free segment in trim_emergency_reserve_to_max() _emergency_reserve is an intrusive containers and it doesn't care about segment lifetime. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-11-05 18:04:38 +02:00
Raphael S. Carvalho	c2a98807c7	compaction_manager: fix remove remove() is the function used to remove every reference to a cf from the compaction manager. This function works by removing cf from the queue, and waiting for possible ongoing compaction on cf. However, a cf may be re-queued by compaction manager task if there is pending compaction by the end of compaction. If cf is still referenced by the time remove() returns, we could end up with an use-after-free. To fix that, a task shouldn't re-queue a cf if it was asked to stop. The stat pending_tasks was also not being updated when a cf was removed from the task queue. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-10-28 17:35:26 +02:00
Vlad Zolotarov	5613979a85	utils::fb_utilities: add the ability to set a broadcast address Add utils::fb_utilities::set_broadcast_address(). Set it to either broadcast_address or listen_address configuration value if appropriate values are set. If none of the two values above are set - abort the application. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com> New in v2: - Simplify the utils::fb_utilities::get_broadcast() logic.	2015-10-26 14:10:39 +02:00
Tomasz Grabiec	764d913d84	Merge branch 'pdziepak/row-cache-range-query/v4' from seastar-dev.git From Pawel: This series enables row cache to serve range queries. In order to achieve that row cache needs to know whether there are some other partitions in the specified range that are not cached and need to be read from the sstables. That information is provied by key_readers, which work very similarly to mutation_readers, but return only the decorated key of partitions in range. In case of sstables key_readers is implemented to use partition index. Approach like this has the disadvantage of needing to access the disk even if all partitions in the range are cached. There are (at least) two solutions ways of dealing with that problem: - cache partition index - that will also help in all other places where it is neededed - add a flag to cache_entry which, when set, indicates that the immediate successor of the partition is also in the cache. Such flag would be set by mutation reader and cleared during eviction. It will also allow newly created mutations from memtable to be moved to cache provided that both their successors and predecessors are already there. The key_reader part of this patchsets adds a lot of new code that probably won't be used in any other place, but the alternative would be to always interleave reads from cache with reads from sstables and that would be more heavy on partition index, which isn't cached. Fixes #185.	2015-10-21 15:26:45 +02:00

1 2 3 4 5

218 Commits