scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-20 06:42:16 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	a0cba3c86f	logalloc: Introduce tracker::occupancy() Returns occupancy information for all memory allocated by LSA, including segment pools / zones.	2016-03-22 16:28:10 +01:00
Tomasz Grabiec	529c8b8858	logalloc: Rename tracker::occupancy() to region_occupancy()	2016-03-22 14:56:44 +01:00
Tomasz Grabiec	ca08db504b	managed_bytes: Make operator[] work for large blobs as well Fixes assertion in mutation_test: mutation_test: ./utils/managed_bytes.hh:349: blob_storage::char_type* managed_bytes::data(): Assertion `!_u.ptr->next' Introduced in `ea7c2dd085` Message-Id: <1458648786-9127-1-git-send-email-tgrabiec@scylladb.com>	2016-03-22 14:43:52 +02:00
Tomasz Grabiec	184e2831e7	managed_bytes: Mark move-assignment noexcept	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	92d4cfc3ab	managed_bytes: Make copy assignment exception-safe	2016-03-21 18:41:27 +01:00
Tomasz Grabiec	22d193ba9f	managed_bytes: Make linearization_context::forget() noexcept It is needed for noexcept destruction, which we need for exception safety in higher layers. According to [1], erase() only throws if key comparison throws, and in our case it doesn't. [1] http://en.cppreference.com/w/cpp/container/unordered_map/erase	2016-03-21 18:41:27 +01:00
Benoît Canet	1fb9a48ac5	exception: Optionally shutdown communication on I/O errors. I/O errors cannot be fixed by Scylla the only solution is to shutdown the database communications. Signed-off-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>	2016-03-17 15:02:52 +02:00
Paweł Dziepak	338fd34770	lsa: update _closed_occupancy after freeing all segments _closed_occupancy will be used when a region is removed from its region group, make sure that it is accurate. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-17 11:12:05 +00:00
Paweł Dziepak	99b61d3944	lsa: set _active to nullptr in region destructor In region destructor, after active segments is freed pointer to it is left unchanged. This confuses the remaining parts of the destructor logic (namely, removal from region group) which may rely on the information in region_impl::_active. In this particular case the problem was that code removing from the region group called region_impl::occupancy() which was dereferencing _active if not null. Fixes #993. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>	2016-03-07 10:15:28 +01:00
Calle Wilund	e79ca557ed	managed_bytes: Change init of small object to silence error on gcc5 Fixes #865 (Some) gcc 5 (5.3.0 for me) on ubuntu will generate errors on compilation of this code (compiling logalloc_test). The memcpy to inline storage seems to confuse the compiler. Simply change to std::copy, which shuts the compiler up. Any decent stl should convert primitive std::copy to memcpy anyway, but since it is also the inline (small storage), it should not matter which way. Message-Id: <1456931988-5876-4-git-send-email-calle@scylladb.com>	2016-03-02 18:21:51 +02:00
Calle Wilund	43ea1f5945	utils::jointpoint: Helper type to generate a singular value for all shards Lets operations working on all shards "join" and acquire the same value of something, with that value being based on whenever all shards reach the join. Obvious use case: time stamp after one set of per-shard ops, but before final ones. The generation of the value is guaranteed to happen on the shards that created the join point. Based on the join-ops in CF::snapshot, but abstracted and made caller responsibility. Primary use case is to help deal with the join-problem of truncation. Message-Id: <1456332856-23395-1-git-send-email-calle@scylladb.com>	2016-02-24 18:59:25 +02:00
Paweł Dziepak	d5c794d5e4	data_output: add reserve() Allows mixing data_output with other output stream like seastar::simple_output_stream which is useful when switching to the new IDL-based serializers. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-02-19 23:11:59 +00:00
Amnon Heiman	1e4d227b20	managed_bytes: don't return auto from non-member function gcc 4.9 does not allow non-static data member declared auto. This patch replace the auto decleration with std::result_of_t Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1455652166-16860-1-git-send-email-amnon@scylladb.com>	2016-02-16 21:50:55 +02:00
Avi Kivity	13144ea9eb	managed_bytes: get rid of explicit linearize/scatter Now that everything is in a linarization context, we don't need to explicitly gather data.	2016-02-16 14:37:46 +02:00
Avi Kivity	af8ef54d5a	managed_bytes: introduce with_linearized_managed_bytes() A large managed_bytes blob can be scattered in lsa memory. Usually this is fine, but someone we want to examine it in place without copying it out, but using contiguous iterators for efficiency. For this use case, introduce with_linearized_managed_bytes(Func), which runs a function in a "linearization context". Within the linearization context, reads of managed_bytes object will see temporarily linearized copies instead of scattered data.	2016-02-09 19:55:13 +02:00
Avi Kivity	e5b72aedf1	managed_bytes: don't copy data during hashing	2016-02-08 12:43:05 +02:00
Avi Kivity	5d958db869	managed_bytes: fix operator== for fragmented blobs Must compare fragment by fragment.	2016-02-08 12:43:05 +02:00
Erich Keane	49842aacd9	managed_vector: maybe_constructed ctor to non-constexpr Clang enforces that a union's constexpr CTOR must initialize one of the members. The spec is seemingly silent as to what the rule on this is, however, making this non-constexpr results in clang accepting the constructor. Signed-off-by: Erich Keane <erich.keane@verizon.net> Message-Id: <1454604300-1673-1-git-send-email-erich.keane@verizon.net>	2016-02-07 10:30:45 +02:00
Gleb Natapov	4e440ebf8e	Remove old inet_address and uuid serializers	2016-02-02 12:15:50 +02:00
Raphael S. Carvalho	2164aa8d5b	move compaction manager from /utils to /sstables Compaction manager was initially created at utils because it was more generic, and wasn't only intended for compaction. It was more like a task handler based on futures, but now it's only intended to manage compaction tasks, and thus should be moved elsewhere. /sstables is where compaction code is located. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-21 15:23:05 -02:00
Raphael S. Carvalho	3bd240d9e8	compaction: add ability to stop an ongoing compaction That's needed for nodetool stop, which is called to stop all ongoing compaction. The implementation is about informing an ongoing compaction that it was asked to stop, so the compaction itself will trigger an exception. Compaction manager will catch this exception and re-schedule the compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Raphael S. Carvalho	ec4c73d451	compaction: rename compaction_stats to compaction_info compaction_info makes more sense because this structure doesn't only store stats about ongoing compaction. Soon, we will add information to it about whether or not an user asked to stop the respective ongoing compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-19 23:15:18 -02:00
Tomasz Grabiec	237819c31f	logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space Historically the purpose of the metric is to show how much memory is in standard allocations. After zones were introduced, this would also include free space in lsa zones, which is almost all memory, and thus the metric lost its original meaning. This change brings it back to its original meaning. Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>	2016-01-18 10:48:14 +02:00
Raphael S. Carvalho	a5c90194f5	db: add support to clean up a column family Cleanup is a procedure that will discard irrelevant keys from all sstables of a column family, thus saving disk space. Scylla will clean up a sstable by using compaction code, in which this sstable will be the only input used. Compaction manager was changed to become aware of cleanup, such that it will be able to schedule cleanup requests and also know how to handle them properly. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 03:53:04 -02:00
Raphael S. Carvalho	d44a5d1e94	compaction: filter out compacting sstables The implementation is about storing generation of compacting sstables in an unordered set per column family, so before strategy is called, compaction manager will filter out compacting sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 01:18:29 -02:00
Raphael S. Carvalho	9c13c1c738	compaction: move compaction execution from strategy to manager Currently, compaction strategy is the responsible for both getting the sstables selected for compaction and running compaction. Moving the code that runs compaction from strategy to manager is a big improvement, which will also make possible for the compaction manager to keep track of which sstables are being compacted at a moment. This change will also be needed for cleanup and concurrent compaction on the same column family. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2016-01-12 00:04:27 -02:00
Tomasz Grabiec	eb1b21eb4b	Introduce hashing helpers	2016-01-08 21:10:25 +01:00
Avi Kivity	c8b09a69a9	lsa: disable constant_time_size in binomial_heap implementation Corrupts heap on boost < 1.60, and not needed. Fixes #698.	2015-12-29 12:59:00 +01:00
Vlad Zolotarov	33552829b2	core: use steady_clock where monotinic clock is required Use steady_clock instead of high_resolution_clock where monotonic clock is required. high_resolution_clock is essentially a system_clock (Wall Clock) therefore may not to be assumed monotonic since Wall Clock may move backwards due to time/date adjustments. Fixes issue #638 Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-12-27 18:07:53 +02:00
Calle Wilund	803b58620f	data_output: specialize serialized_size for bool to ensure sync with write	2015-12-21 14:19:45 +00:00
Paweł Dziepak	442bc90505	compaction_manager: check whether the manager is already stopped Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-17 14:06:41 +01:00
Tomasz Grabiec	157af1036b	data_output: Introduce write_view() which matches data_input::read_view()	2015-12-16 18:06:54 +01:00
Raphael S. Carvalho	e74dcc86bd	compaction_manager: introduce list of compaction_stats This list will store compaction_stats for each ongoing compaction. That's why register and deregister methods are provided. This change is important for compaction stats API that needs data of each ongoing compaction, such as progress, ks, cf, etc. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2015-12-15 09:50:28 -02:00
Lucas Meneghel Rodrigues	2167173251	utils/logalloc.cc - Declare member minimum_size from segment_zone struct This fixes compile error: In function `logalloc::segment_zone::segment_zone()': /home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size' collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed. Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>	2015-12-10 12:54:34 +02:00
Paweł Dziepak	ec453c5037	managed_bytes: fix potentially unaligned accesses blob_storage defined with attribute packed which makes its alignment requirement equal 1. This means that its members may be unaligned. GCC is obviously aware of that and will generate appropriate code (and not generate ubsan checks). However, there are few places where members of blob_storage are accessed via pointers, these have to be wrapped by unaligned_cast<> to let the compiler know that the location pointed to may be not aligned properly. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-10 11:59:54 +02:00
Avi Kivity	204610ac61	Merge "Make LSA more large-allocation-friendly" from Paweł "This series attempts to make LSA more friendly for large (i.e. bigger than LSA segment) allocations. It is achieved by introducing segment zones – large, contiguous areas of segments and using them to allocate segments instead of calling malloc() directly. Zones can be shrunk when needed to reclaim memory and segments can be migrated either to reduce number of zone or to defragment one in order to be able to shrink it. LSA tries to keep all segments at the lower addresses and reclaims memory starting from the zones in the highest parts of the address space."	2015-12-09 10:49:23 +02:00
Paweł Dziepak	8ba66bb75d	managed_bytes: fix copy size in move constructor Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-09 10:38:28 +02:00
Paweł Dziepak	0d66300d43	lsa: add more counters Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	83b004b2fb	lsa: avoid fragmenting memory Originally, lsa allocated each segment independently what could result in high memory fragmentation. As a result many compaction and eviction passes may be needed to release a sufficiently big contiguous memory block. These problems are solved by introduction of segment zones, contiguous groups of segments. All segments are allocated from zones and the algorithm tries to keep the number of zones to a minimum. Moreover, segments can be migrated between zones or inside a zone in order to deal with fragmentation inside zone. Segment zones can be shrunk but cannot grow. Segment pool keeps a tree containing all zones ordered by their base addresses. This tree is used only by the memory reclamer. There is also a list of zones that have at least one free segments that is used during allocation. Segment allocation doesn't have any preferences which segment (and zone) to choose. Each zone contains a free list of unused segments. If there are no zones with free segments a new one is created. Segment reclamation migrates segments from the zones higher in memory to the ones at lower addresses. The remaining zones are shrunk until the requested number of segments is reclaimed. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2fb14a10b6	utils: add dynamic_bitset A dynamic bitset implementation that provides functions to search for both set and cleared bits in both directions. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	40dda261f2	lsa: maintain segment to region mapping Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Paweł Dziepak	2e94086a2c	lsa: use bi::list to implement segment_stack Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2015-12-08 19:31:40 +01:00
Tomasz Grabiec	6ead7a0ec5	Merge tag 'large-blobs/v3' from git@github.com:avikivity/scylla.git Scattering of blobs from Avi: This patchset converts the stack to scatter managed_bytes in lsa memory, allowing large blobs (and collections) to be stored in memtable and cache. Outside memtable/cache, they are still stored sequentially, but it is assumed that the number of transient objects is bounded. The approach taken here is to scatter managed_bytes data in multiple blob_storage objects, but to linearize them back when accessing (for example, to merge cells). This allows simple access through the normal bytes_view. It causes an extra two copies, but copying a megabyte twice is cheap compared to accessing a megabyte's worth of small cells, so per-byte throughput is increased. Testing show that lsa large object space is kept at zero, but throughput is bad because Scylla easily overwhelms the disk with large blobs; we'll need Glauber's throttling patches or a really fast disk to see good throughput with this.	2015-12-08 19:15:13 +01:00
Avi Kivity	0c2fba7e0b	lsa: advertize our preferred maximum allocation size Let managed_bytes know that allocating below a tenth of the segment size is the right thing to do.	2015-12-08 15:17:09 +02:00
Avi Kivity	13324607e6	managed_bytes: conform to allocation_strategy's max_preferred_allocation_size Instead of allocating a single blob_storage, chain multiple blob_storage objects in a list, each limited not to exceed the allocation_strategy's max_preferred_allocation_size. This allows lsa to allocate each blob_storage object as an lsa managed object that can be migrated in memory. Also provide linearize()/scatter() methods that can be used to temporarily consolidate the storage into a single blob_storage. This makes the data contiguous, so we can use a regular bytes_view to examine it.	2015-12-08 15:17:08 +02:00
Tomasz Grabiec	657841922a	Mark move constructors noexcept when possible	2015-12-07 09:50:27 +01:00
Avi Kivity	2437fc956c	allocation_strategy: expose preferred allocation size limit Our premier allocation_strategy, lsa, prefers to limit allocations below a tenth of the segment size so they can be moved around; larger allocations are pinned and can cause memory fragmentation. Provide an API so that objects can query for this preferred size limit. For now, lsa is not updated to expose its own limit; this will be done after the full stack is updated to make use of the limit, or intermediate steps will not work correctly.	2015-12-06 16:23:42 +02:00
Amnon Heiman	61abc85eb3	histogram: Add started counter This patch adds a started counter, that is used to mark the number of operation that were started. This counter serves two purposes, it is a better indication for when to sample the data and it is used to indicate how many pending operations are. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-01 15:28:06 +02:00
Amnon Heiman	88dcf2e935	latency: Switch to steady_clock The system clock is less suitable for for time difference than steady_clock. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2015-12-01 15:28:06 +02:00
Tomasz Grabiec	a3e3add28a	utils: Introduce phased_barrier Utility for waiting on a group of async actions started before certain point in time.	2015-11-29 16:25:21 +01:00

1 2 3 4 5

237 Commits