Commit Graph

237 Commits

Author SHA1 Message Date
Tomasz Grabiec
a0cba3c86f logalloc: Introduce tracker::occupancy()
Returns occupancy information for all memory allocated by LSA, including
segment pools / zones.
2016-03-22 16:28:10 +01:00
Tomasz Grabiec
529c8b8858 logalloc: Rename tracker::occupancy() to region_occupancy() 2016-03-22 14:56:44 +01:00
Tomasz Grabiec
ca08db504b managed_bytes: Make operator[] work for large blobs as well
Fixes assertion in mutation_test:

mutation_test: ./utils/managed_bytes.hh:349: blob_storage::char_type* managed_bytes::data(): Assertion `!_u.ptr->next'

Introduced in ea7c2dd085

Message-Id: <1458648786-9127-1-git-send-email-tgrabiec@scylladb.com>
2016-03-22 14:43:52 +02:00
Tomasz Grabiec
184e2831e7 managed_bytes: Mark move-assignment noexcept 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
92d4cfc3ab managed_bytes: Make copy assignment exception-safe 2016-03-21 18:41:27 +01:00
Tomasz Grabiec
22d193ba9f managed_bytes: Make linearization_context::forget() noexcept
It is needed for noexcept destruction, which we need for exception
safety in higher layers.

According to [1], erase() only throws if key comparison throws, and in
our case it doesn't.

[1] http://en.cppreference.com/w/cpp/container/unordered_map/erase
2016-03-21 18:41:27 +01:00
Benoît Canet
1fb9a48ac5 exception: Optionally shutdown communication on I/O errors.
I/O errors cannot be fixed by Scylla the only solution
is to shutdown the database communications.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1458154098-9977-1-git-send-email-benoit@scylladb.com>
2016-03-17 15:02:52 +02:00
Paweł Dziepak
338fd34770 lsa: update _closed_occupancy after freeing all segments
_closed_occupancy will be used when a region is removed from its region
group, make sure that it is accurate.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-17 11:12:05 +00:00
Paweł Dziepak
99b61d3944 lsa: set _active to nullptr in region destructor
In region destructor, after active segments is freed pointer to it is
left unchanged. This confuses the remaining parts of the destructor
logic (namely, removal from region group) which may rely on the
information in region_impl::_active.

In this particular case the problem was that code removing from the
region group called region_impl::occupancy() which was
dereferencing _active if not null.

Fixes #993.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1457341670-18266-1-git-send-email-pdziepak@scylladb.com>
2016-03-07 10:15:28 +01:00
Calle Wilund
e79ca557ed managed_bytes: Change init of small object to silence error on gcc5
Fixes #865

(Some) gcc 5 (5.3.0 for me) on ubuntu will generate errors on
compilation of this code (compiling logalloc_test). The memcpy
to inline storage seems to confuse the compiler.
Simply change to std::copy, which shuts the compiler up.
Any decent stl should convert primitive std::copy to memcpy
anyway, but since it is also the inline (small storage),
it should not matter which way.

Message-Id: <1456931988-5876-4-git-send-email-calle@scylladb.com>
2016-03-02 18:21:51 +02:00
Calle Wilund
43ea1f5945 utils::jointpoint: Helper type to generate a singular value for all shards
Lets operations working on all shards "join" and acquire
the same value of something, with that value being based on
whenever all shards reach the join.

Obvious use case: time stamp after one set of per-shard ops, but
before final ones.

The generation of the value is guaranteed to happen on the shards
that created the join point.

Based on the join-ops in CF::snapshot, but abstracted and made
caller responsibility. Primary use case is to help deal with
the join-problem of truncation.

Message-Id: <1456332856-23395-1-git-send-email-calle@scylladb.com>
2016-02-24 18:59:25 +02:00
Paweł Dziepak
d5c794d5e4 data_output: add reserve()
Allows mixing data_output with other output stream like
seastar::simple_output_stream which is useful when switching to the new
IDL-based serializers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-02-19 23:11:59 +00:00
Amnon Heiman
1e4d227b20 managed_bytes: don't return auto from non-member function
gcc 4.9 does not allow non-static data member declared auto.

This patch replace the auto decleration with std::result_of_t

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1455652166-16860-1-git-send-email-amnon@scylladb.com>
2016-02-16 21:50:55 +02:00
Avi Kivity
13144ea9eb managed_bytes: get rid of explicit linearize/scatter
Now that everything is in a linarization context, we don't need to explicitly
gather data.
2016-02-16 14:37:46 +02:00
Avi Kivity
af8ef54d5a managed_bytes: introduce with_linearized_managed_bytes()
A large managed_bytes blob can be scattered in lsa memory.  Usually this is
fine, but someone we want to examine it in place without copying it out, but
using contiguous iterators for efficiency.

For this use case, introduce with_linearized_managed_bytes(Func),
which runs a function in a "linearization context".  Within the linearization
context, reads of managed_bytes object will see temporarily linearized copies
instead of scattered data.
2016-02-09 19:55:13 +02:00
Avi Kivity
e5b72aedf1 managed_bytes: don't copy data during hashing 2016-02-08 12:43:05 +02:00
Avi Kivity
5d958db869 managed_bytes: fix operator== for fragmented blobs
Must compare fragment by fragment.
2016-02-08 12:43:05 +02:00
Erich Keane
49842aacd9 managed_vector: maybe_constructed ctor to non-constexpr
Clang enforces that a union's constexpr CTOR must initialize
one of the members.  The spec is seemingly silent as to what
the rule on this is, however, making this non-constexpr results in clang
accepting the constructor.

Signed-off-by: Erich Keane <erich.keane@verizon.net>
Message-Id: <1454604300-1673-1-git-send-email-erich.keane@verizon.net>
2016-02-07 10:30:45 +02:00
Gleb Natapov
4e440ebf8e Remove old inet_address and uuid serializers 2016-02-02 12:15:50 +02:00
Raphael S. Carvalho
2164aa8d5b move compaction manager from /utils to /sstables
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:23:05 -02:00
Raphael S. Carvalho
3bd240d9e8 compaction: add ability to stop an ongoing compaction
That's needed for nodetool stop, which is called to stop all ongoing
compaction. The implementation is about informing an ongoing compaction
that it was asked to stop, so the compaction itself will trigger an
exception. Compaction manager will catch this exception and re-schedule
the compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
ec4c73d451 compaction: rename compaction_stats to compaction_info
compaction_info makes more sense because this structure doesn't
only store stats about ongoing compaction. Soon, we will add
information to it about whether or not an user asked to stop the
respective ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Tomasz Grabiec
237819c31f logalloc: Excluded zones' free segments in lsa/byres-non_lsa_used_space
Historically the purpose of the metric is to show how much memory is
in standard allocations. After zones were introduced, this would also
include free space in lsa zones, which is almost all memory, and thus
the metric lost its original meaning. This change brings it back to
its original meaning.

Message-Id: <1452865125-4033-1-git-send-email-tgrabiec@scylladb.com>
2016-01-18 10:48:14 +02:00
Raphael S. Carvalho
a5c90194f5 db: add support to clean up a column family
Cleanup is a procedure that will discard irrelevant keys from
all sstables of a column family, thus saving disk space.
Scylla will clean up a sstable by using compaction code, in
which this sstable will be the only input used.
Compaction manager was changed to become aware of cleanup, such
that it will be able to schedule cleanup requests and also know
how to handle them properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:04 -02:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Tomasz Grabiec
eb1b21eb4b Introduce hashing helpers 2016-01-08 21:10:25 +01:00
Avi Kivity
c8b09a69a9 lsa: disable constant_time_size in binomial_heap implementation
Corrupts heap on boost < 1.60, and not needed.

Fixes #698.
2015-12-29 12:59:00 +01:00
Vlad Zolotarov
33552829b2 core: use steady_clock where monotinic clock is required
Use steady_clock instead of high_resolution_clock where monotonic
clock is required. high_resolution_clock is essentially a
system_clock (Wall Clock) therefore may not to be assumed monotonic
since Wall Clock may move backwards due to time/date adjustments.

Fixes issue #638

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-27 18:07:53 +02:00
Calle Wilund
803b58620f data_output: specialize serialized_size for bool to ensure sync with write 2015-12-21 14:19:45 +00:00
Paweł Dziepak
442bc90505 compaction_manager: check whether the manager is already stopped
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-17 14:06:41 +01:00
Tomasz Grabiec
157af1036b data_output: Introduce write_view() which matches data_input::read_view() 2015-12-16 18:06:54 +01:00
Raphael S. Carvalho
e74dcc86bd compaction_manager: introduce list of compaction_stats
This list will store compaction_stats for each ongoing compaction.
That's why register and deregister methods are provided.
This change is important for compaction stats API that needs data
of each ongoing compaction, such as progress, ks, cf, etc.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:28 -02:00
Lucas Meneghel Rodrigues
2167173251 utils/logalloc.cc - Declare member minimum_size from segment_zone struct
This fixes compile error:

In function `logalloc::segment_zone::segment_zone()':
/home/lmr/Code/scylla/utils/logalloc.cc:412: undefined reference to `logalloc::segment_zone::minimum_size'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

Signed-off-by: Lucas Meneghel Rodrigues <lmr@scylladb.com>
2015-12-10 12:54:34 +02:00
Paweł Dziepak
ec453c5037 managed_bytes: fix potentially unaligned accesses
blob_storage defined with attribute packed which makes its alignment
requirement equal 1. This means that its members may be unaligned.
GCC is obviously aware of that and will generate appropriate code
(and not generate ubsan checks). However, there are few places where
members of blob_storage are accessed via pointers, these have to be
wrapped by unaligned_cast<> to let the compiler know that the location
pointed to may be not aligned properly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-10 11:59:54 +02:00
Avi Kivity
204610ac61 Merge "Make LSA more large-allocation-friendly" from Paweł
"This series attempts to make LSA more friendly for large (i.e. bigger
than LSA segment) allocations. It is achieved by introducing segment
zones – large, contiguous areas of segments and using them to allocate
segments instead of calling malloc() directly.
Zones can be shrunk when needed to reclaim memory and segments can be
migrated either to reduce number of zone or to defragment one in order
to be able to shrink it. LSA tries to keep all segments at the lower
addresses and reclaims memory starting from the zones in the highest
parts of the address space."
2015-12-09 10:49:23 +02:00
Paweł Dziepak
8ba66bb75d managed_bytes: fix copy size in move constructor
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-09 10:38:28 +02:00
Paweł Dziepak
0d66300d43 lsa: add more counters
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
83b004b2fb lsa: avoid fragmenting memory
Originally, lsa allocated each segment independently what could result
in high memory fragmentation. As a result many compaction and eviction
passes may be needed to release a sufficiently big contiguous memory
block.

These problems are solved by introduction of segment zones, contiguous
groups of segments. All segments are allocated from zones and the
algorithm tries to keep the number of zones to a minimum. Moreover,
segments can be migrated between zones or inside a zone in order to deal
with fragmentation inside zone.

Segment zones can be shrunk but cannot grow. Segment pool keeps a tree
containing all zones ordered by their base addresses. This tree is used
only by the memory reclamer. There is also a list of zones that have
at least one free segments that is used during allocation.

Segment allocation doesn't have any preferences which segment (and zone)
to choose. Each zone contains a free list of unused segments. If there
are no zones with free segments a new one is created.

Segment reclamation migrates segments from the zones higher in memory
to the ones at lower addresses. The remaining zones are shrunk until the
requested number of segments is reclaimed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2fb14a10b6 utils: add dynamic_bitset
A dynamic bitset implementation that provides functions to search for
both set and cleared bits in both directions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
40dda261f2 lsa: maintain segment to region mapping
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Paweł Dziepak
2e94086a2c lsa: use bi::list to implement segment_stack
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Tomasz Grabiec
6ead7a0ec5 Merge tag 'large-blobs/v3' from git@github.com:avikivity/scylla.git
Scattering of blobs from Avi:

This patchset converts the stack to scatter managed_bytes in lsa memory,
allowing large blobs (and collections) to be stored in memtable and cache.
Outside memtable/cache, they are still stored sequentially, but it is assumed
that the number of transient objects is bounded.

The approach taken here is to scatter managed_bytes data in multiple
blob_storage objects, but to linearize them back when accessing (for
example, to merge cells).  This allows simple access through the normal
bytes_view.  It causes an extra two copies, but copying a megabyte twice
is cheap compared to accessing a megabyte's worth of small cells, so
per-byte throughput is increased.

Testing show that lsa large object space is kept at zero, but throughput
is bad because Scylla easily overwhelms the disk with large blobs; we'll
need Glauber's throttling patches or a really fast disk to see good
throughput with this.
2015-12-08 19:15:13 +01:00
Avi Kivity
0c2fba7e0b lsa: advertize our preferred maximum allocation size
Let managed_bytes know that allocating below a tenth of the segment size is
the right thing to do.
2015-12-08 15:17:09 +02:00
Avi Kivity
13324607e6 managed_bytes: conform to allocation_strategy's max_preferred_allocation_size
Instead of allocating a single blob_storage, chain multiple blob_storage
objects in a list, each limited not to exceed the allocation_strategy's
max_preferred_allocation_size.  This allows lsa to allocate each blob_storage
object as an lsa managed object that can be migrated in memory.

Also provide linearize()/scatter() methods that can be used to temporarily
consolidate the storage into a single blob_storage.  This makes the data
contiguous, so we can use a regular bytes_view to examine it.
2015-12-08 15:17:08 +02:00
Tomasz Grabiec
657841922a Mark move constructors noexcept when possible 2015-12-07 09:50:27 +01:00
Avi Kivity
2437fc956c allocation_strategy: expose preferred allocation size limit
Our premier allocation_strategy, lsa, prefers to limit allocations below
a tenth of the segment size so they can be moved around; larger allocations
are pinned and can cause memory fragmentation.

Provide an API so that objects can query for this preferred size limit.

For now, lsa is not updated to expose its own limit; this will be done
after the full stack is updated to make use of the limit, or intermediate
steps will not work correctly.
2015-12-06 16:23:42 +02:00
Amnon Heiman
61abc85eb3 histogram: Add started counter
This patch adds a started counter, that is used to mark the number of
operation that were started.

This counter serves two purposes, it is a better indication for when to
sample the data and it is used to indicate how many pending operations
are.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-01 15:28:06 +02:00
Amnon Heiman
88dcf2e935 latency: Switch to steady_clock
The system clock is less suitable for for time difference than
steady_clock.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-01 15:28:06 +02:00
Tomasz Grabiec
a3e3add28a utils: Introduce phased_barrier
Utility for waiting on a group of async actions started before certain
point in time.
2015-11-29 16:25:21 +01:00