While the migration function should have enough information to obtain
the object size itself, the LSA logic needs to compute it as well.
IMR is going to make calculating object sizes more expensive, so by
providing the infromation to the migrator we can avoid some needless
operations.
It is non-trivial to get the size of an IMR object. However, the
standard allocator doesn't really need it and LSA can compute it itself
by asking the migrator.
The alignment of packed structs can be 1. The system¹ posix_memalign
function will return EINVAL when passed this alignment. This fix
forces the alignment to be at least sizeof(void*).
¹ The seastar implementation of posix_memalign does not appear to
have this limitation currently.
Every lsa-allocated object is prefixed by a header that contains information
needed to free or migrate it. This includes its size (for freeing) and
an 8-byte migrator (for migrating). Together with some flags, the overhead
is 14 bytes (16 bytes if the default alignment is used).
This patch reduces the header size to 1 byte (8 bytes if the default alignment
is used). It uses the following techniques:
- ULEB128-like encoding (actually more like ULEB64) so a live object's header
can typically be stored using 1 byte
- indirection, so that migrators can be encoded in a small index pointing
to a migrator table, rather than using an 8-byte pointer; this exploits
the fact that only a small number of types are stored in LSA
- moving the responsibility for determining an object's size to its
migrator, rather than storing it in the header; this exploits the fact
that the migrator stores type information, and object size is in fact
information about the type
The patch improves the results of memory_footprint_test as following:
Before:
- in cache: 976
- in memtable: 947
After:
mutation footprint:
- in cache: 880
- in memtable: 858
A reduction of about 10%. Further reductions are possible by reducing the
alignment of lsa objects.
logalloc_test was adjusted to free more objects, since with the lower
footprint, rounding errors (to full segments) are different and caused
false errors to be detected.
Missing: adjustments to scylla-gdb.py; will be done after we agree on the
new descriptor's format.
We allocate objects of a certain size, but we use a bit more memory to hold
them. To get a clerer picture about how much memory will an object cost us, we
need help from the allocator. This patch exports an interface that allow users
to query into a specific allocator to get that information.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Our premier allocation_strategy, lsa, prefers to limit allocations below
a tenth of the segment size so they can be moved around; larger allocations
are pinned and can cause memory fragmentation.
Provide an API so that objects can query for this preferred size limit.
For now, lsa is not updated to expose its own limit; this will be done
after the full stack is updated to make use of the limit, or intermediate
steps will not work correctly.
The migrator tells lsa how to move an object when it is compacted.
Currently it is a function pointer, which means we must know how to move
the object at compile time. Making it an object allows us to build the
migration function at runtime, making it suitable for runtime-defined types
(such as tuples and user-defined types).
In the future, we may also store the size there for fixed-size types,
reducing lsa overhead.
C++ variable templates would have made this patch smaller, but unfortunately
they are only supported on gcc 5+.
The goal is to make allocation less likely to fail. With async
reclaimer there is an implicit bound on the amount of memory that can
be allocated between deferring points. This bound is difficult to
enforce though. Sync reclaimer lifts this limitation off.
Also, allocations which could not be satisfied before because of
fragmentation now will have higher chances of succeeding, although
depending on how much memory is fragmented, that could involve
evicting a lot of segments from cache, so we should still avoid them.
Downside of sync reclaiming is that now references into regions may be
invalidated not only across deferring points but at any allocation
site. compaction_lock can be used to pin data, preferably just
temporarily.
This heavily used function shows up in many places in the profile (as part
of other functions), so it's worth optimizing by eliminating the special
case for the standard allocator. Use a statically allocated object instead.
(a non-thread-local object is fine since it has no data members).
Some code may attempt to use it during finalization after "instance"
was destroyed.
Reported by Pekka:
/usr/include/c++/4.9.2/bits/unique_ptr.h:291:14: runtime error:
reference binding to null pointer of type 'struct
standard_allocation_strategy'
./utils/allocation_strategy.hh:105:13: runtime error: reference
binding to null pointer of type 'struct standard_allocation_strategy'
./utils/allocation_strategy.hh:118:35: runtime error: reference
binding to null pointer of type 'struct allocation_strategy'
./utils/managed_bytes.hh:59:45: runtime error: member call on null
pointer of type 'struct allocation_strategy'
./utils/allocation_strategy.hh:82:9: runtime error: member access
within null pointer of type 'struct allocation_strategy'