Commit Graph

945 Commits

Author SHA1 Message Date
Pavel Emelyanov
fdfcda97d7 allocation_strategy: Mark size_for_allocation_strategy noexcept
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-05-19 09:23:49 +03:00
Tomasz Grabiec
f8d7374400 Merge 'Add additional sstable stats' from Michael Livshin
Refs #251.

Closes #8630

* github.com:scylladb/scylla:
  statistics: add global bloom filter memory gauge
  statistics: add some sstable management metrics
  sstables: make the `_open` field more useful
  sstables: stats: noexcept all accessors
2021-05-12 14:35:13 +02:00
Michael Livshin
357ab759ee statistics: add global bloom filter memory gauge
Refs #251.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2021-05-12 03:48:07 +03:00
Benny Halevy
c0dafa75d9 utils: phased_barrier: advance_and_await: make noexcept
As a function returning a future, simplify
its interface by handling any exceptions and
returning an exceptional future instead of
propagating the exception.

In this specific case, throwing from advance_and_await()
will propagate through table::await_pending_* calls
short-circuiting a .finally clause in table::stop().

Also, mark as noexcept methods of class table calling
advance_and_await and table::await_pending_ops that depends on them.

Fixes #8636

A followup patch will convert advance_and_await to a coroutine.
This is done separately to facilitate backporting of this patch.

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210511161407.218402-1-bhalevy@scylladb.com>
2021-05-12 01:36:11 +02:00
Nadav Har'El
c7a814fd5c utils/enum_option.hh: make it easier to compare the value
The operator== of enum_option<> (which we use to hold multi-valued
Scylla options) makes it easy to compare to another enum_option
wrapper, but ugly to compare the actual value held. So this patch
adds a nicer way to compare the value held.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210511120222.1167686-1-nyh@scylladb.com>
2021-05-11 18:39:10 +03:00
Benny Halevy
9ba960a388 utils: phased_barrier::operation do not leak gate entry when reassigned
utils::phased_barrier holds a `lw_shared_ptr<gate>` that is
typically `enter()`ed in `phased_barrier::start()`,
and left when the operation is destroyed in `~operation`.

Currently, the operation move-assign implementation is the
default one that just moves the lw_shared gate ptr from the
other operation into this one, without calling `_gate->leave()` first.

This change first destroys *this when move-assigned (if not self)
to call _gate->leave() if engaged, before reassigning the
gate with the other operation::_gate.

A unit test that reproduces the issue before this change
and passes with the fix was added to serialized_action_test.

Fixes #8613

Test: unit(dev), serialized_action_test(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210510120703.1520328-1-bhalevy@scylladb.com>
2021-05-11 18:39:10 +03:00
Benny Halevy
2a168c3224 atomic_cell: get rid of is_value_fragments
It isn't used.  Along with it, get rid also of:
managed_bytes::is_fragmented and
managed_bytes_basic_view::is_fragmented

Test: unit(dev)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210506174115.171048-1-bhalevy@scylladb.com>
2021-05-09 11:08:53 +03:00
Avi Kivity
3114f09d76 utils: small_vector: add print operator for std::ostream
In order to replace std::vector with utils::small_vector, it needs to
support this feature too.
2021-05-05 12:10:59 +03:00
Pavel Solodovnikov
a7bd7dd122 utils: make basic UUID constructors constexpr
Mark default and `UUID(most_sig_bits, least_sig_bits)` ctors
as constexpr.

This allows to construct constexpr constants using UUID type.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20210429170630.533596-2-pa.solodovnikov@scylladb.com>
2021-05-02 16:39:52 +03:00
Avi Kivity
ae660eeec4 logalloc: reduce minimum lsa reserve in allocating_section to 1
Many workloads have fairly constant and small request sizes, so we
don't need large reserves for them. These workloads suffer needlessly
from the current large reserve of 10 segments (1.2MB) when they really
need a few hundred bytes. Reduce the reserve to a minimum of 1 segment.

Note that due to #8542 this can make a large difference. Consider a
workload that has a 1000-byte footprint in cache. If we've just
consumed some free memory and reduced the reserve to zero, then
we'll evict about 50,000 objects before proceeding to compact. With
the reserved reduced to 1, we'll evict 128 objects.  All this
for 1000 bytes of memory.

Of course, #8542 should be fixed, but reducing the reserve provides
some quick relief and makes sense even with the larger fix. The
reserve will quickly grow for workloads that handle bigger requests,
so they won't see an impact from the reduction.

Closes #8572
2021-05-02 15:22:04 +02:00
Avi Kivity
5801c93715 utils: rjson: convert enable_if to concept
Simpler and easier to understand. Vague comment about enable_if
removed.

Closes #8405
2021-04-25 21:53:46 +03:00
Avi Kivity
c36549b22e Merge 'rjson: Add throwing allocator' from Piotr Sarna
This series adds a wrapper for the default rjson allocator which throws on allocation/reallocation failures. It's done to work around several rapidjson (the underlying JSON parsing library) bugs - in a few cases, malloc/realloc return value is not checked, which results in dereferencing a null pointer (or an arbitrary pointer computed as 0 + `size`, with the `size` parameter being provided by the user). The new allocator will throw an `rjson:error` if it fails to allocate or reallocate memory.
This series comes with unit tests which checks the new allocator behavior and also validates that an internal rapidjson structure which we indirectly rely upon (Stack) is not left in invalid state after throwing. The last part is verified by the fact that its destructor ran without errors.

Fixes #8521
Refs #8515

Tests:
 * unit(release)
 * YCSB: inserting data similar to the one mentioned in #8515 - 1.5MB objects clustered in partitions 30k objects in size - nothing crashed during various YCSB workloads, but nothing also crashed for me locally before this patch, so it's not 100% robust
 relevant YCSB workload config for using 1.5MB objects:
```yaml
fieldcount=150
fieldlength=10000
```

Closes #8529

* github.com:scylladb/scylla:
  test: add a test for rjson allocation
  test: rename alternator_base64_test to alternator_unit_test
  rjson: add a throwing allocator
2021-04-22 17:12:02 +03:00
Avi Kivity
350f79c8ce Merge 'sstables: remove large allocations when parsing cells' from Wojciech Mitros
sstable cells are parsed into temporary_buffers, which causes large contiguous allocations for some cells.
This is fixed by storing fragments of the cell value in a fragmented_temporary_buffer instead.
To achieve this, this patch also adds new methods to the fragmented_temporary_buffer(size(), ostream& operator<<()) and adds methods to the underlying parser(primitive_consumer) for parsing byte strings into fragmented buffers.

Fixes #7457
Fixes #6376

Closes #8182

* github.com:scylladb/scylla:
  primitive_consumer: keep fragments of parsed buffer in a small_vector
  sstables: add parsing of cell values into fragmented buffers
  sstables: add non-contiguous parsing of byte strings to the primitive_consumer
  utils: add ostream operator<<() for fragmented_temporary_buffer::view
  compound_type: extend serialize_value for all FragmentedView types
2021-04-22 15:38:10 +02:00
Piotr Sarna
45d7144529 rjson: add a throwing allocator
The default rapidjson allocator returns nullptr from
a failed allocation or reallocation. It's not a bug by itself,
but rapidjson internals usually don't check for these return values
and happily use nullptr as a valid pointer, which leads to segmentation
faults and memory corruptions.
In order to prevent these bugs, the default allocator is wrapped
with a class which simply throws once it fails to allocate or reallocate
memory, thus preventing the use of nullptr in the code.
One exception is Malloc/Realloc with size 0, which is expected
to return nullptr by rapidjson code.
2021-04-21 14:26:38 +02:00
Piotr Sarna
2ad09d0bf8 Merge 'treewide: remove inclusions of storage_proxy.hh from headers' from Avi Kivity
Reduce rebuilds and build time by removing unnecessary includes. Along the way,
improve header sanity.

Ref #1.

Test: dev-headers, unit(dev).

Closes #8524

* github.com:scylladb/scylla:
  treewide: remove inclusions of storage_proxy.hh from headers
  storage_proxy: unnest coordinator_query_result
  treewide: make headers self-sufficient
  utils: intrusive_btree: add missing #pragma once
2021-04-21 08:22:52 +02:00
Avi Kivity
14a4173f50 treewide: make headers self-sufficient
In preparation for some large header changes, fix up any headers
that aren't self-sufficient by adding needed includes or forward
declarations.
2021-04-20 21:23:00 +03:00
Avi Kivity
6db1a71775 utils: intrusive_btree: add missing #pragma once
Interferes with making headers self-sufficient, so add it now.
2021-04-20 21:23:00 +03:00
Piotr Sarna
ec750e5f49 rjson: make the max nested level configurable
Back when rjson was only part of alternator, there was a hardcoded
limit of nested levels - 78. The number was calculated as:
 - taking the DynamoDB limit (32)
 - adding 7 to it to make alternator support more cases
 - doubling it because rjson internals bump the level twice
   for each alternator object (because the alternator object
   is represented as a 2-level JSON object).

Since rjson is no longer specific to alternator, this limit
is now configurable, and the original default value is explained
in a comment.

Message-Id: <51952951a7cd17f2f06ab36211f74086e1b60d2d.1618916299.git.sarna@scylladb.com>
2021-04-20 14:05:03 +03:00
Avi Kivity
ec3db140cb utils: data_input: replace enable_if with tightened concept
std::is_fundamental isn't a good constraint since it include nullptr_t
and void. Replace with std::integral which is sufficient. Use a concept
instead of enable_if to simplify the code.

Closes #8450
2021-04-11 18:56:21 +03:00
Pavel Emelyanov
26e27e27e8 btree: Add operator bool()
The btree's iterators allow for simple checking for '== tree.end()'
condition. For this check neither the tree itself, nor the ending
iterator is required. One just need to check if the _idx value is
the npos.

One additional change to make it work is required -- when removing
an entry from the inline node the _idx should be set to npos.

This change is, well, a bugfix. An iterator left with 0 in _idx is
treated as a valid one. However, the bug is non-triggerable. If such
an "invalid" iterator is compared against tree.end() the check would
return true, because the tree pointers would conside.

So this patch adds an operator bool() to btree iterator to facilitate
simpler checking if it reached the end of the collection or not.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-04-09 10:05:47 +03:00
Konstantin Osipov
c83cf1f965 uuid: switch the API to use std::chrono
A follow up for the patch for #7611. This change was requested
during review and moved out of #7611 to reduce its scope.

The patch switches UUID_gen API from using plain integers to
hold time units to units from std::chrono.

For one, we plan to switch the entire code base to std::chrono units,
to ensure type safety. Secondly, using std::chrono units allows to
increase code reuse with template metaprogramming and remove a few
of UUID_gen functions that beceme redundant as a result.

* switch  get_time_UUID(), unix_timestamp(), get_time_UUID_raw(), switch
  min_time_UUID(), max_time_UUID(), create_time_safe() to
  std::chrono
* remove unused variant of from_unix_timestamp()
* remove unused get_time_UUID_bytes(), create_time_unsafe(),
  redundant get_adjusted_timestamp()
* inline get_raw_UUID_bytes()
* collapse to similar implementations of get_time_UUID()
* switch internal constants to std::chrono
* remove unnecessary unique_ptr from UUID_gen::_instance
Message-Id: <20210406130152.3237914-2-kostja@scylladb.com>
2021-04-06 17:12:54 +03:00
Michał Chojnowski
f23a47e365 utils: fragment_range: fix FragmentedView utils for views with empty fragments
The copying and comparing utilities for FragmentedView are not prepared to
deal with empty fragments in non-empty views, and will fall into an infinite
loop in such case.
But data coming in result_row_view can contain such fragments, so we need to
fix that.

Fixes #8398.

Closes #8397
2021-04-04 15:31:51 +03:00
Avi Kivity
4739df2cb1 Merge 'cql3: remove linearizations in the write path' from Michał Chojnowski
As a part of the effort of removing big, contiguous buffers from the codebase,
cql3::raw_value should be made fragmented. Unfortunately a straightforward
rewrite to a fragmented buffer type is not possible, because we want
cql3::raw_value to be compatible with cql3::raw_value_view, and we want that
view to be based on fragmented_temporary_buffer::view, so that it can be
used to view data coming directly from seastar without copying.

This patch makes cql3::raw_value fragmented by making cql3::raw_value_view
a `variant` of managed_bytes_view and fragmented_temporary_buffer::view.

Code users which depended on `cql3::raw_value` being `bytes`,
and cql::raw_value_view being `fragmented_temporary_buffer::view` underneath
were adjusted to the new, dual representation, mainly through the
`cql3::raw_value_view::with_value` visitor and deserialization/validation
helpers added to `cql3::raw_value_view`.

The second part of this series gets rid of linearizations occuring when processing
compound types in the CQL layer. This is achieved by storing their elements in
`managed_bytes` instead of `bytes` in the partially deserialized form (`lists::value`
`tuples::value`, etc.) outputting `managed_bytes` instead of `bytes` in functions
which go from the partially deserialized form to the atomic cell format (for frozen
types), and avoiding calling deserialize/serialize on individual elements when
it's not necessary. (It's only necessary for CQLv2, because since CQLv3 the format
on the wire is the same as our internal one).

The above also forces some changes to `expression.cc`, and `restrictions`, mainly because
`IN` clauses store their arguments as `lists` and `tuples`, and the code which handled
this clause expected `bytes`.

After this series, the path from prepared CQL statements to `atomic_cell_or_collection`
is almost completely linearization-free. The last remaining place is `collection_mutation_description`,
where map keys are linearized to `bytes`.

Closes #8160

* github.com:scylladb/scylla:
  cql3: update_parameters: remove unused version of make_cell for bytes_view
  types: collection: remove an unused version of pack_fragmented
  cql3: optimize the deserialization of collections
  cql3: maps, sets: switch the element type from bytes to managed_bytes
  cql3: expression: use managed_bytes instead of bytes where possible
  cql3: expr: expression: make the argument of to_range a forwarding reference
  cql3: don't linearize elements of lists, tuples, and user types
  cql3: values: add const managed_bytes& constructor to raw_value_view
  cql3: output managed_bytes instead of bytes in get_with_protocol_version
  types: collection: add versions of pack for fragmented buffers
  types: add write_collection_{value,size} for managed_bytes_mutable_view
  cql3: tuples, user_types: avoid linearization in from_serialized() and get()
  types: tuple: add build_value_fragmented
  cql3: update_parameters: add make_cell version for managed_bytes_view
  cql3: remove operation::make_*cell
  cql3: values: make raw_value fragmented
  cql3: values: remove raw_value_view::operator==
  cql3: switch users of cql3::raw_value_view to internals-independent API
  cql3: values: add an internals-independent API to raw_value_view
  utils: managed_bytes: add a managed_bytes constructor from FragmentedView
  utils: managed_bytes: add operator<< and to_hex for managed_bytes
  utils: fragment_range: add to_hex
  configure: remove unused link dependencies from UUID_test
2021-04-01 15:21:32 +03:00
Pavel Emelyanov
8bbe2eae5e btree: Convert comparator to <=>
It turned out that all the users of btree can already be converted
to use safer std::strong_ordering. The only meaningful change here
is the btree code itself -- no more ints there.

tests: unit(dev)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210330153648.27049-1-xemul@scylladb.com>
2021-04-01 12:56:08 +03:00
Michał Chojnowski
45e0ef26d3 utils: managed_bytes: add a managed_bytes constructor from FragmentedView
Just for convenience. We will use it in an upcoming patch where we switch
the inner representation of cql3::raw_value from bytes to managed_bytes, and we
will want to construct managed_bytes from fragmented_temporary_buffer::view.
2021-04-01 10:39:42 +02:00
Michał Chojnowski
4715268e30 utils: managed_bytes: add operator<< and to_hex for managed_bytes
We will need them to replace bytes with managed_bytes in some places in an
upcoming patch.

The change to configure.py is necessary because opearator<< links to to_hex
in bytes.cc.
2021-04-01 10:39:42 +02:00
Michał Chojnowski
14c4639994 utils: fragment_range: add to_hex 2021-04-01 10:39:42 +02:00
Wojciech Mitros
3f529b2860 utils: add ostream operator<<() for fragmented_temporary_buffer::view
We are going to store sstable cells' values in fragmented_temporary_buffers.
This patch will allow checking these values with loggers.

Signed-off-by: Wojciech Mitros <wojciech.mitros@scylladb.com>
2021-03-31 12:09:52 +02:00
Avi Kivity
7c953f33d5 utils: disk-error-handler: replace enable_if with concepts
Simpler, cleaner. We also replace the deprecated std::result_of_t
with std::invoke_result_t.

Closes #8305
2021-03-30 09:29:46 +02:00
Avi Kivity
3c292e31af utils: utf8: fix validate_partial() on non-SIMD-optimized architectures
validate_partial() is declared in the internal namespace, but defined
outside it. This causes calls to validate_partial() to be ambiguous
on architectures that haven't been SIMD-optimized yet (e.g. s390x).

Fix by defining it in the internal namespace.

Closes #8268
2021-03-23 09:21:14 +02:00
Avi Kivity
4dae434f69 utils: crc: fix build with big-endian architectures and 1-byte objects
crc has some code to reverse endianness on big-endian machines, but does
not handle the case of a 1-byte object (which doesn't need any adjustement).
This causes clang to complain that the switch statement doesn't handle that
case.

Fix by adding a no-op case.

Closes #8269
2021-03-23 09:16:20 +02:00
Avi Kivity
29a5047982 utils: error_injection: convert enable_if to concepts
Constrain inject() with a requires clause rather than enable_if,
simplifying the code and compiler diagnostics.

Note that the second instance could not have been called, since
the template argument does not appear in the function parameter
list and thus could not be deduced. This is corrected here.

Closes #8322
2021-03-21 09:28:23 +02:00
Piotr Sarna
2509b7dbde Merge 'dht: convert ring_position and decorated_key to std::strong_ordering' from Avi Kivity
As #1449 notes, trichotomic comparators returning int are dangerous as they
can be mistaken for less comparators. This series converts dht::ring_position
and dht::decorated_key, as well as a few closely related downstream types, to
return std::strong_ordering.

Closes #8225

* github.com:scylladb/scylla:
  dht: ring_position, decorated_key: convert tri_comparators to std::strong_ordering
  pager: rephrase misleading comparison check
  test: total_order_checks: prepare for std::strong_ordering
  test: mutation_test: prepare merge_container for std::strong_ordering
  intrusive_array: prepare for std::strong_ordering
  utils: collection-concepts: prepare for std::strong_ordering
2021-03-18 11:51:54 +01:00
Avi Kivity
fe0f983dfb intrusive_array: prepare for std::strong_ordering
Newer comparators can return std::strong_ordering, so don't
expect an int.
2021-03-18 12:40:05 +02:00
Avi Kivity
9fbe4850c9 utils: collection-concepts: prepare for std::strong_ordering
collection-concepts includes a Comparable concept for a trichotomic
comparator function, used in intrusive btree and double_decker. Prepare
for std::strong_ordering by also allowing std::strong_ordering as a
return type. Once we've cleaned the code base, we can tighten it to
only allow std::strong_ordering.
2021-03-18 12:40:03 +02:00
Michał Chojnowski
5c3385730b treewide: get rid of unaligned_cast
unaligned_cast violates strict aliasing rules. Replace it with
safe equivalents.
2021-03-17 17:00:41 +01:00
Michał Chojnowski
4e35befcf2 treewide: get rid of incorrect reinterpret casts
In some places we use the `*reinterpret_cast<const net::packed<T>*>(&x)`
pattern to reinterpret memory. This is a violation of C++'s aliasing rules,
which invokes undefined behaviour.

The blessed way to correctly reinterpret memory is to copy it into a new
object. Let's do that.

Note: the reinterpret_cast way has no performance advantage. Compilers
recognize the memory copy pattern and optimize it away.
2021-03-17 17:00:38 +01:00
Avi Kivity
290897ddbc logalloc: background reclaim: use default scheduling group for adjusting shares
If the shares are currently low, we might not get enough CPU time to
adjust the shares in time.

This is currently no-op, since Seastar runs the callback outside
scheduling groups (and only uses the scheduling group for inherited
continuations); but better be insulated against such details.
2021-03-15 13:54:49 +02:00
Avi Kivity
a87f6498c3 logalloc: background reclaim: log shares adjustment under trace level
Useful when debugging, but too noisy at any other time.
2021-03-15 13:54:49 +02:00
Avi Kivity
ce1b1d6ec4 logalloc: background reclaim: fix shares not updated by periodic timer
adjust_shares() thinks it needs to do nothing if the main loop
is running, but in reality it can only avoid waking the main loop;
it still needs to adjust the shares unconditionally. Otherwise,
the background reclaim shares can get locked into a low value.

Fix by splitting the conditional into two.
2021-03-15 13:54:37 +02:00
Nadav Har'El
f41dac2a3a alternator: avoid large contiguous allocation for request body
Alternator request sizes can be up to 16 MB, but the current implementation
had the Seastar HTTP server read the entire request as a contiguous string,
and then processed it. We can't avoid reading the entire request up-front -
we want to verify its integrity before doing any additional processing on it.
But there is no reason why the entire request needs to be stored in one big
*contiguous* allocation. This always a bad idea. We should use a non-
contiguous buffer, and that's the goal of this patch.

We use a new Seastar HTTPD feature where we can ask for an input stream,
instead of a string, for the request's body. We then begin the request
handling by reading lthe content of this stream into a
vector<temporary_buffer<char>> (which we alias "chunked_content"). We then
use this non-contiguous buffer to verify the request's signature and
if successful - parse the request JSON and finally execute it.

Beyond avoiding contiguous allocations, another benefit of this patch is
that while parsing a long request composed of chunks, we free each chunk
as soon as its parsing completed. This reduces the peak amount of memory
used by the query - we no longer need to store both unparsed and parsed
versions of the request at the same time.

Although we already had tests with requests of different lengths, most
of them were short enough to only have one chunk, and only a few had
2 or 3 chunks. So we also add a test which makes a much longer request
(a BatchWriteItem with large items), which in my experiment had 17 chunks.
The goal of this test is to verify that the new signature and JSON parsing
code which needs to cross chunk boundaries work as expected.

Fixes #7213.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210309222525.1628234-1-nyh@scylladb.com>
2021-03-10 09:22:34 +01:00
Avi Kivity
9038a81317 treewide: drop SEASTAR_CONCEPT
Since Scylla requires C++20, there is no need to protect
concept definitions or usages with SEASTAR_CONCEPT; it just
clutters the code. This patch therefore removes all uses.

Closes #8236
2021-03-08 16:04:20 +01:00
Tomasz Grabiec
ecb6c56a2a Merge 'lsa: background reclaim' from Avi Kivity
This series adds background reclaim to lsa, with the goal
that most large allocations can be satisfied from available
free memory, and and reclaim work can be done from a preemptible
context.

If the workload has free cpu, then background reclaim will
utilize that free cpu, reducing latency for the main workload.
Otherwise, background reclaim will compete with the main
workload, but since that work needs to happen anyway,
throughput will not be reduced.

A unit test is added to verify it works.

Fixes #1634.

Closes #8044

* github.com:scylladb/scylla:
  test: logalloc_test: test background reclaim
  logalloc: reduce gap between std min_free and logalloc min_free
  logalloc: background reclaim
  logalloc: preemptible reclaim
2021-02-24 13:23:30 +01:00
Avi Kivity
78d1afeabd Merge "Use radix tree to store cells on a row" from Pavel E
"
Current storage of cells in a row is a union of vector and set. The
vector holds 5 cell_and_hash's inline, up to 32 ones in the external
storage and then it's switched to std::set. Once switched, the whole
union becomes the waste of space, as it's size is

   sizeof(vector head) + 5 * sizeof(cell and hash) = 90+ bytes

and only 3 pointers from it are used (std::set header). Also the
overhead to keep cell_and_hash as a set entry is more then the size
of the structure itself.

Column ids are 32-bit integers that most likely come sequentialy.
For this kind of a search key a radix tree (with some care for
non-sequential cases) can be beneficial.

This set introduces a compact radix tree, that uses 7-bit sub values
from the search key to index on each node and compacts the nodes
themselves for better memory usage. Then the row::_storage is replaced
with the new tree.

The most notable result is the memory footprint decrease, for wide
rows down to 2x times. The performance of micro-benchmarks is a bit
lower for small rows and (!) higer for longer (8+ cells). The numbers
are in patch #12 (spoiler: they are better than for v2)

v3:
- trimmed size of radix down to 7 bits
- simplified the nodes layouts, now there are 2 of them (was 4)
- enhanced perf_mutation to test N-cells schema
- added AVX intra-nodes search for medium-sized nodes
- added .clone_from() method that helped to improve perf_mutation
- minor
  - changed functions not to return values via refs-arguments
  - fixed nested classes to properly use language constructors
  - renamed index_to to key_t to distinguish from node_index_t
  - improved recurring variadic templates not to use sentinel argument
  - use standard concepts

v2:
- fixed potential mis-compilation due to strict-aliasing violation
- added oracle test (radix tree is compared with std::map)
- added radix to perf_collection
- cosmetic changes (concepts, comments, names)

A note on item 1 from v2 changelog. The nodes are no longer packed
perfectly, each has grown 3 bytes. But it turned out that when used
as cells container most of this growth drowned in lsa alignments.

next todo:
- aarch64 version of 16-keys node search

tests: unit(dev), unit(debug for radix*), pref(dev)
"

* 'br-radix-tree-for-cells-3' of https://github.com/xemul/scylla:
  test/memory_footpring: Print radix tree node sizes
  row: Remove old storages
  row: Prepare row::equal for switch
  row: Prepare row::difference for switch
  row: Introduce radix tree storage type
  row-equal: Re-declare the cells_equal lambda
  test: Add tests for radix tree
  utils: Compact radix tree
  array-search: Add helpers to search for a byte in array
  test/perf_collection: Add callback to check the speed of clone
  test/perf_mutation: Add option to run with more than 1 columns
  test/perf_mutation: Prepare to have several regular columns
  test/perf_mutation: Use builder to build schema
2021-02-18 21:19:14 +02:00
Botond Dénes
ba7a9d2ac3 imr: switch back to open-coded description of structures
Commit aab6b0ee27 introduced the
controversial new IMR format, which relied on a very template-heavy
infrastructure to generate serialization and deserialization code via
template meta-programming. The promise was that this new format, beyond
solving the problems the previous open-coded representation had (working
on linearized buffers), will speed up migrating other components to this
IMR format, as the IMR infrastructure reduces code bloat, makes the code
more readable via declarative type descriptions as well as safer.
However, the results were almost the opposite. The template
meta-programming used by the IMR infrastructure proved very hard to
understand. Developers don't want to read or modify it. Maintainers
don't want to see it being used anywhere else. In short, nobody wants to
touch it.

This commit does a conceptual revert of
aab6b0ee27. A verbatim revert is not
possible because related code evolved a lot since the merge. Also, going
back to the previous code would mean we regress as we'd revert the move
to fragmented buffers. So this revert is only conceptual, it changes the
underlying infrastructure back to the previous open-coded one, but keeps
the fragmented buffers, as well as the interface of the related
components (to the extent possible).

Fixes: #5578
2021-02-16 23:43:07 +01:00
Michał Chojnowski
25a9569cc4 utils: managed_bytes: add a few trivial helper methods
We will use them in the upcoming IMR removal patch.
2021-02-16 23:43:07 +01:00
Michał Chojnowski
3f248ca7cc utils: fragment_range: move FragmentedView helpers to fragment_range.hh
In the upcoming IMR removal patch we will need read_simple() and similar helpers
for FragmentedView outside of types.hh. For now, let's move them to
fragment_range.hh, where FragmentedView is defined. Since it's a widely included
header, we should consider moving them to a more specialized header later.
2021-02-16 21:35:15 +01:00
Michał Chojnowski
8a06a576aa utils: fragment_range: add single_fragmented_mutable_view
We will use it later in the upcoming IMR removal patch.
2021-02-16 21:35:15 +01:00
Michał Chojnowski
7b662b9315 utils: fragment_range: implement FragmentRange for fragment_range
This will allow us to pass FragmentedView instances to places where
FragmentRange is expected.
2021-02-16 21:35:15 +01:00
Michał Chojnowski
f972f90193 utils: mutable_view: add front()
We will use it in the upcoming patches.
2021-02-16 21:35:14 +01:00