Commit Graph

56 Commits

Author SHA1 Message Date
Calle Wilund
43f7eecf9e compress: move compress.cc/hh to sstables/compressor
Fixes #22106

Moves the shared compress components to sstables, and rename to
match class type.

Adjust includes, removing redundant/unneeded ones where possible.

Closes scylladb/scylladb#25103
2025-07-31 13:10:41 +03:00
Ernest Zaslavsky
8d49bb8af2 sstables: Start using make_data_or_index_source in sstable
Convert all necessary methods to be awaitable. Start using `make_data_or_index_source`
when creating data_source for data and index components.

For proper working of compressed/checksummed input streams, start passing
stream creator functors to `make_(checksummed/compressed)_file_(k_l/m)_format_input_stream`.
2025-07-15 10:10:23 +03:00
Michał Chojnowski
cee504f66f sstables/compress: discard hidden compression options after the decompressor is created
Dictionary contents are kept in the list of "compression options" in the
header of `CompressionInfo.db`, and they are loaded from disk into
memory when the `sstable::compression` object is populated.

After the decompressor for the SSTable is created based on those
dict contents, they are not needed in RAM anymore. And since
they take up a sizeable amount of memory, we would like to free them.

In this patch, we discard all "hidden compression options"
(currently: only the dictionary contents) from the
`sstable::compression` object right after the decompressor is created.
(Those options are not supposed to be used for anything else anyway).
2025-04-01 00:07:30 +02:00
Michał Chojnowski
10fa4abde7 compress: change compressor_ptr from shared_ptr to unique_ptr
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
2025-04-01 00:07:29 +02:00
Michał Chojnowski
006c631642 sstables/compress: remove get_sstable_compressor()
Following up on the previous commit, we avoid constructing
a compressor in the `sstable_info` API call, and we instead
read the compression options from the `sstable::compression`.
2025-04-01 00:07:28 +02:00
Michał Chojnowski
8e611536b0 sstables/compress: move ownership of compressor to sstable::compression
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.

`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.

But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.

We stuff the ownership of this compressor into `sstable::compression`.

To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
2025-04-01 00:07:27 +02:00
Michał Chojnowski
cfe69e057f sstables/compress: break the dependency of compression_parameters on compressor
Note: this commit is meant to be a code refactoring only and is not intended
to change the observable behaviour.

Today `schema` contains a `compression_parameters`.
`compression_parameters` contains an instance of
`compressor`, and SSTable writers just share that instance.

This is fine because `compressor` is a stateless object,
functionally dependent on the schema.

But in later parts of the series, we will break this functional
dependency by adding dictionaries to compressors. Two writers
for the same schema might have different dictionaries, so they won't
be able to just share a single instance contained in the schema.

And when that happens, having a `compressor` instance
in the `schema`/`compression_parameters` will become awkward,
since it won't be actually used. It will be only a container for options.

In addition, for performance reasons, we will want to share some pieces
of compressors across shards, which will require -- in the general case --
a construction of a compressor to be asynchronous, and therefore not
possible inside the constructor of `compression_parameters`.

This commit modifies `compression_parameters` so that it doesn't hold or
construct instances of `compressor`.

Before this patch, the `compressor` instance constructed in
`compression_parameters` has an additional role of validating and
holding compressor-specific options.
(Today the only such option is the zstd compression level).

This means that the pieces of logic responsible for compressor-specific
options have to be rewritten. That ends up being the bulk of this commit.
2025-04-01 00:07:27 +02:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Nikos Dragazis
c893f06409 sstables: Add digest check in compressed data source
Following the addition of digest check in the checksummed data source,
add the same feature to the compressed data source as well. This ensures
consistent behavior across any type of SSTable.

This is added as an optional feature so that we can preserve the current
behavior, that is verify only the per-chunk checksums during normal user
reads. To ensure zero cost at runtime when disabled, we introduce the
on/off switch as a template parameter.

The digest calculation for compressed SSTables depends on the SSTable
format, hence the new template argument for the checksum mode. This is
consistent with the compressed data sink.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2024-10-03 18:09:01 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Kefu Chai
9318d21a22 sstables: change const_iterator::value_type to uint64_t
in general, the value_type of a `const_iterator` is `T` instead of
`const T`, what has the const specifier is `reference`. because,
when dereferencing an iterator, the value type does not matter any
more, as it always a copy.

and GCC-14 points this out:

```
/home/kefu/dev/scylladb/sstables/compress.hh:224:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
  224 |             value_type operator*() const {
      |             ^~~~~~~~~~
/home/kefu/dev/scylladb/sstables/compress.hh:228:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers]
  228 |             value_type operator[](ssize_t i) const {
      |             ^~~~~~~~~~
```

so, in this change, let's change the value_type to `uint64_t`.
please note, it's not typical to return `value_type` from `operator*`
or `operator[]` of an iterator. but due to the design of
segmented_offsets, we cannot return a reference, so let's keep it
this way.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19186
2024-06-09 19:21:16 +03:00
Botond Dénes
37fd568139 sstables/compress.hh: remove unused forward declaration
struct compress if forward declared right before its definition. At some
point in the past there was probably some code there using it, but now
its gone so remove it.

Closes scylladb/scylladb#19168
2024-06-09 17:52:05 +03:00
Lakshmi Narayanan Sreethar
de6570e1ec serializer_impl, sstables: fix build failure due to missing includes
When building scylla with cmake, it fails due to missing includes in
serializer_impl.hh and sstables/compress.hh files. Fix that by adding
the appropriate include files.

Fixes #18343

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#18344
2024-04-23 12:03:51 +03:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Kefu Chai
f5b05cf981 treewide: use defaulted operator!=() and operator==()
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.

fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.

in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.

sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.

also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.

please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.

this change is a cleanup to modernize the code base with C++20
features.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13687
2023-04-27 10:24:46 +03:00
Kefu Chai
df63e2ba27 types: move types.{cc,hh} into types
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.

the building systems are updated accordingly.

the source files referencing `types.hh` were updated using following
command:

```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```

the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12926
2023-02-19 21:05:45 +02:00
Botond Dénes
c4688563e3 sstables: track decompressed buffers
Convert decompressed temporary buffers into tracked buffers just before
returning them to the upper layer. This ensures these buffers are known
to the reader concurrency semaphore and it has an accurate view of the
actual memory consumption of reads.

Fixes: #12448

Closes #12454
2023-01-08 15:34:28 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Piotr Jastrzebski
10228b35c5 compress: Remove unused make_compressed_file_k_l_format_output_stream
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2021-06-27 15:12:31 +02:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Piotr Jastrzebski
bacda100ec sstables: remove std::iterator from const_iterator
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Rafael Ávila de Espíndola
13282b3d4c sstables: Pass an output_stream to make_compressed_file_.*_format_output_stream
This is a bit simpler as we don't have to pass in the options and
moves the calls to make_file_output_stream to places where we can
handle futures.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
2020-06-03 10:32:46 -07:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Avi Kivity
0d0ee20f76 Merge "Implement sstable_info API command (info on sstables)" from Calle
"
Refs #4726

Implement the api portion of a "describe sstables" command.

Adds rest types for collecting both fixed and dynamic attributes, some grouped. Allows extensions to add attributes as well. (Hint hint)
"

* 'sstabledesc' of https://github.com/elcallio/scylla:
  api/storage_service: Add "sstable_info" command
  sstables/compress: Make compressor pointer accessible from compression info
  sstables.hh: Add attribute description API to file extension
  sstables.hh: Add compression component accessor
  sstables.hh: Make "has_component" public
2019-08-12 21:16:08 +03:00
Calle Wilund
95a8ff12e7 sstables/compress: Make compressor pointer accessible from compression info 2019-08-06 07:07:44 +00:00
Kamil Braun
f14e6e73bb Add ZStandard compression
This adds the option to compress sstables using the Zstandard algorithm
(https://facebook.github.io/zstd/).
To use, pass 'sstable_compression': 'org.apache.cassandra.io.compress.ZstdCompressor'
to the 'compression' argument when creating a table.
You can also specify a 'compression_level'. See Zstd documentation for the available
compression levels.
Resolves #2613.

Signed-off-by: Kamil Braun <kbraun@scylladb.com>
2019-08-05 14:55:53 +02:00
Benny Halevy
0e3f9c25e4 sstables: compress.hh: add missing include
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2019-03-26 16:05:08 +02:00
Vladimir Krivopalov
cc62ad3b69 sstables: Make compressed streams customizable on checksumming.
Use either Adler32 or CRC32 while writing to or reading from a
compressed stream.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:07 +03:00
Vladimir Krivopalov
5183294676 sstables: Move checksum calculation logic to compressed_output_stream.
Previously, compressed_output_stream used to calculate checksum of the
supplied chunk and pass it to the 'compression' object to combine with
the full checksum calculated on prior writes.
Now, all the checksum calculation happens inside
compressed_output_stream and 'compression' only stores the result.

This is done to loosen ties between two classes and simplify
compressed_output_stream customisation with various checksum algorithms.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-19 20:52:07 +03:00
Vladimir Krivopalov
adb43959d1 sstables: Move adler32 routines under the scope of a class.
This is a step towards making digest algorithm customizable at compile
time.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-13 12:38:25 -07:00
Vladimir Krivopalov
4e4030676f sstables: Move checksum utils into separate header.
Checksummed writer doesn't need to include all compression stuff.

Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com>
2018-05-13 12:38:25 -07:00
Calle Wilund
74758c87cd sstables::compress/compress: Make compression a virtual object
Make a "compressor" an actual class, that can be implemented and
registered via class registry. 

For "common" compressors, the objects will be shared, but complex
implementors can be semi-stateful. 

sstable compression is split into two parts: The "static" config
which is shared across shards, and a "local" one, which holds 
a compressor pointer. The latter is encapsulated, along with 
actual compressed data writers, in sstables/compress.cc.

For compression (write), compression writer is instansiated 
with the settings active in table metadata. 

For decompression (read), compression reader is instansiated
with the settings stored in sstable metadata, which can 
differ from the currently active table metadata. 

v2:
* Structured patch sets differently (dependencies)
* Added more comments/api descs
* Added patch to move all sstable compression into compress.cc,
  effectively separating top-level virtual compressor object
  from sstable io knowledge
v3:
* Rebased
v4: 
* Moved all sstable compression logic/knowledge into  
  compress.cc (local compression). Merged the two patches 
  (separation just confuses reader).
2018-02-07 10:11:45 +00:00
Raphael S. Carvalho
09f4ee808f sstables/compress: Fix race condition in segmented offset reading of shared sstable
Race condition was introduced by commit 028c7a0888, which introduces chunk offset
compression, because a reading state is kept in the compress structure which is
supposed to be immutable and can be shared among shards owning the same sstable.

So it may happen that shard A updates state while shard B relies on information
previously set which leads to incorrect decompression, which in turn leads to
read misbehaving.

We could serialize access to at() which would only lead to contention issues for
shared sstables, but that can be avoided by moving state out of compress structure
which is expected to be immutable after sstable is loaded and feeded to shards that
own it. Sequential accessor (wraps state and reference to segmented_offset) is
added to prevent at() and push_back() interfaces from being polluted.

Tests: release mode.

Fixes #3148.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20180205192432.23405-1-raphaelsc@scylladb.com>
2018-02-06 12:10:10 +02:00
Botond Dénes
d1209c548a Fix -Wreturn-type warnings
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <99f7a006daaa78eb87720ac51c394093398bc868.1504013915.git.bdenes@scylladb.com>
2017-08-29 16:41:09 +03:00
Avi Kivity
fa8d0fe4d0 Revert "Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond""""
This reverts commit 238877a0c6.  A fix was found and will be committed
shortly.
2017-08-28 16:14:13 +03:00
Avi Kivity
238877a0c6 Revert "Revert "Revert "Merge "Compress in-memory compression-info" from Botond"""
This reverts commit 9d27455744. It's still broken.

To reproduce:

  ./tools/bin/cassandra-stress write -schema compression=LZ4Compressor

(on a clean database)

.0  0x00007ffff32aa69b in raise () from /lib64/libc.so.6
.1  0x00007ffff32ac4a0 in abort () from /lib64/libc.so.6
.2  0x000000000054a0e8 in seastar::memory::abort_on_underflow (size=<optimized out>) at core/memory.cc:1189
.3  seastar::memory::allocate_large (size=<optimized out>) at core/memory.cc:1194
.4  0x000000000054b305 in seastar::memory::allocate (size=size@entry=18446744073702885265) at core/memory.cc:1227
.5  0x000000000054b45e in malloc (n=n@entry=18446744073702885265) at core/memory.cc:1452
.6  0x00000000006013e4 in seastar::temporary_buffer<char>::temporary_buffer (this=0x6010195fc800, size=18446744073702885265) at /home/avi/urchin/seastar/core/temporary_buffer.hh:72
.7  0x0000000000a3908b in seastar::input_stream<char>::read_exactly (this=0x6010053d0248, n=18446744073702885265) at /home/avi/urchin/seastar/core/iostream-impl.hh:189
.8  0x0000000000a9c77f in compressed_file_data_source_impl::get (this=0x6010053d0240) at sstables/compress.cc:499
.9  0x0000000000aa1b01 in seastar::data_source::get (this=<optimized out>) at /home/avi/urchin/seastar/core/iostream.hh:63
.10 seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}::operator()() const (__closure=__closure@entry=0x6010195fcab0) at /home/avi/urchin/seastar/core/iostream-impl.hh:204
.11 0x0000000000aa22f0 in seastar::futurize<seastar::future<seastar::bool_class<seastar::stop_iteration_tag> > >::apply<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}&>(sstables::data_consume_rows_context&&) (func=...) at /home/avi/urchin/seastar/core/future.hh:1312
.12 seastar::repeat<seastar::future<> seastar::input_stream<char>::consume<sstables::data_consume_rows_context>(sstables::data_consume_rows_context&)::{lambda()#1}>(sstables::data_consume_rows_context&&) (action=...) at /home/avi/urchin/seastar/core/future-util.hh:203
.13 0x0000000000a9e730 in seastar::input_stream<char>::consume<sstables::data_consume_rows_context> (consumer=..., this=<optimized out>) at /home/avi/urchin/seastar/core/iostream-impl.hh:237
.14 data_consumer::continuous_data_consumer<sstables::data_consume_rows_context>::consume_input<sstables::data_consume_rows_context> (c=..., this=<optimized out>) at sstables/consumer.hh:226
.15 sstables::data_consume_context::impl::read (this=<optimized out>) at sstables/row.cc:411
.16 sstables::data_consume_context::read (this=<optimized out>) at sstables/row.cc:437
.17 0x0000000000aafbae in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const (__closure=<optimized out>) at sstables/partition.cc:843
.18 seastar::apply_helper<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, std::tuple<>&&, std::integer_sequence<unsigned long> >::apply({lambda()#2}&&, std::tuple) (args=..., func=...) at ./seastar/core/apply.hh:36
.19 seastar::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=..., func=...)
    at ./seastar/core/apply.hh:44
.20 seastar::futurize<seastar::future<> >::apply<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&, std::tuple<>&&) (args=...,
    func=...) at ./seastar/core/future.hh:1302
.21 seastar::future<>::then<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}, seastar::future<> >(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const::{lambda()#1}&&) (
    this=this@entry=0x6010195fcbb0, func=...) at ./seastar/core/future.hh:890
.22 0x0000000000ac273f in sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}::operator()() const (__closure=0x6010195fcc28) at sstables/partition.cc:843
.23 seastar::do_until_continued<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&, seastar::promise<>) (stop_cond=..., action=..., p=...) at /home/avi/urchin/seastar/core/future-util.hh:155
.24 0x0000000000ac29c3 in seastar::do_until<sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}>(sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#1}&&, sstables::sstable_streamed_mutation::fill_buffer()::{lambda()#2}&&) (action=..., stop_cond=..., this=<optimized out>) at /home/avi/urchin/seastar/core/future-util.hh:330
.25 sstables::sstable_streamed_mutation::fill_buffer (this=<optimized out>) at sstables/partition.cc:844
.26 0x0000000000ad3d2b in streamed_mutation::fill_buffer (this=0x6010195fcd10) at ./streamed_mutation.hh:489
.27 consume_flattened_in_thread<stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >, std::function<bool (streamed_mutation const&)> >(mutation_reader&, stable_flattened_mutations_consumer<compact_for_compaction<sstables::compacting_sstable_writer> >&, std::function<bool (streamed_mutation const&)>&&) (

(gdb) p addr
$1 = {
  chunk_start = 13330037,
  chunk_len = 18446744073702885265,
  offset = 0
}
2017-08-27 13:32:37 +03:00
Avi Kivity
9d27455744 Revert "Revert "Merge "Compress in-memory compression-info" from Botond""
This reverts commit 9656fd79a0. A fix is now
available.
2017-08-24 13:37:35 +03:00
Tomasz Grabiec
9656fd79a0 Revert "Merge "Compress in-memory compression-info" from Botond"
This reverts commit ef85cf1cb3, reversing
changes made to de011ece52.

Vlad reports that this causes SIGSEGV on cluster restarts.

seastar::backtrace_buffer::append_backtrace() at /home/vladz/work/urchin/seastar/core/reactor.cc:274
 (inlined by) print_with_backtrace at /home/vladz/work/urchin/seastar/core/reactor.cc:289
seastar::print_with_backtrace(char const*) at /home/vladz/work/urchin/seastar/core/reactor.cc:296
sigsegv_action at /home/vladz/work/urchin/seastar/core/reactor.cc:3512
 (inlined by) operator() at /home/vladz/work/urchin/seastar/core/reactor.cc:3498
 (inlined by) _FUN at /home/vladz/work/urchin/seastar/core/reactor.cc:3494
?? ??:0
operator()<seastar::temporary_buffer<char> > at /home/vladz/work/urchin/sstables/sstables.cc:870
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) do_void_futurize_apply_tuple<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1270
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)>, seastar::temporary_buffer<char> > at /home/vladz/work/urchin/seastar/core/future.hh:1290
 (inlined by) then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>::<lambda(auto:104)> > at /home/vladz/work/urchin/seastar/core/future.hh:890
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:873
 (inlined by) do_until_continued<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:155
do_until<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>, sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()>::<lambda()>&> at /home/vladz/work/urchin/seastar/core/future-util.hh:330
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:874
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302
then<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()>::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:890
 (inlined by) operator() at /home/vladz/work/urchin/sstables/sstables.cc:875
 (inlined by) apply at /home/vladz/work/urchin/seastar/core/apply.hh:36
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/apply.hh:44
 (inlined by) apply<sstables::parse(sstables::random_access_reader&, sstables::compression&)::<lambda()> > at /home/vladz/work/urchin/seastar/core/future.hh:1302
operator()<seastar::future_state<> > at /home/vladz/work/urchin/seastar/core/future.hh:900
 (inlined by) run at /home/vladz/work/urchin/seastar/core/future.hh:395
seastar::reactor::run_tasks(seastar::circular_buffer<std::unique_ptr<seastar::task, std::default_delete<seastar::task> >, std::allocator<std::unique_ptr<seastar::task, std::default_delete<seastar::task> > > >&) at /home/vladz/work/urchin/seastar/core/reactor.cc:2317
seastar::reactor::run() at /home/vladz/work/urchin/seastar/core/reactor.cc:2775
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at /home/vladz/work/urchin/seastar/core/app-template.cc:142
2017-08-24 11:44:14 +02:00
Botond Dénes
028c7a0888 Optimise the storage of compression chunk offsets
To reduce the memory footprint of compression-info, n offsets are
grouped together into segments, where each segment stores a base
absolute offset into the file, the other offsets in the segments being
relative offsets (and thus of reduced size). Also offsets are
allocated only just enough bits to store their maximum value. The
offsets are thus packed in a buffer like so:
     arrrarrrarrr...
where n is 4, a is an absolute offset and r are offsets relative to a.

The optimal value of n can be calculated for a given file_size (f) and
chunk_size (c), by finding the minima of the following function:

f(n) = (f/c)/n * (log2(f) + (n - 1)*log2((n-1)*(c + 64)))

This is done in an empirical way, using a script (see below).

Furthermore segments are stored in buckets, where each bucket has its
own base offset. Each bucket therefore can address an equal chunk of the
file and furthermore each segment in a bucket can address an equal
sub-chunk of this area.
The value of a given offset i is thus:
    bucket_base_offset_for(i) + segment_base_offset_for(i) + offset(i)

To account for the bucketed storage we calculate a local_f, which is
optimized so that a bucketful of segmented offsets can address the
largest possible chunk of f. As value of this local_f only depends on
the bucket_size (b) and c the value of n can be made independent of f
and therefore only depend on one dynamic value, c. This makes life much
simpler as we don't need to know the size of the file up-front, we can
just append buckets to the storage on demand, while the required storage
is still less than a third [1] of the original storage requirements
(std::deque<uint64>).

The table with the minima(f(n)) for different f and c values is
pre-computed by gen_segmented_compress_params.py and
stored in sstables/segmented_compress_params.hh. This script also
creates a table with the best values of local_f for the given
bucket_size. At runtime we only select the best params based on c.

[1] This was calculated for c=4K and b=4K
2017-08-21 17:06:12 +03:00
Raphael S. Carvalho
15246f31f7 sstables: fix incorrect sstable size when compression is enabled
Size of uncompressed sstable was being unconditionally used to determine
when to stop writing a table. When compression is enabled, compressed
size should be used instead. Problem affected Scylla when compression
and leveled strategy were used.

Fixes #1177.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <d9bf26def41fb33ca297f4127ce042b7f67adf96.1460484529.git.raphaelsc@scylladb.com>
2016-04-13 09:01:01 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Nadav Har'El
2f56577794 sstables: more efficient read of compressed data file
Before this patch, reading large ranges from a compressed data file involved
two inefficiencies:

 1.  The compressed data file was read one compressed chunk at a time.
     Such a chunk is around 30 KB in size, well below our desired sstable
     read-ahead size (sstable_buffer_size = 128 KB).

 2.  Because the compressed chunks have variable length (the uncompressed
     chunk has a fixed length) they are not aligned to disk blocks, so
     consecutive chunks have overlapping blocks which were unnecessarily
     read twice.

The fix for both issues is to build the compressed_file_input_stream on
an existing file_input_stream, instead of using direct file IO to read the
individual chunks. file_input_stream takes care of doing the appropriate
amount of read-ahead, and the compressed_file_input_stream layer does the
decompression of the data read from the underlying layer.

Fixes #992.

Historical note: Implementing compressed_file_input_stream on top of
file_input_stream was already tried in the past, and rejected. The problem
at that time was that compressed_file_input_stream's constructor did not
specify the *end* of the range to read, so that when we wanted to read
only a small range we got too much read-ahead beyond the exactly one
compressed chunk that we needed to read.  Following the fix to issue #964,
we now know on every streaming read also the intended *end* of the stream,
so we can now use this to stop reading at the end of the last required
chunk, even when we use a read-ahead buffer much larger than a chunk.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1457304335-8507-1-git-send-email-nyh@scylladb.com>
2016-03-09 10:14:15 +02:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Nadav Har'El
4edf7fe206 clean up uses of lw_shared_ptr<file>
recently, "file" started to use a shared_ptr internally, and is already
copy-able and reference counted, and there is no reason to use
lw_shared_ptr<file>. This patch cleans up a few remaining places where
lw_shared_ptr<file> was used.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-22 11:51:40 +03:00
Raphael S. Carvalho
113d3b1001 sstables: update compression ratio stats
If compression is used, we should provide both uncompressed and
compressed length to metadata collector, so as for the ratio to
be computed. Stats metadata stores compression ratio.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-21 08:14:07 +03:00
Raphael S. Carvalho
f17f3b197a sstables: add initial support to compression
lz4 is the unique compressor algorithm supported so far.
missing deflate and snappy algorithms.
Adding them should be relatively easy though.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-16 12:42:00 -03:00
Raphael S. Carvalho
3bfb86f541 sstables: add compress_max_size to compression
used to return maximum size which compressor may output.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-16 09:48:00 -03:00
Raphael S. Carvalho
d1ed0744f0 schema: add sstable compressor property
The field compressor is about saying which compressor algorithm
must be used in compression of sstable data file.
This is a small step towards compressed sstable data file.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-09 11:18:56 +03:00
Glauber Costa
2dbd2b408a sstables: change describe_type's return type to auto
We always return a future, but with the threaded writer, we can get rid of
that. So while reads will still return a future, the writer will be able to
return void.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-08 15:25:35 +03:00