Commit Graph

18 Commits

Author SHA1 Message Date
Avi Kivity
8dab93a853 sstables: fix low disk utilization with compression and small chunk lengths
As Nadav notes we use the chunk length as the buffer size for the compressed
stream too.

Fix by using it only for the outer (uncompressed) stream; the inner
(compressed) stream uses the sstable buffer size, 128 kiB.

Fixes #1402.
Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2016-07-07 18:13:30 +01:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Raphael S. Carvalho
f001bb0f53 sstables: fix make_checksummed_file_output_stream
Arguments buffer_size and true were accidently inverted.
GCC wasn't complaning because implicit conversion of bool to
int, and vice-versa, is valid.
However, this conversion is not very safe because we could
accidentaly invert parameters.

This should fix the last problem with sstable_test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>
2016-01-20 16:01:38 +01:00
Glauber Costa
63967db8bf sstables: always use a file_*_stream_options in our readers and writes
Instead of using the APIs that explicitly pass things like buffer_size,
always use the options instance instead.

This will make it easier to pass extra options in the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:37 +02:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Raphael S. Carvalho
6fe853fe7b sstables: fix possible use-after-free
buf is a stack variable, so it may be destroyed by the time it's
used by output_stream::write().

Spotted while auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-09-02 09:26:06 +03:00
Glauber Costa
0dd57fbca8 checksummed file writer: some cleanups
- no need to mark us as a friend of file_writer
- should be constructing the fields directly instead of using the constructors body.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-29 11:44:48 +03:00
Glauber Costa
66cc546781 sstable writer: compute checksum at larger chunks
What we are doing now, is computing checksum at every write() operation, possibly
at a small byte quantity - like 2 or 4 bytes, since we write those a lot as sizes.

While adler32 allows those computations and make them very easy, that doesn't mean
they are efficient. It is a lot more efficient to compute the checksum on larger
buffer.

We can do that by doing it at put() time in a data_sink_impl, instead of
keeping that in the file abstraction. The code for the checksum itself now also
becomes remarkably simpler - since there is no need anymore to keep state:
we'll always be presented with full buffers.

The data sink implementation and the file_writer share the full_checksum and
the checksum struct variables: and with that in place, the file writer can
still expose the final results of the computation in the same way it does at
present.

Benchmarked with:
perf_sstable_g  --smp 1 --iterations 30 --parallelism 1 --mode write --num_columns 5 --partitions 500000

Before:
178829.07 +- 141.28 partitions / sec (30 runs, 1 concurrent ops)
After:
199744.71 +- 201.64 partitions / sec (30 runs, 1 concurrent ops)

gain: 11.70 %

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-29 11:44:47 +03:00
Nadav Har'El
4edf7fe206 clean up uses of lw_shared_ptr<file>
recently, "file" started to use a shared_ptr internally, and is already
copy-able and reference counted, and there is no reason to use
lw_shared_ptr<file>. This patch cleans up a few remaining places where
lw_shared_ptr<file> was used.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-07-22 11:51:40 +03:00
Avi Kivity
4a95f1589c Merge seastar upstream
Adjust make_file_*_stream() callers for updated seastar API.
2015-07-20 17:02:46 +03:00
Raphael S. Carvalho
6dcf136702 sstables: enable trim_to_size option of compressed_file_output_stream
Following Nadav's discovery of the problem with large writes to output stream,
it turns out that compressed_file_output_stream also needs the option trim_to_
size enabled. Otherwise, a write to compressed_file_output_stream larger than
_size would result in a buffer larger than chunk size being flushed, which is
definitely wrong.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-28 19:57:07 +03:00
Nadav Har'El
9c7f1744b3 sstables: add missing virtual destructor
A base class with virtual functions should also have a virtual destructor,
so if someone deletes it by the base class pointer, the concrete class's
destructor will be called.

I thought this missing virtual destructor is to blame for a bug I was
hunting, but it's not - but it's still worth adding this missing definition.

The silly "default" definition of the move constructor is also necessary,
because when you define the destructor explicitly, the compiler no longer
defines any constructors implicitly for you.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-28 13:28:12 +03:00
Raphael S. Carvalho
113d3b1001 sstables: update compression ratio stats
If compression is used, we should provide both uncompressed and
compressed length to metadata collector, so as for the ratio to
be computed. Stats metadata stores compression ratio.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-21 08:14:07 +03:00
Raphael S. Carvalho
f17f3b197a sstables: add initial support to compression
lz4 is the unique compressor algorithm supported so far.
missing deflate and snappy algorithms.
Adding them should be relatively easy though.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-16 12:42:00 -03:00
Glauber Costa
0f0721af1f sstables: remove circular reference
writer.hh includes sstables.hh which includes writer.hh
We can't remove the reference if we include core/fstream.hh into writer.hh instead

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-07 22:50:07 +03:00
Pekka Enberg
31e9381be3 sstables: Defer full checksum calculation
Optimize checksum calculation by deferring "full checksum" update until
we've computed a full per-chunk checksum.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-06-04 15:48:37 +03:00
Raphael S. Carvalho
83d4b962ff sstables: move file writer to a new header
That was needed because new methods of sstable class will have a
file writer as a parameter, and thus the definition of the file
writer must be available from sstables header.
2015-06-02 17:30:50 -03:00