Commit Graph

670 Commits

Author SHA1 Message Date
Nadav Har'El
c647d917e0 sstables: move to_bytes_view to header file
Move the to_bytes_view(temporary_buffer<char>) function from source file
to header file where is can be used in more places.

This saves one use of reinterpret_cast (which we are no re-evaluating),
and moreover, we want to use this function also in the promoted index
code (to return a bytes_view from the promoted index which was saved as a
temporary_buffer).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1468761437-27046-1-git-send-email-nyh@scylladb.com>
2016-07-17 16:29:26 +03:00
Paweł Dziepak
93cc4454a6 streamed_mutation: emit range_tombstones directly
Originally, streamed_mutations guaranteed that emitted tombstones are
disjoint. In order to achieve that two separate objects were produced
for each range tombstone: range_tombstone_begin and range_tombstone_end.

Unfortunately, this forced sstable writer to accumulate all clustering
rows between range_tombstone_begin and range_tombstone_end.

However, since there is no need to write disjoint tombstones to sstables
(see #1153 "Write range tombstones to sstables like Cassandra does") it
is also not necessary for streamed_mutations to produce disjoint range
tombstones.

This patch changes that by making streamed_mutation produce
range_tombstone objects directly.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-13 09:51:18 +01:00
Nadav Har'El
aec90a22da sstable parsing: assert we do not lose clustering rows
The sstable parsing code calls mp_row_consumer::flush() after every
clustering row has been read, and this puts the now complete row in a single
field "_ready". The assumption is that at this point parsing will stop, the
consumer will move out this _ready (mp_row_consumer::get_mutation_fragment())
and when flush() is later called again, _ready will be empty again.

This assumption is correct in our code, but is based on an intricate
combination of estoreric parts of the code, such as:

 1. In data_consume_row_context we stop parsing after reading the parition's
    header, before reading any clustering rows, giving the caller the chance
    to call sstable_streamed_mutation::read_next() to be prepared for the
    incoming mutations.

 2. In mp_row_consumer::flush_if_needed(), we stop the parser after each
    individual clustering row.

It is easy to break this assumption, and I did this in one of my code changes,
and the result was silent loss of clustering rows, as "_ready" got silently
overwritten before the reader had a chance to move it out.

What this patch does is to add an assertion: If a clustering row is silently
lost before being transferred to the mutation fragment reader, we croak.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <1468389955-24600-1-git-send-email-nyh@scylladb.com>
2016-07-13 09:42:48 +01:00
Duarte Nunes
4eca7632ec sstables: Replace composite fields with raw bytes
This patch fixes a regression introduced in
f81329be60, which made keys compound by
default when using a particular ctor, in turn leading to mismatches
when comparing the same key built with functions that properly
consider compoundness.

As a temporary fix, the sstable::key and sstable::key_view classes
store raw bytes instead of a composite.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468339295-3924-1-git-send-email-duarte@scylladb.com>
2016-07-12 18:08:04 +02:00
Duarte Nunes
f81329be60 sstables: sstables::key delegates to composite
The sstables::key class now delegates much of its functionality
to the composite class. All existing behavior is preserved.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 23:37:33 +02:00
Duarte Nunes
ad8ff1df7e sstables: Replace composite class
This patch replaces the sstables::composite class with the one in
compound_compat.hh.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-07-11 16:55:11 +02:00
Avi Kivity
24e3026e32 Merge "compaction manager refactoring" from Raphael 2016-07-10 17:16:23 +03:00
Tomasz Grabiec
8c4b5e4283 db: Avoiding checking bloom filters during compaction
Checking bloom filters of sstables to compute max purgeable timestamp
for compaction is expensive in terms of CPU time. We can avoid
calculating it if we're not about to GC any tombstone.

This patch changes compacting functions to accept a function instead
of ready value for max_purgeable.

I verified that bloom filter operations no longer appear on flame
graphs during compaction-heavy workload (without tombstones).

Refs #1322.
2016-07-10 09:54:20 +02:00
Raphael S. Carvalho
ed5e7e6842 compaction: refactor compaction manager
Previously, same function was used to handle both regular compaction
and cleanup requests. That's bad because a lot of conditions were
added for both compaction types to live in the same function.
Now, cleanup and regular compaction will live in different functions.
They share a lot of code, so helper functions were introduced.
This change is also important for user-initiated compaction that
will go through compaction manager in the future.
Code is also a lot easier to read now.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 16:37:53 -03:00
Raphael S. Carvalho
da6a2b429d compaction: add functions to register and deregister compacting sstables
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 16:00:51 -03:00
Raphael S. Carvalho
4d6dce8ec9 compaction: add helper function to get candidates for strategy
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 15:06:14 -03:00
Raphael S. Carvalho
bfc5376548 compaction: remove gate from compaction manager task
There is no longer a need to use gate for regular termination of
fiber that runs compaction. Now, we only set task->stopping to
true, ask for compaction termination, and wait for its future to
resolve. Code is simplified a lot with this change.

Reviewed-by: Nadav Har'El <nyh@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-08 15:05:10 -03:00
Avi Kivity
8dab93a853 sstables: fix low disk utilization with compression and small chunk lengths
As Nadav notes we use the chunk length as the buffer size for the compressed
stream too.

Fix by using it only for the outer (uncompressed) stream; the inner
(compressed) stream uses the sstable buffer size, 128 kiB.

Fixes #1402.
Message-Id: <1467910556-5759-1-git-send-email-avi@scylladb.com>
Reviewed-by: Nadav Har'El <nyh@scylladb.com>
2016-07-07 18:13:30 +01:00
Paweł Dziepak
5bc51821fe sstables: allow writing unsealed sstables
The purpose of this patch is to split the actions of writing sstable and
sealing it. As long as the sstable is unsealed it is considered
incomplete and is going to be removed on reboot.

Such functionality is needed in order to defer visibility of sstables
created during streaming until the streaming is complete.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
a7b6c1110f sstables: do not require seal_sstable() to be run in thread
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Raphael S. Carvalho
0772d20c60 fix compilation in debug mode
build/debug/sstables/compaction_strategy.o: In function
`date_tiered_manifest::date_tiered_manifest(std::map<basic_sstring<char, unsigned int, 15u>,
basic_sstring<char, unsigned int, 15u>, std::less<basic_sstring<char, unsigned int, 15u> >,
std::allocator<std::pair<basic_sstring<char, unsigned int, 15u> const, basic_sstring<char,
unsigned int, 15u> > > > const&)':
/home/centos/scylla/sstables/date_tiered_compaction_strategy.hh:67: undefined reference to
`date_tiered_manifest::DEFAULT_BASE_TIME_SECONDS'

That's fixed by moving definition of static constexpr outside the class.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20c16ad71f64900aa5591018bc4e976406cfebb3.1467870383.git.raphaelsc@scylladb.com>
2016-07-07 11:52:37 +03:00
Avi Kivity
02530faeb2 compaction: fix tombstones not being garbage collected during compaction
2a46410f4a changed sstable_list from a map
to a set, so it is no longer sorted by generation.  The code for finding
the list of sstables not being compacted relied on this sort order, and
now broke, returning a longer list than needed (including some of the
sstables being compacted).  As a result, the compaction code preserved
the tombstones, incorrectly thinking there was still live data they
referenced.

Fix by sorting the set explicitly.

Fixes #1429.
Message-Id: <1467793026-6571-1-git-send-email-avi@scylladb.com>
2016-07-06 10:22:31 +02:00
Raphael S. Carvalho
b699ef2de3 compaction: wire up date tiered compaction strategy
After this commit, date tiered compaction strategy is supported
on Scylla.

To understand how it works, take a look at our wiki page:
https://github.com/scylladb/scylla/wiki/SSTable-compaction#date-tiered-compaction

Fixes #511.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
e5cc0cc6c4 compaction: implement date tiered compaction strategy
This commit is basically about converting Java to C++.
Date tiered compaction strategy isn't wired yet.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
e9076f39be compaction: implement function to get fully expired sstables
Strongly based on org.apache.cassandra.db.compaction.
CompactionController.getFullyExpiredSSTables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 02:11:47 -03:00
Raphael S. Carvalho
92848efc42 sstables: make overlapping functions static
That's needed for a function that will get overlapping sstables to
get fully expired ones.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:34:34 -03:00
Raphael S. Carvalho
8d38fa49d4 sstables: move code to get uncompacting sstables to a function
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:33:55 -03:00
Raphael S. Carvalho
cc6c383249 sstables: properly keep track of max local deletion time
We weren't updating max local deletion time for cells that contain
ttl, or for tombstone cells.
If there is a live cell with no ttl, then max local deletion time
is supposed to store maximum value, which means that the sstable
will not be fully expired later on.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:13:24 -03:00
Raphael S. Carvalho
1ecd9bdefc sstables: fix type of max_local_deletion_time
max_local_deletion_time was incorrectly using an unsigned type
instead of a signed one.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:13:13 -03:00
Raphael S. Carvalho
f9ab94d266 compaction: import DateTieredCompactionStrategy.java
File can be found at the following C* directory:
src/java/org/apache/cassandra/db/compaction

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-07-06 01:12:49 -03:00
Avi Kivity
cb59e724ee Merge "Fix enabling sstable read ahead" from Paweł
"This series contains remaining changes necessary to safely enable read
ahead of sstables. Basically, it makes sure that input_streams are
always properly closed (even in case of exception during read)."
2016-07-05 19:04:19 +03:00
Raphael S. Carvalho
43926026c3 compaction: introduce compaction strategy method to estimate pending compaction
At the moment, it's not possible to know how many compaction are needed for
compaction strategy to be satisfied. It's not possible to know exactly the
number of pending compaction, but the strategy can provide an estimation.

For size tiered, it's based on number of sstables in each bucket. By dividing
bucket size by max threshold, we get number of compaction needed to compact
that single bucket.

For leveled, it's about the number of sstables that exceeds the limit in
each level.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e209e52f6159ee274a8358b69961a7c0ce357f7d.1467667054.git.raphaelsc@scylladb.com>
2016-07-05 19:03:11 +03:00
Paweł Dziepak
4acf77d755 sstables: drop unused data_stream_at()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-04 18:17:43 +01:00
Paweł Dziepak
2cdf498bbd sstables: close input stream in sstable::data_read()
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-04 18:17:42 +01:00
Paweł Dziepak
8931b939a1 sstables: use finally() to close input streams
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-04 18:17:42 +01:00
Avi Kivity
e22517bafc Merge "Optimize reads from leveled sstables"
In a leveled column family, there can be many thousands of sstables, since
each sstable is limited to a relatively small size (160M by default).
With the current approach of reading from all sstables in parallel, cpu
quickly becomes a bottleneck as we need to check the bloom filter for each
of these sstables.

This patch addresses the problem by introducing a
compaction-strategy-specific data structure for holding sstables.  This
data structure has a method to obtain the sstables used for a read.

For leveled compaction strategy, this data structure is an interval map,
which can be efficiently used to select the right sstables.
2016-07-04 16:00:35 +03:00
Avi Kivity
c8237fc262 compaction_strategy: introduce make_sstable_set()
Allow compaction_strategy to create a container for sstables that is
optimized for the strategy.

Most compaction_strategies return bag_sstable_set; leveled compaction
returns the specialized partitioned_sstable_set.
2016-07-03 10:27:01 +03:00
Avi Kivity
168696c558 Introduce partitioned_sstable_set
partitioned_sstable_set assumes that sstable are mostly partitioned along
the token range: only a few sstables will be needed to access a particular
token.  It is implemented as an interval_map.
2016-07-03 10:27:00 +03:00
Avi Kivity
64e4357461 Introduce bag_sstable_set
bag_sstable_set is a generic sstable_set implementation: it assumes nothing
about the sstables.  It is implemented as a vector, and any select will
return the entire sstable set.
2016-07-03 10:27:00 +03:00
Avi Kivity
85e9cf4616 Introduce sstable_set
sstable_set abstracts the notion of a container of sstables, allowing
different compaction strategies to supply their own implementation.  The
intended user is leveled compaction strategy; since it partitions sstables,
it can quickly restrict the number of sstables that participate in a query
by looking at the min/max partition key.

sstable_set also maintains an internal lw_shared_ptr<sstable_list>,
in parallel with the abstract container.  This is to support
column_family::get_sstable(), which returns a lw_shared_ptr<sstable_list>
which must be anchored somewhere if it is not saved at the caller side,
as it isn't in most current callers.
2016-07-03 10:27:00 +03:00
Avi Kivity
2a46410f4a Change sstable_list from a map to a set
sstable_list is now a map<generation, sstable>; change it to a set
in preparation for replacing it with sstable_set.  The change simplifies
a lot of code; the only casualty is the code that computes the highest
generation number.
2016-07-03 10:26:57 +03:00
Paweł Dziepak
b150720361 sstable: enable read ahead
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 13:18:24 +01:00
Paweł Dziepak
4513f8b52c sstables: add compressed_file_data_source_impl::close()
compressed_file_data_source_impl should close the underlying data source
properly when asked to.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 13:07:07 +01:00
Paweł Dziepak
55a6911d7a sstables: close input_stream<> properly
If read ahead is going to be enabled it is important to close
input_stream<> properly (and wait for completion) before destroying it.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
e44e12c74a sstables: drop no longer needed code
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
c2f0ee9b5f sstables: add consumer-style sstable compactor
This patch moves compaction logic to a consumer that can be used with
consume_flattened_in_thread(). Internally, sstable_writer is used to
write individual sstables.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
18a9ee105f sstables: add consumer-style sstable writer
sstable_writer encapsulates all logic related to writing sstable.
Previously introduced component_writer is used to write actual
mutations. sstable_writer is intended to be used with
consume_flattened_in_thread(). Its purpose is to be used by higher-level
consumer that needs to write possibly more than one sstable (sstable
compaction is an example of such consumer).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
0e8b8463ba sstables: introduce consumer-style components writer
This patch rewrites do_write_components() so that it can use
consume_flattened_in_thread(). All components-writing code is moved to a
new consumer: component_writer.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:39:01 +01:00
Paweł Dziepak
599ed7f1ed sstables: restore indentation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
e7ff20b3bb sstables: run compaction code inside a thread
Currently, each sstable write has its separate thread. However, the goal
is to have compaction use consume_flattened() with a consumer that
creates and writes the sstables. consume_flattened() needs to be executed
inside a thread, since sstable writer may defer.

This patch is a first step in preparations and it just makes whole
compaction logic run inside a thread. That makes little sense now, since
all sstable writes spawn their own threads but that's going to change
in the following patches.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-30 11:37:54 +01:00
Paweł Dziepak
2ee69860d2 sstables: make sstable reader produce streamed_mutations
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
b6f78a8e2f sstable: make sstable reads return streamed_mutation
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
9e8db53c46 sstables: allow row consumer to stop at any point
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
71088b4f4a sstables: fix partition slicing for row markers and collections
Row markers and collections weren't filtered out even if they belonged
to a clustering row that shouldn't be in the result. The check whether
to include cell or not was done only for live and dead atomic cells.

This patch adds appropriate checks for collections and row markers.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00
Paweł Dziepak
575daea897 sstables: make deletion_time to tombstone cast safer
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:50 +01:00