Commit Graph

521 Commits

Author SHA1 Message Date
Raphael S. Carvalho
ee84f310d9 move deletion of sstables generated by interrupted compaction
This deletion should be handled by sstables::compact_sstables, which
is the responsible for creation of new sstables.
It also simplifies the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Glauber Costa
7214649b8a sstables: const where const is due
Some SSTable methods are not marked as const. But they should be.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <72cd3ef0157eb38e7fd48d0c989f2342cbc42f3c.1454103008.git.glauber@scylladb.com>
2016-01-31 12:36:36 +02:00
Raphael S. Carvalho
ba4260ea8f api: print proper compaction type
There are several compaction types, and we should print the correct
one when listing ongoing compaction. Currently, we only support
compaction types: COMPACTION and CLEANUP.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c96b1508a8216bf5405b1a0b0f8489d5cc4be844.1453851299.git.raphaelsc@scylladb.com>
2016-01-28 13:47:00 +02:00
Raphael S. Carvalho
45c446d6eb compaction: pass dht::token by reference
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:25:41 -02:00
Raphael S. Carvalho
fc541e2f08 compaction: remove code to sort local ranges
storage_service::get_local_ranges returns sorted ranges, which are
not overlapping nor wrap-around. As a result, there is no need for
the consumer to do anything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:15:36 -02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
03d5a89b90 sstables: mandate a buffer size parameter for data_stream_at
The only user for the default size is data_read, sitting at row.cc.
That reader wants to read and process a chunk all at once. So there's
really no reason to use the default buffer size - except that this code
is old.

We should do as we do in other single-key / single-range readers and
try to read all at once if possible, by looking at the size we received
as a parameter. Cleaning up the data_stream_at interface then comes as
a nice side effect.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Pekka Enberg
81996bd10b Merge "Improvements to compaction manager" from Raphael 2016-01-21 20:54:49 +02:00
Raphael S. Carvalho
bb909798bc compaction_manager: introduce can_submit
Purpose is to reuse code and also make it easier to read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:42:23 -02:00
Raphael S. Carvalho
653a07d75d compaction_manager: introduce signal_less_busy_task
Purpose is to reuse code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:31:44 -02:00
Raphael S. Carvalho
2164aa8d5b move compaction manager from /utils to /sstables
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:23:05 -02:00
Pekka Enberg
b5833e8002 Merge "Enable incremental backups option" from Vlad
"This series moves the "backup" logic into the sstable::write_components()
methods, adds a support for enabling backup for sstables flushed in the
compaction flow (in addition to a regular flushing flow which had this support
already) and enables the "incremental_backups" configuration option."

I fixed up a merge conflict with commit 5e953b5 ("Merge "Add support to
stop ongoing compaction" from Raphael").
2016-01-21 18:52:07 +02:00
Pekka Enberg
5e953b5e47 Merge "Add support to stop ongoing compaction" from Raphael
"stop compaction is about temporarily interrupting all ongoing compaction
 of a given type.
 That will also be needed for 'nodetool stop <compaction_type>'.

 The test was about starting scylla, stressing it, stopping compaction using
 the API and checking that scylla was able to recover.

 Scylla will print a message as follow for each compaction that was stopped:
 ERROR [shard 0] compaction_manager - compaction failed: read exception:
 std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.)
 INFO  [shard 0] compaction_manager - compaction task handler sleeping for 20 seconds"
2016-01-21 18:34:10 +02:00
Vlad Zolotarov
c2ab54e9c7 sstables flushing: enable incremental backup (if requested)
Enable incremental backup when sstables are flushed if
incremental backup has been requested.

It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.

This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:20 +02:00
Vlad Zolotarov
cb5c66f264 sstable::write_components(): add a 'backup' parameter
When 'backup' parameter is TRUE - create backup hard
links for a newly written sstables in <sstable dir>/backups/
subdirectory.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:04:45 +02:00
Raphael S. Carvalho
f001bb0f53 sstables: fix make_checksummed_file_output_stream
Arguments buffer_size and true were accidently inverted.
GCC wasn't complaning because implicit conversion of bool to
int, and vice-versa, is valid.
However, this conversion is not very safe because we could
accidentaly invert parameters.

This should fix the last problem with sstable_test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>
2016-01-20 16:01:38 +01:00
Paweł Dziepak
33892943d9 sstables: do not drop row marker when reading mutation
Since 581271a243 "sstables: ignore data
belonging to dropped columns" we silently drop cells if there is no
column in the current schema that they belong to or their timestamp is
older than the column dropped_at value. Originally this check was
applied to row markers as well which caused them to be always dropped
since there is no column in the schema representing these markers.
This patch makes sure that the check whether colum is alive is performed
only if the cell is not a row marker.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453289300-28607-1-git-send-email-pdziepak@scylladb.com>
2016-01-20 12:35:41 +01:00
Raphael S. Carvalho
c318f3baa3 sstables: fix sstable::data_stream_at
After 63967db8, offset is ignored when creating a input stream.
Found the problem after sstable_test failed recently.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <56ece21ff6e043e224eb2a6e76cdd422b94821b0.1453232689.git.raphaelsc@scylladb.com>
2016-01-20 09:35:57 +02:00
Raphael S. Carvalho
3bd240d9e8 compaction: add ability to stop an ongoing compaction
That's needed for nodetool stop, which is called to stop all ongoing
compaction. The implementation is about informing an ongoing compaction
that it was asked to stop, so the compaction itself will trigger an
exception. Compaction manager will catch this exception and re-schedule
the compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
ec4c73d451 compaction: rename compaction_stats to compaction_info
compaction_info makes more sense because this structure doesn't
only store stats about ongoing compaction. Soon, we will add
information to it about whether or not an user asked to stop the
respective ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Glauber Costa
63967db8bf sstables: always use a file_*_stream_options in our readers and writes
Instead of using the APIs that explicitly pass things like buffer_size,
always use the options instance instead.

This will make it easier to pass extra options in the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:37 +02:00
Glauber Costa
c3ac5257b5 sstables: don't repeat file_writer creation all the time
When this code was originally written, we used to operate on a generic
output_stream. We created a file output stream, and then moved it into
the generic object.

Many patches and reworks later, we now have a file_writer object, but
that pattern was never reworked.

So in a couple of places we have something like this:

    f = file_object acquired by open_file_dma
    auto out = file_writer(std::move(f), 4096);
    auto w = make_shared<file_writer>(std::move(out));

The last statement is just totally redundant. make_shared can create
an object from its parameters without trouble, so we can just pass
the parameter list directly to it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c01801a1fdf37f8ea9a3e5c52cd424e35ba0a80d.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:36 +02:00
Raphael S. Carvalho
0c67b1d22b compaction: filter out mutation that doesn't belong to shard
When compacting sstable, mutation that doesn't belong to current shard
should be filtered out. Otherwise, mutation would be duplicated in
all shards that share the sstable being compacted.
sstable_test will now run with -c1 because arbitrary keys are chosen
for sstables to be compacted, so test could fail because of mutations
being filtered out.

fixes #527.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>
2016-01-19 10:16:41 +01:00
Pekka Enberg
7d3a3bd201 Merge "column family cleanup support" from Raphael
"This patch is intended to add support to column family cleanup, which will
 make 'nodetool cleanup' possible.

 Why is this feature needed? Remove irrelevant data from a node that loses part
 of its token range to a newly added node."
2016-01-18 10:15:05 +02:00
Paweł Dziepak
cfc0a132a9 sstable: handle multi-cell vs atomic incompatibilities
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
581271a243 sstables: ignore data belonging to dropped columns
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Tomasz Grabiec
ccd609185f sstables: Add ability to wait for async sstable cleanup tasks
This patch adds a function which waits for the background cleanup work
which is started from sstable destructors.

We wait for those cleanups on reactor exit so that unit tests don't
leak. This fixes erratic ASAN complaint about memory leak when running
schema_change_test in debug mode:

    Indirect leak of 64 byte(s) in 1 object(s) allocated from:
         0x7fab24413912 in operator new(unsigned long) (/lib64/libasan.so.2+0x99912)
         0x1776aeb in make_unique<continuation<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >, future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /usr/include/c++/5.1.1/bits/unique_ptr.h:765
         0x1752b69 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:513
        0x1711365 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:690
        0x16d0474 in then_wrapped<future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>, future<> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:880
        0x1696e9c in handle_exception<sstables::sstable::~sstable()::<lambda(auto:52)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:1012
        0x1638ba8 in sstables::sstable::~sstable() sstables/sstables.cc:1619

The leak is about allocations related to close() syscall tasks invoked
from sstable destructor, which were not waited for.

Message-Id: <1452783887-25244-1-git-send-email-tgrabiec@scylladb.com>
2016-01-15 11:32:15 +02:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Raphael S. Carvalho
ed80ed82ef sstables: prepare compact_sstables to work with cleanup
Cleanup is about rewriting a sstable discarding any keys that
are irrelevant, i.e. keys that don't belong to current node.
Parameter cleanup was added to compact_sstables.
If set to true, irrelevant code such as the one that updates
compaction history will be skipped. Logic was also added to
discard irrelevant keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:40 -02:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
5184381a0b memtable: Deconstify memtable in readers
We want to upgrade entries on read and for that we need mutating
permission.
2016-01-11 10:34:51 +01:00
Avi Kivity
0c755d2c94 db: reduce log spam when ignoring an sstable
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored.  This is not reasonable.

Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
2016-01-07 19:23:25 +02:00
Glauber Costa
74fbd8fac0 do not call open_file_dma directly
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.

This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
2016-01-05 10:37:57 +02:00
Avi Kivity
e9400dfa96 Revert "sstable: Initialize super class of malformed_sstable_exception"
This reverts commit d69dc32c92d63057edf9f84aa57ca53b2a6e37e4; it does nothing
and does not address issue #669.
2016-01-05 10:21:00 +02:00
Benoît Canet
d69dc32c92 sstable: Initialize super class of malformed_sstable_exception
This exception was not caught properly as a std::exception
by report_failed_future call to report_exception because the
superclass std::exception was not initialized.

Fixes #669.

Signed-off-by: Benoît Canet <benoit@scylladb.com>
2016-01-05 09:54:36 +02:00
Raphael S. Carvalho
b7d36af26f compaction: fix max_purgeable calculation
max_purgeable was being incorrectly calculated because the code
that creates vector of uncompacted sstables was wrong.
This value is used to determine whether or not a tombstone can
be purged.
Operand < is supposed to be used instead in the callback passed
as third parameter to boost::set_difference.
This fix is a step towards closing the issue #676.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-29 09:30:08 +02:00
Vlad Zolotarov
33552829b2 core: use steady_clock where monotinic clock is required
Use steady_clock instead of high_resolution_clock where monotonic
clock is required. high_resolution_clock is essentially a
system_clock (Wall Clock) therefore may not to be assumed monotonic
since Wall Clock may move backwards due to time/date adjustments.

Fixes issue #638

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-27 18:07:53 +02:00
Tomasz Grabiec
0862d2f531 Merge branch 'pdziepak/fix-sstables-key_reader-663/v2'
From Paweł:

"This series fixes sstables::key_reader not respecting range inclusiveness
if the bounds were the keys that were present in the index summary.

Fixes #663."
2015-12-18 17:35:09 +01:00
Paweł Dziepak
18b8d7cccc sstables: respect range inclusiveness in key_reader
When choosing a relevant range of buckets it wasn't taken into account
whether the range bounds are inclusive or not. That may have resulted in
more buckets being read than necessary which was a condition not
expected by the code responsible from looking for a relevant keys inside
the buckets.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-18 17:24:26 +01:00
Pekka Enberg
eeadf601e6 Merge "cleanups and improvements" from Raphael 2015-12-18 13:45:11 +02:00
Pekka Enberg
40e8a9c99c sstables/compaction: Fix compilation error with GCC 4.9.2
I am sure it's a compiler issue but I am not ready to give up and
upgrade just yet:

  sstables/compaction.cc:307:55: error: converting to ‘std::unordered_map<int, long int>’ from initializer list would use explicit constructor ‘std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::unordered_map(std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type, const hasher&, const key_equal&, const allocator_type&) [with _Key = int; _Tp = long int; _Hash = std::hash<int>; _Pred = std::equal_to<int>; _Alloc = std::allocator<std::pair<const int, long int> >; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::size_type = long unsigned int; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::hasher = std::hash<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::key_equal = std::equal_to<int>; std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc>::allocator_type = std::allocator<std::pair<const int, long int> >]’
                 stats->start_size, stats->end_size, {});
2015-12-16 10:03:14 +02:00
Raphael S. Carvalho
193ede68f3 compaction: register and deregister compaction_stats
That's important for compaction stats API that will need stats
data of each ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:32 -02:00
Raphael S. Carvalho
1fba394dd0 sstables: store keyspace and cf in compaction_stats
The reason behind this change is that we will need ks and cf
for the compaction stats API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:50:02 -02:00
Raphael S. Carvalho
ac1a67c8bc sstables: move compaction_stats to header file
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-15 09:49:45 -02:00
Raphael S. Carvalho
a2fb0ec9a3 sstables: update compaction history at the end of compaction
When compaction job finishes, call function to update the system
table COMPACTION_HISTORY. That's also needed for the compaction
history API.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 14:20:03 -02:00
Raphael S. Carvalho
0fa194c844 sstables: remove outdated comment
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:53 -02:00