Commit Graph

535 Commits

Author SHA1 Message Date
Tomasz Grabiec
a921479e71 Merge tag '807-v3' from https://github.com/avikivity/scylla
From Avi:

This patchset introduces a linearization context for managed_bytes objects.

Within this context, any scattered managed_bytes (found only in lsa regions,
so limited to memtable and cache) are auto-linearized for the lifetime of
the context.   This ensures that key and value lookups can use fast
contiguous iterators instead of using slow discontiguous iterators (or
crashing, as is the case now).
2016-02-16 14:29:48 +01:00
Avi Kivity
ce74718950 Merge "Preparation for specifying query result format in IDL" from Tomasz 2016-02-15 19:41:18 +02:00
Raphael S. Carvalho
59bbe98c21 sstables: keep track of compacting sstables in compacton manager itself
Avi says:
"Something like unordered_set<unsigned long> is error prone, because ints
tend to mix up (also, need to use a sized type, unsigned long varies among
machines)."

With that in mind, it's better if we keep track of compacting sstables in
a unordered_set<shared_sstable>.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <249f0fd4cfcf786cf3c37a79978f7743d07f48ad.1455120811.git.raphaelsc@scylladb.com>
2016-02-15 18:35:43 +02:00
Tomasz Grabiec
9d11968ad8 Rename serialization_format to cql_serialization_format 2016-02-15 16:53:56 +01:00
Raphael S. Carvalho
a487ef1ff3 sstables: improve log message when a sstable is sealed
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <e391243212d83347b1b50c728bee24f6a2ecc950.1455230788.git.raphaelsc@scylladb.com>
2016-02-14 12:05:16 +02:00
Raphael S. Carvalho
ed61fe5831 sstables: make compaction stop report user-friendly
When scylla stopped an ongoing compaction, the event was reported
as an error. This patch introduces a specialized exception for
compaction stop so that the event can be handled appropriately.

Before:
ERROR [shard 0] compaction_manager - compaction failed: read exception:
std::runtime_error (Compaction for keyspace1/standard1 was deliberately
stopped.)

After:
INFO  [shard 0] compaction_manager - compaction info: Compaction for
keyspace1/standard1 was stopped due to shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1f85d4e5c24d23a1b4e7e0370a2cffc97cbc6d44.1455034236.git.raphaelsc@scylladb.com>
2016-02-11 12:16:53 +02:00
Avi Kivity
3c60310e38 key: relax some APIs to accept partition_key_view instead of const partition_key&
Using a partition_key_view can save an allocation in some cases.  We will
make use of it when we linearize a partition_key; during the process we
are given a simple byte pointer, and constructing a partition_key from that
requires an allocation.
2016-02-09 19:55:13 +02:00
Avi Kivity
f3ca597a01 Merge "Sstable cleanup fixes" from Tomasz
"  - Added waiting for async cleanup on clean shutdown

  - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"
2016-02-04 12:36:13 +02:00
Tomasz Grabiec
c7ef3703cc sstable: Make sstable deletion never leave sstable set in a non-bootable state
Refs #860
Refs #802

An sstable file set with any component missing is interpreted as a
critical error during boot. Currently sstable removal procedure could
leave the files in a non-bootable state if the process crashed after
TOC was removed but before all components were removed as well.

To solve this problem, start the removal by renaming the TOC file to a
so called "temporary TOC". Upon boot such kind of TOC file is
interpreted as an sstable which is safe to remove. This kind of TOC
was added before to deal with a similar scenario but in the opposite
direction - when writing a new sstable.
2016-02-03 17:36:17 +01:00
Tomasz Grabiec
c8a98b487c sstables: Remove coupling-hiding duplication 2016-02-03 17:36:17 +01:00
Tomasz Grabiec
355874281a sstables: Do not register exit hooks from static initializer
Fixes #868.

Registerring exit hooks while reactor is already iterating over exit
hooks is not allowed and currently leads to undefined behavior
observed in #868. While we should make the failure more user friendly,
registering exit hooks concurrently with shutdown will not be allowed.

We don't expect exit hooks to be registered after exit starts because
this would violate the guarantee which says that exit hooks are
executed in reverse order of registration. Starting exit sequence in
the middle of initialization sequence would result in use after free
errors. Btw, I'm not sure if currently there's anything which prevents
this

To solve this problem, move the exit hook to initilization
sequence. In case of tests, the cleanup has to be called explicitly.
2016-02-03 17:35:50 +01:00
Raphael S. Carvalho
4041f8cffc compaction: stop all ongoing compaction during shutdown
Currently, we wait for ongoing compaction during shutdown, but
that may take 'forever' if compacting huge sstables with a slow
disk. Compaction of huge sstables will take a considerable amount
of time even with fast disks. Therefore, all ongoing compaction
should be stopped during shutdown.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3370f17ce4274df417ea60651f33fc5d4de91199.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:51 +02:00
Raphael S. Carvalho
cf22c827f9 compaction_manager: fix assertion when stopping task
Task is stopped by closing gate and forcing it to exit via gate
exception. The problem is that task->compacting_cf may be set to
the column family being compacted, and compaction_manager::remove
would see it and try to stop the same task again, which would
lead to problems. The fix is to clean task->compacting_cf when
stopping task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <3473e93c1a107a619322769d65fa020529b5501b.1454441286.git.raphaelsc@scylladb.com>
2016-02-03 10:18:15 +02:00
Raphael S. Carvalho
a46aa47ab1 make sstables::compact_sstables return list of created sstables
Now, sstables::compact_sstables() receives as input a list of sstables
to be compacted, and outputs a list of sstables generated by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0d8397f0395ce560a7c83cccf6e897a7f464d030.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Raphael S. Carvalho
ee84f310d9 move deletion of sstables generated by interrupted compaction
This deletion should be handled by sstables::compact_sstables, which
is the responsible for creation of new sstables.
It also simplifies the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Glauber Costa
7214649b8a sstables: const where const is due
Some SSTable methods are not marked as const. But they should be.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <72cd3ef0157eb38e7fd48d0c989f2342cbc42f3c.1454103008.git.glauber@scylladb.com>
2016-01-31 12:36:36 +02:00
Raphael S. Carvalho
ba4260ea8f api: print proper compaction type
There are several compaction types, and we should print the correct
one when listing ongoing compaction. Currently, we only support
compaction types: COMPACTION and CLEANUP.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <c96b1508a8216bf5405b1a0b0f8489d5cc4be844.1453851299.git.raphaelsc@scylladb.com>
2016-01-28 13:47:00 +02:00
Raphael S. Carvalho
45c446d6eb compaction: pass dht::token by reference
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:25:41 -02:00
Raphael S. Carvalho
fc541e2f08 compaction: remove code to sort local ranges
storage_service::get_local_ranges returns sorted ranges, which are
not overlapping nor wrap-around. As a result, there is no need for
the consumer to do anything.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-27 13:15:36 -02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
8e4bf025ae sstables: wire priority for read path
All the SSTable read path can now take an io_priority. The public functions will
take a default parameter which is Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
56c11a8109 sstables: wire priority for write path
All variants of write_component now take an io_priority. The public
interfaces are by default set to Seastar's default priority.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
03d5a89b90 sstables: mandate a buffer size parameter for data_stream_at
The only user for the default size is data_read, sitting at row.cc.
That reader wants to read and process a chunk all at once. So there's
really no reason to use the default buffer size - except that this code
is old.

We should do as we do in other single-key / single-range readers and
try to read all at once if possible, by looking at the size we received
as a parameter. Cleaning up the data_stream_at interface then comes as
a nice side effect.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Pekka Enberg
81996bd10b Merge "Improvements to compaction manager" from Raphael 2016-01-21 20:54:49 +02:00
Raphael S. Carvalho
bb909798bc compaction_manager: introduce can_submit
Purpose is to reuse code and also make it easier to read.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:42:23 -02:00
Raphael S. Carvalho
653a07d75d compaction_manager: introduce signal_less_busy_task
Purpose is to reuse code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:31:44 -02:00
Raphael S. Carvalho
2164aa8d5b move compaction manager from /utils to /sstables
Compaction manager was initially created at utils because it was
more generic, and wasn't only intended for compaction.
It was more like a task handler based on futures, but now it's
only intended to manage compaction tasks, and thus should be
moved elsewhere. /sstables is where compaction code is located.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-21 15:23:05 -02:00
Pekka Enberg
b5833e8002 Merge "Enable incremental backups option" from Vlad
"This series moves the "backup" logic into the sstable::write_components()
methods, adds a support for enabling backup for sstables flushed in the
compaction flow (in addition to a regular flushing flow which had this support
already) and enables the "incremental_backups" configuration option."

I fixed up a merge conflict with commit 5e953b5 ("Merge "Add support to
stop ongoing compaction" from Raphael").
2016-01-21 18:52:07 +02:00
Pekka Enberg
5e953b5e47 Merge "Add support to stop ongoing compaction" from Raphael
"stop compaction is about temporarily interrupting all ongoing compaction
 of a given type.
 That will also be needed for 'nodetool stop <compaction_type>'.

 The test was about starting scylla, stressing it, stopping compaction using
 the API and checking that scylla was able to recover.

 Scylla will print a message as follow for each compaction that was stopped:
 ERROR [shard 0] compaction_manager - compaction failed: read exception:
 std::runtime_error (Compaction for keyspace1/standard1 was deliberately stopped.)
 INFO  [shard 0] compaction_manager - compaction task handler sleeping for 20 seconds"
2016-01-21 18:34:10 +02:00
Vlad Zolotarov
c2ab54e9c7 sstables flushing: enable incremental backup (if requested)
Enable incremental backup when sstables are flushed if
incremental backup has been requested.

It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.

This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:20 +02:00
Vlad Zolotarov
cb5c66f264 sstable::write_components(): add a 'backup' parameter
When 'backup' parameter is TRUE - create backup hard
links for a newly written sstables in <sstable dir>/backups/
subdirectory.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:04:45 +02:00
Raphael S. Carvalho
f001bb0f53 sstables: fix make_checksummed_file_output_stream
Arguments buffer_size and true were accidently inverted.
GCC wasn't complaning because implicit conversion of bool to
int, and vice-versa, is valid.
However, this conversion is not very safe because we could
accidentaly invert parameters.

This should fix the last problem with sstable_test.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <9478cd266006fdf8a7bd806f1c612ec9d1297c1f.1453301866.git.raphaelsc@scylladb.com>
2016-01-20 16:01:38 +01:00
Paweł Dziepak
33892943d9 sstables: do not drop row marker when reading mutation
Since 581271a243 "sstables: ignore data
belonging to dropped columns" we silently drop cells if there is no
column in the current schema that they belong to or their timestamp is
older than the column dropped_at value. Originally this check was
applied to row markers as well which caused them to be always dropped
since there is no column in the schema representing these markers.
This patch makes sure that the check whether colum is alive is performed
only if the cell is not a row marker.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1453289300-28607-1-git-send-email-pdziepak@scylladb.com>
2016-01-20 12:35:41 +01:00
Raphael S. Carvalho
c318f3baa3 sstables: fix sstable::data_stream_at
After 63967db8, offset is ignored when creating a input stream.
Found the problem after sstable_test failed recently.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <56ece21ff6e043e224eb2a6e76cdd422b94821b0.1453232689.git.raphaelsc@scylladb.com>
2016-01-20 09:35:57 +02:00
Raphael S. Carvalho
3bd240d9e8 compaction: add ability to stop an ongoing compaction
That's needed for nodetool stop, which is called to stop all ongoing
compaction. The implementation is about informing an ongoing compaction
that it was asked to stop, so the compaction itself will trigger an
exception. Compaction manager will catch this exception and re-schedule
the compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Raphael S. Carvalho
ec4c73d451 compaction: rename compaction_stats to compaction_info
compaction_info makes more sense because this structure doesn't
only store stats about ongoing compaction. Soon, we will add
information to it about whether or not an user asked to stop the
respective ongoing compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-19 23:15:18 -02:00
Glauber Costa
63967db8bf sstables: always use a file_*_stream_options in our readers and writes
Instead of using the APIs that explicitly pass things like buffer_size,
always use the options instance instead.

This will make it easier to pass extra options in the future.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <5b04e60ab469c319a17a522694e5bedf806702fe.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:37 +02:00
Glauber Costa
c3ac5257b5 sstables: don't repeat file_writer creation all the time
When this code was originally written, we used to operate on a generic
output_stream. We created a file output stream, and then moved it into
the generic object.

Many patches and reworks later, we now have a file_writer object, but
that pattern was never reworked.

So in a couple of places we have something like this:

    f = file_object acquired by open_file_dma
    auto out = file_writer(std::move(f), 4096);
    auto w = make_shared<file_writer>(std::move(out));

The last statement is just totally redundant. make_shared can create
an object from its parameters without trouble, so we can just pass
the parameter list directly to it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <c01801a1fdf37f8ea9a3e5c52cd424e35ba0a80d.1453219530.git.glauber@scylladb.com>
2016-01-19 18:26:36 +02:00
Raphael S. Carvalho
0c67b1d22b compaction: filter out mutation that doesn't belong to shard
When compacting sstable, mutation that doesn't belong to current shard
should be filtered out. Otherwise, mutation would be duplicated in
all shards that share the sstable being compacted.
sstable_test will now run with -c1 because arbitrary keys are chosen
for sstables to be compacted, so test could fail because of mutations
being filtered out.

fixes #527.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <1acc2e8b9c66fb9c0c601b05e3ae4353e514ead5.1453140657.git.raphaelsc@scylladb.com>
2016-01-19 10:16:41 +01:00
Pekka Enberg
7d3a3bd201 Merge "column family cleanup support" from Raphael
"This patch is intended to add support to column family cleanup, which will
 make 'nodetool cleanup' possible.

 Why is this feature needed? Remove irrelevant data from a node that loses part
 of its token range to a newly added node."
2016-01-18 10:15:05 +02:00
Paweł Dziepak
cfc0a132a9 sstable: handle multi-cell vs atomic incompatibilities
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Paweł Dziepak
581271a243 sstables: ignore data belonging to dropped columns
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-15 13:12:40 +01:00
Tomasz Grabiec
ccd609185f sstables: Add ability to wait for async sstable cleanup tasks
This patch adds a function which waits for the background cleanup work
which is started from sstable destructors.

We wait for those cleanups on reactor exit so that unit tests don't
leak. This fixes erratic ASAN complaint about memory leak when running
schema_change_test in debug mode:

    Indirect leak of 64 byte(s) in 1 object(s) allocated from:
         0x7fab24413912 in operator new(unsigned long) (/lib64/libasan.so.2+0x99912)
         0x1776aeb in make_unique<continuation<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> >, future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /usr/include/c++/5.1.1/bits/unique_ptr.h:765
         0x1752b69 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:513
        0x1711365 in schedule<future<T>::then_wrapped(Func&&) [with Func = future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>; Result = future<>; T = {}]::<lambda(auto:2&&)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:690
        0x16d0474 in then_wrapped<future<T>::handle_exception(Func&&) [with Func = sstables::sstable::~sstable()::<lambda(auto:52)>; T = {}]::<lambda(auto:5&&)>, future<> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:880
        0x1696e9c in handle_exception<sstables::sstable::~sstable()::<lambda(auto:52)> > /home/tgrabiec/src/scylla2/seastar/core/future.hh:1012
        0x1638ba8 in sstables::sstable::~sstable() sstables/sstables.cc:1619

The leak is about allocations related to close() syscall tasks invoked
from sstable destructor, which were not waited for.

Message-Id: <1452783887-25244-1-git-send-email-tgrabiec@scylladb.com>
2016-01-15 11:32:15 +02:00
Raphael S. Carvalho
d44a5d1e94 compaction: filter out compacting sstables
The implementation is about storing generation of compacting sstables
in an unordered set per column family, so before strategy is called,
compaction manager will filter out compacting sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 01:18:29 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Raphael S. Carvalho
ed80ed82ef sstables: prepare compact_sstables to work with cleanup
Cleanup is about rewriting a sstable discarding any keys that
are irrelevant, i.e. keys that don't belong to current node.
Parameter cleanup was added to compact_sstables.
If set to true, irrelevant code such as the one that updates
compaction history will be skipped. Logic was also added to
discard irrelevant keys.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 21:43:40 -02:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
5184381a0b memtable: Deconstify memtable in readers
We want to upgrade entries on read and for that we need mutating
permission.
2016-01-11 10:34:51 +01:00
Avi Kivity
0c755d2c94 db: reduce log spam when ignoring an sstable
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored.  This is not reasonable.

Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
2016-01-07 19:23:25 +02:00