Commit Graph

480 Commits

Author SHA1 Message Date
Glauber Costa
ece77cce90 database: turn sstable generation number into an optional
This patch makes sure that every time we need to create a new generation number -
the very first step in the creation of a new SSTable, the respective CF is already
initialized and populated. Failure to do so can lead to data being overwritten.
Extensive details about why this is important can be found
in Scylla's Github Issue #1014

Nothing should be writing to SSTables before we have the chance to populate the
existing SSTables and calculate what should the next generation number be.

However, if that happens, we want to protect against it in a way that does not
involve overwriting existing tables. This is one of the ways to do it: every
column family starts in an unwriteable state, and when it can finally be written
to, we mark it as writeable.

Note that this *cannot* be a part of add_column_family. That adds a column family
to a db in memory only, and if anybody is about to write to a CF, that was most
likely already called. We need to call this explicitly when we are sure we're ready
to issue disk operations safely.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit a339296385)
2016-03-14 15:52:52 +02:00
Glauber Costa
e885eacbe4 column_family: do not open code generation calculation
We already have a function that wraps this, re-use it.  This FIXME is still
relevant, so just move it there. Let's not lose it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 94e90d4a17)
2016-03-14 15:51:06 +02:00
Glauber Costa
3f67277804 colum_family: remove mutation_count
We use memory usage as a threshold these days, and nowhere is _mutation_count
checked. Get rid of it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 46fdeec60a)
2016-03-14 15:50:57 +02:00
Tomasz Grabiec
dba2b617e7 db: Fix error handling in populate_keyspace()
When find_uuid() fails Scylla would terminate with:

  Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at

But we are supposed to ignore directories for unknown column
families. The try {} catch block is doing just that when
no_such_column_family is thrown from the find_column_family() call
which follows find_uuid(). Fix by converting std::out_of_range to
no_such_column_family.

Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>
2016-03-03 11:37:26 +02:00
Calle Wilund
04c19344de database: Fix use and assumptions about pending compations
Fixes #934 - faulty assert in discard_sstables

run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.

pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.

This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
    The counters value on end of run_with_compaction_disabled is then
    instead used as an indicator as to whether a compaction should be
    re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
    method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.

Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
2016-03-03 10:51:27 +02:00
Calle Wilund
873f87430d database: Check sstable dir name UUID part when populating CF
Fixes #870
Only load sstables from CF directories that match the current
CF uuid.
Message-Id: <1454938450-4338-1-git-send-email-calle@scylladb.com>
2016-02-08 14:48:19 +01:00
Avi Kivity
f3ca597a01 Merge "Sstable cleanup fixes" from Tomasz
"  - Added waiting for async cleanup on clean shutdown

  - Crash in the middle of sstable removal doesn't leave system in a non-bootable state"
2016-02-04 12:36:13 +02:00
Tomasz Grabiec
136c9d9247 sstables: Improve error message in case of generation duplication
Refs #870.
2016-02-03 17:35:50 +01:00
Raphael S. Carvalho
a46aa47ab1 make sstables::compact_sstables return list of created sstables
Now, sstables::compact_sstables() receives as input a list of sstables
to be compacted, and outputs a list of sstables generated by compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <0d8397f0395ce560a7c83cccf6e897a7f464d030.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Raphael S. Carvalho
ee84f310d9 move deletion of sstables generated by interrupted compaction
This deletion should be handled by sstables::compact_sstables, which
is the responsible for creation of new sstables.
It also simplifies the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <541206be2e910ab4edb1500b098eb5ebf29c6509.1454110234.git.raphaelsc@scylladb.com>
2016-01-31 12:39:20 +02:00
Raphael S. Carvalho
3b7970baff compaction: delete generated sstables in event of an interrupt
Generated sstables may imply either fully or partially written.
Compaction is interrupted if it was deriberately asked to stop (stop API)
or it was forced to do so in event of a failure, ex: out of disk space.
There is a need to explicitly delete sstables generated by a compaction
that was interrupted. Otherwise, such sstables will waste disk space and
even worsen read performance, which degrades as number of generations
to look at increases.

Fixes #852.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <49212dbf485598ae839c8e174e28299f7127f63e.1453912119.git.raphaelsc@scylladb.com>
2016-01-28 14:05:57 +02:00
Tomasz Grabiec
9fa62af96b database: Move implementation to .cc
Message-Id: <1453980679-27226-1-git-send-email-tgrabiec@scylladb.com>
2016-01-28 13:35:33 +02:00
Glauber Costa
3f94070d4e use auto&& instead of auto& for priority classes.
By Avi's request, who reminds us that auto& is more suited for situations
in which we are assigning to the variable in question.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <87c76520f4df8b8c152e60cac3b5fba5034f0b50.1453820373.git.glauber@scylladb.com>
2016-01-26 17:00:20 +02:00
Glauber Costa
b63611e148 mark I/O operations with priority classes
After this patch, our I/O operations will be tagged into a specific priority class.

The available classes are 5, and were defined in the previous patch:

 1) memtable flush
 2) commitlog writes
 3) streaming mutation
 4) SSTable compaction
 5) CQL query

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
f6cfb04d61 add a priority class to mutation readers
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.

Some of those readers will be used to read through a mutation_source, and those
have to patched as well.

Right now, whenever we need to pass a class, we pass Seastar's default priority
class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
15336e7eb7 key_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Glauber Costa
58fdae33bd mutation_source: turn it into a class
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Vlad Zolotarov
c2ab54e9c7 sstables flushing: enable incremental backup (if requested)
Enable incremental backup when sstables are flushed if
incremental backup has been requested.

It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.

This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-21 12:13:20 +02:00
Tomasz Grabiec
06d1f4b584 database: Print table name when printing mutation 2016-01-19 13:46:28 +01:00
Tomasz Grabiec
52073d619c database: Add trace-level logging of applied mutations 2016-01-19 13:46:28 +01:00
Pekka Enberg
7d3a3bd201 Merge "column family cleanup support" from Raphael
"This patch is intended to add support to column family cleanup, which will
 make 'nodetool cleanup' possible.

 Why is this feature needed? Remove irrelevant data from a node that loses part
 of its token range to a newly added node."
2016-01-18 10:15:05 +02:00
Paweł Dziepak
18d0a57bf4 commitlog: use commitlog entry writer and reader
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-01-13 10:20:06 +01:00
Raphael S. Carvalho
a5c90194f5 db: add support to clean up a column family
Cleanup is a procedure that will discard irrelevant keys from
all sstables of a column family, thus saving disk space.
Scylla will clean up a sstable by using compaction code, in
which this sstable will be the only input used.
Compaction manager was changed to become aware of cleanup, such
that it will be able to schedule cleanup requests and also know
how to handle them properly.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 03:53:04 -02:00
Raphael S. Carvalho
9c13c1c738 compaction: move compaction execution from strategy to manager
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-12 00:04:27 -02:00
Raphael S. Carvalho
5c674091dc db: move code that rebuilds sstable list to a function
That code will be used by column family cleanup, so let's put
that code into a function. This change also improves the code
readability.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:04 -02:00
Raphael S. Carvalho
58189dd489 db: move generation calculation code to a function
Code that calculates generation should be put in a function.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2016-01-11 19:51:02 -02:00
Tomasz Grabiec
8deb3f18d3 query_processor: Invalidate prepared statements when columns change
Replicates https://issues.apache.org/jira/browse/CASSANDRA-7910 :

"Prepare a statement with a wildcard in the select clause.
2. Alter the table - add a column
3. execute the prepared statement
Expected result - get all the columns including the new column
Actual result - get the columns except the new column"
2016-01-11 10:34:55 +01:00
Tomasz Grabiec
c6a52bed73 db: Fail when attempting to mutate using not synced schema 2016-01-11 10:34:53 +01:00
Tomasz Grabiec
f0d886893d db: Mark new schemas as synced 2016-01-11 10:34:52 +01:00
Tomasz Grabiec
8164902c84 schema_tables: Change column_family schema on schema sync
Notifications are not implemented yet.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
d81a46d7b5 column_family: Add schema setters
There is one current schema for given column_family. Entries in
memtables and cache can be at any of the previous schemas, but they're
always upgraded to current schema on access.
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Tomasz Grabiec
9eef4d1651 db: Learn schema versions when adding tables 2016-01-11 10:34:51 +01:00
Tomasz Grabiec
dbb7b7ebe3 db: Move system keyspace initialization to init_system_keyspace() 2016-01-08 21:10:26 +01:00
Avi Kivity
0c755d2c94 db: reduce log spam when ignoring an sstable
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored.  This is not reasonable.

Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.

Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
2016-01-07 19:23:25 +02:00
Vlad Zolotarov
07f8549683 database: filter out a manifest.json files
Filter out manifest.json files when reading sstables during
bootup and when loading new sstables ('nodetool refresh').

Fixes issue #529

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-3-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:02 +02:00
Vlad Zolotarov
c5aa2d6f1a database: lister: add a filtering option
Add a possibility to pass a filter functor receiving a full path
to a directory entry and returning a boolean value: TRUE if an
entry should be enumerated and FALSE - if it should be filtered out.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Message-Id: <1451911734-26511-2-git-send-email-vladz@cloudius-systems.com>
2016-01-07 15:56:01 +02:00
Pekka Enberg
f4bdec4d09 Merge "Support for deleting all snapshots" from Vlad
"Add support for deleting all snapshots of all keyspaces."

Fixes #639.
2016-01-05 15:42:44 +02:00
Glauber Costa
74fbd8fac0 do not call open_file_dma directly
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.

This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
2016-01-05 10:37:57 +02:00
Vlad Zolotarov
7bb2b2408b database::clear_snapshot(): added support for deleting all snapshots
When 'nodetool clearsnapshot' is given no parameters it should
remove all existing snapshots.

Fixes issue #639

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-03 14:22:25 +02:00
Vlad Zolotarov
d5920705b8 service::storage_service: move clear_snapshot() code to 'database' class
service::storage_service::clear_snapshot() was built around _db.local()
calls so it makes more sense to move its code into the 'database' class
instead of calling _db.local().bla_bla() all the time.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-01-03 14:22:17 +02:00
Vlad Zolotarov
756de38a9d database: actually check that a snapshot directory exists
Actually check that a snapshot directory with a given tag
exists instead of just checking that a 'snapshot' directory
exists.

Fixes issue #689

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-12-29 12:59:00 +01:00
Avi Kivity
41bd266ddd db: provide more information on "Unrecognized error" while loading sstables
This information can be used to understand the root cause of the failure.

Refs #692.
2015-12-29 10:23:32 +02:00
Pekka Enberg
eeadf601e6 Merge "cleanups and improvements" from Raphael 2015-12-18 13:45:11 +02:00
Pekka Enberg
e56bf8933f Improve not implemented errors
Print out the function name where we're throwing the exception from to
make it easier to debug such exceptions.
2015-12-18 10:51:37 +01:00
Raphael S. Carvalho
41be378ff1 db: fix build of sstable list in column_family::compact_sstables
The last two loops were incorrectly inside the first one. That's a
bug because a new sstable may be emplaced more than once in the
sstable list, which can cause several problems. mark_for_deletion
may also be called more than once for compacted sstables, however,
it is idempotent.
Found this issue while auditing the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-16 17:46:17 +02:00
Raphael S. Carvalho
6142efaedb db: fix indentation
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:43:34 -02:00
Raphael S. Carvalho
7bbc1b49b6 db: add missing sstable::mark_for_deletion call
If a sstable doesn't belong to current shard, mark_for_deletion
should be called for the deletion manager to still work.
It doesn't mean that the sstable will be deleted, but that the
sstable is not relevant to the current shard, thus it can be
deleted by the deletion manager in the future.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-12-14 12:42:26 -02:00
Amnon Heiman
2086c651ba column_family: get_snapshot_details should return empty map for no snapshots
If there is no snapshot directory for the specific column family,
get_snapshot_details should return an empty map.

This patch check that a directory exists before trying to iterate over
it.

Fixes #619

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-12-07 12:51:04 +01:00