This patch makes sure that every time we need to create a new generation number -
the very first step in the creation of a new SSTable, the respective CF is already
initialized and populated. Failure to do so can lead to data being overwritten.
Extensive details about why this is important can be found
in Scylla's Github Issue #1014
Nothing should be writing to SSTables before we have the chance to populate the
existing SSTables and calculate what should the next generation number be.
However, if that happens, we want to protect against it in a way that does not
involve overwriting existing tables. This is one of the ways to do it: every
column family starts in an unwriteable state, and when it can finally be written
to, we mark it as writeable.
Note that this *cannot* be a part of add_column_family. That adds a column family
to a db in memory only, and if anybody is about to write to a CF, that was most
likely already called. We need to call this explicitly when we are sure we're ready
to issue disk operations safely.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit a339296385)
We already have a function that wraps this, re-use it. This FIXME is still
relevant, so just move it there. Let's not lose it.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 94e90d4a17)
We use memory usage as a threshold these days, and nowhere is _mutation_count
checked. Get rid of it.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
(cherry picked from commit 46fdeec60a)
When find_uuid() fails Scylla would terminate with:
Exiting on unhandled exception of type 'std::out_of_range': _Map_base::at
But we are supposed to ignore directories for unknown column
families. The try {} catch block is doing just that when
no_such_column_family is thrown from the find_column_family() call
which follows find_uuid(). Fix by converting std::out_of_range to
no_such_column_family.
Message-Id: <1456056280-3933-1-git-send-email-tgrabiec@scylladb.com>
Fixes#934 - faulty assert in discard_sstables
run_with_compaction_disabled clears out a CF from compaction
mananger queue. discard_sstables wants to assert on this, but looks
at the wrong counters.
pending_compactions is an indicator on how much interested parties
want a CF compacted (again and again). It should not be considered
an indicator of compactions actually being done.
This modifies the usage slightly so that:
1.) The counter is always incremented, even if compaction is disallowed.
The counters value on end of run_with_compaction_disabled is then
instead used as an indicator as to whether a compaction should be
re-triggered. (If compactions finished, it will be zero)
2.) Document the use and purpose of the pending counter, and add
method to re-add CF to compaction for r_w_c_d above.
3.) discard_sstables now asserts on the right things.
Message-Id: <1456332824-23349-1-git-send-email-calle@scylladb.com>
Generated sstables may imply either fully or partially written.
Compaction is interrupted if it was deriberately asked to stop (stop API)
or it was forced to do so in event of a failure, ex: out of disk space.
There is a need to explicitly delete sstables generated by a compaction
that was interrupted. Otherwise, such sstables will waste disk space and
even worsen read performance, which degrades as number of generations
to look at increases.
Fixes#852.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <49212dbf485598ae839c8e174e28299f7127f63e.1453912119.git.raphaelsc@scylladb.com>
After this patch, our I/O operations will be tagged into a specific priority class.
The available classes are 5, and were defined in the previous patch:
1) memtable flush
2) commitlog writes
3) streaming mutation
4) SSTable compaction
5) CQL query
Signed-off-by: Glauber Costa <glauber@scylladb.com>
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.
Some of those readers will be used to read through a mutation_source, and those
have to patched as well.
Right now, whenever we need to pass a class, we pass Seastar's default priority
class.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Its definition as a lambda function is inconvenient, because it does not allow
us to use default values for parameters.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Enable incremental backup when sstables are flushed if
incremental backup has been requested.
It has been enabled in the regular flushing flow before but
wasn't in the compaction flow.
This patch enables it in both places and does it using a
backup capability of sstable::write_components() method(s).
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
"This patch is intended to add support to column family cleanup, which will
make 'nodetool cleanup' possible.
Why is this feature needed? Remove irrelevant data from a node that loses part
of its token range to a newly added node."
Cleanup is a procedure that will discard irrelevant keys from
all sstables of a column family, thus saving disk space.
Scylla will clean up a sstable by using compaction code, in
which this sstable will be the only input used.
Compaction manager was changed to become aware of cleanup, such
that it will be able to schedule cleanup requests and also know
how to handle them properly.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently, compaction strategy is the responsible for both getting the
sstables selected for compaction and running compaction.
Moving the code that runs compaction from strategy to manager is a big
improvement, which will also make possible for the compaction manager
to keep track of which sstables are being compacted at a moment.
This change will also be needed for cleanup and concurrent compaction
on the same column family.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That code will be used by column family cleanup, so let's put
that code into a function. This change also improves the code
readability.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Replicates https://issues.apache.org/jira/browse/CASSANDRA-7910 :
"Prepare a statement with a wildcard in the select clause.
2. Alter the table - add a column
3. execute the prepared statement
Expected result - get all the columns including the new column
Actual result - get the columns except the new column"
There is one current schema for given column_family. Entries in
memtables and cache can be at any of the previous schemas, but they're
always upgraded to current schema on access.
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.
Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.
Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.
Schema requesting across nodes is currently stubbed (throws runtime
exception).
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.
Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
With 10 sstables/shard and 50 shards, we get ~10*50*50 messages = 25,000
log messages about sstables being ignored. This is not reasonable.
Reduce the log level to debug, and move the message to database.cc,
because at its original location, the containing function has nothing to
do with the message itself.
Reviewed-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Message-Id: <1452181687-7665-1-git-send-email-avi@scylladb.com>
We have an API that wraps open_file_dma which we use in some places, but in
many other places we call the reactor version directly.
This patch changes the latter to match the former. It will have the added benefit
of allowing us to make easier changes to these interfaces if needed.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <29296e4ec6f5e84361992028fe3f27adc569f139.1451950408.git.glauber@scylladb.com>
When 'nodetool clearsnapshot' is given no parameters it should
remove all existing snapshots.
Fixes issue #639
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
service::storage_service::clear_snapshot() was built around _db.local()
calls so it makes more sense to move its code into the 'database' class
instead of calling _db.local().bla_bla() all the time.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Actually check that a snapshot directory with a given tag
exists instead of just checking that a 'snapshot' directory
exists.
Fixes issue #689
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
The last two loops were incorrectly inside the first one. That's a
bug because a new sstable may be emplaced more than once in the
sstable list, which can cause several problems. mark_for_deletion
may also be called more than once for compacted sstables, however,
it is idempotent.
Found this issue while auditing the code.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
If a sstable doesn't belong to current shard, mark_for_deletion
should be called for the deletion manager to still work.
It doesn't mean that the sstable will be deleted, but that the
sstable is not relevant to the current shard, thus it can be
deleted by the deletion manager in the future.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
If there is no snapshot directory for the specific column family,
get_snapshot_details should return an empty map.
This patch check that a directory exists before trying to iterate over
it.
Fixes#619
Signed-off-by: Amnon Heiman <amnon@scylladb.com>