In preparation for adding listener state to migration manager, use
sharded<> for migration manager.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
This message is printed when we are about to run the strategy code
which may not decide to compact anything. Compaction is already
properly logged in sstables::compact_sstables().
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Add CF UUID validation to update table paths to make us behave like
Origin for parallel table creation.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Abillity to enable/disable specific sub-modules - this settings do not
affect system tables which are allways persisted,cached and written to
commitlog
enable-in-memory-data-store marks if tables will be written/read to/from
disk
enable-commitllog marks if tables will be written to commitlog
enable-cache marks if tables will be written/read to/from cache
Please note in-memory-data-store does not change the read path so "old"
sstables are still read and cache may be used to cache their data
Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
We should only call column_family::start after the checks because
if a check failed, column_family would be destroyed without
column_family::stop being called first, and that would lead to
a problem, such as _compaction_done future not being resolved.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
So far, automatic compaction was disabled, but now that we support
size-tiered strategy, the default compaction strategy algorithm,
we could definitely enable automatic compaction by default.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Currently, compaction will no longer happen for a column family which
a compaction failed for some unexpected reason.
We want to implement a retry policy that will sleep for a while until
the next compaction attempt. This patch implements retry policy for
compaction using exponential_backoff_retry.
With exponential_backoff_retry, the sleep time grows exponentially
with the number of retries until the maximum sleep time is reached.
For compaction specifically, the base sleep time will be 5 seconds and
the maximum sleeping time will be 300 seconds, i.e. 5 minutes.
If compaction succeeded after a retry, the sleep time will be reset to
the base sleep time.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
We must wait for the system tables to be loaded on all shards before
populating the other keyspaces, or we might miss some keyspaces or column
families. This is hinted at by the fact that we use storage_proxy, which
isn't usable until the system keyspace is ready.
Credit to Tomek for identifying the problem and the fix.
Actually we should rethrow exceptions because they are needed for
keep_doing() to finish. Otherwise, the future _compaction_done
will never be resolved.
This reverts commit 89698b0d1c.
Support to compaction strategy options was recently added.
Previously, we were using default values in compaction strategy for
options, but now we can use the options defined in the schema.
Currently, we only support size-tiered strategy, so let's start
with it.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
broken_semaphore and seastar::gate_closed_exception exceptions are
used for regular termination of compaction fiber, which otherwise
would live forever. We shouldn't re-throw these exceptions, but
instead only print a log message.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
compact_all_sstables is about selecting all available sstables
for compaction and executing a compaction code on them.
This compaction code was moved to a more generic function called
compact_sstables, which will compact a list of given sstables.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Tomek pointed out that we shouldn't be passing a reference to commitlog every
time we use the add_column_family interface, because that will at times pass a
reference to a null object.
Test that, and pass no_commitlog if there is none.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
As the name implies, this patch introduces the concept of automatic
compaction for sstables.
Compaction task is triggered whenever a new sstable is written.
Concurrent compaction on the same column family isn't supported, so
compaction may be postponed if there is an ongoing compression.
In addition, seastar::gate is used both to prevent a new compaction
from starting and to wait for an ongoing compaction to finish, when
the system is asked for a shutdown.
This patch also introduces an abstract class for compaction strategy,
which is really useful for supporting multiple strategies.
Currently, null and major compaction strategies are supported.
As the name implies, null compaction strategy does nothing.
Major compaction strategy is about compacting all sstables into one.
This strategy may end up being helpful when adding support to major
compaction via nodetool.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
In much of our column_families APIs, we need to pass a pointer to the database.
The only reason we do that, is so we can properly handle the commit log entries
after we seal the current memtables into sstables.
Now that we store a pointer to the commit log in the CF itself at the time it
is created, we no longer have to do it. As a result, the APIs are a lot
cleaner, with no gratuitous parameters.
My motivation for this was the flush method, but as a result, apply() also gets
cleaner.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
When we create a column family, we can pass as an extra parameter, the
commitlog - or lack thereof. Because the commitlog is optional to begin with -
it won't exist if we don't call init_commitlog, we can have this to be empty
meaning no commit log.
The creation of a column family should be always done through
add_column_family. And if that is the case, we have the database's commitlog
right there and can get the pointer through the db. Only tests are not creating
the column family this way, and for them, it is fine.
We want to do that, because some column family operations will use the commit log.
Right now, they are forcing us to add parameters to APIs that would be much cleaner
without it. So while separation is good, this level of coupling is a net win as it
allows us to clean up some visible APIs.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
The logger class constructor registers itself with the logger registry,
in order to enable dynamically setting log levels. However, since
thread_local variables may be (and are) initialized at the time of first
use, when the program starts up no loggers are registered.
Fix by making loggers global, not thread_local. This requires that the
registry use locking to prevent registration happening on different threads
from corrupting the registry.
Note that technically global variables can also be initialized at the
point of first use, and there is no portable way for classes to self-register.
However this is the best we can do.
Mutation query differs from data query in that returns information
needed to reconcile data slice with that retruned by other data
sources.
There is a generic mutation_query() algorithm introduced, which can
work with any mutation_source.
database::query_mutations() is a shard-local interface for mutation
queries.
The reconcilable_result is introduced as a medium for mutation query
results. It piggy backs on frozen_mutation as a medium for
reconcilable data.
When partition has no live regular rows, but has some data live in the
static row, then it should appear in the results, even though we
didn't select any static column.
To reproduce:
create table cf (k blob, c blob, v blob, s1 blob static, primary key (k, c));
update cf set s1 = 0x01 where k = 0x01;
update cf set s1 = 0x02 where k = 0x02;
select k from cf;
The "select" statement should return 2 rows, but was returning 0.
The following query worked fine, because static columns were included:
select * from cf;
The data query should contain only live data, so we shouldn't write a
partition entry if it's supposed to be absent from the results. We
can'r tell that though until we've processed all the data. To solve
this problem, query result writer is using an optimistic approach,
where the partition header will be retracted from the buffer
(cheaply), if it turns out there's no live data in it.
"We don't want multiple shards to respond with the same data. Higher level code
assumes that shard data is non-overlapping. It's cheaper to drop duplicates as
soon as possible. Memtable reader for example will never have overlapping
data, so cache hitting queries will never need to pay for this. Compaction
process may also rely on this."
We always operate on the local storage proxy so pass it by reference.
This simplifies DEFINITIONS_UPDATE message handler where all we have is
a "this" pointer to the local storage proxy.
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
Aside from guaranteeing that we will always leave correctly, it will also
allow us to change the implementation of the enter / leave pair without
disrupting existing code.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
After compaction, remove the source sstables. This cannot be done
immediately, as ongoing reads might be using them, so we mark the sstable
as "to be deleted", and when all references to this sstable are lost and
the object is destroy, we see this flag and delete the on-disk files.
This patch doesn't change the low-level compact_sstables() (which doesn't
mark its input sstables for deletion), but rather the higher-level example
"strategy" column_family::compact_all_sstables(). I thought we might want
to do this to allow in the future strategies that might only mark the input
sstables for deletion after doing perhaps other steps and to be sure it
doesn't want to abort the compaction and return to the old files. If we
decide this isn't needed, we can easily move the mark_for_deletion() call
to compact_sstables().
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
One of the find_schema variants calls a find_uuid() that throws out_of_range,
without converting it to no_such_column_family first. This results in
std::terminate() being called due to exception specifications.
Fix by converting the exception.
Same as for keyspaces, and we will reuse most of the code. CFs that are
found on disk upon bootstrap are now recreated in memory.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This patch bootstraps the keyspaces found in system sstables and make
our in-memory structures reflect them. It tries to reuse as much code
as we can from db::legacy_system_tables, but keeping everything local
and without applying any mutations to the database - since the latter
would only unnecessarily complicate the write path.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>