Commit Graph

53948 Commits

Author SHA1 Message Date
Glauber Costa
19bb50f450 api: implement take_snapshot
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:31 +02:00
Glauber Costa
21f84d77fc storage_service: delete a snapshot
This patch provides an storage service api to delete an snapshot.  Because all
keyspaces and CFs are visible in all shards. This will allow us to fetch the
list of keyspaces in the present shard and issue the filesystem operations in
that same shard.

That simplifies the code tremendously, and because there are not any operations
we need to do previous to the fs ones (like in the case of create snapshot), we
need no synchronization. Even easier.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:31 +02:00
Glauber Costa
2f2a4e83e0 storage_service: take a snapshot of a particular column family
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:30 +02:00
Glauber Costa
fe3164714f storage_service: take a snapshot of a group of keyspaces
Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:30 +02:00
Glauber Costa
d236b01b48 snapshots: check existence of snapshots
We go to the filesystem to check if the snapshot exists. This should make us
robust against deletions of existing snapshots from the filesystem.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:58:26 +02:00
Glauber Costa
d3aef2c1a5 database: support clear snapshot
This allows for us to delete an existing snapshot. It works at the column
family level, and removing it from the list of keyspace snapshots needs to
happen only when all CFs are processed. Therefore, that is provided as a
separate operation.

The filesystem code is a bit ugly: it can be made better by making our file
lister more generic. First step would be to call it walker, not lister...

For now, we'll use the fact that there are mostly two levels in the snapshot
hierarchy to our advantage, and avoid a full recursion - using the same lambda
for all calls would require us to provide a separate class to handle the state,
that's part of making this generic.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:38:14 +02:00
Glauber Costa
500ee99c93 file lister: allow for more than one directory type
There are situations in which we would like to match more than one directory
type.  One example of that, would be a recursive delete operation: we need to
delete the files inside directories and the directories themselves, but we
still don't want a "delete all" since finding anything other than a directory
or a file is an error, and we should treat it as such.

Since there aren't that many times, it should be ok performance wise to just
use a list. I am using an unordered_set here just because it is easy enough,
but we could actually relax it later if needed. In any case, users of the
interface should not worry about that, and that decision is abstracted away
into lister::dir_entry_types.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-20 15:38:14 +02:00
Asias He
9e5ee17f4a storage_service: Implement rebuild 2015-10-20 21:32:30 +08:00
Asias He
31b50a83d1 storage_service: Implement excise 2015-10-20 21:32:30 +08:00
Asias He
0cf112501e storage_service: Implement restore_replica_count 2015-10-20 21:32:30 +08:00
Asias He
0ebcb1ddef storage_service: Stub send_replication_notification
Needed by restore_replica_count.
2015-10-20 21:32:30 +08:00
Asias He
1c480554eb storage_service: Stub get_new_source_ranges
Needed by restore_replica_count.
2015-10-20 21:32:30 +08:00
Asias He
955e766a49 storage_service: Partially implement decommission 2015-10-20 21:32:30 +08:00
Asias He
893849f8af storage_service: Stub unbootstrap 2015-10-20 21:32:30 +08:00
Asias He
9ebae12614 storage_service: Partially implement remove_node
restoreReplicaCount and excise are missing.
2015-10-20 21:32:30 +08:00
Asias He
142f29483a token_metadata: Implement add_leaving_endpoint 2015-10-20 21:32:30 +08:00
Asias He
f1bc882b90 storage_service: Implement get_changed_ranges_for_leaving 2015-10-20 21:32:30 +08:00
Asias He
937474bf14 abstract_replication_strategy: Make calculate_natural_endpoints public
It is used by storage_service.
2015-10-20 21:32:30 +08:00
Asias He
c5e35ac57e storage_service: Enable _replicating_nodes and _removing_node members 2015-10-20 21:32:30 +08:00
Asias He
f30fbd53ff storage_service: Start to use pending_range_calculator_service 2015-10-20 21:32:30 +08:00
Asias He
934c963d85 init: Init pending_range_calculator_service 2015-10-20 21:32:29 +08:00
Asias He
8d6200c036 service: Convert PendingRangeCalculatorService.java to C++ 2015-10-20 21:32:29 +08:00
Asias He
a5d91519f2 service: Import PendingRangeCalculatorService.java 2015-10-20 21:32:29 +08:00
Asias He
c96bc8bbd2 token_metadata: Implement calculate_pending_ranges 2015-10-20 21:32:14 +08:00
Asias He
a6065397d9 token_metadata: Implement clone_after_all_left 2015-10-20 20:38:43 +08:00
Avi Kivity
5abdc4323a Merge seastar upstream
* seastar 46fd389...6af5a0d (2):
  > sharded: do not try to call nonexistent delete callback
  > use real argv0 in dpdk initialization
2015-10-20 15:19:35 +03:00
Tomasz Grabiec
67d0f9c7df lsa: Restore heap invariant before calling _segments.erase()
This is certainly the right thing to do and seems to fix #403. However
I didn't manage to convince myself that this would cause problems for
binomial_heap, given that binomial_heap::erase() calls siftup()
anyway:

    void erase(handle_type handle)
    {
        node_pointer n = handle.node_;
        siftup(n, force_inf());
        top_element = n;
        pop();
    }

    void increase (handle_type handle)
    {
        node_pointer n = handle.node_;
        siftup(n, *this);

        update_top_element();
        sanity_check();
    }
2015-10-20 15:18:05 +03:00
Asias He
08b762111e storage_service: Fix ignored future in on_dead
All the on_xxx interface in i_endpoint_state_change_subscriber are
called within a seastar::async context.
2015-10-20 15:16:03 +03:00
Amnon Heiman
1e8752d55e API: Fix a confusion in the storage service snapshot details
There was a confusion between the snapshot key and the keyspace in the
snapshot details, this fixes it.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 13:53:09 +03:00
Avi Kivity
2575a4602e Merge "Fix for snapshots/create_links and shared SSTables" from Glauber
"Those are fixes needed for the snapshotting process itself. I have bundled this
in the create_snapshot series before to avoid a rebase, but since I will have to
rewrite that to get rid of the snapshot manager (and go to the filesystem),
I am sending those out on their own."
2015-10-20 13:49:17 +03:00
Pekka Enberg
3a3c7f7e79 configure.py: Propagate CFLAGS to seastar config
We need to propagate CFLAGS to seastar config for things like
DEBUG_SHARED_PTR to work.
2015-10-20 10:57:26 +03:00
Amnon Heiman
91d396760e API: add a workaround for the get schema_versions
This adds a workaround for the get schema_version, it will return only a
shcema version of the local node, this is a temporary workaround until
describe_schema_versions will be implemented.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:56:21 +03:00
Amnon Heiman
383d7ccf4d Add the get cluster and partitioner names to the API
This adds the implementation for the get cluster name and get
partitioner name to the storage_service API.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:54:55 +03:00
Amnon Heiman
77b4fc74cd gossiper need to set the cluster name on all shareds
The API can call any of the gossiper shareds to get the cluster name, so
the initilization needs to set it in all of them.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:52:14 +03:00
Amnon Heiman
ff67285091 gossiper: make the get cluster name and partitioner public
The API needs the cluster and the partitioner names, so the methods are
now public.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-20 10:21:19 +03:00
Calle Wilund
786d66cacf commitlog: Fix use-after-free
Remove "finally". Just use a then_wrapped. Which it was originally, before
"handle_exception" was introduced to seastar. Oh, the irony...
2015-10-20 09:56:40 +03:00
Calle Wilund
02732f19f2 database: Handle CF flush with no high replay_position
We occasionally generate memtables that are not empty, yet have no
high replay_position set. (Typical case is CL replay, but apparently
there are others).

Moreover, we can do this repeatedly, and thus get caught in the flush
queue ordering restrictions.

Solve this by treating a flush without replay_position as a flush at the
highest running position, i.e. "last" in queue. Note that this will not
affect the actual flush operation, nor CL callbacks, only anyone waiting
for the operation(s) to complete.
2015-10-20 08:24:04 +02:00
Calle Wilund
31fca82213 flush_queue_test: Add test for multiple ops per key 2015-10-20 08:24:04 +02:00
Calle Wilund
62c0be376c flush_queue: Ease key restriction and allow multiple calls on each key
As long as we guarantee that the execution order for the post ops are
upheld, we can allow insertion of multiple ops on the same key.

Implemented by adding a ref count to each position.

The restriction then becomes that an added key must either be larger
than any already existing key, _OR_ already exist. In the latter case,
we still know that we have not finished this position and signaled
"upwards".
2015-10-20 08:24:04 +02:00
Calle Wilund
ca451acb41 commitlog: Fix use-after-free
Remove "finally". Just use a then_wrapped. Which it was originally, before
"handle_exception" was introduced to seastar. Oh, the irony...
2015-10-20 08:02:46 +02:00
Avi Kivity
2ccb5feabd Merge "Support nodetool cfhistogram"
"This series adds the missing estimated histogram to the column family and to
the API so the nodetool cfhistogram would work."
2015-10-19 17:11:46 +03:00
Raphael S. Carvalho
28ef8feffa tests: fix test for leveled compaction
test_setup::do_with_test_directory is missing. For some reason,
the test wasn't failing without it until now. Adding it is the
correct thing to do anyway.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2015-10-19 17:07:08 +03:00
Avi Kivity
f4706c7050 Merge "initial support to leveled compaction" from Raphael
"This patchset introduces leveled compaction to Scylla.
We don't handle all corner cases yet, but we already have the strategy
and compaction working as expected. Test cases were written and I also
tested the stability with a load of cassandra-stress.

Leveled compaction may output more than one sstable because there is
a limit on the size of sstables. 160M by default.
Related to handling of partial compaction, it's still something to be
worked on.

Anyway, it will not be a big problem. Why? Suppose that a leveled
compaction will generate 2 sstables, and scylla is interrupted after
the first sstable is completely written but before the second one is
completely written. The next boot will delete the second sstable,
because it was partially written, but will not do anything with the
first one as it was completely written.
As a result, we will have two sstables with redundant data."
2015-10-19 16:17:45 +03:00
Amnon Heiman
5998ee718a API: Add logger API implementation to the system API
This patch adds the ability to set one or all log levels get a log level
and get all logs name.

After this patch the following url will be available:
GET/POST
/system/logger
/system/logger/{name}

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-19 15:05:46 +03:00
Amnon Heiman
ba1e6adf2a Adding the system swagger definition
The system api will include system related command, currently it holds
the logger related API. It holds definition for the following commands:
get_all_logger_names
set_all_logger_level
get_logger_level
set_logger_level

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-19 14:59:54 +03:00
Amnon Heiman
521d9b62dd Add a level_name function to logger
This is a helper function that returns a log level name. It will be used
by the API to report the log levels.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-19 14:59:54 +03:00
Glauber Costa
218fdebbeb snapshot: do not allow exceptions in snapshot creation hang us
With the distribute-and-sync method we are using, if an exception happens in
the snapshot creation for any reason (think file permissions, etc), that will
just hang the server since our shard won't do the necessary work to
synchronize and note that we done our part (or tried to) in snapshot creation.

Make the then clause a finally, so that the sync part is always executed.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-19 13:37:02 +02:00
Glauber Costa
9083a0e5a7 snapshots: fix generation of snapshots with shared sstables
create_links will fail in one of the shards if one of the SSTables happen to be
shared. It should be fine if the link already exists, so let's just ignore that case.

Signed-off-by: Glauber Costa <glommer@scylladb.com>
2015-10-19 13:37:01 +02:00
Gleb Natapov
156f760663 storage_proxy: drop Origin's truncate code from comment
Drop already translated code.
2015-10-19 14:06:06 +03:00
Gleb Natapov
ae42ec7832 storage_proxy: actually sort endpoints in get_live_sorted_endpoints() 2015-10-19 13:38:37 +03:00