Commit Graph

307 Commits

Author SHA1 Message Date
Glauber Costa
89366dc2c2 sstables: do not accept files with missing TOC.
We can catch most errors when we try to load an sstable. But if the TOC file is
the one missing, we won't try to load the sstable at all. This case is still an
invalid case, but it is way easier for us to treat it by waiting for all files
to be loaded, and then checking if we saw a file during scan_dir, without its
corresponding TOC.

Fixes #114

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 15:21:40 +03:00
Glauber Costa
0650579ace sstables: refuse to boot on corrupted sstables
We are now skipping them. That's dangerous.

Fixes #115

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 15:21:38 +03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Avi Kivity
1016b21089 cache: improve preloading of flushed memtable mutations
If a mutation definitely doesn't exist in all sstables, then we can
certainly load it into the cache.
2015-08-09 22:46:08 +03:00
Glauber Costa
c2a0232048 database: generate UUIDs compatible with Cassandra 2.1.8
Without this, Cassandra won't even try to read our sstables. The containing
directories will be ignored.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:56 -05:00
Glauber Costa
c8ca9b376d database: change default sstable version
Let's change the default generated tables to ka, which is the one that is present
in Origin

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
2d1b965f91 database: change filename parser to also accept ka
A ka file has a slightly different name on disk. Change the
parser so we can deal with both

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
cd8c9ad288 sstables: add ks and cf name to sstable constructor
When a schema is available, we use it. However, we have, by now, way too many
tests. Some of them use tables for which we don't even know the schema. It would
have been a massive amount of work to require a schema for all of them - so I am
keeping both constructors around.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
77e06c3ab1 sstables: remove name parameter
It is currently only used to log a message, and for that we have an sstable
method that will do just fine. Using the name itself just makes it being passed
along throughout the captures.  Remove it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:53 -05:00
Raphael S. Carvalho
64fcd16c0c db: adding data to column family statistics for API
Adding required data for column family API to be implemented.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 17:38:59 +03:00
Avi Kivity
48a1ce28fc Merge "Switch to log-structured allocator" from Tomasz 2015-08-06 15:45:39 +03:00
Tomasz Grabiec
926509525f row_cache: Switch to using LSA 2015-08-06 14:05:16 +02:00
Tomasz Grabiec
18ec9c3643 db: Move column_family::flush() to source file 2015-08-06 14:05:16 +02:00
Tomasz Grabiec
3b92ba2857 db: Add memtable flush logging 2015-08-06 14:05:16 +02:00
Pekka Enberg
dae1119796 database: Fix create keyspace ASan error
ASan does not like commit 05c23c7f73
("database: Add create_keyspace_on_all() helper"):

  ==8112==WARNING: AddressSanitizer failed to allocate 0x7f88b84fc690 bytes
  ==8112==AddressSanitizer's allocator is terminating the process instead of returning 0
  ==8112==If you don't like this behavior set allocator_may_return_null=1
  ==8112==Sanitizer CHECK failed: ../../../../libsanitizer/sanitizer_common/sanitizer_allocator.cc:147 ((0)) != (0) (0, 0)

I was not able to determine the source of the bug. Make ASan happy by
reverting the code movement and using the "cpu zero" trick we use for
table creation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 13:02:58 +03:00
Glauber Costa
fb34ac7f65 database: fix scan_dir
When probing for the type, I have made the classical mistake of using
as a parameter part of a structure that is moved into the capture. That
is what broke our tests.

But also, when stat'ing, de.name will give us only the component relative to
the current path. We need to add the directory so the stat will succeed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-06 10:10:56 +03:00
Glauber Costa
ece8f01d06 database: make sure a type is present.
Our directory scanner currently requires a type to be passed, and we have a
FIXME saying that we should stat when there is none. In some filesystems,
in particular, XFS, getdents won't return a type, meaning we should manually
probe it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-05 22:52:07 +03:00
Avi Kivity
522f23b830 Merge "Schema table cleanups" from Pekka
"Clean up the schema table code. Be explicit that we don't support
Cassandra 3.0 and eliminate some dead code."
2015-08-05 15:09:59 +03:00
Raphael S. Carvalho
3ddb9be984 db: fix compaction on an empty column family
When forcing a compaction on a column family with no sstables, an
assert will fail because there is no sstables to be compacted.
This problem is fixed by ignoring a compaction request when no
sstable is provided.

Fixes #61.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-05 14:04:22 +03:00
Pekka Enberg
99a80050e3 db: Rename legacy_schema_tables to schema_tables
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:56:47 +03:00
Avi Kivity
55ca295154 Merge "Initial CQL event support" from Pekka
"This series implements initial support for CQL events. We introduce
migration_listener hook in migration manager as well as event notifier
in the CQL server that's built on top of it to send out the events via
CQL binary protocol. We also wire up create keyspace events to the
system so subscribed clients are notified when a new keyspace is
created.

There's still more work to be done to support all the events. That
requires some work to restructure existing code so it's better to merge
this initial series now and avoid future code conflicts."
2015-08-05 12:56:37 +03:00
Pekka Enberg
618ba067bf database: Wire up create keyspace listener hook
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:52 +03:00
Pekka Enberg
05c23c7f73 database: Add create_keyspace_on_all() helper
Add a create_keyspace_on_all() helper which is needed for sending just
one event notification per created keyspace, not one per shard.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:52 +03:00
Paweł Dziepak
8a0d21b8b8 query: support option distinct in partition_slice
In case of SELECT DISTINCT statments we are not intersted in clustering
keys at all. The only important information is whether partition key
exists and what's in static row (if it exists).

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-04 15:39:42 +02:00
Pekka Enberg
a3c95235e6 migration_manager: Make stateful with sharded<>
In preparation for adding listener state to migration manager, use
sharded<> for migration manager.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-04 11:23:23 +03:00
Raphael S. Carvalho
34eaeedff2 db: remove imprecise log message about compaction
This message is printed when we are about to run the strategy code
which may not decide to compact anything. Compaction is already
properly logged in sstables::compact_sstables().

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-04 00:15:50 +03:00
Avi Kivity
9663ea86e9 db: fix test for whether an sstable includes a shard's range
Spotted by Raphael and Nadav.
2015-08-04 00:14:22 +03:00
Avi Kivity
c1a2831d41 db: ignore sstables that clearly don't belong to this shard 2015-08-03 20:17:41 +03:00
Pekka Enberg
e22f5a1cd7 database: Add CF UUID validation to update
Add CF UUID validation to update table paths to make us behave like
Origin for parallel table creation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-03 13:41:16 +03:00
Pekka Enberg
0b762338c1 database: Futurize update_column_family()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-03 13:41:16 +03:00
Shlomi Livne
199f4d2545 Add enable-in-memory-data-store,enable-commitlog,enable-cache config
Abillity to enable/disable specific sub-modules - this settings do not
affect system tables which are allways persisted,cached and written to
commitlog

enable-in-memory-data-store marks if tables will be written/read to/from
disk
enable-commitllog marks if tables will be written to commitlog
enable-cache marks if tables will be written/read to/from cache

Please note in-memory-data-store does not change the read path so "old"
sstables are still read and cache may be used to cache their data

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-02 17:19:30 +03:00
Avi Kivity
98ec451d6a Extract range<> into its own header
It's not just for queries any more.
2015-08-02 16:07:42 +03:00
Raphael S. Carvalho
6bc822dd71 db: fix problem with initialization of a column family
We should only call column_family::start after the checks because
if a check failed, column_family would be destroyed without
column_family::stop being called first, and that would lead to
a problem, such as _compaction_done future not being resolved.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-31 13:03:07 +03:00
Raphael S. Carvalho
d791438a43 db: enable automatic compaction by default
So far, automatic compaction was disabled, but now that we support
size-tiered strategy, the default compaction strategy algorithm,
we could definitely enable automatic compaction by default.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-29 19:02:16 +03:00
Raphael S. Carvalho
5a70c8c8f4 db: implement retry policy for compaction
Currently, compaction will no longer happen for a column family which
a compaction failed for some unexpected reason.
We want to implement a retry policy that will sleep for a while until
the next compaction attempt. This patch implements retry policy for
compaction using exponential_backoff_retry.
With exponential_backoff_retry, the sleep time grows exponentially
with the number of retries until the maximum sleep time is reached.
For compaction specifically, the base sleep time will be 5 seconds and
the maximum sleeping time will be 300 seconds, i.e. 5 minutes.
If compaction succeeded after a retry, the sleep time will be reset to
the base sleep time.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-28 18:24:04 -03:00
Avi Kivity
b3b0d5150a main: fix system tables vs. sstables initialization race
We must wait for the system tables to be loaded on all shards before
populating the other keyspaces, or we might miss some keyspaces or column
families.  This is hinted at by the fact that we use storage_proxy, which
isn't usable until the system keyspace is ready.

Credit to Tomek for identifying the problem and the fix.
2015-07-28 09:49:11 +02:00
Avi Kivity
2e745bebad Merge "use compaction strategy options" from Raphael 2015-07-27 17:06:43 +03:00
Raphael S. Carvalho
15bbb71b7b db: handle compaction exception outside keep doing
Otherwise, we would needlessly handle it twice.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-24 19:12:34 -03:00
Raphael S. Carvalho
5f89f80ae5 Revert "db: dont rethrow exceptions for termination of compaction fiber"
Actually we should rethrow exceptions because they are needed for
keep_doing() to finish. Otherwise, the future _compaction_done
will never be resolved.

This reverts commit 89698b0d1c.
2015-07-24 19:07:47 -03:00
Raphael S. Carvalho
634d00511b compaction: use compaction options in strategy
Support to compaction strategy options was recently added.
Previously, we were using default values in compaction strategy for
options, but now we can use the options defined in the schema.
Currently, we only support size-tiered strategy, so let's start
with it.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-23 15:26:47 -03:00
Glauber Costa
d1496944d9 sstables: handle compaction strategy
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-23 00:02:11 -04:00
Avi Kivity
8870bf1bf8 Merge "Handling of non-full partition range queries" from Tomasz 2015-07-22 15:18:02 +03:00
Tomasz Grabiec
f9da612581 memtable: Implement range queries 2015-07-22 13:14:33 +02:00
Tomasz Grabiec
152582a869 sstables: Add read_range_rows() variant which takes a partition_range 2015-07-22 13:13:38 +02:00
Pekka Enberg
791031fbc7 database: Extract update_schema_version_and_announce() function
It's needed in storage proxy.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 11:57:00 +03:00
Tomasz Grabiec
0b0ea04958 range: Remove start_value() and end_value()
It's easy to miss that they may be undefined. start() and end(), which
return optional<bound> const&, make it clear.
2015-07-22 10:27:47 +02:00
Tomasz Grabiec
4a18693a23 db: Remove dead code 2015-07-22 10:27:47 +02:00
Raphael S. Carvalho
89698b0d1c db: dont rethrow exceptions for termination of compaction fiber
broken_semaphore and seastar::gate_closed_exception exceptions are
used for regular termination of compaction fiber, which otherwise
would live forever. We shouldn't re-throw these exceptions, but
instead only print a log message.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-22 11:23:58 +03:00
Avi Kivity
8ba5d19db5 db: avoid ubsan false-positive in query_state move constructor
The value is moved before initialization due to a do_with().  It's harmless,
but better to silence the warning.
2015-07-21 12:19:54 +03:00
Raphael S. Carvalho
6ae3ffa319 database: add get_sstables to column_family
Returns all sstables added to a given column_family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-20 10:08:09 -03:00