Commit Graph

328 Commits

Author SHA1 Message Date
Avi Kivity
5f62f7a288 Revert "Merge "Commit log replay" from Calle"
Due to test breakage.

This reverts commit 43a4491043, reversing
changes made to 5dcf1ab71a.
2015-08-27 12:39:08 +03:00
Avi Kivity
0fff367230 Merge "test for compaction metadata's ancestors" from Raphael 2015-08-27 11:07:53 +03:00
Avi Kivity
4e3c9c5493 Merge "compaction manager fixes" from Raphael 2015-08-27 11:05:26 +03:00
Avi Kivity
43a4491043 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
2015-08-27 10:53:36 +03:00
Avi Kivity
e6965c520d Merge "Adding the ownership suport to storage_service" from Amnon
"This series adds the missing code from origin to support this functionality.
While doing so, some method where changed to be const when it was more
appropriate and a few const version of methods where added when the two
variation was required."
2015-08-25 20:13:33 +03:00
Amnon Heiman
b5ceef451e keyspace: Add the get_non_system_keyspaces and expose the replication
This patch adds the get_non_system_keyspaces that found in origin and
expose the replication strategy. With the get_replication_strategy
method.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:13 +03:00
Vlad Zolotarov
08e7736f0b database::find_column_family(): init the exception with the readable message
Make the exceptions created inside database::find_column_family() return
a readable message from their what() method.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2015-08-25 18:00:19 +03:00
Calle Wilund
df8d7a8295 Database: Add "flush_all_memtables" 2015-08-25 09:41:56 +02:00
Calle Wilund
5524da8f18 Database: do not create shard-specific dirs for commitlog
New ID scheme allows for a single dir for all segments from all shards.
2015-08-25 09:40:52 +02:00
Avi Kivity
4390be3956 Rename 'negative_mutation_reader' to 'partition_presence_checker'
Suggested by Tomek.
2015-08-24 18:03:22 +03:00
Raphael S. Carvalho
c65af6e188 api: add get_unleveled_sstables to column family api
Adding to API function to return count of sstables in L0 if leveled
compaction strategy is enabled, 0 otherwise. Currently, we don't
support leveled compaction strategy, so function to return count of
sstables in L0 always return zero.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-24 11:56:31 -03:00
Raphael S. Carvalho
4c9c144987 compaction_manager: avoid concurrent compaction on the same cf
It was noticed that the same sstable files could be selected for
compaction if concurrent compaction happens on the same cf.
That's possible because compaction manager uses 2 tasks for
handling compactions.

Solution is to not duplicate cf in the compaction manager queue,
and re-schedule compaction for a cf if needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-24 11:11:47 -03:00
Avi Kivity
8a4648761c tests: make test cql environment use volatile system keyspace
Prevents hangs due to the database not being able to persist a memtable.

Tested-by: Asias He <asias@cloudius-systems.com>
2015-08-24 13:50:22 +03:00
Avi Kivity
6f11322220 db: move annoying log on non-durable cf to quieter place
Fixes #174.
2015-08-23 23:12:07 +03:00
Avi Kivity
c01bc16f58 db: don't give up flushing a memtable on error
We must try again, or the memtable's memory will never be reclaimed.
2015-08-19 19:36:41 +03:00
Avi Kivity
6846909533 db: extract sstable flushing code to a function 2015-08-19 19:36:41 +03:00
Avi Kivity
5bf5476beb db: add collectd counter for dirty memory 2015-08-19 19:36:41 +03:00
Avi Kivity
c175025bb6 db: place all memtables into a single region_group
We can use this to track the amount of unevictable memory in the
system.
2015-08-19 19:36:41 +03:00
Avi Kivity
7b67b04822 db: wire up max memtable size configuration 2015-08-19 13:17:27 +03:00
Avi Kivity
176ab06f77 db: demote commitlog reorderign detected log message to debug
It's less rare than we thought and also less interesting.
2015-08-19 09:26:23 +03:00
Raphael S. Carvalho
820ba6f4d2 adapt compaction manager for column family removal
We need a way to remove a column family from the compaction manager
because when dropping a column family we need to make sure that the
compaction manager doesn't hold a reference to it anymore.

So compaction manager queue is now of column_family, allowing us
to cancel requests pertaining to a column family being dropped.
There may be an ongoing compaction for the column family being
dropped, so we also need to wait for its termination.

Testcase for compaction manager was also adapted and improved.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 11:38:06 +03:00
Glauber Costa
89366dc2c2 sstables: do not accept files with missing TOC.
We can catch most errors when we try to load an sstable. But if the TOC file is
the one missing, we won't try to load the sstable at all. This case is still an
invalid case, but it is way easier for us to treat it by waiting for all files
to be loaded, and then checking if we saw a file during scan_dir, without its
corresponding TOC.

Fixes #114

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 15:21:40 +03:00
Glauber Costa
0650579ace sstables: refuse to boot on corrupted sstables
We are now skipping them. That's dangerous.

Fixes #115

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 15:21:38 +03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Avi Kivity
1016b21089 cache: improve preloading of flushed memtable mutations
If a mutation definitely doesn't exist in all sstables, then we can
certainly load it into the cache.
2015-08-09 22:46:08 +03:00
Glauber Costa
c2a0232048 database: generate UUIDs compatible with Cassandra 2.1.8
Without this, Cassandra won't even try to read our sstables. The containing
directories will be ignored.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:56 -05:00
Glauber Costa
c8ca9b376d database: change default sstable version
Let's change the default generated tables to ka, which is the one that is present
in Origin

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
2d1b965f91 database: change filename parser to also accept ka
A ka file has a slightly different name on disk. Change the
parser so we can deal with both

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
cd8c9ad288 sstables: add ks and cf name to sstable constructor
When a schema is available, we use it. However, we have, by now, way too many
tests. Some of them use tables for which we don't even know the schema. It would
have been a massive amount of work to require a schema for all of them - so I am
keeping both constructors around.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:55 -05:00
Glauber Costa
77e06c3ab1 sstables: remove name parameter
It is currently only used to log a message, and for that we have an sstable
method that will do just fine. Using the name itself just makes it being passed
along throughout the captures.  Remove it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 08:31:53 -05:00
Raphael S. Carvalho
64fcd16c0c db: adding data to column family statistics for API
Adding required data for column family API to be implemented.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 17:38:59 +03:00
Avi Kivity
48a1ce28fc Merge "Switch to log-structured allocator" from Tomasz 2015-08-06 15:45:39 +03:00
Tomasz Grabiec
926509525f row_cache: Switch to using LSA 2015-08-06 14:05:16 +02:00
Tomasz Grabiec
18ec9c3643 db: Move column_family::flush() to source file 2015-08-06 14:05:16 +02:00
Tomasz Grabiec
3b92ba2857 db: Add memtable flush logging 2015-08-06 14:05:16 +02:00
Pekka Enberg
dae1119796 database: Fix create keyspace ASan error
ASan does not like commit 05c23c7f73
("database: Add create_keyspace_on_all() helper"):

  ==8112==WARNING: AddressSanitizer failed to allocate 0x7f88b84fc690 bytes
  ==8112==AddressSanitizer's allocator is terminating the process instead of returning 0
  ==8112==If you don't like this behavior set allocator_may_return_null=1
  ==8112==Sanitizer CHECK failed: ../../../../libsanitizer/sanitizer_common/sanitizer_allocator.cc:147 ((0)) != (0) (0, 0)

I was not able to determine the source of the bug. Make ASan happy by
reverting the code movement and using the "cpu zero" trick we use for
table creation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 13:02:58 +03:00
Glauber Costa
fb34ac7f65 database: fix scan_dir
When probing for the type, I have made the classical mistake of using
as a parameter part of a structure that is moved into the capture. That
is what broke our tests.

But also, when stat'ing, de.name will give us only the component relative to
the current path. We need to add the directory so the stat will succeed.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-06 10:10:56 +03:00
Glauber Costa
ece8f01d06 database: make sure a type is present.
Our directory scanner currently requires a type to be passed, and we have a
FIXME saying that we should stat when there is none. In some filesystems,
in particular, XFS, getdents won't return a type, meaning we should manually
probe it.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-05 22:52:07 +03:00
Avi Kivity
522f23b830 Merge "Schema table cleanups" from Pekka
"Clean up the schema table code. Be explicit that we don't support
Cassandra 3.0 and eliminate some dead code."
2015-08-05 15:09:59 +03:00
Raphael S. Carvalho
3ddb9be984 db: fix compaction on an empty column family
When forcing a compaction on a column family with no sstables, an
assert will fail because there is no sstables to be compacted.
This problem is fixed by ignoring a compaction request when no
sstable is provided.

Fixes #61.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
Reviewed-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-05 14:04:22 +03:00
Pekka Enberg
99a80050e3 db: Rename legacy_schema_tables to schema_tables
There's nothing legacy about it so rename legacy_schema_tables to
schema_tables. The naming comes from a Cassandra 3.x development branch
which is not relevant for us in the near future.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 13:56:47 +03:00
Avi Kivity
55ca295154 Merge "Initial CQL event support" from Pekka
"This series implements initial support for CQL events. We introduce
migration_listener hook in migration manager as well as event notifier
in the CQL server that's built on top of it to send out the events via
CQL binary protocol. We also wire up create keyspace events to the
system so subscribed clients are notified when a new keyspace is
created.

There's still more work to be done to support all the events. That
requires some work to restructure existing code so it's better to merge
this initial series now and avoid future code conflicts."
2015-08-05 12:56:37 +03:00
Pekka Enberg
618ba067bf database: Wire up create keyspace listener hook
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:52 +03:00
Pekka Enberg
05c23c7f73 database: Add create_keyspace_on_all() helper
Add a create_keyspace_on_all() helper which is needed for sending just
one event notification per created keyspace, not one per shard.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:52 +03:00
Paweł Dziepak
8a0d21b8b8 query: support option distinct in partition_slice
In case of SELECT DISTINCT statments we are not intersted in clustering
keys at all. The only important information is whether partition key
exists and what's in static row (if it exists).

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-04 15:39:42 +02:00
Pekka Enberg
a3c95235e6 migration_manager: Make stateful with sharded<>
In preparation for adding listener state to migration manager, use
sharded<> for migration manager.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-04 11:23:23 +03:00
Raphael S. Carvalho
34eaeedff2 db: remove imprecise log message about compaction
This message is printed when we are about to run the strategy code
which may not decide to compact anything. Compaction is already
properly logged in sstables::compact_sstables().

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-04 00:15:50 +03:00
Avi Kivity
9663ea86e9 db: fix test for whether an sstable includes a shard's range
Spotted by Raphael and Nadav.
2015-08-04 00:14:22 +03:00
Avi Kivity
c1a2831d41 db: ignore sstables that clearly don't belong to this shard 2015-08-03 20:17:41 +03:00
Pekka Enberg
e22f5a1cd7 database: Add CF UUID validation to update
Add CF UUID validation to update table paths to make us behave like
Origin for parallel table creation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-03 13:41:16 +03:00