Commit Graph

240 Commits

Author SHA1 Message Date
Avi Kivity
4390be3956 Rename 'negative_mutation_reader' to 'partition_presence_checker'
Suggested by Tomek.
2015-08-24 18:03:22 +03:00
Avi Kivity
8a4648761c tests: make test cql environment use volatile system keyspace
Prevents hangs due to the database not being able to persist a memtable.

Tested-by: Asias He <asias@cloudius-systems.com>
2015-08-24 13:50:22 +03:00
Avi Kivity
83d5c7e7c8 Merge 2015-08-24 10:58:39 +03:00
Avi Kivity
855ef838a9 db: fix use-after-free with region_group
_dirty_memory_region_group is used by the column_family's memtables, but
is destroyed before them.

Fix by changing the destruction order.

Fixes #175.
2015-08-24 10:51:03 +03:00
Avi Kivity
0afbdf4aa7 Merge "Add row related methods to the cache_service API" from Amnon
"This series expose statistics from the row_cache in the cache_service API.
After this series the following methods will be available:
get_row_hits
get_row_requests
get_row_hit_rate
get_row_size
get_row_entries"
2015-08-23 15:46:07 +03:00
Avi Kivity
c01bc16f58 db: don't give up flushing a memtable on error
We must try again, or the memtable's memory will never be reclaimed.
2015-08-19 19:36:41 +03:00
Avi Kivity
6846909533 db: extract sstable flushing code to a function 2015-08-19 19:36:41 +03:00
Avi Kivity
5bf5476beb db: add collectd counter for dirty memory 2015-08-19 19:36:41 +03:00
Avi Kivity
c175025bb6 db: place all memtables into a single region_group
We can use this to track the amount of unevictable memory in the
system.
2015-08-19 19:36:41 +03:00
Avi Kivity
7b67b04822 db: wire up max memtable size configuration 2015-08-19 13:17:27 +03:00
Avi Kivity
c317391f62 db: trigger memtable flush based on actual memory usage
Rather than using _mutation_count as a poor proxy.
2015-08-19 12:59:52 +03:00
Raphael S. Carvalho
820ba6f4d2 adapt compaction manager for column family removal
We need a way to remove a column family from the compaction manager
because when dropping a column family we need to make sure that the
compaction manager doesn't hold a reference to it anymore.

So compaction manager queue is now of column_family, allowing us
to cancel requests pertaining to a column family being dropped.
There may be an ongoing compaction for the column family being
dropped, so we also need to wait for its termination.

Testcase for compaction manager was also adapted and improved.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 11:38:06 +03:00
Amnon Heiman
361b2377bb Expose the row_cache in the column_family
This expose the row_cache in the column family, it will be used by the
API to get the row_cache statistic information.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-17 19:42:23 +03:00
Avi Kivity
608c0b8460 Merge "initial work on compaction manager API" from Rapahel 2015-08-17 17:24:13 +03:00
Avi Kivity
eb09eddee5 Merge "Adding sampled histogram" from Amnon
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.

This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.

The first to use the modified histogram are the column family for their read
and write."

Conflicts:
	database.hh
2015-08-16 17:15:24 +03:00
Glauber Costa
89366dc2c2 sstables: do not accept files with missing TOC.
We can catch most errors when we try to load an sstable. But if the TOC file is
the one missing, we won't try to load the sstable at all. This case is still an
invalid case, but it is way easier for us to treat it by waiting for all files
to be loaded, and then checking if we saw a file during scan_dir, without its
corresponding TOC.

Fixes #114

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 15:21:40 +03:00
Raphael S. Carvalho
077ac1cce1 db: add method to retrieve compaction_manager
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-12 15:10:25 -03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Amnon Heiman
4329377556 column_family to use histogram for read and write latency
With the use of sparse histogram, the read and write counters in the
column_family stats can be used.

The total impact on performanc should be neglectible.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Avi Kivity
1016b21089 cache: improve preloading of flushed memtable mutations
If a mutation definitely doesn't exist in all sstables, then we can
certainly load it into the cache.
2015-08-09 22:46:08 +03:00
Raphael S. Carvalho
64fcd16c0c db: adding data to column family statistics for API
Adding required data for column family API to be implemented.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 17:38:59 +03:00
Avi Kivity
48a1ce28fc Merge "Switch to log-structured allocator" from Tomasz 2015-08-06 15:45:39 +03:00
Tomasz Grabiec
18ec9c3643 db: Move column_family::flush() to source file 2015-08-06 14:05:16 +02:00
Pekka Enberg
dae1119796 database: Fix create keyspace ASan error
ASan does not like commit 05c23c7f73
("database: Add create_keyspace_on_all() helper"):

  ==8112==WARNING: AddressSanitizer failed to allocate 0x7f88b84fc690 bytes
  ==8112==AddressSanitizer's allocator is terminating the process instead of returning 0
  ==8112==If you don't like this behavior set allocator_may_return_null=1
  ==8112==Sanitizer CHECK failed: ../../../../libsanitizer/sanitizer_common/sanitizer_allocator.cc:147 ((0)) != (0) (0, 0)

I was not able to determine the source of the bug. Make ASan happy by
reverting the code movement and using the "cpu zero" trick we use for
table creation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 13:02:58 +03:00
Pekka Enberg
05c23c7f73 database: Add create_keyspace_on_all() helper
Add a create_keyspace_on_all() helper which is needed for sending just
one event notification per created keyspace, not one per shard.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:52 +03:00
Avi Kivity
a9cc6d7be1 Merge "Parallel CQL table creation improvements" from Pekka
"This series improves parallel CQL table creation to validate CF UUID.
This should bring us closer to Origin behavior when there's multiple
cassandra-stress processes started at the same time."
2015-08-03 14:37:29 +03:00
Pekka Enberg
0b762338c1 database: Futurize update_column_family()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-03 13:41:16 +03:00
Amnon Heiman
0d7fe9bd89 Adding stats to column_family
This adds the stats object to column_family.

It set the write counter in the write path and support the pending_flush
counter.

The stats object contains information for switch_count, number of
pending flushes, and counters for read, write, and range.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-03 11:09:37 +03:00
Shlomi Livne
199f4d2545 Add enable-in-memory-data-store,enable-commitlog,enable-cache config
Abillity to enable/disable specific sub-modules - this settings do not
affect system tables which are allways persisted,cached and written to
commitlog

enable-in-memory-data-store marks if tables will be written/read to/from
disk
enable-commitllog marks if tables will be written to commitlog
enable-cache marks if tables will be written/read to/from cache

Please note in-memory-data-store does not change the read path so "old"
sstables are still read and cache may be used to cache their data

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-02 17:19:30 +03:00
Raphael S. Carvalho
d791438a43 db: enable automatic compaction by default
So far, automatic compaction was disabled, but now that we support
size-tiered strategy, the default compaction strategy algorithm,
we could definitely enable automatic compaction by default.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-29 19:02:16 +03:00
Raphael S. Carvalho
5a70c8c8f4 db: implement retry policy for compaction
Currently, compaction will no longer happen for a column family which
a compaction failed for some unexpected reason.
We want to implement a retry policy that will sleep for a while until
the next compaction attempt. This patch implements retry policy for
compaction using exponential_backoff_retry.
With exponential_backoff_retry, the sleep time grows exponentially
with the number of retries until the maximum sleep time is reached.
For compaction specifically, the base sleep time will be 5 seconds and
the maximum sleeping time will be 300 seconds, i.e. 5 minutes.
If compaction succeeded after a retry, the sleep time will be reset to
the base sleep time.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-28 18:24:04 -03:00
Avi Kivity
b3b0d5150a main: fix system tables vs. sstables initialization race
We must wait for the system tables to be loaded on all shards before
populating the other keyspaces, or we might miss some keyspaces or column
families.  This is hinted at by the fact that we use storage_proxy, which
isn't usable until the system keyspace is ready.

Credit to Tomek for identifying the problem and the fix.
2015-07-28 09:49:11 +02:00
Pekka Enberg
791031fbc7 database: Extract update_schema_version_and_announce() function
It's needed in storage proxy.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-22 11:57:00 +03:00
Raphael S. Carvalho
6ae3ffa319 database: add get_sstables to column_family
Returns all sstables added to a given column_family.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-20 10:08:09 -03:00
Raphael S. Carvalho
ebbc7aa43e database: add compact_sstables to column_family
compact_all_sstables is about selecting all available sstables
for compaction and executing a compaction code on them.
This compaction code was moved to a more generic function called
compact_sstables, which will compact a list of given sstables.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-20 10:08:02 -03:00
Pekka Enberg
81cddec777 database: Add versioning support
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-07-16 14:53:30 +03:00
Raphael S. Carvalho
719898d0e5 introduce automatic compaction
As the name implies, this patch introduces the concept of automatic
compaction for sstables.

Compaction task is triggered whenever a new sstable is written.
Concurrent compaction on the same column family isn't supported, so
compaction may be postponed if there is an ongoing compression.
In addition, seastar::gate is used both to prevent a new compaction
from starting and to wait for an ongoing compaction to finish, when
the system is asked for a shutdown.

This patch also introduces an abstract class for compaction strategy,
which is really useful for supporting multiple strategies.
Currently, null and major compaction strategies are supported.
As the name implies, null compaction strategy does nothing.
Major compaction strategy is about compacting all sstables into one.
This strategy may end up being helpful when adding support to major
compaction via nodetool.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-07-16 12:00:12 +03:00
Glauber Costa
04c0fbcb8c remove calls to seal_active_memtable
It should not be called directly: externall callers should be calling flush()
instead.

To be sure it doesn't happen again, make seal_active_memtable private.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-15 10:24:20 -04:00
Glauber Costa
9c464aff9b database: clean up various APIs
In much of our column_families APIs, we need to pass a pointer to the database.
The only reason we do that, is so we can properly handle the commit log entries
after we seal the current memtables into sstables.

Now that we store a pointer to the commit log in the CF itself at the time it
is created, we no longer have to do it. As a result, the APIs are a lot
cleaner, with no gratuitous parameters.

My motivation for this was the flush method, but as a result, apply() also gets
cleaner.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-15 10:24:20 -04:00
Glauber Costa
ad46daa6aa column_family: add the commitlog as a parameter
When we create a column family, we can pass as an extra parameter, the
commitlog - or lack thereof. Because the commitlog is optional to begin with -
it won't exist if we don't call init_commitlog, we can have this to be empty
meaning no commit log.

The creation of a column family should be always done through
add_column_family. And if that is the case, we have the database's commitlog
right there and can get the pointer through the db. Only tests are not creating
the column family this way, and for them, it is fine.

We want to do that, because some column family operations will use the commit log.
Right now, they are forcing us to add parameters to APIs that would be much cleaner
without it. So while separation is good, this level of coupling is a net win as it
allows us to clean up some visible APIs.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-15 10:24:20 -04:00
Tomasz Grabiec
9bea6aa0a3 db: Introduce mutation query interface
Mutation query differs from data query in that returns information
needed to reconcile data slice with that retruned by other data
sources.

There is a generic mutation_query() algorithm introduced, which can
work with any mutation_source.

database::query_mutations() is a shard-local interface for mutation
queries.

The reconcilable_result is introduced as a medium for mutation query
results. It piggy backs on frozen_mutation as a medium for
reconcilable data.
2015-07-12 12:51:38 +02:00
Glauber Costa
5545e08bf7 database: introduce flush method
We will have to flush it from other places as well, so wrap the flushing code
into a method - specially because the current code has issues and it will be
easier to deal with it if it is in a single place.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-07 11:38:23 -04:00
Avi Kivity
b27f93af97 Merge "Adding implementation to the storage_service" from Amnon
"The storage_service API contains many function, the actuall implementation will
be added gradually."
2015-06-29 14:57:45 +03:00
Glauber Costa
b3a4eb83d3 database: keyspaces loading
This patch bootstraps the keyspaces found in system sstables and make
our in-memory structures reflect them. It tries to reuse as much code
as we can from db::legacy_system_tables, but keeping everything local
and without applying any mutations to the database - since the latter
would only unnecessarily complicate the write path.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-25 09:50:55 -04:00
Glauber Costa
c0ad0dc1d3 database: pass storage proxy to init_data_directory
In order for us to call some function from db::legacy_schema_tables, we need
a working storage proxy. We will use those functions in order to leverage the
work done in keyspace / table creation from mutations

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-25 09:50:55 -04:00
Amnon Heiman
0151ba4372 database: changing the signature of get_config
get_config returns a const reference to the configuration object inside
the database object because it returns a const referent it could be
const. This is helpful when the call is made from a const reference to the
database.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-06-25 11:48:58 +03:00
Nadav Har'El
3f5114e415 compaction: compact_all_sstables demo function
This is an example of how to use the low-level compact_sstable() function
to compact all the sstables of one column family into one. It is not a
full-fledged "compaction strategy" but the real ones can be based on this
example.

Among the things that this code doesn't do yet is to delete the old
sstables. In the future, this should happen automatically in the sstable
destructor when all the references to the sstable get deleted.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-06-25 11:05:35 +03:00
Glauber Costa
4d07c952cc remove global create_keyspace
There is only one user in tree, and as Tomek pointed out, it is buggy. The
reason is that ksm is a shard-local structure, and it is currently used
indiscriminately by all shards which can easily lead to problems.

We could fix it with some tricks, but it is way better and safer to make sure
the callers are doing it right instead. There is only one caller, so let's fix
that.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-24 12:44:05 -04:00
Glauber Costa
f0f04892d3 database: no longer replicate create_keyspace code
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-06-24 12:44:05 -04:00
Avi Kivity
3d22623a6b Merge "Flush schema changes to disk" from Glauber
"This is the current patchset to flush and persist schema changes to disk.
It is not perfect, in the sense that older changes still in flight won't be
waited for. But as we discussed - at this moment we'll just note that, and
leave the fix for later"
2015-06-24 17:08:33 +03:00