Commit Graph

261 Commits

Author SHA1 Message Date
Amnon Heiman
089bd6a5bd column family: Expose the compaction strategy
This expose the compaction strategy object.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-12 08:35:34 +03:00
Amnon Heiman
3af683e6f4 column family: add estimate read, write
This adds an estimated read and estimated write histogram to the column
family stats object.
2015-09-12 08:35:03 +03:00
Amnon Heiman
dd7638cfa9 Expose the dirty_memory_region_group in database and add occupancy to
column_family

This patch adds a getter for the dirty_memory_region_group in the
database object and add an occupency method to column family that
returns the total occupency in all the memtable in the column family.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-09-10 00:22:08 +03:00
Avi Kivity
b96018411b Merge "Fix flush in the middle of scanning bug" from Tomasz
Fixes #309.

Conflicts:
	sstables/sstables.cc
2015-09-09 11:56:04 +03:00
Tomasz Grabiec
320ff132f8 sstables: Relax header dependencies 2015-09-09 10:07:43 +02:00
Gleb Natapov
df468504b6 schema_table: convert code to use distributed<storage_proxy> instead of storage_proxy&
All database code was converted to is when storage_proxy was made
distributed, but then new code was written to use storage_proxy& again.
Passing distributed<> object is safer since it can be passed between
shards safely. There was a patch to fix one such case yesterday, I found
one more while converting.
2015-09-09 10:19:30 +03:00
Tomasz Grabiec
c623fbe1f7 database: Keep sstable as lw_shared_ptr<> from the beginning
Allows us to save on indentation, and we need it as shared anyway later.
2015-09-08 10:19:19 +02:00
Calle Wilund
380649eb66 Database: Add commitlog flush handler to switch memtables to disk
Initiates flushing of CF:s to sstable on CL disk overflow (flush req)
2015-09-07 13:21:46 +02:00
Avi Kivity
349015a269 Merge "Fix migration manager logging" from Pekka
"Fix migration manager logging to output what origin does. Fixes #112."
2015-08-31 16:27:49 +03:00
Calle Wilund
987454d012 Database: Add "flush_all_memtables" 2015-08-31 14:29:50 +02:00
Pekka Enberg
03e0bcd8cb database: Add operator<< for keyspace_metadata
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-31 13:35:19 +03:00
Pekka Enberg
04a65ec06f database: Add keyspace_metadata::validate() helper
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-31 11:54:56 +03:00
Avi Kivity
012fd41fc0 db: hard dirty memory limit
Unlike cache, dirty memory cannot be evicted at will, so we must limit it.

This patch establishes a hard limit of 50% of all memory.  Above that,
new requests are not allowed to start.  This allows the system some time
to clean up memory.

Note that we will need more fine-grained bandwidth control than this;
the hard limit is the last line of defense against running our of reclaimable
memory.

Tested with a mixed read/write load; after reads start to dominate writes
(due to the proliferation of small sstables, and the inability of compaction
to keep up, dirty memory usage starts to climb until the hard stop prevents
it from climbing further and ooming the server).
2015-08-28 14:47:17 +02:00
Avi Kivity
5f62f7a288 Revert "Merge "Commit log replay" from Calle"
Due to test breakage.

This reverts commit 43a4491043, reversing
changes made to 5dcf1ab71a.
2015-08-27 12:39:08 +03:00
Avi Kivity
0fff367230 Merge "test for compaction metadata's ancestors" from Raphael 2015-08-27 11:07:53 +03:00
Avi Kivity
4e3c9c5493 Merge "compaction manager fixes" from Raphael 2015-08-27 11:05:26 +03:00
Avi Kivity
43a4491043 Merge "Commit log replay" from Calle
"Initial implementation/transposition of commit log replay.

* Changes replay position to be shard aware
* Commit log segment ID:s now follow basically the same scheme as origin;
  max(previous ID, wall clock time in ms) + shard info (for us)
* SStables now use the DB definition of replay_position.
* Stores and propagates (compaction) flush replay positions in sstables
* If CL segments are left over from a previous run, they, and existing
  sstables are inspected for high water mark, and then replayed from
  those marks to amend mutations potentially lost in a crash
* Note that CPU count change is "handled" in so much that shard matching is
  per _previous_ runs shards, not current.

Known limitations:
* Mutations deserialized from old CL segments are _not_ fully validated
  against existing schemas.
* System::truncated_at (not currently used) does not handle sharding afaik,
  so watermark ID:s coming from there are dubious.
* Mutations that fail to apply (invalid, broken) are not placed in blob files
  like origin. Partly because I am lazy, but also partly because our serial
  format differs, and we currently have no tools to do anything useful with it
* No replay filtering (Origin allows a system property to designate a filter
  file, detailing which keyspace/cf:s to replay). Partly because we have no
  system properties.

There is no unit test for the commit log replayer (yet).
Because I could not really come up with a good one given the test
infrastructure that exists (tricky to kill stuff just "right").
The functionality is verified by manual testing, i.e. running scylla,
building up data (cassandra-stress), kill -9 + restart.
This of course does not really fully validate whether the resulting DB is
100% valid compared to the one at k-9, but at least it verified that replay
took place, and mutations where applied.
(Note that origin also lacks validity testing)"
2015-08-27 10:53:36 +03:00
Amnon Heiman
b5ceef451e keyspace: Add the get_non_system_keyspaces and expose the replication
This patch adds the get_non_system_keyspaces that found in origin and
expose the replication strategy. With the get_replication_strategy
method.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-25 19:39:13 +03:00
Calle Wilund
df8d7a8295 Database: Add "flush_all_memtables" 2015-08-25 09:41:56 +02:00
Avi Kivity
4390be3956 Rename 'negative_mutation_reader' to 'partition_presence_checker'
Suggested by Tomek.
2015-08-24 18:03:22 +03:00
Raphael S. Carvalho
c65af6e188 api: add get_unleveled_sstables to column family api
Adding to API function to return count of sstables in L0 if leveled
compaction strategy is enabled, 0 otherwise. Currently, we don't
support leveled compaction strategy, so function to return count of
sstables in L0 always return zero.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-24 11:56:31 -03:00
Raphael S. Carvalho
4c9c144987 compaction_manager: avoid concurrent compaction on the same cf
It was noticed that the same sstable files could be selected for
compaction if concurrent compaction happens on the same cf.
That's possible because compaction manager uses 2 tasks for
handling compactions.

Solution is to not duplicate cf in the compaction manager queue,
and re-schedule compaction for a cf if needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-24 11:11:47 -03:00
Avi Kivity
8a4648761c tests: make test cql environment use volatile system keyspace
Prevents hangs due to the database not being able to persist a memtable.

Tested-by: Asias He <asias@cloudius-systems.com>
2015-08-24 13:50:22 +03:00
Avi Kivity
83d5c7e7c8 Merge 2015-08-24 10:58:39 +03:00
Avi Kivity
855ef838a9 db: fix use-after-free with region_group
_dirty_memory_region_group is used by the column_family's memtables, but
is destroyed before them.

Fix by changing the destruction order.

Fixes #175.
2015-08-24 10:51:03 +03:00
Avi Kivity
0afbdf4aa7 Merge "Add row related methods to the cache_service API" from Amnon
"This series expose statistics from the row_cache in the cache_service API.
After this series the following methods will be available:
get_row_hits
get_row_requests
get_row_hit_rate
get_row_size
get_row_entries"
2015-08-23 15:46:07 +03:00
Avi Kivity
c01bc16f58 db: don't give up flushing a memtable on error
We must try again, or the memtable's memory will never be reclaimed.
2015-08-19 19:36:41 +03:00
Avi Kivity
6846909533 db: extract sstable flushing code to a function 2015-08-19 19:36:41 +03:00
Avi Kivity
5bf5476beb db: add collectd counter for dirty memory 2015-08-19 19:36:41 +03:00
Avi Kivity
c175025bb6 db: place all memtables into a single region_group
We can use this to track the amount of unevictable memory in the
system.
2015-08-19 19:36:41 +03:00
Avi Kivity
7b67b04822 db: wire up max memtable size configuration 2015-08-19 13:17:27 +03:00
Avi Kivity
c317391f62 db: trigger memtable flush based on actual memory usage
Rather than using _mutation_count as a poor proxy.
2015-08-19 12:59:52 +03:00
Raphael S. Carvalho
820ba6f4d2 adapt compaction manager for column family removal
We need a way to remove a column family from the compaction manager
because when dropping a column family we need to make sure that the
compaction manager doesn't hold a reference to it anymore.

So compaction manager queue is now of column_family, allowing us
to cancel requests pertaining to a column family being dropped.
There may be an ongoing compaction for the column family being
dropped, so we also need to wait for its termination.

Testcase for compaction manager was also adapted and improved.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-18 11:38:06 +03:00
Amnon Heiman
361b2377bb Expose the row_cache in the column_family
This expose the row_cache in the column family, it will be used by the
API to get the row_cache statistic information.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-17 19:42:23 +03:00
Avi Kivity
608c0b8460 Merge "initial work on compaction manager API" from Rapahel 2015-08-17 17:24:13 +03:00
Avi Kivity
eb09eddee5 Merge "Adding sampled histogram" from Amnon
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.

This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.

The first to use the modified histogram are the column family for their read
and write."

Conflicts:
	database.hh
2015-08-16 17:15:24 +03:00
Glauber Costa
89366dc2c2 sstables: do not accept files with missing TOC.
We can catch most errors when we try to load an sstable. But if the TOC file is
the one missing, we won't try to load the sstable at all. This case is still an
invalid case, but it is way easier for us to treat it by waiting for all files
to be loaded, and then checking if we saw a file during scan_dir, without its
corresponding TOC.

Fixes #114

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-16 15:21:40 +03:00
Raphael S. Carvalho
077ac1cce1 db: add method to retrieve compaction_manager
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-12 15:10:25 -03:00
Raphael S. Carvalho
9823164c89 db: introduce compaction manager
Currently, each column family creates a fiber to handle compaction requests
in parallel to the system. If there are N column families, N compactions
could be running in parallel, which is definitely horrible.

To solve that problem, a per-database compaction manager is introduced here.

Compaction manager is a feature used to service compaction requests from N
column families. Parallelism is made available by creating more than one
fiber to service the requests. That being said, N compaction requests will
be served by M fibers.

A compaction request being submitted will go to a job queue shared between
all fibers, and the fiber with the lowest amount of pending jobs will be
signalled.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-11 17:25:46 +03:00
Amnon Heiman
4329377556 column_family to use histogram for read and write latency
With the use of sparse histogram, the read and write counters in the
column_family stats can be used.

The total impact on performanc should be neglectible.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-11 10:21:22 +03:00
Avi Kivity
1016b21089 cache: improve preloading of flushed memtable mutations
If a mutation definitely doesn't exist in all sstables, then we can
certainly load it into the cache.
2015-08-09 22:46:08 +03:00
Raphael S. Carvalho
64fcd16c0c db: adding data to column family statistics for API
Adding required data for column family API to be implemented.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 17:38:59 +03:00
Avi Kivity
48a1ce28fc Merge "Switch to log-structured allocator" from Tomasz 2015-08-06 15:45:39 +03:00
Tomasz Grabiec
18ec9c3643 db: Move column_family::flush() to source file 2015-08-06 14:05:16 +02:00
Pekka Enberg
dae1119796 database: Fix create keyspace ASan error
ASan does not like commit 05c23c7f73
("database: Add create_keyspace_on_all() helper"):

  ==8112==WARNING: AddressSanitizer failed to allocate 0x7f88b84fc690 bytes
  ==8112==AddressSanitizer's allocator is terminating the process instead of returning 0
  ==8112==If you don't like this behavior set allocator_may_return_null=1
  ==8112==Sanitizer CHECK failed: ../../../../libsanitizer/sanitizer_common/sanitizer_allocator.cc:147 ((0)) != (0) (0, 0)

I was not able to determine the source of the bug. Make ASan happy by
reverting the code movement and using the "cpu zero" trick we use for
table creation.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-06 13:02:58 +03:00
Pekka Enberg
05c23c7f73 database: Add create_keyspace_on_all() helper
Add a create_keyspace_on_all() helper which is needed for sending just
one event notification per created keyspace, not one per shard.

Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-05 11:50:52 +03:00
Avi Kivity
a9cc6d7be1 Merge "Parallel CQL table creation improvements" from Pekka
"This series improves parallel CQL table creation to validate CF UUID.
This should bring us closer to Origin behavior when there's multiple
cassandra-stress processes started at the same time."
2015-08-03 14:37:29 +03:00
Pekka Enberg
0b762338c1 database: Futurize update_column_family()
Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>
2015-08-03 13:41:16 +03:00
Amnon Heiman
0d7fe9bd89 Adding stats to column_family
This adds the stats object to column_family.

It set the write counter in the write path and support the pending_flush
counter.

The stats object contains information for switch_count, number of
pending flushes, and counters for read, write, and range.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-08-03 11:09:37 +03:00
Shlomi Livne
199f4d2545 Add enable-in-memory-data-store,enable-commitlog,enable-cache config
Abillity to enable/disable specific sub-modules - this settings do not
affect system tables which are allways persisted,cached and written to
commitlog

enable-in-memory-data-store marks if tables will be written/read to/from
disk
enable-commitllog marks if tables will be written to commitlog
enable-cache marks if tables will be written/read to/from cache

Please note in-memory-data-store does not change the read path so "old"
sstables are still read and cache may be used to cache their data

Signed-off-by: Shlomi Livne <shlomi@cloudius-systems.com>
2015-08-02 17:19:30 +03:00