scylladb

Author	SHA1	Message	Date
Calle Wilund	d614143f5e	Commitlog/database: Fixup series "Commit log flush request on disk overflow" Also at seastar-dev: calle/commitlog_flush_v3 (And, yes, this time I _did_ update the remote!) Refs #262 Commit of original series was done on stale version (v2) due to authors inability to multitask and update git repos. v3: * Removed future<> return value from callbacks. I.e. flush callback is now only fully syncronous over actual call	2015-09-07 21:29:19 +03:00
Avi Kivity	dee9060b12	Merge "Commit log flush request on disk overflow" from Calle "Fixes #262 Handles CL disk size exceeding configured max size by calling flush handlers for each dirty CF id / high replay_position mark. (Instead of uncontrolled delete as previously). * Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no real change, but synced). * Divide the max disk size by cpus (so sum of all shards == max) * Abstract flush callbacks in CL * Handler in DB that initiates memtable->sstable writes when called. Note that the flush request is done "syncronously" in new_segment() (i.e. when getting a new segment and crossing threshold). This is however more or less congruent with Origin, which will do a request-sync in the corresponding case. Actual dealing with the request should at least in production code however be done async, and in DB it is, i.e. we initiate sstable writes. Hopefully they finish soon, and CL segments will be released (before next segment is allocated). If the flush request does _not_ eventually result in any CF:s becoming clean and segments released we could potentially be issuing flushes repeatedly, but never more often than on every new segment."	2015-09-07 18:46:48 +03:00
Calle Wilund	380649eb66	Database: Add commitlog flush handler to switch memtables to disk Initiates flushing of CF:s to sstable on CL disk overflow (flush req)	2015-09-07 13:21:46 +02:00
Tomasz Grabiec	802a9db9b0	Fix spelling of 'definitely_doesnt_exist'	2015-09-06 21:24:58 +02:00
Glauber Costa	0fc2995b54	database: initialize sst field The reader has a field for the sstable, but we are not initializing it, so it can be destroyed before we finish our job. It seems to work here, but transposing this code to the test case crashed it. So this means at some point we will crash here as well. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-09-02 06:01:38 +03:00
Paweł Dziepak	9ab44d6754	database: log row::max_vector_size and internal_count Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-08-31 17:29:16 +02:00
Avi Kivity	349015a269	Merge "Fix migration manager logging" from Pekka "Fix migration manager logging to output what origin does. Fixes #112."	2015-08-31 16:27:49 +03:00
Avi Kivity	f2a79aa7f6	Merge Prepare for closing sstables, part 1 Read-ahead will require that we close input_streams. As part of that we have to close sstables, and mutation_readers (which encapsulate input_streams). This is part 1 of a patchset series to do that. (The overarching goal is to enable read-ahead for sstables, see #244) Conflicts: sstables/compaction.cc	2015-08-31 16:15:18 +03:00
Avi Kivity	7090dffe91	mutation_reader: switch to a class based implementation Using a lambda for implementing a mutation_reader is nifty, but does not allow us to add methods. Switch to a class-based implementation in anticipation of adding a close() method.	2015-08-31 15:53:53 +03:00
Calle Wilund	987454d012	Database: Add "flush_all_memtables"	2015-08-31 14:29:50 +02:00
Calle Wilund	f14e3cf8d0	Database: do not create shard-specific dirs for commitlog New ID scheme allows for a single dir for all segments from all shards.	2015-08-31 14:29:46 +02:00
Pekka Enberg	03e0bcd8cb	database: Add operator<< for keyspace_metadata Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 13:35:19 +03:00
Pekka Enberg	04a65ec06f	database: Add keyspace_metadata::validate() helper Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 11:54:56 +03:00
Avi Kivity	012fd41fc0	db: hard dirty memory limit Unlike cache, dirty memory cannot be evicted at will, so we must limit it. This patch establishes a hard limit of 50% of all memory. Above that, new requests are not allowed to start. This allows the system some time to clean up memory. Note that we will need more fine-grained bandwidth control than this; the hard limit is the last line of defense against running our of reclaimable memory. Tested with a mixed read/write load; after reads start to dominate writes (due to the proliferation of small sstables, and the inability of compaction to keep up, dirty memory usage starts to climb until the hard stop prevents it from climbing further and ooming the server).	2015-08-28 14:47:17 +02:00
Avi Kivity	5f62f7a288	Revert "Merge "Commit log replay" from Calle" Due to test breakage. This reverts commit `43a4491043`, reversing changes made to `5dcf1ab71a`.	2015-08-27 12:39:08 +03:00
Avi Kivity	0fff367230	Merge "test for compaction metadata's ancestors" from Raphael	2015-08-27 11:07:53 +03:00
Avi Kivity	4e3c9c5493	Merge "compaction manager fixes" from Raphael	2015-08-27 11:05:26 +03:00
Avi Kivity	43a4491043	Merge "Commit log replay" from Calle "Initial implementation/transposition of commit log replay. * Changes replay position to be shard aware * Commit log segment ID:s now follow basically the same scheme as origin; max(previous ID, wall clock time in ms) + shard info (for us) * SStables now use the DB definition of replay_position. * Stores and propagates (compaction) flush replay positions in sstables * If CL segments are left over from a previous run, they, and existing sstables are inspected for high water mark, and then replayed from those marks to amend mutations potentially lost in a crash * Note that CPU count change is "handled" in so much that shard matching is per _previous_ runs shards, not current. Known limitations: * Mutations deserialized from old CL segments are _not_ fully validated against existing schemas. * System::truncated_at (not currently used) does not handle sharding afaik, so watermark ID:s coming from there are dubious. * Mutations that fail to apply (invalid, broken) are not placed in blob files like origin. Partly because I am lazy, but also partly because our serial format differs, and we currently have no tools to do anything useful with it * No replay filtering (Origin allows a system property to designate a filter file, detailing which keyspace/cf:s to replay). Partly because we have no system properties. There is no unit test for the commit log replayer (yet). Because I could not really come up with a good one given the test infrastructure that exists (tricky to kill stuff just "right"). The functionality is verified by manual testing, i.e. running scylla, building up data (cassandra-stress), kill -9 + restart. This of course does not really fully validate whether the resulting DB is 100% valid compared to the one at k-9, but at least it verified that replay took place, and mutations where applied. (Note that origin also lacks validity testing)"	2015-08-27 10:53:36 +03:00
Avi Kivity	e6965c520d	Merge "Adding the ownership suport to storage_service" from Amnon "This series adds the missing code from origin to support this functionality. While doing so, some method where changed to be const when it was more appropriate and a few const version of methods where added when the two variation was required."	2015-08-25 20:13:33 +03:00
Amnon Heiman	b5ceef451e	keyspace: Add the get_non_system_keyspaces and expose the replication This patch adds the get_non_system_keyspaces that found in origin and expose the replication strategy. With the get_replication_strategy method. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-25 19:39:13 +03:00
Vlad Zolotarov	08e7736f0b	database::find_column_family(): init the exception with the readable message Make the exceptions created inside database::find_column_family() return a readable message from their what() method. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-08-25 18:00:19 +03:00
Calle Wilund	df8d7a8295	Database: Add "flush_all_memtables"	2015-08-25 09:41:56 +02:00
Calle Wilund	5524da8f18	Database: do not create shard-specific dirs for commitlog New ID scheme allows for a single dir for all segments from all shards.	2015-08-25 09:40:52 +02:00
Avi Kivity	4390be3956	Rename 'negative_mutation_reader' to 'partition_presence_checker' Suggested by Tomek.	2015-08-24 18:03:22 +03:00
Raphael S. Carvalho	c65af6e188	api: add get_unleveled_sstables to column family api Adding to API function to return count of sstables in L0 if leveled compaction strategy is enabled, 0 otherwise. Currently, we don't support leveled compaction strategy, so function to return count of sstables in L0 always return zero. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-24 11:56:31 -03:00
Raphael S. Carvalho	4c9c144987	compaction_manager: avoid concurrent compaction on the same cf It was noticed that the same sstable files could be selected for compaction if concurrent compaction happens on the same cf. That's possible because compaction manager uses 2 tasks for handling compactions. Solution is to not duplicate cf in the compaction manager queue, and re-schedule compaction for a cf if needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-24 11:11:47 -03:00
Avi Kivity	8a4648761c	tests: make test cql environment use volatile system keyspace Prevents hangs due to the database not being able to persist a memtable. Tested-by: Asias He <asias@cloudius-systems.com>	2015-08-24 13:50:22 +03:00
Avi Kivity	6f11322220	db: move annoying log on non-durable cf to quieter place Fixes #174.	2015-08-23 23:12:07 +03:00
Avi Kivity	c01bc16f58	db: don't give up flushing a memtable on error We must try again, or the memtable's memory will never be reclaimed.	2015-08-19 19:36:41 +03:00
Avi Kivity	6846909533	db: extract sstable flushing code to a function	2015-08-19 19:36:41 +03:00
Avi Kivity	5bf5476beb	db: add collectd counter for dirty memory	2015-08-19 19:36:41 +03:00
Avi Kivity	c175025bb6	db: place all memtables into a single region_group We can use this to track the amount of unevictable memory in the system.	2015-08-19 19:36:41 +03:00
Avi Kivity	7b67b04822	db: wire up max memtable size configuration	2015-08-19 13:17:27 +03:00
Avi Kivity	176ab06f77	db: demote commitlog reorderign detected log message to debug It's less rare than we thought and also less interesting.	2015-08-19 09:26:23 +03:00
Raphael S. Carvalho	820ba6f4d2	adapt compaction manager for column family removal We need a way to remove a column family from the compaction manager because when dropping a column family we need to make sure that the compaction manager doesn't hold a reference to it anymore. So compaction manager queue is now of column_family, allowing us to cancel requests pertaining to a column family being dropped. There may be an ongoing compaction for the column family being dropped, so we also need to wait for its termination. Testcase for compaction manager was also adapted and improved. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-18 11:38:06 +03:00
Glauber Costa	89366dc2c2	sstables: do not accept files with missing TOC. We can catch most errors when we try to load an sstable. But if the TOC file is the one missing, we won't try to load the sstable at all. This case is still an invalid case, but it is way easier for us to treat it by waiting for all files to be loaded, and then checking if we saw a file during scan_dir, without its corresponding TOC. Fixes #114 Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-16 15:21:40 +03:00
Glauber Costa	0650579ace	sstables: refuse to boot on corrupted sstables We are now skipping them. That's dangerous. Fixes #115 Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-16 15:21:38 +03:00
Raphael S. Carvalho	9823164c89	db: introduce compaction manager Currently, each column family creates a fiber to handle compaction requests in parallel to the system. If there are N column families, N compactions could be running in parallel, which is definitely horrible. To solve that problem, a per-database compaction manager is introduced here. Compaction manager is a feature used to service compaction requests from N column families. Parallelism is made available by creating more than one fiber to service the requests. That being said, N compaction requests will be served by M fibers. A compaction request being submitted will go to a job queue shared between all fibers, and the fiber with the lowest amount of pending jobs will be signalled. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-11 17:25:46 +03:00
Avi Kivity	1016b21089	cache: improve preloading of flushed memtable mutations If a mutation definitely doesn't exist in all sstables, then we can certainly load it into the cache.	2015-08-09 22:46:08 +03:00
Glauber Costa	c2a0232048	database: generate UUIDs compatible with Cassandra 2.1.8 Without this, Cassandra won't even try to read our sstables. The containing directories will be ignored. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:56 -05:00
Glauber Costa	c8ca9b376d	database: change default sstable version Let's change the default generated tables to ka, which is the one that is present in Origin Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:55 -05:00
Glauber Costa	2d1b965f91	database: change filename parser to also accept ka A ka file has a slightly different name on disk. Change the parser so we can deal with both Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:55 -05:00
Glauber Costa	cd8c9ad288	sstables: add ks and cf name to sstable constructor When a schema is available, we use it. However, we have, by now, way too many tests. Some of them use tables for which we don't even know the schema. It would have been a massive amount of work to require a schema for all of them - so I am keeping both constructors around. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:55 -05:00
Glauber Costa	77e06c3ab1	sstables: remove name parameter It is currently only used to log a message, and for that we have an sstable method that will do just fine. Using the name itself just makes it being passed along throughout the captures. Remove it. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-08-07 08:31:53 -05:00
Raphael S. Carvalho	64fcd16c0c	db: adding data to column family statistics for API Adding required data for column family API to be implemented. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-06 17:38:59 +03:00
Avi Kivity	48a1ce28fc	Merge "Switch to log-structured allocator" from Tomasz	2015-08-06 15:45:39 +03:00
Tomasz Grabiec	926509525f	row_cache: Switch to using LSA	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	18ec9c3643	db: Move column_family::flush() to source file	2015-08-06 14:05:16 +02:00
Tomasz Grabiec	3b92ba2857	db: Add memtable flush logging	2015-08-06 14:05:16 +02:00
Pekka Enberg	dae1119796	database: Fix create keyspace ASan error ASan does not like commit `05c23c7f73` ("database: Add create_keyspace_on_all() helper"): ==8112==WARNING: AddressSanitizer failed to allocate 0x7f88b84fc690 bytes ==8112==AddressSanitizer's allocator is terminating the process instead of returning 0 ==8112==If you don't like this behavior set allocator_may_return_null=1 ==8112==Sanitizer CHECK failed: ../../../../libsanitizer/sanitizer_common/sanitizer_allocator.cc:147 ((0)) != (0) (0, 0) I was not able to determine the source of the bug. Make ASan happy by reverting the code movement and using the "cpu zero" trick we use for table creation. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-06 13:02:58 +03:00

1 2 3 4 5 ...

342 Commits