scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	461ecc55e3	sstable: fix race condition when deleting a partial sstable Race condition happens when two or more shards will try to delete the same partial sstable. So the problem doesn't affect scylla when it boots with a single shard. To fix this problem, shard 0 will be made the responsible for deleting a partial sstable. fixes #359. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-09-16 19:58:44 +03:00
Avi Kivity	7e1d03d098	db: delete ignored sstables If an sstable is irrelevant for a shard, delete it. The deletion will only complete when all shards agree (either ignore the sstable or delete it after compaction).	2015-09-14 10:14:00 +02:00
Avi Kivity	cab2148141	Merge "partial sstable handling" from Raphael closes issue #75.	2015-09-13 12:03:50 +03:00
Raphael S. Carvalho	e65c91f324	db: avoid possible underflow on stats pending_compactions In event of a compaction failure, run_compaction would be called more than one time for a request, which could result in an underflow in the stats pending_compactions. Let's fix that by only decreasing it if compaction succeeded. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-09-13 11:59:34 +03:00
Raphael S. Carvalho	538611ab93	sstable: delete sstable generation with temporary toc file When populating a column family, we will now delete all components of a sstable with a temporary toc file. A sstable with a temporary TOC file means that it was partially written, and can be safely deleted because the respective data is either saved in the commit log, or in the compacted sstables in case of the partial sstable being result of a compaction. Deletion procedure is guarded against power failure by only deleting the temporary TOC file after all other components were deleted. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-09-13 03:17:58 -03:00
Raphael S. Carvalho	7677202700	db: handle temporary TOC file when populating cf When populating a cf, we should also check for a sstable with temporary TOC file, and act accordingly. By the time being, we will only refuse to boot. Subsequent work is to gather all files of a sstable with a temporary TOC file and delete them. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-09-13 03:03:30 -03:00
Amnon Heiman	dd7638cfa9	Expose the dirty_memory_region_group in database and add occupancy to column_family This patch adds a getter for the dirty_memory_region_group in the database object and add an occupency method to column family that returns the total occupency in all the memtable in the column family. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-09-10 00:22:08 +03:00
Tomasz Grabiec	882f231ef2	database: Move sstable_range_wrapping_reader to sstable_mutation_readers.hh Fixes compilation problem in memtable.cc. Should be part of the series committed in `b96018411b`	2015-09-09 12:03:06 +02:00
Avi Kivity	b96018411b	Merge "Fix flush in the middle of scanning bug" from Tomasz Fixes #309. Conflicts: sstables/sstables.cc	2015-09-09 11:56:04 +03:00
Tomasz Grabiec	a0c180ef49	memtable: Fix flush in the middle of scanning bug Fixes #309. When scanning memtable readers detect is was flushed, which means that it started to be moved to cache, they fall back to reading from memtable's sstable. Eventually what we should do is to combine memtable and cache contents so that as long as data is not evicted we won't do IO. We do not support scanning in cache yet though, so there is no point in doing this now, and it is not trivial.	2015-09-09 10:17:35 +02:00
Avi Kivity	5bbe526738	Merge sstable deletion Deleting sstables is tricky, since they can be shared across shards. This patchset introduces an sstable deletion agreement table, that records the agreement of shards to delete an sstable. Sstables are only deleted after all shards have agreed. With this, we can change core count across boots. Fixes #53.	2015-09-09 11:01:13 +03:00
Gleb Natapov	df468504b6	schema_table: convert code to use distributed<storage_proxy> instead of storage_proxy& All database code was converted to is when storage_proxy was made distributed, but then new code was written to use storage_proxy& again. Passing distributed<> object is safer since it can be passed between shards safely. There was a patch to fix one such case yesterday, I found one more while converting.	2015-09-09 10:19:30 +03:00
Avi Kivity	b76d7db432	db: mark newly created sstables as unshared Other shards know nothing about them, so they won't mark them for deletion when the time comes.	2015-09-08 16:45:28 +03:00
Calle Wilund	1004e090f8	Database: Use commitlog::shutdown to help making shutdown more coherent Should more or less mean that data in sstables + stuff in CL is the actual DB state.	2015-09-08 11:55:21 +02:00
Tomasz Grabiec	d52853c4fe	database: Restore indentation	2015-09-08 10:19:19 +02:00
Tomasz Grabiec	c623fbe1f7	database: Keep sstable as lw_shared_ptr<> from the beginning Allows us to save on indentation, and we need it as shared anyway later.	2015-09-08 10:19:19 +02:00
Tomasz Grabiec	820a50a36e	db: Move FIXME to a more appropriate place From column_family's point of view, calling write_components() is all it needs. The FIXME belongs more to an implementation of write_components().	2015-09-08 10:19:19 +02:00
Tomasz Grabiec	ecf4841953	Fix typo in 'attempt'	2015-09-08 10:19:19 +02:00
Avi Kivity	a95d3f9cf5	Merge "Commitlog shutdown" from Calle "Refs #293 * Add a commitlog::sync_all_segments, that explicitly forces all pending disk writes * Only delete segments from disk IFF they are marked clean. Thus on partial shutdown or whatnot, even if CL is destroyed (destructor runs) disk files not yet clean visavi sstables are preserved and replayable * Do a sync_all_segments first of all in database::stop. Exactly what to not stop in main I leave up to others discretion, or at least another patch."	2015-09-08 11:11:18 +03:00
Tomasz Grabiec	15ae1a92cb	Merge branch 'pdziepak/compaction-remove-items/v4' from seastar-dev.git From Pawel: This series makes compaction remove items that are no longer items: - expired cells are changed into tombstones - items covered by higher level tombstones are removed - expired tombstones are removed if possible Fixes #70. Fixes #71.	2015-09-08 09:23:00 +02:00
Paweł Dziepak	969fe6b878	sstables: make compact_sstables() take ref to column_family Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 21:20:32 +02:00
Calle Wilund	6b81845041	Database: Do a commitlog::sync_all first on stop. Refs #293 IFF one desires to _not_ shutdown stuff cleanly, still running this first in database::stop will at least ensure that mutations already in CL transit will end up on disk and be replayable	2015-09-07 20:32:04 +02:00
Calle Wilund	d614143f5e	Commitlog/database: Fixup series "Commit log flush request on disk overflow" Also at seastar-dev: calle/commitlog_flush_v3 (And, yes, this time I _did_ update the remote!) Refs #262 Commit of original series was done on stale version (v2) due to authors inability to multitask and update git repos. v3: * Removed future<> return value from callbacks. I.e. flush callback is now only fully syncronous over actual call	2015-09-07 21:29:19 +03:00
Avi Kivity	dee9060b12	Merge "Commit log flush request on disk overflow" from Calle "Fixes #262 Handles CL disk size exceeding configured max size by calling flush handlers for each dirty CF id / high replay_position mark. (Instead of uncontrolled delete as previously). * Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no real change, but synced). * Divide the max disk size by cpus (so sum of all shards == max) * Abstract flush callbacks in CL * Handler in DB that initiates memtable->sstable writes when called. Note that the flush request is done "syncronously" in new_segment() (i.e. when getting a new segment and crossing threshold). This is however more or less congruent with Origin, which will do a request-sync in the corresponding case. Actual dealing with the request should at least in production code however be done async, and in DB it is, i.e. we initiate sstable writes. Hopefully they finish soon, and CL segments will be released (before next segment is allocated). If the flush request does _not_ eventually result in any CF:s becoming clean and segments released we could potentially be issuing flushes repeatedly, but never more often than on every new segment."	2015-09-07 18:46:48 +03:00
Calle Wilund	380649eb66	Database: Add commitlog flush handler to switch memtables to disk Initiates flushing of CF:s to sstable on CL disk overflow (flush req)	2015-09-07 13:21:46 +02:00
Tomasz Grabiec	802a9db9b0	Fix spelling of 'definitely_doesnt_exist'	2015-09-06 21:24:58 +02:00
Glauber Costa	0fc2995b54	database: initialize sst field The reader has a field for the sstable, but we are not initializing it, so it can be destroyed before we finish our job. It seems to work here, but transposing this code to the test case crashed it. So this means at some point we will crash here as well. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-09-02 06:01:38 +03:00
Paweł Dziepak	9ab44d6754	database: log row::max_vector_size and internal_count Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-08-31 17:29:16 +02:00
Avi Kivity	349015a269	Merge "Fix migration manager logging" from Pekka "Fix migration manager logging to output what origin does. Fixes #112."	2015-08-31 16:27:49 +03:00
Avi Kivity	f2a79aa7f6	Merge Prepare for closing sstables, part 1 Read-ahead will require that we close input_streams. As part of that we have to close sstables, and mutation_readers (which encapsulate input_streams). This is part 1 of a patchset series to do that. (The overarching goal is to enable read-ahead for sstables, see #244) Conflicts: sstables/compaction.cc	2015-08-31 16:15:18 +03:00
Avi Kivity	7090dffe91	mutation_reader: switch to a class based implementation Using a lambda for implementing a mutation_reader is nifty, but does not allow us to add methods. Switch to a class-based implementation in anticipation of adding a close() method.	2015-08-31 15:53:53 +03:00
Calle Wilund	987454d012	Database: Add "flush_all_memtables"	2015-08-31 14:29:50 +02:00
Calle Wilund	f14e3cf8d0	Database: do not create shard-specific dirs for commitlog New ID scheme allows for a single dir for all segments from all shards.	2015-08-31 14:29:46 +02:00
Pekka Enberg	03e0bcd8cb	database: Add operator<< for keyspace_metadata Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 13:35:19 +03:00
Pekka Enberg	04a65ec06f	database: Add keyspace_metadata::validate() helper Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-08-31 11:54:56 +03:00
Avi Kivity	012fd41fc0	db: hard dirty memory limit Unlike cache, dirty memory cannot be evicted at will, so we must limit it. This patch establishes a hard limit of 50% of all memory. Above that, new requests are not allowed to start. This allows the system some time to clean up memory. Note that we will need more fine-grained bandwidth control than this; the hard limit is the last line of defense against running our of reclaimable memory. Tested with a mixed read/write load; after reads start to dominate writes (due to the proliferation of small sstables, and the inability of compaction to keep up, dirty memory usage starts to climb until the hard stop prevents it from climbing further and ooming the server).	2015-08-28 14:47:17 +02:00
Avi Kivity	5f62f7a288	Revert "Merge "Commit log replay" from Calle" Due to test breakage. This reverts commit `43a4491043`, reversing changes made to `5dcf1ab71a`.	2015-08-27 12:39:08 +03:00
Avi Kivity	0fff367230	Merge "test for compaction metadata's ancestors" from Raphael	2015-08-27 11:07:53 +03:00
Avi Kivity	4e3c9c5493	Merge "compaction manager fixes" from Raphael	2015-08-27 11:05:26 +03:00
Avi Kivity	43a4491043	Merge "Commit log replay" from Calle "Initial implementation/transposition of commit log replay. * Changes replay position to be shard aware * Commit log segment ID:s now follow basically the same scheme as origin; max(previous ID, wall clock time in ms) + shard info (for us) * SStables now use the DB definition of replay_position. * Stores and propagates (compaction) flush replay positions in sstables * If CL segments are left over from a previous run, they, and existing sstables are inspected for high water mark, and then replayed from those marks to amend mutations potentially lost in a crash * Note that CPU count change is "handled" in so much that shard matching is per _previous_ runs shards, not current. Known limitations: * Mutations deserialized from old CL segments are _not_ fully validated against existing schemas. * System::truncated_at (not currently used) does not handle sharding afaik, so watermark ID:s coming from there are dubious. * Mutations that fail to apply (invalid, broken) are not placed in blob files like origin. Partly because I am lazy, but also partly because our serial format differs, and we currently have no tools to do anything useful with it * No replay filtering (Origin allows a system property to designate a filter file, detailing which keyspace/cf:s to replay). Partly because we have no system properties. There is no unit test for the commit log replayer (yet). Because I could not really come up with a good one given the test infrastructure that exists (tricky to kill stuff just "right"). The functionality is verified by manual testing, i.e. running scylla, building up data (cassandra-stress), kill -9 + restart. This of course does not really fully validate whether the resulting DB is 100% valid compared to the one at k-9, but at least it verified that replay took place, and mutations where applied. (Note that origin also lacks validity testing)"	2015-08-27 10:53:36 +03:00
Avi Kivity	e6965c520d	Merge "Adding the ownership suport to storage_service" from Amnon "This series adds the missing code from origin to support this functionality. While doing so, some method where changed to be const when it was more appropriate and a few const version of methods where added when the two variation was required."	2015-08-25 20:13:33 +03:00
Amnon Heiman	b5ceef451e	keyspace: Add the get_non_system_keyspaces and expose the replication This patch adds the get_non_system_keyspaces that found in origin and expose the replication strategy. With the get_replication_strategy method. Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-08-25 19:39:13 +03:00
Vlad Zolotarov	08e7736f0b	database::find_column_family(): init the exception with the readable message Make the exceptions created inside database::find_column_family() return a readable message from their what() method. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2015-08-25 18:00:19 +03:00
Calle Wilund	df8d7a8295	Database: Add "flush_all_memtables"	2015-08-25 09:41:56 +02:00
Calle Wilund	5524da8f18	Database: do not create shard-specific dirs for commitlog New ID scheme allows for a single dir for all segments from all shards.	2015-08-25 09:40:52 +02:00
Avi Kivity	4390be3956	Rename 'negative_mutation_reader' to 'partition_presence_checker' Suggested by Tomek.	2015-08-24 18:03:22 +03:00
Raphael S. Carvalho	c65af6e188	api: add get_unleveled_sstables to column family api Adding to API function to return count of sstables in L0 if leveled compaction strategy is enabled, 0 otherwise. Currently, we don't support leveled compaction strategy, so function to return count of sstables in L0 always return zero. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-24 11:56:31 -03:00
Raphael S. Carvalho	4c9c144987	compaction_manager: avoid concurrent compaction on the same cf It was noticed that the same sstable files could be selected for compaction if concurrent compaction happens on the same cf. That's possible because compaction manager uses 2 tasks for handling compactions. Solution is to not duplicate cf in the compaction manager queue, and re-schedule compaction for a cf if needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-08-24 11:11:47 -03:00
Avi Kivity	8a4648761c	tests: make test cql environment use volatile system keyspace Prevents hangs due to the database not being able to persist a memtable. Tested-by: Asias He <asias@cloudius-systems.com>	2015-08-24 13:50:22 +03:00
Avi Kivity	6f11322220	db: move annoying log on non-durable cf to quieter place Fixes #174.	2015-08-23 23:12:07 +03:00

1 2 3 4 5 ...

364 Commits