scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 09:30:45 +00:00

Author	SHA1	Message	Date
Avi Kivity	dcdc925b86	Revert "Commitlog: Pre-allocate "reserve" segments" This reverts commit `cbf3b63853`, due to reports of increased latency (instead of the opposite).	2015-09-19 09:26:39 +03:00
Calle Wilund	cbf3b63853	Commitlog: Pre-allocate "reserve" segments Refs #356 Pre-allocates N segments from timer task. N is "adaptive" in that it is increased (to a max) every time segement acquisition is forced to allocate a new instead of picking from pre-alloc (reserve) list. The idea is that it is easier to adapt how many segments we consume per timer quanta than the timer quanta itself. Also does disk pressure check and flush from timer task now. Note that the check is still only done max once every new segment. Some logging cleanup/betterment also to make behaviour easier to trace. Reserve segments start out at zero length, and are still deleted when finished. This is because otherwise we'd still have to clear the file to be able to properly parse it later (given that is can be a "half" file due to power fail etc). This might need revisiting as well. With this patch, there should be no case (except flush starvation) where "add_mutation" actually waits for a (potentially) blocking op (disk). Note that since the amount of reserve is increased as needed, there will be occasional cases where a new segment is created in the alloc path until the system finds equilebrium. But this should only be during a breif warmup.	2015-09-17 19:54:28 +03:00
Calle Wilund	b512192b3b	Commitlog: Fix some timing/latency issues with sync Refs #356 * Move sync time setting to sync initiate to help prevent double syncs * Change add_mutation to only do explicit sync with wait if time elapsed since last is 2x sync window * Do not wait for sync when moving to new segment in alloc path * Initiate _sync_time properly. * Add some tracing log messages to help debug	2015-09-16 20:07:25 +03:00
Calle Wilund	d42ff89e83	Config: Promote logging of unhandled options to warning Fixes #222	2015-09-16 15:43:53 +03:00
Calle Wilund	bf727b2272	config.cc : add logging of unset attributes Helps checking for missing stuff in scylla.yaml	2015-09-16 15:43:35 +03:00
Calle Wilund	8172717ba0	config.hh : update some default values to match scylla.conf	2015-09-16 15:43:35 +03:00
Calle Wilund	04562b23b4	commitlog_replayer: More correct fix for reordering issue in replay * Removes previous, accidental fix that got committed. * Instead just do not give RP:s to replay mutations. This is same as in Origin, and just as/more correct, since we intend to flush the data to sstables asap anyway	2015-09-16 15:41:17 +03:00
Avi Kivity	cab2148141	Merge "partial sstable handling" from Raphael closes issue #75.	2015-09-13 12:03:50 +03:00
Gleb Natapov	17e54d0604	add logger for consistency level calculation	2015-09-13 11:59:17 +03:00
Raphael S. Carvalho	c729ea36e1	commitlog: guard commit log replay against reordering After killing scylla in the middle of a write, the next scylla instance failed to finish commit log replay, showing the following error message: scylla: core/future.hh:448: void promise<T>::set_value(A&& ...) [with A = {}; T = {}]: Assertion `_state' failed. After a long debug session, I figured out that check_valid_rp() was triggering the exception replay_position_reordered_exception, which means replay position reordering. Looking at `8b9a63a3c6`, I noticed that database::apply is guarded against reodering, but commitlog replay code is not. Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>	2015-09-12 06:17:14 -03:00
Gleb Natapov	04d2bef55b	give preference to local data during query Until dynamic snitch is implemented this is better than nothing. Fixes #322	2015-09-10 15:45:20 +03:00
Pekka Enberg	1f7fa18970	db/schema_tables: Fix create keyspace notification We need to send out the notification for all created keyspaces, not just for the first one. Spotted during code inspection. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-09-10 10:35:59 +03:00
Avi Kivity	6ccb0b15b5	Merge "Move the API configuration from command line to configuration" from Amnon "It moves the API configuration from the command line argument to the general config, it also move the api-doc directory to be configurable instead of hard coded."	2015-09-09 12:35:03 +03:00
Gleb Natapov	df468504b6	schema_table: convert code to use distributed<storage_proxy> instead of storage_proxy& All database code was converted to is when storage_proxy was made distributed, but then new code was written to use storage_proxy& again. Passing distributed<> object is safer since it can be passed between shards safely. There was a patch to fix one such case yesterday, I found one more while converting.	2015-09-09 10:19:30 +03:00
Calle Wilund	d46a95242a	Config: Fix type where alias destination was copied instead of referenced Fixes #310 Missing '&'. (And no, cannot make the type non-copyable, since we want to copy config objects).	2015-09-08 16:54:04 +03:00
Calle Wilund	456246dfd5	Commitlog: Add a gate + shutdown method * Gate ensures we don't add data into a segment after close * Shutdown closes all segments for business and prohibits new segments	2015-09-08 11:53:41 +02:00
Calle Wilund	d666c747e3	Commitlog: Just add some more verbosity	2015-09-08 11:16:38 +02:00
Avi Kivity	a95d3f9cf5	Merge "Commitlog shutdown" from Calle "Refs #293 * Add a commitlog::sync_all_segments, that explicitly forces all pending disk writes * Only delete segments from disk IFF they are marked clean. Thus on partial shutdown or whatnot, even if CL is destroyed (destructor runs) disk files not yet clean visavi sstables are preserved and replayable * Do a sync_all_segments first of all in database::stop. Exactly what to not stop in main I leave up to others discretion, or at least another patch."	2015-09-08 11:11:18 +03:00
Tomasz Grabiec	15ae1a92cb	Merge branch 'pdziepak/compaction-remove-items/v4' from seastar-dev.git From Pawel: This series makes compaction remove items that are no longer items: - expired cells are changed into tombstones - items covered by higher level tombstones are removed - expired tombstones are removed if possible Fixes #70. Fixes #71.	2015-09-08 09:23:00 +02:00
Amnon Heiman	8be2ee54aa	configuration: Add the API configuration to the general configuration This adds the API configuration parameters to the configurtion, so it will be possible to take them from the configuration file or from the command line. The following configuration were defined: api_port api_address api_ui_dir api_doc_dir Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>	2015-09-08 02:56:47 +03:00
Paweł Dziepak	64949e8339	schema: make gc_grace_seconds() return gc_clock::duration Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-07 21:14:41 +02:00
Calle Wilund	256c0550bf	Commitlog: Only delete segments on disk if they are marked clean For #293 - i.e. allow more or less coherent shutdown/destruction of the commitlog while retaining disk data. (tests still clear stuff explicitly).	2015-09-07 20:32:01 +02:00
Calle Wilund	4ed95b7020	Commitlog: Add sync_all_segments() For #293 - allows explicit flush to disk (not close!) of all active segments	2015-09-07 20:31:59 +02:00
Calle Wilund	d614143f5e	Commitlog/database: Fixup series "Commit log flush request on disk overflow" Also at seastar-dev: calle/commitlog_flush_v3 (And, yes, this time I _did_ update the remote!) Refs #262 Commit of original series was done on stale version (v2) due to authors inability to multitask and update git repos. v3: * Removed future<> return value from callbacks. I.e. flush callback is now only fully syncronous over actual call	2015-09-07 21:29:19 +03:00
Avi Kivity	dee9060b12	Merge "Commit log flush request on disk overflow" from Calle "Fixes #262 Handles CL disk size exceeding configured max size by calling flush handlers for each dirty CF id / high replay_position mark. (Instead of uncontrolled delete as previously). * Increased default max disk size to 8GB. Same as Origin/scylla.yaml (so no real change, but synced). * Divide the max disk size by cpus (so sum of all shards == max) * Abstract flush callbacks in CL * Handler in DB that initiates memtable->sstable writes when called. Note that the flush request is done "syncronously" in new_segment() (i.e. when getting a new segment and crossing threshold). This is however more or less congruent with Origin, which will do a request-sync in the corresponding case. Actual dealing with the request should at least in production code however be done async, and in DB it is, i.e. we initiate sstable writes. Hopefully they finish soon, and CL segments will be released (before next segment is allocated). If the flush request does _not_ eventually result in any CF:s becoming clean and segments released we could potentially be issuing flushes repeatedly, but never more often than on every new segment."	2015-09-07 18:46:48 +03:00
Gleb Natapov	da242146b6	do not pass storage_proxy reference across cpus storage_proxy instances are per cpu, so they cannot be passed around to other cpus.	2015-09-07 17:16:29 +02:00
Calle Wilund	fdb921afb2	Commitlog: Add flushing of segment CF:s on disk overflow * Do not throw away commitlog segments on disk size overflow. Issue a flush request (i.e. calculate RP we want to free unto, and for all dirty CF:s, do a request). "Abstracted" as registerable callback. I.e. DB:s responsibility to actually do something with it.	2015-09-07 13:21:43 +02:00
Calle Wilund	31f2dcb342	Config: change commilog max size on disk to be in sync with scylla.yaml	2015-09-07 13:13:51 +02:00
Calle Wilund	841dd32a8a	Commitlog: divide max on-disk-size by num cpus To try to keep the resulting limit as configured	2015-09-07 13:13:46 +02:00
Asias He	f89a25562c	storage_service: Fix is_auto_bootstrap Get the value from cfg option.	2015-09-07 12:53:58 +03:00
Asias He	7cc768a864	gossip: Fix wrong cluster name and partitioner name Right now, gossip returns hard coded cluster and partitioner name. sstring get_cluster_name() { // FIXME: DatabaseDescriptor.getClusterName() return "my_cluster_name"; } sstring get_partitioner_name() { // FIXME: DatabaseDescriptor.getPartitionerName() return "my_partitioner_name"; } Fix it by setting the correct name from configure option. With this cqlsh 127.0.0.$i -e "SELECT * from system.local; returns correct cluster_name. Fixes #291	2015-09-07 09:21:18 +03:00
Avi Kivity	42dc29619d	Merge "Optimize mutation copies and moves" from Paweł "This series deals with copies and moves of mutation. The former are dealt with by adding std::move() and missing 'mutable' (in case of lambdas). The latter are improved by storing mutation_partition externally thus removing the need for moving mutation_partition each time mutation is moved. Storing mutation_partition externally is obviously trading the cost of move constructor for the cost of allocation which shows in perf_mutation results since mutations aren't moved in that test. perf_mutation (-c 1): before: 3289520.06 tps after: 3183023.37 tps diff: -3.24% perf_simple_query (read): before: 526954.05 tps after: 577225.16 tps diff +9.54% perf_simple_query (write): before: 731832.70 tps after: 734923.60 tps diff: +0.42% Fixes #150 (well, not completely)."	2015-09-03 12:05:28 +03:00
Paweł Dziepak	830a86258b	db: avoid copying mutations Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-03 10:30:32 +02:00
Paweł Dziepak	ddec2b4d09	batchlog_manager: pass mutations by const ref Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-03 10:30:29 +02:00
Paweł Dziepak	8188896eb7	schema_tables: add missing mutable Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>	2015-09-03 10:30:25 +02:00
Glauber Costa	28f315fad4	system_keyspace: keep msg alive when needed Fixes #266 Some callsites are fine: if we just get the message and process it, as is the case with check_health for instance, msg will be alive and all is good. But if we return a future inside the processing, msg must be kept alive. Classic bug, appearing again. Pekka saw this in practice in another bug. We haven't seen anything that is related to this, but it is certainly wrong. Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>	2015-09-03 09:11:07 +03:00
Pekka Enberg	ce39f9d57a	db/system_keyspace: Fix use-after-free in build_dc_rack_info() Fixes #264. Signed-off-by: Pekka Enberg <penberg@cloudius-systems.com>	2015-09-02 16:37:34 +03:00
Calle Wilund	d95101664d	Commitlog: Don't throw exceptions on unrecognized files in CL dir	2015-09-01 14:23:03 +02:00
Calle Wilund	1814f89730	Commitlog: Add some more metrics + accessors for json API Fixes #99 Adding missing commitlog metrics to the rest API. v2: Mis-send (clumsy fingers) v3: Use map_reduce0 + subroutine for nicer code v4: rebased on current master v5: rebased yet again. Since the _second_ file in this previous patch set was commited, and is dependent on this very change below to even compile, some expediency might be warranted.	2015-09-01 10:15:33 +03:00
Calle Wilund	9ba84e458a	Commitlog: Handle partial writes in segment::cycle * Fixes #247 * Re-introduce test_allocation_failure, but allow for the "failure" to not happen. I.e. if run with low memory settings, the test will check that allocation failure is graceful. With lots of memory it will check partial write.	2015-08-31 20:02:05 +03:00
Calle Wilund	d3a01072af	CommitLogReplayer: Java -> C++ Initial implementation	2015-08-31 14:29:50 +02:00
Calle Wilund	bbf82e80d0	Commitlog: Allow skipping X bytes in commit log reader Also refactor reader into named methods for debugging sanity.	2015-08-31 14:29:49 +02:00
Calle Wilund	da9ea641e5	Commitlog: Handle full paths in descriptor file name parse.	2015-08-31 14:29:48 +02:00
Calle Wilund	02d2bef1f2	Commitlog: Expose convinience method "list_existing_segments"	2015-08-31 14:29:48 +02:00
Calle Wilund	19052b3c09	Commitlog: Expose list_existing_descriptors	2015-08-31 14:29:48 +02:00
Calle Wilund	e068ffb5a5	Commitlog: Make file reader provide replay_position for entries	2015-08-31 14:29:47 +02:00
Calle Wilund	41b1ad8600	Commitlog: Make descriptor type visible/usable from outside	2015-08-31 14:29:47 +02:00
Calle Wilund	ea38b223bd	Commitlog: change the ID generation scheme * Make it more like origin, i.e. based on wall clock time of app start * Encode shard ID in the, RP segement ID, to ensure RP:s and segement names are unique per shard	2015-08-31 14:29:46 +02:00
Calle Wilund	0fcf7e3e91	Commitlog: Make "position" type 32-bit to align replay_position with Origin * Note: removed commitlog_test:test_allocation_failure because with segments limited to 4GB -> mutation limited to 2GB, actually forcing a fail is not guaranteed or even likely.	2015-08-31 14:29:44 +02:00
Calle Wilund	3f1a91b89c	Commitlog: do not eagerly create first segment on init Deferring makes it easier to separate old segments from new, which in turn helps replay logic.	2015-08-31 13:11:44 +02:00

1 2 3 4 5 ...

421 Commits