scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	ef18b1162b	sstables/compaction_manager: rename and better explain reshard function submit doesn't properly describe the function and also improve explanation of the relationship between function itself and its job parameter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170912032034.23043-1-raphaelsc@scylladb.com>	2017-09-12 12:25:17 +03:00
Tomasz Grabiec	3f527e028d	Merge "Reduce dependencies on sstables.hh" from Avi This patchset reduces includes of sstables.hh, reducing compile time by both reducing the amount of code compiled, and the amount of needless recompiles caused by false dependencies. It does so by replacing lw_shared_ptr<sstable>, which requires a complete class, with a new custom type shared_sstable, which allows an incomplete sstable class definition. * https://github.com/avikivity/scylla deps2/v2.1 database: change truncate() to flush while compaction is disabled database: make run_with_compaction_disabled() a non-template database: add indirection to compaction_manager instance database: remove dependency on compaction.hh and compaction_manager.hh size_estimates_virtual_reader.hh: add missing include system_keyspace: add missing include main: add missing include storage_service: add missing include repair: add missing include compaction.hh: add missig include and forward declaration compaction_manager: add missing include shared_index_lists.hh: add missing include perf_fast_forward: add missing include sstable_mutation_test: add missing include sstables: extract version and format enum into a separate header file database.hh: add missing forward declaration for foreign_sstable_open_info cql_test_env: add forward declaration database: make column_family::disable_sstable_write() out-of-line sstables: introduce make_sstable() as a shortcut for make_lw_shared<sstable> treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable> sstables: use support for lw_shared_ptr with incomplete type for shared_sstable sstables: reduce dependencies streaming: remove unneeded includes	2017-09-12 09:56:46 +02:00
Tomasz Grabiec	ee1e7732a6	database: Create tables with continuous cache When table is created, it doesn't contain any data, so we can mark the whole data range as continuous in cache. This way reads will immediately hit, and flushes will populate. If sstables are later attached, the attaching process is supposed to invalidate affected ranges (and it does). Fixes #2536. Message-Id: <1505200269-4031-1-git-send-email-tgrabiec@scylladb.com>	2017-09-12 10:53:07 +03:00
Avi Kivity	f7023501d6	treewide: use shared_sstable, make_sstable in place of lw_shared_ptr<sstable> Since shared_sstable is going to be its own type soon, we can't use the old alias.	2017-09-12 10:43:05 +03:00
Avi Kivity	88b91c84a1	database: make column_family::disable_sstable_write() out-of-line Reduces dependencies.	2017-09-12 10:43:05 +03:00
Avi Kivity	9b540eccb0	database: remove dependency on compaction.hh and compaction_manager.hh	2017-09-11 20:09:45 +03:00
Avi Kivity	f9c8c1ddc2	database: add indirection to compaction_manager instance Allows making it forward-declared later on, reducing dependencies.	2017-09-11 20:09:45 +03:00
Avi Kivity	9d0aaa941a	database: make run_with_compaction_disabled() a non-template Allows reducing dependencies down the line, and un-templating non-performance-critical functions is a good thing.	2017-09-11 20:09:45 +03:00
Avi Kivity	6b5514a3df	database: change truncate() to flush while compaction is disabled In preparation to make run_with_compaction_disabled() a non-template, we want to remove any non-copyable captures (so the function can be an std::function, which requires copyability). Move the flush within the compaction disabled region. This changes the behavior, but it shouldn't matter.	2017-09-11 20:09:45 +03:00
Paweł Dziepak	e401d2d50b	db: reject non-Scylla counter sstables in flush_upload_dir Scylla already refuses to load counter sstables that do not have Scylla component. However, if this happens because of 'nodetool refresh' command the existing protection will trigger after sstables have been moved to the data directory. This is too later, so an additional check is added when the upload directory is scanned.	2017-09-06 12:04:26 +01:00
Paweł Dziepak	6a5e8bace1	db: disallow loading non-Scylla counter sstables Scylla does not support local and remote counter shards. This means that it is unsafe to directly load sstables that may contain them.	2017-09-06 12:03:58 +01:00
Tomasz Grabiec	d22fdf4261	row_cache: Improve safety of cache updates Cache imposes requirements on how updates to the on-disk mutation source are made: 1) each change to the on-disk muation source must be followed by cache synchronization reflecting that change 2) The two must be serialized with other synchronizations 3) must have strong failure guarantees (atomicity) Because of that, sstable list update and cache synchronization must be done under a lock, and cache synchronization cannot fail to synchronize. Normally cache synchronization achieves no-failure thing by wiping the cache (which is noexcept) in case failure is detect. There are some setup steps hoever which cannot be skipped, e.g. taking a lock followed by switching cache to use the new snapshot. That truly cannot fail. The lock inside cache synchronizers is redundant, since the user needs to take it anyway around the combined operation. In order to make ensuring strong exception guarantees easier, and making the cache interface easier to use correctly, this patch moves the control of the combined update into the cache. This is done by having cache::update() et al accept a callback (external_updater) which is supposed to perform modiciation of the underlying mutation source when invoked. This is in-line with the layering. Cache is layered on top of the on-disk mutation source (it wraps it) and reading has to go through cache. After the patch, modification also goes through cache. This way more of cache's requirements can be confined to its implementation. The failure semantics of update() and other synchronizers needed to change due to strong exception guaratnees. Now if it fails, it means the update was not performed, neither to the cache nor to the underlying mutation source. The database::_cache_update_sem goes away, serialization is done internally by the cache. The external_updater needs to have strong exception guarantees. This requirement is not new. It is however currently violated in some places. This patch marks those callbacks as noexcept and leaves a FIXME. Those should be fixed, but that's not in the scope of this patch. Aborting is still better than corrupting the state. Fixes #2754. Also fixes the following test failure: tests/row_cache_test.cc(949): fatal error: in "test_update_failure": critical check it->second.equal(*s, mopt->partition()) has failed which started to trigger after commit `318423d50b`. Thread stack allocation may fail, in which case we did not do the necessary invalidation.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	bf75b882ae	database: Add non-throwing try_trigger_compaction()	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	116d4ae02b	database: Make add_sstable() have strong exception guarantees If insert() fails, we left the database with stats updated, but sstable not being attached.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	56e3ce05db	row_cache: Don't require presence checker to be supplied externally The API is simpler and safer this way.	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	df787afe6a	database: Supply presence checker in sstable snapshots	2017-09-04 10:04:29 +02:00
Tomasz Grabiec	ab8632b225	database: Add missing serialization of sstable set udpate and cache invalidation Commit `e3ad676433` missed a few places. It is required to serialize sstable list update and cache synchronization in order to preserve partition update isolation. Fixes #2746.	2017-09-04 10:04:29 +02:00
Glauber Costa	e642aee3f7	database: wait for asynchronous operations to end before closing CF This was part of "add gate for generic async operations to column family" but somehow didn't make it into the final patch. Add the missing piece. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20170830164205.4497-1-glauber@scylladb.com>	2017-08-31 11:16:30 +03:00
Tzach Livyatan	12fb975282	Fix typos in metrics description Fixes #2658 Signed-off-by: Tzach Livyatan <tzach@scylladb.com> Message-Id: <20170803121732.19640-1-tzach@scylladb.com>	2017-08-28 10:48:28 +03:00
Glauber Costa	83323e155e	database: add gate for generic async operations to column family run_with_compaction_disabled(), which is called by truncate, has a pretty large defer point in remove(). When the code gets to finally execute, we can't guarantee that the column family will still be alive. That is true in particular if we issued a drop table command following truncate: by the time truncate gets to resume, the CF will be gone. Before the column family is dropped, it will always call its stop() method, which means we have an opportunity to do some waiting there. We already wait for flushes and current compactions to end. Traditionally, we have been solving similar problems by adding a gate that will catch asynchronous operations and making sure that potentially asynchronous operations will enter the gate before executing. Let's do the same thing here. We will close() the gate during stop(). Fixes #2726 Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-08-24 13:12:57 -04:00
Glauber Costa	d090e7be35	database: make sure that column family is always stopped when dropped truncate can throw exceptions. If it does, cf->stop() will never be called because it is contained in a .then clause instead of finally. One of the things that truncate does - in a finally block of its own - is initiate a final compaction. If it returns an exception nobody will wait for that compaction to finish (since cf->stop() is the one doing that) and we'll crash. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2017-08-24 13:01:47 -04:00
Amnon Heiman	abbd78367c	Add configuration to disable per keyspace and column family metrics The number of keysapce and column family metrics reported is proportional to the number of shards times the number of keysapce/column families. This can cause a performance issue both on the reporting system and on the collecting system. This patch adds a configuration flag (set to false by default) to enable or disable those metrics. Fixes #2701 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20170821113843.1036-1-amnon@scylladb.com>	2017-08-22 19:19:54 +03:00
Raphael S. Carvalho	10eaa2339e	compaction: Make resharding go through compaction manager Two reasons for this change: 1) every compaction should be multiplexed to manager which in turn will make decision when to schedule. improvements on it will immediately benefit every existing compaction type. 2) active tasks metric will now track ongoing reshard jobs. Fixes #2671. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20170817224334.6402-1-raphaelsc@scylladb.com>	2017-08-20 11:35:14 +03:00
Botond Dénes	e70cfc8f36	incremental_reader_selector: account for possibly disengaged lower bound In addition to the constructor (fixed previously) the check for no sstables on the first call to select() also has to be prepared for the lower bound of the range being disengaged. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4ab1296c71814fcd492996fa36fd00fd7bbbbc7f.1502949875.git.bdenes@scylladb.com>	2017-08-17 10:07:26 +03:00
Botond Dénes	af83b7f57b	incremental_reader_selector: use lazy_deref instead of tertiary operator Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <4f4b884c6a1f517bd654f3b27608d854b17a66e1.1502948635.git.bdenes@scylladb.com>	2017-08-17 08:45:46 +03:00
Avi Kivity	8df6dd1fa0	database: make incremental_reader_selector robust vs. full-range partition_range incremental_reader_selector assumes the partition_range it receives has a lower bound, but it was seen in mutation_test that this is not so. Fix by checking whether the bound exists or not. Message-Id: <20170815095852.14149-1-avi@scylladb.com>	2017-08-15 11:03:22 +01:00
Duarte Nunes	7fb6a74302	combined_mutation_reader: Drop exhausted readers if not in FF mode Exhausted readers can be fast forwarded, so we have to keep them around. However, if the current reader is not fast forwardable, then we can drop those readers and their buffers. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-08-14 14:37:27 +02:00
Botond Dénes	9ee9988097	Add combined_mutation_reader_test unit test	2017-08-10 12:38:10 +03:00
Botond Dénes	3e97a5cd6b	Remove range_sstable_reader range_sstable_reader is replaced with combined_mutation_reader, using the incremental_reader_selector.	2017-08-10 12:38:10 +03:00
Botond Dénes	bfc74f1312	Add incremental_reader_selector incremental_reader_selector is a specialization of reader_selector for the case when sstables have narrow and/or disjoint token ranges. To exploit this it creates new readers on-demand when their sstable's token range intersects with the current ring position.	2017-08-10 12:38:02 +03:00
Botond Dénes	94fc550e68	sstable_set::incremental_selector: select() now returns a selection A seletion contains - in addition to the list of sstables - a next_token which is a hint as to what is the next best token to call select() with. This should be the smallest token such that at the next call to select() the least number of new sstables will be returned, without skipping any.	2017-08-09 16:27:33 +03:00
Glauber Costa	4a911879a3	add active streaming reads metric In commit `f38e4ff3f`, we have separated streaming reads from normal reads for the purpose of determining the maximum number of reads going on. However, we'll now be totally unaware of how many reads will be happening on behalf of streaming and that can be important information when debugging issues. This patch adds this metric so we don't fly blind. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <1501909973-32519-1-git-send-email-glauber@scylladb.com>	2017-08-05 11:06:37 +03:00
Avi Kivity	f38e4ff3f9	database: prevent streaming reads from blocking normal reads Streaming reads and normal reads share a semaphore, so if a bunch of streaming reads use all available slots, no normal reads can proceed. Fix by assigning streaming reads their own semaphore; they will compete with normal reads once issued, and the I/O scheduler will determine the winner. Fixes #2663. Message-Id: <20170802153107.939-1-avi@scylladb.com>	2017-08-03 10:23:01 +01:00
Avi Kivity	911536960a	database: remove streaming read queue length limit If we fail a streaming read due queue overload, we will fail the entire repair. Remove the limit for streaming, and trust the caller (repair) to have bounded concurrency. Fixes #2659. Message-Id: <20170802143448.28311-1-avi@scylladb.com>	2017-08-03 10:21:07 +01:00
Duarte Nunes	a85232dd82	Fix compilation errors on GCC 6 GCC 6 inconsistently requires explicitly calling a member function through "this->" for lambda functions capturing "this". Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170731143755.21970-1-duarte@scylladb.com>	2017-07-31 17:40:44 +03:00
Avi Kivity	3fe6731436	Merge "educe the effect of the latency metrics" from Amnon "This series reduce that effect in two ways: 1. Remove the latency counters from the system keyspaces 2. Reduce the histogram size by limiting the maximum number of buckets and stop the last bucket." Fixes #2650. * 'amnon/remove_cf_latency_v2' of github.com:cloudius-systems/seastar-dev: database: remove latency from the system table estimated histogram: return a smaller histogram	2017-07-31 15:58:30 +03:00
Duarte Nunes	c81431ad16	column_family: Re-acquire flush permit in case of error If we fail to flush an sstable, after creating the flush_reader, then we will have released the flush permit when we retry the flush. Ensure that when retrying, we re-acquire the flush permit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Duarte Nunes	9162e016da	column_family: Don't hold sstable read lock when retrying flush Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Duarte Nunes	1a33cc6847	sstables: Release the flush permit before fsyncing This allows a queued flush to start while we fsync the current sstable, which helps reduce the overall time new writes are blocked on dirty memory. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Duarte Nunes	a2b732c156	dirty_memory_manager: Refactor flush permit lifetime management This patch refactors how the flush permit lifetime is managed, dropping the current hash table in favour of a RAII approach. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Duarte Nunes	f647f5b14a	dirty_memory_manager: Invert permit acquisition order For an upcoming fix it is required to invert the permit acquisition order: first we acquire the background work permit and then the single flush permit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Duarte Nunes	e371accac8	memtable_list: Register different seal functions for each behaviour Instead of passing a flush_behaviour to the seal function, use two different functions for each of the behaviours. This will be important in the forthcoming patches, which will require the signatures of those functions to differ. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-31 12:40:19 +02:00
Avi Kivity	e855a28fae	Revert "Merge "memtable flush: Fixes and improvements" from Duarte" This reverts commit `733a64a1df`, reversing changes made to `e11e66723a`. Breaks sstable_test and perf_fast_forward.	2017-07-31 12:44:28 +03:00
Duarte Nunes	0f1bd81523	column_family: Re-acquire flush permit in case of error If we fail to flush an sstable, after creating the flush_reader, then we will have released the flush permit when we retry the flush. Ensure that when retrying, we re-acquire the flush permit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	2f4cffc7f6	column_family: Don't hold sstable read lock when retrying flush Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	5e64839e85	sstables: Release the flush permit before fsyncing This allows a queued flush to start while we fsync the current sstable, which helps reduce the overall time new writes are blocked on dirty memory. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	ef1275e9dd	dirty_memory_manager: Refactor flush permit lifetime management This patch refactors how the flush permit lifetime is managed, dropping the current hash table in favour of a RAII approach. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	cfc8fae33f	dirty_memory_manager: Invert permit acquisition order For an upcoming fix it is required to invert the permit acquisition order: first we acquire the background work permit and then the single flush permit. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Duarte Nunes	7e68e4677d	memtable_list: Register different seal functions for each behaviour Instead of passing a flush_behaviour to the seal function, use two different functions for each of the behaviours. This will be important in the forthcoming patches, which will require the signatures of those functions to differ. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-07-27 21:09:18 +02:00
Amnon Heiman	a71b9e498a	database: remove latency from the system table This patch remove the latency histograms from the system table, it also extend the already existing exclusion to all system keyspaces. It also uses the new get_histogram API to set a minimal bucket size to 100 microseconds.	2017-07-27 11:41:15 +03:00

1 2 3 4 5 ...

891 Commits