scylladb

Author	SHA1	Message	Date
Avi Kivity	1533487ba8	Merge "hinted handoff: give a sender a low priority" from Vlad " Hinted handoff should not overpower regular flows like READs, WRITEs or background activities like memtable flushes or compactions. In order to achieve this put its sending in the STEAMING CPU scheduling group and its commitlog object into the STREAMING I/O scheduling group. Fixes #3817 " * 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla: db::hints::manager: use "streaming" I/O scheduling class for reads commitlog::read_log_file(): set the a read I/O priority class explicitly db::hints::manager: add hints sender to the "streaming" CPU scheduling group	2018-10-23 16:55:05 +00:00
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Vlad Zolotarov	cff4186517	db::hints::manager: add a "started" state Hinting is allowed after "started" before "stopping". Hints that attempted to be stored outside this time frame are going to be dropped. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:36 -04:00
Vlad Zolotarov	fb513a4b23	db::hints::manager: introduce a _state Introduce a multi-bit state field. In this patch it replaces the _stopping boolean. We are going to add more states in the following patches. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:33 -04:00
Duarte Nunes	624472d16a	db/hints/manager: Expose current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	6dcb7a39d4	db/hints/manager: Move decision about blocking hints to the manager The space_watchdog enables or disables hints for the managers associated with a particular device. We encapsulate this decision inside the hints::managers by introducing the update_backlog() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	207c9c8e38	db/hints/resource_manager: Correctly account resources in space_watchdog A db::hints::resource_manager manages the resources for one or two db::hints::managers. Each of these can be using the same or different devices. The db::hints::space_watchdog periodically checks whether each manager is within their resource allocation, and if not disables it. The watchdog iterates over the managers and accounts for the total size they are using. This is wrong, since it can account in the same variable the size consumed by managers using different devices. We fix this while taking advantage of the fact that on_timer is now called in the context of a seastar::thread, instead of using future combinators. Fixes #3821 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:34:54 +01:00
Duarte Nunes	25d266bdc1	db/hints/resource_manager: Replace timer with seastar::thread Will make on_timer() much simpler to allow fixing a bug in subsequent patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	278aa13bb0	db/hints/resource_manager: Ensure managers are correctly registered Registering a manager for a new device used std::unordered_map::emplace(), which may not insert the specified value if one with the same key has already been added. This could happen if both managers were using the same device and the fiber deferred in-between adding them. Found during code reading. Could cause hints to not be disabled for an overloaded manager. Fixes #3822 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	9e3b09cf48	db/hints/resource_manager: Fix formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	622ac734da	db/hints: Disallow moving or copying the managers Disable the copy and move ctors and assignment operators for both the hints::manager and the hints::resource_manager. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Vlad Zolotarov	5b12ec441d	db::hints::manager: use "streaming" I/O scheduling class for reads Make sure that read I/O in the context of HH sending do not overpower I/O in the context of queries, memtable flushes or compactions. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	a89188de07	commitlog::read_log_file(): set the a read I/O priority class explicitly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	629972d586	db::hints::manager: add hints sender to the "streaming" CPU scheduling group Make sure that HH sends do not overpower (CPU wise) regular WRITEs flow. Fixes #3817 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Duarte Nunes	74d809f8be	db/hints/manager: Use frozen_mutation instead of mutation Instead of unfreezing a mutation from the commitlog and then freezing it again to send, just keep the read frozen mutation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	6eec9748fc	db/hints/manager: Use database::find_schema() Instead of using find_column_family() and repeatedly asking for column_family::schema(), use database::find_schema() instead. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Asias He	4a0b561376	storage_service: Get rid of moving operation The moving operation changes a node's token to a new token. It is supported only when a node has one token. The legacy moving operation is useful in the early days before the vnode is introduced where a node has only one token. I don't think it is useful anymore. In the future, we might support adjusting the number of vnodes to reblance the token range each node owns. Removing it simplifies the cluster operation logic and code. Fixes #3475 Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>	2018-08-01 11:18:17 +03:00
Avi Kivity	512baf536f	storage_proxy: implement write timeouts Require a timeout parameter for storage_proxy::mutate_begin() and all its callers (all the way to thrift and cql modification_statement and batch_statement). This should fix spurious debug-mode test failures, where overcommit and general debug slowness result in the default timeouts being exceeded. Since the tests use infinite timeouts, they should not time out any more. Tests: unit (release), with an extra patch that aborts when a non-infinite timeout is detected. Message-Id: <20180707204424.17116-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Vlad Zolotarov	83ba6d84a1	db::hints::manager: implement rebalance() method Rebalance hints segments that need to be sent among all present shards. Ensure that after rebalancing the difference between the number of segments of any two shards is not greater than 1. Try to minimize the amount of "file rename" operations in order to achieve the needed result. Note: "Resharding" is a particular case of rebalancing. Tests: dtest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:18:46 -04:00
Piotr Sarna	828497ad19	hints: amend a comment in device limits To make the comment less confusing, 'group of managers' is used instead of 'device'. Refs #3516 Reported-by: Vlad Zolotarov <vladz@scylladb.com> Signed-off-by: Piotr Sarna <sarna@scylladb.com> Message-Id: <60c9ab6b47195570f7ce7dff9556e3739b7ae00f.1529862547.git.sarna@scylladb.com>	2018-06-24 19:14:59 +01:00
Piotr Sarna	8b43ac3a57	hints: reserve more space for dedicated storage Reserving 10% of space for hints managers makes sense if the device is shared with other components (like /data or /commitlog). But, if hints directory is mounted on a dedicated storage, it makes sense to reserve much more - 90% was chosen as a sane limit. Whether storage is 'dedicated' or not is based on a simple check if given hints directory is a mount point. Fixes #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:27:00 +02:00
Piotr Sarna	32f86ca61e	hints: add is_mountpoint function A helper function that checks whether a path is also a mount point is added. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:52 +02:00
Piotr Sarna	b6c1b8c5ef	hints: make space_watchdog device-aware Instead of having one static space limit for all directories, space_watchdog now keeps a per-device limit, shared among hints managers residing on the same disks. References #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:45 +02:00
Piotr Sarna	d22668de04	hints: add device_id to manager In order to make space_watchdog device-aware, device_id field is added to hints manager. It's an equivalent of stat.st_dev and it identifies the disk that contains manager's root directory. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:37 +02:00
Piotr Sarna	91b5e33c6a	hints: add get_device_id function In order to distinguish which directories reside on which devices, get_device_id function is added to resource manager. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:25:47 +02:00
Piotr Sarna	6b3a97e34a	hints: fix max_shard_disk_space_size initialization Previously max_shard_disk_space_size was unconditionally initialized with the capacity of hints_directory. But, it's likely that hints_directory doesn't exist at all if hinted handoff is not enabled, which results in Scylla failing to boot. So, max_shard_disk_space_size is now initialized with the capacity of hints_for_views directory, which is always present. This commit also moves max_shard_disk_space_size to the .cc file where it belongs - resource_manager.cc. Tests: unit (release) Message-Id: <9f7b86b6452af328c05c5c6c55bfad3382e12445.1528977363.git.sarna@scylladb.com>	2018-06-14 14:24:01 +01:00
Gleb Natapov	cdf1289b43	Provide available memory size to hinted handoff resource manager during creation	2018-06-11 15:34:13 +03:00
Piotr Sarna	204bc17bd7	hints: decouple hints manager metrics from constructor Now that more than one instance of hints manager can be present at the same time, registering metrics is moved out of the constructor to prevent 'registering metrics twice' errors.	2018-06-04 09:46:06 +02:00
Piotr Sarna	f345efc79a	hints: move space_watchdog to resource manager Space watchdog is decoupled from hints manager and moved to resource manager, so it can be shared among different hints manager instances.	2018-06-04 09:46:01 +02:00
Piotr Sarna	ef40f7e628	hints: move send limiter to resource manager Send limiting semaphore is moved from hints manager to resource manager. In consequence, hints manager now keeps a reference to its resource manager.	2018-06-04 09:35:58 +02:00
Piotr Sarna	2315937854	hints: move constants to resource_manager Constants related to managing resources are moved to newly created resource_manager class. Later, this class will be used to manage (potentially shared) resources of hints managers.	2018-06-04 09:35:58 +02:00
Vlad Zolotarov	48c96d09d6	db::hints::manager: drain hints when the node is decommissioned/removed When node is decommissioned/removed it will drain all its hints and all remote nodes that have hints to it will drain their hints to this node. What "drain" means? - The node that "drains" hints to a specific destination will ignore failures and will continue sending hints till the end of the current segment, erase it and move to the next one till there are no more segments left. After all hints are drained the corresponding hints directory is removed. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	ec76f8a27d	db::hints::manager: add a few more trace messages Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	6ede32156f	db::hints::manager::end_point_hints_manager::sender: add set_stopping()/stopping() methods It's nicer to have access methods instead of working directly with enum_set methods and values. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	94da744f37	db::hints::manager::end_point_hints_manager::stop(): log the last exception instead of forwarding it Returning a future with an exception from end_point_manager::stop() is practically useless because the best the caller can do is to log it and continue as if it didn't happen because it has other things to shut down. Therefore in order to simplify the caller we will log the exception if it happens and will always return a non-exceptional future. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	8aedbf9d18	db::hints: manager.hh: cleanup: fix the comments Fix the comments that went out of sync with the current implementation. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Vlad Zolotarov	5463b58faa	db::hints::manager: rework end_point_hints_manager::stop() to use seastar::async() This simplifies the code reading and extending. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Avi Kivity	16a7650873	Merge "More extensions: commitlog + system tables" from Calle " Additional extension points. * Allows wrapping commitlog file io (including hinted handoff). * Allows system schema modification on boot, allowing extensions to inject extensions into hardcoded schemas. Note: to make commitlog file extensions work, we need to both enforce we can be notified on segment delete, and thus need to fix the old issue of hard ::unlink call in segment destructor. Segment delete is therefore moved to a batch routine, run at intervals/flush. Replay segments and hints are also deleted via the commitlog object, ensuring an extension is notified (metadata). Configurable listeneres are now allowed to inject configuration object into the main config. I.e. a local object can, either by becoming a "configurable" or manually, add references to self-describing values that will be parsed from the scylla.yaml file, effectively extending it. All these wonderful abstractions courtesy of encryption of course. But super generalized! " * 'calle/commitlog_ext' of github.com:scylladb/seastar-dev: db::extensions: Allow extensions to modify (system) schemas db::commitlog: Add commitlog/hints file io extension db::commitlog: Do segment delete async + force replay delete go via CL main/init: Change configurable callbacks and calls to allow adding opts util::config_file: Add "add" config item overload	2018-03-26 16:18:22 +03:00
Calle Wilund	bb1a2c6c2e	db::commitlog: Add commitlog/hints file io extension To allow on-disk data to be augumented.	2018-03-26 11:58:27 +00:00
Calle Wilund	2bc98aebaf	db::commitlog: Do segment delete async + force replay delete go via CL Refs #2858 Push segement files to be deleted to a pending list, and process at intervals or flush-requests (or shutdown). Note that we do _not_ indescrimenately do deletes in non-anchored tasks, because we need to guarantee that finshed segments are fully deleted and gone on CL shutdown, not to be mistaken for replayables. Also make sure we delete segments replayed via commitlog call, so IFF we add metadata processing for CL, we can clear it out.	2018-03-26 11:58:27 +00:00
Duarte Nunes	fb54c09e0b	service/storage_proxy: Pass pending endpoints to send_to_endpoint() This will allow us to minimize the number of mutation copies in mutate_MV(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180325121412.76844-1-duarte@scylladb.com>	2018-03-25 15:45:21 +03:00
José Guilherme Vanz	380bc0aa0d	Swap arguments order of mutation constructor Swap arguments in the mutation constructor keeping the same standard from the constructor variants. Refs #3084 Signed-off-by: José Guilherme Vanz <guilherme.sft@gmail.com> Message-Id: <20180120000154.3823-1-guilherme.sft@gmail.com>	2018-01-21 12:58:42 +02:00
Glauber Costa	80c4a211d8	consolidate timeout_clock At the moment, various different subsystems use their different ideas of what a timeout_clock is. This makes it a bit harder to pass timeouts between them because although most are actually a lowres_clock, that is not guaranteed to be the case. As a matter of fact, the timeout for restricted reads is expressed as nanoseconds, which is not a valid duration in the lowres_clock. As a first step towards fixing this, we'll consolidate all of the existing timeout_clocks in one, now called db::timeout_clock. Other things that tend to be expressed in terms of that clock--like the fact that the maximum time_point means no timeout and a semaphore that wait()s with that resolution are also moved to the common header. In the upcoming patch we will fix the restricted reader timeouts to be expressed in terms of the new timeout_clock. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-11 12:07:41 -05:00
Raphael S. Carvalho	928beae242	Fix compilation of db/hints/manager.cc and row_cache.cc compiler: gcc (GCC) 6.3.1 20161221 (Red Hat 6.3.1-1) Problems introduced in `f6a461c7a4` and `37b19ae6ba`, respectively. They both fail to compile due to use of method in lambda without explicit mention of this. Some of failure is fixed by not using auto in lambda parameter. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20171218222144.12297-1-raphaelsc@scylladb.com>	2017-12-19 11:15:45 +01:00
Vlad Zolotarov	51bbf18c08	db::hints::manager: initial commit Curently implemented: - Hints generation: db::hints::manager::store_hint(...). - Sending: db::hints::manager::on_timer(). TODO: - Resharding. - Node decommission. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:08:07 -05:00

45 Commits