scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-09 08:23:29 +00:00

Author	SHA1	Message	Date
Botond Dénes	fddd9a88dd	treewide: silence discarded future warnings for legit discards This patch silences those future discard warnings where it is clear that discarding the future was actually the intent of the original author, and they did the necessary precautions (handling errors). The patch also adds some trivial error handling (logging the error) in some places, which were lacking this, but otherwise look ok. No functional changes.	2019-08-26 18:54:44 +03:00
Gleb Natapov	6a4207f202	Pass service permit to storage_proxy Current cql transport code acquire a permit before processing a query and release it when the query gets a reply, but some quires leave work behind. If the work is allowed to accumulate without any limit a server may eventually run out of memory. To prevent that the permit system should account for the background work as well. The patch is a first step in this direction. It passes a permit down to storage proxy where it will be later hold by background work.	2019-08-12 10:20:43 +03:00
Piotr Sarna	ac7531d8d9	db,hints: decouple in-flight hints limits from resource manager The resource manager is used to manage common resources between various hints managers. In-flight hints used to be one of the shared resources, but it proves to cause starvation, when one manager eats the whole limit - which may be especially painful if the background materialized views hints manager starves the regular hints manager, which can in turn start failing user writes because of admission control. This patch makes the limit per-manager again, which effectively reverts the limit to its original behavior. Fixes #4483 Message-Id: <8498768e8bccbfa238e6a021f51ec0fa0bf3f7f9.1559649491.git.sarna@scylladb.com>	2019-07-12 19:21:26 +03:00
Vlad Zolotarov	f07c341efc	hints_manager: rename the state::ep_state_is_not_normal enum value Rename this state value to better reflect the reality: state::ep_state_is_not_normal -> state::ep_state_left_the_ring The manager gets to this state when the destination Node has left the ring. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 15:46:47 -04:00
Vlad Zolotarov	93ba700458	hinted handoff: fix the logic that detects that the destination node is in DN state When node is in a DN state its gossiper state may be NORMAL, SHUTDOWN or "" depending on the use case. In addition to that if node has been removed from the ring its state is also going to be removed from the gossiper_state map. Let's consider the above when deciding if node is in the DN state. Fixes #4461 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-05-08 14:53:01 -04:00
Vlad Zolotarov	274b9d8069	hinted_handoff: sender::can_send(): optimize gossiper::is_alive(ep) check gossiper::is_alive() has a lot of not needed checks (e.g. is_me(ep)) that are irrelevant for HH use case and we may safely skip them. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:16:07 -04:00
Vlad Zolotarov	74b4076ceb	hinted handoff: end_point_hints_manager::sender: use _gossiper instead of _shard_manager.local_gossiper() sender has its own reference to the local gossiper - use it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-25 23:04:02 -04:00
Benny Halevy	5a99023d4a	treewide: use lambda for io_check of *touch_directory To prepare for a seastar change that adds an optional file_permissions parameter to touch_directory and recursive_touch_directory. This change messes up the call to io_check since the compiler can't derive the Func&& argument. Therefore, use a lambda function instead to wrap the call to {recursive_,}touch_directory. Ref #4395 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20190421085502.24729-1-bhalevy@scylladb.com>	2019-04-21 12:04:39 +03:00
Vlad Zolotarov	db2ba0df61	hinted handoff: discard corrupted segments If we discover that a current segment is corrupted there is nothing we can do about it. This patch does the following: 1) Drops the corrupted segment and moves to the next one. 2) Logs such events as ERRORs. 3) Introduces a new metrics that accounts such event. Fixes #4364 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 15:54:20 -04:00
Vlad Zolotarov	00fe2acb35	hinted handoff: disable "reuse_segments" Hinted handoff doesn't utilize this feature (which was developed with a commitlog in mind). Since it's enabled by default we need to explicitly disable it. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-04-09 11:13:41 -04:00
Benny Halevy	ff4d8b6e85	treewide: use std::filesystem Rather than {std::experimental,boost,seastar::compat}::filesystem On Sat, 2019-03-23 at 01:44 +0200, Avi Kivity wrote: > The intent for seastar::compat was to allow the application to choose > the C++ dialect and have seastar follow, rather than have seastar choose > the types and have the application follow (as in your patch). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-03-28 14:21:10 +02:00
Duarte Nunes	93a1c27b31	service/storage_proxy: Don't consider view hints for MV backpressure When a view replica becomes unavailable, updates to it are stored as hints at the paired based replica. This on-disk queue of pending view updates grows as long as there are view updated and the view replica remains unavailable. Currently, we take that relative queue size into account when calculating the delay for new base writes, in the context of the backpressure algorithm for materialized views. However, the way we're calculating that on-disk backlog is wrong, since we calculate it per-device and then feed it to all the hints managers for that device. This means that normal hints will show up as backlog for the view hints manager, which in turn introduces delays. This can make the view backpressure mechanism kick-in even if the cluster uses no materialized views. There's yet another way in which considering the view hints backlog is wrong: a view replica that is unavailable for some period of time can cause the backlog to grow to a point where all base writes are applied the maximum delay of 1 second. This turns a single-node failure into cluster unavailability. The fix to both issues is to simply not take this on-disk backlog into account for the backpressure algorithm. Fixes #4351 Fixes #4352 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190321170418.25953-1-duarte@scylladb.com>	2019-03-24 20:29:56 +02:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Vlad Zolotarov	34829b8f81	hinted handoff: cache column family mappings for segments that were not sent out in full We will try to send a particular segment later (in 1s) from the place where we left off if it wasn't sent out in full before. However we may miss some of column family mappings when we get back to sending this file and start sending from some entry in the middle of it (where we left off) if we didn't save column family mappings we cached while reading this segment from its begining. This happens because commitlog doesn't save a column family information in every entry but rather once for each uniq column family (version) per "cycle" (see commitlog::segment description for more info). Therefore we have to assume that a particular column family mapping appears once in the whole segment (worst case). And therefore, when we decide to resume sending a segment we need to keep the column family mappings we accumulated so far and drop them only after we are done with this particular segment (sent it out in full). Fixes #4122 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 15:24:22 -05:00
Vlad Zolotarov	4516a8cfc4	hinted handoff: add a "discarded" metric Account the amount of hints that were discarded in the send path. This may happen for instance due to a schema change or because a hint being to old. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2019-01-22 14:11:09 -05:00
Avi Kivity	630f841e5b	hints: de-template scan_for_hints_dirs() This function is called twice, and is not doing anything performance critical, so replace the template parameter Func with std::function<>.x	2019-01-20 15:55:20 +02:00
Avi Kivity	6e6372e8d2	Revert "Merge "Type-eaese gratuitous templates with functions" from Avi" This reverts commit `31c6a794e9`, reversing changes made to `4537ec7426`. It causes bad_function_calls in some situations: INFO 2019-01-20 01:41:12,164 [shard 0] database - Keyspace system: Reading CF sstable_activity id=5a1ff267-ace0-3f12-8563-cfae6103c65e version=d69820df-9d03-3cd0-91b0-c078c030b708 INFO 2019-01-20 01:41:13,952 [shard 0] legacy_schema_migrator - Moving 0 keyspaces from legacy schema tables to the new schema keyspace (system_schema) INFO 2019-01-20 01:41:13,958 [shard 0] legacy_schema_migrator - Dropping legacy schema tables INFO 2019-01-20 01:41:14,702 [shard 0] legacy_schema_migrator - Completed migration of legacy schema tables ERROR 2019-01-20 01:41:14,999 [shard 0] seastar - Exiting on unhandled exception: std::bad_function_call (bad_function_call)	2019-01-20 11:32:14 +02:00
Avi Kivity	81d004b2c0	hints: de-template scan_for_hints_dirs() This function is called twice, and is not doing anything performance critical, so replace the template parameter Func with std::function<>.x	2019-01-17 18:51:46 +02:00
Avi Kivity	f02c64cadf	streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh This header, which is easily replaced with a forward declaration, introduces a dependency on database.hh everywhere. Remove it and scatter includes of database.hh in source files that really need it.	2019-01-05 17:33:25 +02:00
Duarte Nunes	b7517183fa	db/commitlog: Use fragmented buffers to read entries Leverage fragmented_temporary_buffer when reading commit log entries, avoiding large allocations. Refs #4020 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-31 13:20:37 +00:00
Avi Kivity	eae030b061	hints: reduce dependencies on db/config.hh Instead of accessing extensions via config, access it via database::extensions(). This reduces recompilations when configuration is extended.	2018-12-21 20:15:44 +00:00
Duarte Nunes	6afbec4685	db/hints: Initialize current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Calle Wilund	b35af84599	commitlog_replay: Enforce file name based id matching When reading the header chunk of a commitlog file, check the stored id value against the id derived from the file name, and ignore if mismatched. This is a prerequisite for re-using renamed commitlog files, as we can then fail-fast should one such be left on disk, instead of trying to replay it. We also check said id via the CRC check for each chunk parsed. If we find a chunk with mismatched id, we will get a CRC error for the chunk, and replay will terminate (albeit not gracefully).	2018-12-10 09:09:07 +00:00
Benny Halevy	857ff4f59a	database: directly use std::experimental::filesystem::path for lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Benny Halevy	585ac6e641	database: use std::experimental::filesystem::path for lister::path We would like to get rid of boost::filesystem and gradually replace it with std::experimental::filesystem. TODO: using namespace fs = std::experimental::filesystem, use fs::path directly, rather than lister::path Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2018-12-02 22:02:10 +02:00
Gleb Natapov	b4a8802edc	hints: make hints manager more resilient to unexpected directory content Currently if hints directory contains unexpected directories Scylla fails to start with unhandled std::invalid_argument exception. Make the manager ignore malformed files instead and try to proceed anyway. Message-Id: <20181121134618.29936-2-gleb@scylladb.com>	2018-11-21 14:53:03 +00:00
Gleb Natapov	9433d02624	hints: add auxiliary function for scanning high level hints directory We scan hints directory in two places: to search for files to replay and to search for directories to remove after resharding. The code that translates directory name to a shard is duplicated. It is simple now, so not a bit issue but in case it grows better have it in one place. Message-Id: <20181121134618.29936-1-gleb@scylladb.com>	2018-11-21 14:53:03 +00:00
Avi Kivity	1533487ba8	Merge "hinted handoff: give a sender a low priority" from Vlad " Hinted handoff should not overpower regular flows like READs, WRITEs or background activities like memtable flushes or compactions. In order to achieve this put its sending in the STEAMING CPU scheduling group and its commitlog object into the STREAMING I/O scheduling group. Fixes #3817 " * 'hinted_handoff_scheduling_groups-v2' of https://github.com/vladzcloudius/scylla: db::hints::manager: use "streaming" I/O scheduling class for reads commitlog::read_log_file(): set the a read I/O priority class explicitly db::hints::manager: add hints sender to the "streaming" CPU scheduling group	2018-10-23 16:55:05 +00:00
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Vlad Zolotarov	cff4186517	db::hints::manager: add a "started" state Hinting is allowed after "started" before "stopping". Hints that attempted to be stored outside this time frame are going to be dropped. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:36 -04:00
Vlad Zolotarov	fb513a4b23	db::hints::manager: introduce a _state Introduce a multi-bit state field. In this patch it replaces the _stopping boolean. We are going to add more states in the following patches. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:41:33 -04:00
Duarte Nunes	624472d16a	db/hints/manager: Expose current backlog Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	6dcb7a39d4	db/hints/manager: Move decision about blocking hints to the manager The space_watchdog enables or disables hints for the managers associated with a particular device. We encapsulate this decision inside the hints::managers by introducing the update_backlog() function. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:35:00 +01:00
Duarte Nunes	207c9c8e38	db/hints/resource_manager: Correctly account resources in space_watchdog A db::hints::resource_manager manages the resources for one or two db::hints::managers. Each of these can be using the same or different devices. The db::hints::space_watchdog periodically checks whether each manager is within their resource allocation, and if not disables it. The watchdog iterates over the managers and accounts for the total size they are using. This is wrong, since it can account in the same variable the size consumed by managers using different devices. We fix this while taking advantage of the fact that on_timer is now called in the context of a seastar::thread, instead of using future combinators. Fixes #3821 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:34:54 +01:00
Duarte Nunes	25d266bdc1	db/hints/resource_manager: Replace timer with seastar::thread Will make on_timer() much simpler to allow fixing a bug in subsequent patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	278aa13bb0	db/hints/resource_manager: Ensure managers are correctly registered Registering a manager for a new device used std::unordered_map::emplace(), which may not insert the specified value if one with the same key has already been added. This could happen if both managers were using the same device and the fiber deferred in-between adding them. Found during code reading. Could cause hints to not be disabled for an overloaded manager. Fixes #3822 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	9e3b09cf48	db/hints/resource_manager: Fix formatting Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Duarte Nunes	622ac734da	db/hints: Disallow moving or copying the managers Disable the copy and move ctors and assignment operators for both the hints::manager and the hints::resource_manager. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-16 20:32:16 +01:00
Vlad Zolotarov	5b12ec441d	db::hints::manager: use "streaming" I/O scheduling class for reads Make sure that read I/O in the context of HH sending do not overpower I/O in the context of queries, memtable flushes or compactions. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	a89188de07	commitlog::read_log_file(): set the a read I/O priority class explicitly Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Vlad Zolotarov	629972d586	db::hints::manager: add hints sender to the "streaming" CPU scheduling group Make sure that HH sends do not overpower (CPU wise) regular WRITEs flow. Fixes #3817 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-10 15:22:43 -04:00
Duarte Nunes	74d809f8be	db/hints/manager: Use frozen_mutation instead of mutation Instead of unfreezing a mutation from the commitlog and then freezing it again to send, just keep the read frozen mutation. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Duarte Nunes	6eec9748fc	db/hints/manager: Use database::find_schema() Instead of using find_column_family() and repeatedly asking for column_family::schema(), use database::find_schema() instead. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:57:30 +01:00
Asias He	4a0b561376	storage_service: Get rid of moving operation The moving operation changes a node's token to a new token. It is supported only when a node has one token. The legacy moving operation is useful in the early days before the vnode is introduced where a node has only one token. I don't think it is useful anymore. In the future, we might support adjusting the number of vnodes to reblance the token range each node owns. Removing it simplifies the cluster operation logic and code. Fixes #3475 Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>	2018-08-01 11:18:17 +03:00
Avi Kivity	512baf536f	storage_proxy: implement write timeouts Require a timeout parameter for storage_proxy::mutate_begin() and all its callers (all the way to thrift and cql modification_statement and batch_statement). This should fix spurious debug-mode test failures, where overcommit and general debug slowness result in the default timeouts being exceeded. Since the tests use infinite timeouts, they should not time out any more. Tests: unit (release), with an extra patch that aborts when a non-infinite timeout is detected. Message-Id: <20180707204424.17116-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Vlad Zolotarov	83ba6d84a1	db::hints::manager: implement rebalance() method Rebalance hints segments that need to be sent among all present shards. Ensure that after rebalancing the difference between the number of segments of any two shards is not greater than 1. Try to minimize the amount of "file rename" operations in order to achieve the needed result. Note: "Resharding" is a particular case of rebalancing. Tests: dtest: hintedhandoff_additional_test.py:TestHintedHandoff.hintedhandoff_rebalance_test Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-07-06 19:18:46 -04:00
Piotr Sarna	828497ad19	hints: amend a comment in device limits To make the comment less confusing, 'group of managers' is used instead of 'device'. Refs #3516 Reported-by: Vlad Zolotarov <vladz@scylladb.com> Signed-off-by: Piotr Sarna <sarna@scylladb.com> Message-Id: <60c9ab6b47195570f7ce7dff9556e3739b7ae00f.1529862547.git.sarna@scylladb.com>	2018-06-24 19:14:59 +01:00
Piotr Sarna	8b43ac3a57	hints: reserve more space for dedicated storage Reserving 10% of space for hints managers makes sense if the device is shared with other components (like /data or /commitlog). But, if hints directory is mounted on a dedicated storage, it makes sense to reserve much more - 90% was chosen as a sane limit. Whether storage is 'dedicated' or not is based on a simple check if given hints directory is a mount point. Fixes #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:27:00 +02:00
Piotr Sarna	32f86ca61e	hints: add is_mountpoint function A helper function that checks whether a path is also a mount point is added. Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:52 +02:00
Piotr Sarna	b6c1b8c5ef	hints: make space_watchdog device-aware Instead of having one static space limit for all directories, space_watchdog now keeps a per-device limit, shared among hints managers residing on the same disks. References #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:45 +02:00

1 2

72 Commits