scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 11:55:15 +00:00

Author	SHA1	Message	Date
Nadav Har'El	da54d0fc7d	Materialized views: fix accidental zeroing of flow-control delay The materialized-views flow control carefully calculates an amount of microseconds to delay a client to slow it down to the desired rate - but then a typo (std::min instead of std::max) causes this delay to be zeroed, which in effect completely nullifies the flow control algorithm. Before this fix, experiments suggested that view flow control was not having any effect and view backlog not bounded at all. After this fix, we can see the flow control having its desired effect, and the view backlog converging. Fixes #4143. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190226161452.498-1-nyh@scylladb.com>	2019-02-26 18:22:18 +02:00
Gleb Natapov	b01a659014	storage_proxy: remove old Cassandra code Part of the code is already implemented (counters and hinted-handoff). Part of the code will probably never be (triggers). And the rest is the code that estimates number of rows per range to determine query parallelism, but we implemented exponential growth algorithms instead. Message-Id: <20190214112226.GE19055@scylladb.com>	2019-02-18 10:34:55 +02:00
Gleb Natapov	26e5700819	storage_proxy: limit amount of precaclulated ranges by query_ranges_to_vnodes_generator Do not recalculate too much ranges in advance, it requires large allocation and usually means that a consumer of the interface is going to do to much work in parallel. Fixes: #3767	2019-02-12 10:45:25 +02:00
Gleb Natapov	ecc5230de5	storage_proxy: remove old get_restricted_ranges() interface It is not used any more.	2019-02-11 14:45:43 +02:00
Gleb Natapov	2735a85c8e	storage_proxy: convert range query path to new query_ranges_to_vnodes_generator interface	2019-02-11 14:45:43 +02:00
Gleb Natapov	692a0bd000	storage_proxy: introduce new query_ranges_to_vnode_generator interface get_restricted_ranges() function gets query provided key ranges and divides them on vnode boundaries. It iterates over all ranges and calculates all vnodes, but all its users are usually interested in only one vnode since most likely it will be enough to populate a page. If it will be not enough they will ask for more. This patch introduces new interface instead of the function that allows to generate vnode ranges on demand instead of precalculating all of them.	2019-02-11 14:45:43 +02:00
Piotr Sarna	e0fe9ce2c0	storage_proxy: add allow_hints parameter to send_to_endpoint With hints allowed, send_to_endpoint will leverage consistency level ANY to send data. Otherwise, it will use the default - cl::ONE.	2019-01-28 09:38:41 +01:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Nadav Har'El	da090a5458	materialized views: move hints to top-level directory While we keep ordinary hints in a directory parallel to the data directory, we decided to keep the materialized view hints in a subdirectory of the data directory, named "view_pending_updates". But during boot, we expect all subdirectories of data/ to be keyspace names, and when we notice this one, we print a warning: WARN: database - Skipping undefined keyspace: view_pending_updates This spurious warning annoyed users. But moreover, we could have bigger problems if the user actually tries to create a keyspace with that name. So in this patch, we move the view hints to a separate top-level directory, which defaults to /var/lib/scylla/view_hints, but as usual can be configured. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20190107142257.16342-1-nyh@scylladb.com>	2019-01-07 16:43:43 +02:00
Avi Kivity	f02c64cadf	streaming: stream_session: remove include of db/view/view_update_from_staging_generator.hh This header, which is easily replaced with a forward declaration, introduces a dependency on database.hh everywhere. Remove it and scatter includes of database.hh in source files that really need it.	2019-01-05 17:33:25 +02:00
Duarte Nunes	2f69ba2844	lwt: Remove Paxos-related Cassandra code Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181227112526.4180-1-duarte@scylladb.com>	2018-12-27 13:30:10 +02:00
Duarte Nunes	e6a8883228	service/storage_proxy: Protect against empty mutation when storing hint mutation_holder::get_mutation_for() can return nullptr's, so protect against those when storing a hint. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-2-duarte@scylladb.com>	2018-12-23 11:14:44 +02:00
Duarte Nunes	6c4a34f378	service/storage_proxy: Protect against empty mutation in mutation_holder The per_destination_mutation holder can contain empty mutations, so make sure release_mutation() skips over those. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181221194853.98775-1-duarte@scylladb.com>	2018-12-23 11:14:43 +02:00
Duarte Nunes	2d7c026d6e	service/storage_proxy: Release mutation as early as possible When delaying a base write, there is no need to hold on to the mutation if all replicas have already replied. We introduce mutation_holder::release_mutation(), which frees the mutations that are no longer needed during the rest of the delay. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	756b601560	service/storage_proxy: Delay replica writes based on view update backlog As the amount of pending view updates increases we know that there’s a mismatch between the rate at which the base receives writes and the rate at which the view retires them. We react by applying backpressure to decrease the rate of incoming base writes, allowing the slow view replicas to catch up. We want to delay the client’s next writes to a base replica. We use the base’s backlog of view updates to derive this delay. If we achieve CL and the backlogs of all replicas involved were last seen to be empty, then we wouldn't delay the client's reply. However, it could be that one of the replicas is actually overloaded, and won't reply for many new such requests. We'll eventually start applying backpressure to the client via the background's write queue, but in the meanwhile we may be dropping view updates. To mitigate this we rely on the backlog being gossiped periodically. Fixes #2538 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	997bdf5d98	service/storage_proxy: Get the backlog of a particular base replica Add a function that returns the view update backlog for a particular replica. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	819b6f3406	service/storage_proxy: Add counters for delayed base writes Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	ede5742f9b	service/storage_proxy: Send view update backlog from replicas Change the inter-node protocol so we can propagate the view update backlog from a base replica to the coordinator through the mutation_done and mutation_failed verbs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	34b48e1d98	service/storage_proxy: Prepare to receive replica view update backlog In subsequent patches, replicas will reply to the coordinator with their view update backlog. Before introducing changes to the messaging_service, prepare the storage_proxy to receive and store those backlogs. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	776fdd4d1a	service/storage_proxy: Expose local view update backlog The local view update backlog is the max backlog out of the relative memory backlog size and the relative hints backlog size. We leverage the db::view::node_update_backlog class so we can send the max backlog out of the node's shards. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	e33e187096	service/storage_proxy: Use near-infinite timeouts for view updates View updates are sent with a timeout of 5 minutes, unrelated to any user-defined value and meant as a protection mechanism. During normal operation we don’t benefit from timing out view writes and offloading them to the hinted-handoff queue, since they are an internal, non-real time workload that we already spent resources on. This value should be increases further, but that change depends on Refs #2538 Refs #3826 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:30 +00:00
Duarte Nunes	bf4277fd8c	service/storage_proxy: Remove unused send_to_endpoint() overloads The send_to_endpoint() overloads that receive a non-frozen mutation are no longer used. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-12-19 22:38:29 +00:00
Duarte Nunes	f8878238ed	service/storage_proxy: Embed the expire timer in the response handler Embedding the expire timer for a write response in the abstract_write_response_handler simplifies the code as it allows removing the rh_entry type. It will also make the timeout easily accessible inside the handler, for future patches. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181213111818.39983-1-duarte@scylladb.com>	2018-12-13 14:25:21 +02:00
Gleb Natapov	9fb79bf379	storage_proxy: fix crash during write timeout callback invocation rh_entry address is captured inside timeout's callback lambda, so the structure should not be moved after it is created. Change the code to create rh_entry in-place instead of moving it into the map. Fixes #3972. Message-Id: <20181206164043.GN25283@scylladb.com>	2018-12-09 10:33:37 +02:00
Gleb Natapov	17197fb005	storage_proxy: store hint for CL=ANY if all nodes replied with failure Current code assumes that request failed if all replicas replied with failure, but this is not true for CL=ANY requests. Take it into account. Fixed: #3565	2018-11-27 15:06:37 +02:00
Gleb Natapov	d1d04eae3c	storage_proxy: complete write request early if all replicas replied with success of failure Currently if write request reaches CL and all replicas replied, but some replied with failures, the request will wait for timeout to be retired. Detect this case and retire request immediately instead. Fixes #3566	2018-11-27 14:49:37 +02:00
Gleb Natapov	76ab3d716b	storage_proxy: check that write failure response comes from recognized replica Before accounting failure response we need to make sure it comes from a replica that participates in the request.	2018-11-27 14:44:49 +02:00
Gleb Natapov	7bc68aa0eb	storage_proxy: move code executed on write timeout into separate function Currently the callback is in lambda, but we will want to call the code not only during timer expiration.	2018-11-27 13:23:30 +02:00
Avi Kivity	775b7e41f4	Update seastar submodule * seastar d59fcef...b924495 (2): > build: Fix protobuf generation rules > Merge "Restructure files" from Jesse Includes fixup patch from Jesse: " Update Seastar `#include`s to reflect restructure All Seastar header files are now prefixed with "seastar" and the configure script reflects the new locations of files. Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com> Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com> "	2018-11-21 00:01:44 +02:00
Vlad Zolotarov	a87c11bad2	storage_proxy::query_result_local: create a single tracing span on a replica shard Every call of a tracing::global_trace_state_ptr object instead of a tracing::tracing_state_ptr or a call to tracing::global_trace_state_ptr::get() creates a new tracing session (span) object. This should never be done unless query handling moves to a different shard. Fixes #3862 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20181018003500.10030-1-vladz@scylladb.com>	2018-10-19 16:47:17 +00:00
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Avi Kivity	1891779e64	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare()	2018-10-09 17:48:18 +03:00
Gleb Natapov	319ece8180	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com>	2018-10-09 16:33:53 +03:00
Gleb Natapov	207b57a892	storage_proxy: count number of timed out write attempts after CL is reached It is useful to have this counter to investigate the reason for read repairs. Non zero value means that writes were lost after CL is reached and RR is expected. Message-Id: <20181009120900.GF22665@scylladb.com>	2018-10-09 15:17:07 +03:00
Duarte Nunes	3b6d2286e9	service/storage_proxy: send_to_endpoint overload accepting frozen_mutation Add an overload to send_to_endpoint() which accepts a frozen_mutation. The motivation is to allow better accounting of pending view updates, but this change also allows some callers to avoid unfreezing already frozen mutations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:37:39 +01:00
Duarte Nunes	c7639f53e0	service/storage_proxy: Build a shared_mutation from a frozen_mutation Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	2c739f36cc	service/storage_proxy: Allow non-const ranges in mutate_prepare() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	30d6ed8f92	service/storage_proxy: Consider target liveness in sent_to_endpoint() So we don't attempt to send mutations to unreachable endpoints and instead store a hint for them, we now check the endpoint status and populate dead_endpoints accordingly in storage_proxy::send_to_endpoint(). Fixes #3820 Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181007100640.2182-1-duarte@scylladb.com>	2018-10-07 16:04:26 +03:00
Duarte Nunes	a69d468101	service/storage_proxy: Fix formatting of send_to_endpoint() Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20181006204756.32232-1-duarte@scylladb.com>	2018-10-07 11:05:32 +03:00
Calle Wilund	2996b8154f	storage_proxy: Add missing re-throw in truncate_blocking Iff truncation times out, we want to log it, but the exception should not be swallowed, but re-thrown. Fixes #3796. Message-Id: <20181001112325.17809-1-calle@scylladb.com>	2018-10-01 19:07:04 +02:00
Tomasz Grabiec	82270c8699	storage_proxy: Fix misqualification of reads as foreground or background in some cases The foreground reads metric is derived from the number of live read executors minus the number of background reads. Background reads are counted down when their resolver times out. However, a read executor may still be around for a while, resulting in such reads being accounted as foreground. Usually, the gap in which this happens is short, because executor reference holders timeout quickly as well. It's not always the case though. For instance, local read executor doesn't time out quickly when the target shard has an overloaded CPU, and it takes a while before the request goes through all the queues, even if IO is not involved. Observed in #3628. Fixes #3734. Another problem is that all reads which received CL responses are accounted as background, until all replicas respond, but if such read needs reconciliation, it's still practically a foreground read and should be accounted as such. Found during code review. Fixes #3745. This patch fixes both issues by rearranging accounting to track foreground reads instead of background reads, and considering all reads as foreground until the resulting promise is resolved. Message-Id: <1535999620-25784-1-git-send-email-tgrabiec@scylladb.com>	2018-09-05 20:42:51 +03:00
Botond Dénes	6486d6c8bd	storage_proxy: use preferred/last replicas	2018-09-03 10:31:44 +03:00
Botond Dénes	577a06ce1b	storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent	2018-09-03 10:31:44 +03:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	2f66bde26f	storage_proxy: use query_mutations_from_all_shards() for range scans	2018-09-03 10:31:44 +03:00
Avi Kivity	908e497f3d	storage_proxy: make _mutate_stage inherit its caller's scheduling_group Right now, storage_proxy's mutate_stage violates isolation by running in a plain execution_stage without a scheduling_group. This means do_mutate() will run under the main scheduling_group, at least until we reach the database apply execution stage, which is correct. Fix by moving to an inheriting execution stage; this works because the messaging service will tell RPC to set the correct execution stage for us. We could explicitly specify statement_scheduling_group, but inheriting the scheduling group allows us to have multiple statment scheduling groups, later.	2018-08-24 19:04:49 +03:00
Gleb Natapov	7277ee2939	storage_proxy: do not fail read without speculation on connection error After `ac27d1c93b` if a read executor has just enough targets to achieve request's CL and a connection to one of them will be dropped during execution ReadFailed error will be returned immediately and client will not have a chance to issue speculative read (retry). The patch changes the code to not return ReadFailed error immediately, but wait for timeout instead and give a client chance to issue speculative read in case read executor does not have additional targets to send speculative reads to by itself. Fixes #3699. Message-Id: <20180819131646.GK2326@scylladb.com>	2018-08-20 10:12:31 +03:00
Duarte Nunes	a025bf6a7d	Merge seastar upstream Seastar introduced a "compat" namespace, which conflicts with Scylla's own "compat" namespaces. The merge thus includes changes to scope uses of Scylla's "compat" namespaces. * seastar 8ad870f...9bb1611 (5): > util/variant_utils: Ensure variant_cast behaves well with rvalues > util/std-compat: Fix infinite recursion > doc/tutorial: Undo namespace changes > util/variant_utils: Add cast_variant() > Add compatbility with C++17's library types Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-14 13:07:09 +01:00
Nadav Har'El	25bd139508	cross-tree: clean up use of std::random_device() std::random_device() uses the relatively slow /dev/urandom, and we rarely if ever intend to use it directly - we normally want to use it to seed a faster random_engine (a pseudo-random number generator). In many places in the code, we first created a random_device variable, and then using it created a random_engine variable. However, this practice created the risk of a programmer accidentally using the random_device object, instead of the random_engine object, because both have the same API; This hurts performance. This risk materialized in just two places in the code, utils/uuid.cc and gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is not included in this patch, and the fix for gossiper.{cc,hh} is included here. To avoid risking the same mistake in the future, this patch switches across the code to an idiom where the random_device object is not named, so cannot be accidentally used. We use the following idiom: std::default_random_engine _engine{std::random_device{}()}; Here std::random_device{}() creates the random device (/dev/urandom) and pulls a random integer from it. It then uses this seed to create the random_engine (the pseudo-random number generator). The std::random_device{} object is temporary and unnamed, and cannot be unintentionally used directly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180726154958.4405-1-nyh@scylladb.com>	2018-07-26 16:54:58 +01:00
Avi Kivity	bea1f715dc	storage_proxy: count cross-shard operations Count operations which were started on one shard and were performed on another, due to non-shard-aware driver and/or RPC. Message-Id: <20180723155118.8545-1-avi@scylladb.com>	2018-07-25 16:21:04 +01:00

1 2 3 4 5 ...

512 Commits