scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-22 01:20:39 +00:00

Author	SHA1	Message	Date
Vlad Zolotarov	aca0882a3f	hinted handoff: enable storing hints before starting messaging_service When messaging_service is started we may immediately receive a mutation from another node (e.g. in the MV update context). If hinted handoff is not ready to store hints at that point we may fail some of MV updates. We are going to resolve this by start()ing hints::managers before we start messaging_service and blocking hints replaying until all relevant objects are initialized. Refs #3828 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-10-18 16:49:58 -04:00
Avi Kivity	1891779e64	Merge "db/hints: Use frozen_mutation in hinted handoff" from Duarte " This series changes hinted handoff to work with `frozen_mutation`s instead of naked `mutation`s. Instead of unfreezing a mutation from the commitlog entry and then freezing it again for sending, now we'll just keep the read, frozen mutation. Tests: unit(release) " * 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla: db/hints/manager: Use frozen_mutation instead of mutation db/hints/manager: Use database::find_schema() db/commitlog/commitlog_entry: Allow moving the contained mutation service/storage_proxy: send_to_endpoint overload accepting frozen_mutation service/storage_proxy: Build a shared_mutation from a frozen_mutation service/storage_proxy: Lift frozen_mutation_and_schema service/storage_proxy: Allow non-const ranges in mutate_prepare()	2018-10-09 17:48:18 +03:00
Gleb Natapov	319ece8180	storage_proxy: do not pass write_stats down to send_to_live_endpoints write_stats is referenced from write handler which is available in send_to_live_endpoints already. No need to pass it down. Message-Id: <20181009133017.GA14449@scylladb.com>	2018-10-09 16:33:53 +03:00
Duarte Nunes	3b6d2286e9	service/storage_proxy: send_to_endpoint overload accepting frozen_mutation Add an overload to send_to_endpoint() which accepts a frozen_mutation. The motivation is to allow better accounting of pending view updates, but this change also allows some callers to avoid unfreezing already frozen mutations. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:37:39 +01:00
Duarte Nunes	9e14412528	service/storage_proxy: Lift frozen_mutation_and_schema Lift frozen_mutation_and_schema to frozen_mutation.hh, since other subsystems using frozen_mutations will likely want to pass it around together with the schema. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Duarte Nunes	2c739f36cc	service/storage_proxy: Allow non-const ranges in mutate_prepare() Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-10-07 19:27:29 +01:00
Piotr Sarna	c41e0ade6c	storage_proxy: make get_restricted_ranges public This function is useful for splitting ranges in indexed queries.	2018-09-27 15:29:28 +02:00
Botond Dénes	577a06ce1b	storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent	2018-09-03 10:31:44 +03:00
Avi Kivity	908e497f3d	storage_proxy: make _mutate_stage inherit its caller's scheduling_group Right now, storage_proxy's mutate_stage violates isolation by running in a plain execution_stage without a scheduling_group. This means do_mutate() will run under the main scheduling_group, at least until we reach the database apply execution stage, which is correct. Fix by moving to an inheriting execution stage; this works because the messaging service will tell RPC to set the correct execution stage for us. We could explicitly specify statement_scheduling_group, but inheriting the scheduling group allows us to have multiple statment scheduling groups, later.	2018-08-24 19:04:49 +03:00
Duarte Nunes	a025bf6a7d	Merge seastar upstream Seastar introduced a "compat" namespace, which conflicts with Scylla's own "compat" namespaces. The merge thus includes changes to scope uses of Scylla's "compat" namespaces. * seastar 8ad870f...9bb1611 (5): > util/variant_utils: Ensure variant_cast behaves well with rvalues > util/std-compat: Fix infinite recursion > doc/tutorial: Undo namespace changes > util/variant_utils: Add cast_variant() > Add compatbility with C++17's library types Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-14 13:07:09 +01:00
Avi Kivity	512baf536f	storage_proxy: implement write timeouts Require a timeout parameter for storage_proxy::mutate_begin() and all its callers (all the way to thrift and cql modification_statement and batch_statement). This should fix spurious debug-mode test failures, where overcommit and general debug slowness result in the default timeouts being exceeded. Since the tests use infinite timeouts, they should not time out any more. Tests: unit (release), with an extra patch that aborts when a non-infinite timeout is detected. Message-Id: <20180707204424.17116-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Gleb Natapov	19e7493d5b	storage_proxy: initialize write response id counter from wall clock value Initializing write response id to the same value on each reboot may cause stale id to be taken for active one if node restarts after sending only a couple of write request and before receiving replies. On next reboot it will start assigning id's from the same value and receiving old replies will confuse it. Mitigate this by assigning initial id to wall clock value in milliseconds. It will not solve the problem completely, but will mitigate it.	2018-07-01 17:24:40 +03:00
Gleb Natapov	ac88935baa	Provide available memory size to storage_proxy object during creation	2018-06-11 15:34:13 +03:00
Piotr Sarna	f12fdcffdb	storage_proxy: restore optional hinted handoff Since hinted handoff for materialized views is now a separate entity, regular hinted handoff can go back to being optional.	2018-06-04 09:46:06 +02:00
Piotr Sarna	a6aae369da	storage_proxy: add hints manager for views This commit adds a separate hints manager that serves only failed materialized view updates.	2018-06-04 09:46:06 +02:00
Piotr Sarna	ef40f7e628	hints: move send limiter to resource manager Send limiting semaphore is moved from hints manager to resource manager. In consequence, hints manager now keeps a reference to its resource manager.	2018-06-04 09:35:58 +02:00
Piotr Sarna	ffe52681ea	storage_proxy: add mv stats to write handler Previous patch for issue 3416 did not cover passing write stats to write response handler, which results in some write stats being incorrectly counted as user write stats, while they belong to materialized views. This one fixes that by passing correct write stats reference to write response handler constructor. Also at: https://github.com/psarna/scylla/commits/fix_3416_again Closes #3416 Message-Id: <53ef3cc96ccadfdad8992d92ed6a41473419eb0a.1527510473.git.sarna@scylladb.com>	2018-05-28 17:50:49 +01:00
Piotr Sarna	1d590b3ca4	storage_proxy: decouple write_stats from stats This commit extracts metrics related to writes from stats structure, so it can be easily replaced later, e.g. for materialized view metrics. References #3385 References #3416	2018-05-22 16:52:58 +02:00
Piotr Sarna	f5d6326ced	storage_proxy: enable hinted handoff for materialized views This commit initializes and enables hinted handoff for materialized views, even if HH is not explicitly turned on in config. User writes still use hinted handoff only if it is explicitly enabled, while materialized views are allowed to use it unconditionally in order to store failed replica updates somewhere. Fixes #3383	2018-05-21 17:09:27 +02:00
Duarte Nunes	a23bda3393	Merge 'Implement separate timeout for range queries' from Avi " This patchset implements separate timeouts for range queries, and lays the foundations for separate timeouts for other query types. While the feature in itself is worthy, the real motivation is to have the timeouts decided by the caller, instead of storage_proxy. This in turn is required to disentangle each layer behaving differently depending on whether the query is internal or not; instead, the goal is to have each caller declare its needs in terms of consistency level and timeouts, and have the lower layers implement its requirements instead of making their own decisions. Fixes #3013. Tests: unit (release) " * tag '3013/v1.1' of https://github.com/avikivity/scylla: storage_proxy: remove default_query_timeout() storage_proxy: don't use default timeouts query_options: augment with timeout_config thrift: configure thrift transport and handler with a timeout_config transport: configure native transport with a timeout_config cql3: define and populate timeout_config_selector timeout_config: introduce timeout configuration	2018-05-13 20:05:50 +02:00
Botond Dénes	ddd70dc113	Use dht::token_range alias for last/preferred replicas Use the pre-existing type alias instead of fully spelling out the type everywhere.	2018-05-10 06:22:39 +03:00
Botond Dénes	52affa2a61	storage_proxy::coordinator_query_result: merge constructors into one w/ default params	2018-05-10 06:22:39 +03:00
Vlad Zolotarov	48c96d09d6	db::hints::manager: drain hints when the node is decommissioned/removed When node is decommissioned/removed it will drain all its hints and all remote nodes that have hints to it will drain their hints to this node. What "drain" means? - The node that "drains" hints to a specific destination will ignore failures and will continue sending hints till the end of the current segment, erase it and move to the next one till there are no more segments left. After all hints are drained the corresponding hints directory is removed. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2018-05-08 22:29:21 +01:00
Avi Kivity	c8a6fe3044	storage_proxy: remove default_query_timeout() No longer used.	2018-04-30 13:19:53 +03:00
Avi Kivity	d8dd7e05a7	storage_proxy: don't use default timeouts Require all callers to supply timeouts instead of relying on defaults. Since all callers now have the timeouts set up, they can easily supply them.	2018-04-30 13:19:53 +03:00
Tomasz Grabiec	52c61df930	Relax includes To avoid unnecessary recompilations. Message-Id: <1522168295-994-1-git-send-email-tgrabiec@scylladb.com>	2018-03-28 10:49:07 +03:00
Duarte Nunes	fb54c09e0b	service/storage_proxy: Pass pending endpoints to send_to_endpoint() This will allow us to minimize the number of mutation copies in mutate_MV(). Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20180325121412.76844-1-duarte@scylladb.com>	2018-03-25 15:45:21 +03:00
Botond Dénes	eee9bda85b	Make the read-repair decision only once Make the read-repair decision on the first page of a paged-query and use it for all the remaining pages. This helps querier-cache hit-rates as reads to nodes will be sent consistently throught the query.	2018-03-19 16:29:43 +02:00
Botond Dénes	2e2abf6edb	storage_proxy: add coordinator_query_options and coordinator_query_result As yet more parameters and return-values are about to be added to all storage_proxy::query_* methods we need a way that scales better than changing the signatures every time. To this end we aggregate all non-mandatory query parameters into `coordinator_query_options` and all return values into `coordinator_query_result`. This way new fields can be simply added to the respective structs while the signatures of the methods themselves and their client code can remain unchanged.	2018-03-19 15:17:35 +02:00
Botond Dénes	aaf67bcbaa	Consider preferred replicas when choosing endpoints for query_singular() Propagate the preferred_replicas to db::filter_for_query() and consider them when selecting the endpoints. The algoritm for selecting the endpoints is as follows: * Compute the intersection of the endpoint candidates and the preferred endpoints. * If this yields a set of endpoints that already satisfies the CL requirements use this set. * Otherwise select the remaining endpoints according to the load-balancing strategy, just like before.	2018-03-13 10:34:34 +02:00
Botond Dénes	eac597d726	Add preferred and last replicas to the signature of query() preferred_replicas are added to the parameters and last_replicas are added to the return type. The preferred replicas will be used as a hint for the selection of the replicas to send the read requests to. The last replicas (returned) are the replicas actually selected for the read. This will allow queries to consistently hit the same replicas for each page thus reusing readers created on these replicas. For convenience a query() overload is provided that doesn't take or return the preferred and last replicas. This patch only adds the parameters and propagates them down to query_singular() and query_partition_key_range(). The code to actually use these preferred-replicas will be added in later patches. This reason for separating this is to reduce noise and improve reviewability for those functional changes later.	2018-03-13 10:34:34 +02:00
Duarte Nunes	440ea56010	message/messaging_service: Specify algorithm when requesting digest While not strictly needed, specify which algorithm to use when request a digest from a remote node. This is more flexible than relying on a cluster wide feature, although that's what we'll do in subsequent patches. It also makes the verb more consistent with the data request. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 01:02:50 +00:00
Duarte Nunes	6b4b429883	query-result: Introduce class result_options Introduce class result_options to carry result options through the request pipeline, which at this point mean the result type and the digest algorithm. This class allows us to encapsulate the concrete digest algorithm to use. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-02-01 00:22:50 +00:00
Glauber Costa	08a0c3714c	allow request-specific read timeouts in storage proxy reads Timeouts are a global property. However, for tables in keyspaces like the system keyspace, we don't want to uphold that timeout--in fact, we wan't no timeout there at all. We already apply such configuration for requests waiting in the queued sstable queue: system keyspace requests won't be removed. However, the storage proxy will insert its own timeouts in those requests, causing them to fail. This patch changes the storage proxy read layer so that the timeout is applied based on the column family configuration, which is in turn inherited from the keyspace configuration. This matches our usual way of passing db parameters down. In terms of implementation, we can either move the timeout inside the abstract read executor or keep it external. The former is a bit cleaner, the the latter has the nice property that all executors generated will share the exact same timeout point. In this patch, we chose the latter. We are also careful to propagate the timeout information to the replica. So even if we are talking about the local replica, when we add the request to the concurrency queue, we will do it in accordance with the timeout specified by the storage proxy layer. After this patch, Scylla is able to start just fine with very low timeouts--since read timeouts in the system keyspace are now ignored. Fixes #2462 Implementation notes, and general comments about open discussion in 2462: * Because we are not bypassing the timeout, just setting it high enough, I consider the concerns about the batchlog moot: if we fail for any other reason that will be propagated. Last case, because the timeout is per-CF, we could do what we do for the dirty memory manager and move the batchlog alone to use a different timeout setting. * Storage proxy likes specifying its timeouts as a time_point, whereas when we get low enough as to deal with the read_concurrency_config, we are talking about deltas. So at some point we need to convert time_points to durations. We do that in the database query functions. v2: - use per-request instead of per-table timeouts. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2018-01-12 07:43:21 -05:00
Nadav Har'El	73aad5736f	Fix compilation of tests/cql_test_env.cc In commit `1f4f71e619`, an stdx::optional<std::vector<sstring>> parameter was added to storage_proxy's constructor. However, this parameter was not made optional, and tests/cql_test_env.cc failed to compile because it didn't provide this parameter. This patch makes this parameter optional (if missing, it's like an empty stdx::optional) so cql_test_env.cc compiles. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20171218132121.18782-1-nyh@scylladb.com>	2017-12-18 15:32:54 +02:00
Vlad Zolotarov	1f4f71e619	main + storage_service: wire up hints generation Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2017-12-14 15:08:11 -05:00
Gleb Natapov	ddf117535a	storage_proxy: add counters for speculative reads Fixes #3030 Message-Id: <20171206143611.8756-1-gleb@scylladb.com>	2017-12-06 16:38:16 +02:00
Gleb Natapov	16964de1f3	storage_proxy: fail read/write requests early if it cannot be completed due to errors If errors make reaching CL impossible a request can be aborted earlier without waiting for timeout.	2017-12-05 16:46:25 +02:00
Gleb Natapov	d0d8bdf615	storage_proxy: remove unused parameter from get_restricted_ranges() function Message-Id: <20170911084653.GH24167@scylladb.com>	2017-09-11 11:58:44 +02:00
Duarte Nunes	ec75eac37d	ring_position_exponential_vector_sharder: Take ranges by rvalue Avoids some copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170814093310.29200-1-duarte@scylladb.com>	2017-08-14 12:55:43 +03:00
Gleb Natapov	3b7d8c8767	storage_proxy: add capability to read data/digest for non singular ranges Currently only mutation_data read supports non singular ranges. This patch extends data/digest reads to support them too.	2017-08-03 10:35:09 +03:00
Gleb Natapov	69c5526301	messaging_service: return cache hit ratio as part of data read	2017-06-13 09:57:14 +03:00
Paweł Dziepak	cfde2ad5b4	storage_proxy: make mutate() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	00b42c477f	storage_proxy: count counter updates for which the node was a leader	2017-03-02 09:05:12 +00:00
Paweł Dziepak	277501f42f	db: propagate tracing state for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	25173f8095	db: propagate timeout for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	426345e1d4	storage_proxy: avoid excessive mutation freezes	2017-03-01 16:33:36 +00:00
Calle Wilund	0a4edca756	counters/cql: allow wormholing actual counter values (with shards) via cql Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that argument value, which must be a list of tuples <int, UUID, long, long>, should be inserted as an actual counter value, not update. This of course to allow counters to be read from sstable loader. Note that we also need to allow timestamps for counter mutations, as well as convince the counter code itself to treat the data as already baked. So ugly wormhole galore. v2: * Changed flag names * More explicit wormholing, bypassing normal counter path, to avoid read-before-write etc * throw exceptions on unhandled shard types in marshalling v3: * Added counter id ordering check * Added batch statement check for mixing normal and raw counter updates Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>	2017-02-22 09:19:46 +00:00
Nadav Har'El	f2fd81ece0	materialized views: function to send a mutation to endpoint Add a function for sending one mutation to one remote replica owning this mutation. This is needed for materialized views, where each base replica sends each view mutation to one particular view replica. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2017-02-06 13:36:45 +01:00
Gleb Natapov	3c372525ed	storage_proxy: use storage_proxy clock instead of explicit lowres_clock Merge commit `45b6070832` used butchered version of storage_proxy patch to adjust to rpc timer change instead the one I've sent. This patch fixes the differences. Message-Id: <20170206095237.GA7691@scylladb.com>	2017-02-06 12:51:36 +02:00

1 2 3 4

166 Commits