scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-31 20:16:43 +00:00

Author	SHA1	Message	Date
Botond Dénes	6486d6c8bd	storage_proxy: use preferred/last replicas	2018-09-03 10:31:44 +03:00
Botond Dénes	577a06ce1b	storage_proxy: add preferred/last replicas to the signature of query_partition_key_range_concurrent	2018-09-03 10:31:44 +03:00
Botond Dénes	6e59cee244	db::consistency_level::filter_for_query() add preferred_endpoints To the second overload (the one without read-repair related params) too.	2018-09-03 10:31:44 +03:00
Botond Dénes	2f66bde26f	storage_proxy: use query_mutations_from_all_shards() for range scans	2018-09-03 10:31:44 +03:00
Paweł Dziepak	6f1c3e6945	Merge "Convert more execution_stages to inherit scheduling_groups" from Avi " Previous work (`71471bb322`) converted the CQL layer to inheriting execution stages, paving the way to multiple users sharing the front-end. This patchset does the same thing to the back-end, converting more execution stages to preserve the caller's scheduling_group. Since RPC now (`8c993e0728`) assigns the correct scheduling group within the replica, we can extend that work so a statement is executed with the same scheduling group all the way to sstable parsing, even if we cross nodes in the process. This improves performance isolation and paves the way to multi-user SLA guarantees. " * tag 'inherit-sched_group/v1' of https://github.com/avikivity/scylla: database: make database's mutation apply stage inherit its scheduling group from the caller database: make database::_mutation_query_stage inherit the scheduling group database: make database::_data_query_stage inheriting its caller's scheduling_group storage_proxy: make _mutate_stage inherit its caller's scheduling_group	2018-08-28 13:49:31 +01:00
Avi Kivity	5792a59c96	migration_manager: downgrade frightening "Can't send migration request" ERROR This error is transient, since as soon as the node is up we will be able to send the migration request. Downgrade it to a warning to reduce anxiety among people who actually read the logs (like QA). The message is also badly worded as no one can guess what a migration request is, but that is left to another patch. Fixes #3706. Message-Id: <20180821070200.18691-1-avi@scylladb.com>	2018-08-27 14:49:36 +02:00
Avi Kivity	908e497f3d	storage_proxy: make _mutate_stage inherit its caller's scheduling_group Right now, storage_proxy's mutate_stage violates isolation by running in a plain execution_stage without a scheduling_group. This means do_mutate() will run under the main scheduling_group, at least until we reach the database apply execution stage, which is correct. Fix by moving to an inheriting execution stage; this works because the messaging service will tell RPC to set the correct execution stage for us. We could explicitly specify statement_scheduling_group, but inheriting the scheduling group allows us to have multiple statment scheduling groups, later.	2018-08-24 19:04:49 +03:00
Gleb Natapov	7277ee2939	storage_proxy: do not fail read without speculation on connection error After `ac27d1c93b` if a read executor has just enough targets to achieve request's CL and a connection to one of them will be dropped during execution ReadFailed error will be returned immediately and client will not have a chance to issue speculative read (retry). The patch changes the code to not return ReadFailed error immediately, but wait for timeout instead and give a client chance to issue speculative read in case read executor does not have additional targets to send speculative reads to by itself. Fixes #3699. Message-Id: <20180819131646.GK2326@scylladb.com>	2018-08-20 10:12:31 +03:00
Duarte Nunes	a025bf6a7d	Merge seastar upstream Seastar introduced a "compat" namespace, which conflicts with Scylla's own "compat" namespaces. The merge thus includes changes to scope uses of Scylla's "compat" namespaces. * seastar 8ad870f...9bb1611 (5): > util/variant_utils: Ensure variant_cast behaves well with rvalues > util/std-compat: Fix infinite recursion > doc/tutorial: Undo namespace changes > util/variant_utils: Add cast_variant() > Add compatbility with C++17's library types Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-08-14 13:07:09 +01:00
Piotr Sarna	8c18aaa511	cql3: pass query options to restrictions filter Query options may contain bound values needed for checking filtering restrictions. Previously, empty query_options{} were used, which caused prepared statements to fail. Fixes #3677	2018-08-09 17:44:45 +02:00
Amnon Heiman	80b1ef0f47	storage_service: Add nodes_status related metrics This patch adds a metric for a node own operation mode, the operation_mode metric represent the enum modes as gauge values according to: UNKNOWN = 0, STARTING = 1, JOINING = 2, NORMAL = 3, LEAVING = 4, DECOMMISSIONED = 5, DRAINING = 6, DRAINED = 7, MOVING = 8 Fixes: #3482 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20180806142706.23579-1-amnon@scylladb.com>	2018-08-06 18:19:56 +03:00
Asias He	95849371aa	range_streamer: Remove unordered_multimap usage We need the mapping between dht::token_range to std::vector<inet_address> and inet_address to dht::token_range_vector in various places. Currently, we use std::unordered_multimap and convert to std::unordered_map. It is better to use std::unordered_map in the first place. The changes like below: - Change from std::unordered_multimap<dht::token_range, inet_address> to std::unordered_map<dht::token_range, std::vector<inet_address>> - Change from std::unordered_multimap<inet_address, dht::token_range> to std::unordered_map<inet_address, dht::token_range_vector> Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>	2018-08-01 13:01:41 +03:00
Gleb Natapov	44a6afad8c	cache_hitrate_calculator: fix race when new table is added during calculations The calculation consists of several parts with preemption point between them, so a table can be added while calculation is ongoing. Do not assume that table exists in intermediate data structure. Fixes #3636 Message-Id: <20180801093147.GD23569@scylladb.com>	2018-08-01 12:45:03 +03:00
Asias He	4a0b561376	storage_service: Get rid of moving operation The moving operation changes a node's token to a new token. It is supported only when a node has one token. The legacy moving operation is useful in the early days before the vnode is introduced where a node has only one token. I don't think it is useful anymore. In the future, we might support adjusting the number of vnodes to reblance the token range each node owns. Removing it simplifies the cluster operation logic and code. Fixes #3475 Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>	2018-08-01 11:18:17 +03:00
Asias He	02befb6474	gossip: Log seeds seen It is useful for debugging bootstap issue, especially for large clusters. Also do not use the `_seeds` as the set_seeds function parameter since there is a class member called _seeds. Refs #3417 Message-Id: <15e6bdf06376949ced1bdb845f810da09266783d.1532474820.git.asias@scylladb.com>	2018-08-01 10:57:56 +03:00
Avi Kivity	a4c9330bfc	Merge "Optimise paged queries" from Paweł " This series adds some optimisations to the paging logic, that attempt to close the performance gap between paged and not paged queries. The former are more complex so always are going to be slower, but the performance loss was unacceptably large. Fixes #3619. Performance with paging: ./perf_paging_before ./perf_paging_after diff read 271246.13 312815.49 15.3% Without paging: ./perf_nopaging_before ./perf_nopaging_after diff read 343732.17 342575.77 -0.3% Tests: unit(release), dtests(paging_test.py, paging_additional_test.py) " * tag 'optimise-paging/v1' of https://github.com/pdziepak/scylla: cql3: select statement: don't copy metadata if not needed cql3: query_options: make simple getter inlineable cql3: metadata: avoid copying column information query_pager: avoid visiting result_view if not needed query::result_view: add get_last_partition_and_clustering_key() query::result_reader: fix const correctness tests/uuid: add more tests including make_randm_uuid() utils: uuid: don't use std::random_device()	2018-07-26 19:24:03 +03:00
Nadav Har'El	25bd139508	cross-tree: clean up use of std::random_device() std::random_device() uses the relatively slow /dev/urandom, and we rarely if ever intend to use it directly - we normally want to use it to seed a faster random_engine (a pseudo-random number generator). In many places in the code, we first created a random_device variable, and then using it created a random_engine variable. However, this practice created the risk of a programmer accidentally using the random_device object, instead of the random_engine object, because both have the same API; This hurts performance. This risk materialized in just two places in the code, utils/uuid.cc and gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is not included in this patch, and the fix for gossiper.{cc,hh} is included here. To avoid risking the same mistake in the future, this patch switches across the code to an idiom where the random_device object is not named, so cannot be accidentally used. We use the following idiom: std::default_random_engine _engine{std::random_device{}()}; Here std::random_device{}() creates the random device (/dev/urandom) and pulls a random integer from it. It then uses this seed to create the random_engine (the pseudo-random number generator). The std::random_device{} object is temporary and unnamed, and cannot be unintentionally used directly. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20180726154958.4405-1-nyh@scylladb.com>	2018-07-26 16:54:58 +01:00
Paweł Dziepak	757d9e3b5d	query_pager: avoid visiting result_view if not needed query::result_visitor provides get_last_partition_and_clustering_key() which allows getting those without iterating through the whole result. Moreover, row count may be precomputed in the result, if it isn't there is query::result_view::count_partitions_and_rows() for getting it.	2018-07-26 12:14:48 +01:00
Avi Kivity	bea1f715dc	storage_proxy: count cross-shard operations Count operations which were started on one shard and were performed on another, due to non-shard-aware driver and/or RPC. Message-Id: <20180723155118.8545-1-avi@scylladb.com>	2018-07-25 16:21:04 +01:00
Botond Dénes	cc4acb6e26	storage_proxy: use the original row limits for the final results merging `query_partition_key_range()` does the final result merging and trimming (if necessary) to make sure we don't send more rows to the client than requested. This merging and trimming is done by a continuation attached to the `query_partition_key_range_concurrent()` which does the actual querying. The continuations captures via value the `row_limit` and `partition_limit` fields of the `query::read_command` object of the query. This has an unexpected consequence. The lambda object is constructed after the call to `query_partition_key_range_concurrent()` returns. If this call doesn't defer, any modifications done to the read command object done by `query_partition_key_range_concurrent()` will be visible to the lambda. This is undesirable because `query_partition_key_range_concurrent()` updates the read command object directly as the vnodes are traversed which in turn will result in the lambda doing the final trimming according to a decremented `row_limits`, which will cause the paging logic to declare the query as exhausted prematurely because the page will not be full. To avoid all this make a copy of the relevant limit fields before `query_partition_key_range_concurrent()` is called and pass these copies to the continuation, thus ensuring that the final trimming will be done according to the original page limits. Spotted while investigating a dtest failure on my 1865/range-scans/v2 branch. On that branch the way range scans are executed on replicas is completely refactored. These changes appearantly reduce the number of continuations in the read path to the point where an entire page can be filled without deferring and thus causing the problem to surface. Fixes #3605. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f11e80a6bf8089d49ba3c112b25a69edf1a92231.1531743940.git.bdenes@scylladb.com>	2018-07-16 16:54:50 +03:00
Asias He	71e22fe981	storage_service: Introduce STREAM_WITH_RPC_STREAM feature With this feature, the node supports scylla streaming using the rpc streaming.	2018-07-13 08:36:47 +08:00
Gleb Natapov	617666efb0	storage_proxy: use logger's exception printer to report read failure Use existing exception pretty printer since it handles nested exceptions. Message-Id: <20180709122826.GT28899@scylladb.com>	2018-07-09 15:31:14 +03:00
Gleb Natapov	ac27d1c93b	storage_proxy: fix rpc connection failure handling by read operation Currently rpc::closed_error is not counted towards replica failure during read and thus read operation waits for timeout even if one of the nodes dies. Fix this by counting rpc::closed_error towards failed attempts. Fixes #3590. Message-Id: <20180708123522.GC28899@scylladb.com>	2018-07-09 10:05:31 +03:00
Avi Kivity	512baf536f	storage_proxy: implement write timeouts Require a timeout parameter for storage_proxy::mutate_begin() and all its callers (all the way to thrift and cql modification_statement and batch_statement). This should fix spurious debug-mode test failures, where overcommit and general debug slowness result in the default timeouts being exceeded. Since the tests use infinite timeouts, they should not time out any more. Tests: unit (release), with an extra patch that aborts when a non-infinite timeout is detected. Message-Id: <20180707204424.17116-1-avi@scylladb.com>	2018-07-08 10:27:03 +01:00
Piotr Sarna	03f2f8633b	cql3: add updating ALLOW FILTERING metrics Metrics related to ALLOW FILTERING queries are now properly updated on read requests.	2018-07-06 12:00:29 +02:00
Duarte Nunes	c126b00793	Merge 'ALLOW FILTERING support' from Piotr " The main idea of this series is to provide a filtering_visitor as a specialised result_set_builder::visitor implementation that keeps restriction info and applies it on query results. Also, since allow_filtering checking is not correct now (e.g. #2025) on select_statement level, this series tries to fix any issues related to it. Still in TODO: * handling CONTAINS relation in single column restriction filtering * handling multi-column restrictions - especially EQ, which can be split into multiple single-column restrictions * more tests - it's never enough; especially esoteric cases like filtering queries which also use secondary indexes, paging tests, etc. Tests: unit (release) " * 'allow_filtering_6' of https://github.com/psarna/scylla: tests: add allow_filtering tests to cql_query_test cql3: enable ALLOW FILTERING service: add filtering_pager cql3: optimize filtering partition keys and static rows cql3: add filtering visitor cql3: move result_set_builder functions to header cql3: amend need_filtering() cql3: add single column primary key restrictions getters cql3: expose single column primary key restrictions cql3: add needs_filtering to primary key restrictions cql3: add simpler single_column_restriction::is_satisfied_by	2018-07-05 10:18:08 +01:00
Piotr Sarna	7b018f6fd6	service: add filtering_pager For paged results of an 'ALLOW FILTERING' query, a filtering pager is provided. It's based on a filtering_visitor for result_builder.	2018-07-05 10:50:43 +02:00
Botond Dénes	8084ce3a8e	query_pager: use query::is_single_partition() to check for singular range Use query::is_single_partition() to check whether the queried ranges are singular or not. The current method of using `dht::partition_range::is_singular()` is incorrect, as it is possible to build a singular range that doesn't represent a single partition. `query::is_single_partition()` correctly checks for this so use it instead. Found during code-review. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <f671f107e8069910a2f84b14c8d22638333d571c.1530675889.git.bdenes@scylladb.com>	2018-07-04 10:04:50 +01:00
Botond Dénes	59a30f0684	query_pager: be prepared to _ranges being empty do_fetch_page() checks in the beginning whether there is a saved query state already, meaning this is not the first page. If there is not it checks whether the query is for a singulular partitions or a range scan to decide whether to enable the stateful queries or not. This check assumed that there is at least one range in _ranges which will not hold under some circumstances. Add a check for _ranges being empty. Fixes: #3564 Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <cbe64473f8013967a93ef7b2104c7ca0507afac9.1530610709.git.bdenes@scylladb.com>	2018-07-03 11:05:01 +01:00
Gleb Natapov	19e7493d5b	storage_proxy: initialize write response id counter from wall clock value Initializing write response id to the same value on each reboot may cause stale id to be taken for active one if node restarts after sending only a couple of write request and before receiving replies. On next reboot it will start assigning id's from the same value and receiving old replies will confuse it. Mitigate this by assigning initial id to wall clock value in milliseconds. It will not solve the problem completely, but will mitigate it.	2018-07-01 17:24:40 +03:00
Gleb Natapov	569437aaa5	storage_proxy: drop virtual from signal(gms::inet_address) The function is not overridden, so should not be virtual.	2018-07-01 16:35:59 +03:00
Gleb Natapov	5ee09e5f3b	storage_proxy: do not assert on getting an unexpected write reply In theory we should not get write reply from a node we did not send write to, but in practice stale reply can be received if node reboot between sending write and getting a reply. Do not assert, but log the warning instead and ignore the reply. Fixes: #3153	2018-07-01 16:35:09 +03:00
Paweł Dziepak	6bd71015e7	storage_proxy: use mutation_partition_view::{first, last}_row_key()	2018-06-28 22:11:19 +01:00
Asias He	bb4d361cf6	storage_service: Limit number of REPLICATION_FINISHED verb can retry In the removenode operation, if the message servicing is stopped, e.g., due to disk io error isolation, the node can keep retrying the REPLICATION_FINISHED verb infinitely. Scylla log full of such message was observed: [shard 0] storage_service - Fail to send REPLICATION_FINISHED to $IP:0: seastar::rpc::closed_error (connection is closed) To fix, limit the number of retires. Tests: update_cluster_layout_tests.py Fixes #3542 Message-Id: <638d392d6b39cc2dd2b175d7f000e7fb1d474f87.1529927816.git.asias@scylladb.com>	2018-06-28 19:54:01 +01:00
Vladimir Krivopalov	82f76b0947	Use std::reference_wrapper instead of a plain reference in bound_view. The presence of a plain reference prohibits the bound_view class from being copyable. The trick employed to work around that was to use 'placement new' for copy-assigning bound_view objects, but this approach is ill-formed and causes undefined behaviour for classes that have const and/or reference members. The solution is to use a std::reference_wrapper instead. Signed-off-by: Vladimir Krivopalov <vladimir@scylladb.com> Message-Id: <a0c951649c7aef2f66612fc006c44f8a33713931.1530113273.git.vladimir@scylladb.com>	2018-06-28 11:24:06 +01:00
Paweł Dziepak	1cf3cb285f	pager: add fetch_page_generator() fetch_page_generator() is an equivalent of fetch_page(), but instead of building a cql3::result_set it returns a cql3::result_generator().	2018-06-25 09:21:47 +01:00
Paweł Dziepak	f6fe831d49	pager: make the visitor handle_result() accepts a template parameter	2018-06-25 09:21:47 +01:00
Paweł Dziepak	fc87ca5926	pager: make query_result_visitor base class a template parameter So far query_result_visitor was tied to result_set_builder. The goal is to enable result_generator to work with paged queries as well so we need to decouple them.	2018-06-25 09:21:47 +01:00
Paweł Dziepak	dc9a65ea76	pager: make myvistor a member class of query_pager It is going to be come a class template.	2018-06-25 09:21:47 +01:00
Paweł Dziepak	319b2cde7e	pager: make shared pointers to selection constant Shared pointers make code harder to reason about, it is not easy to get rid of them in this piece of the code, but we can restore at least a bit of sanity by adding consts.	2018-06-25 09:21:47 +01:00
Paweł Dziepak	327d3de51e	pager: merge query_pager and query_pagers::impl There is just a single implementation of query_pager and there is no reason to make anything virtual. Devirtualising this code will allow higher layers to pass visitors via templates.	2018-06-25 09:21:47 +01:00
Piotr Sarna	b6c1b8c5ef	hints: make space_watchdog device-aware Instead of having one static space limit for all directories, space_watchdog now keeps a per-device limit, shared among hints managers residing on the same disks. References #3516 Signed-off-by: Piotr Sarna <sarna@scylladb.com>	2018-06-22 10:26:45 +02:00
Gleb Natapov	f53ae2d07f	storage_service: avoid "ignored future" message during schema check failure Message-Id: <20180620134402.GQ1918@scylladb.com>	2018-06-20 18:53:47 +03:00
Piotr Sarna	6b3a97e34a	hints: fix max_shard_disk_space_size initialization Previously max_shard_disk_space_size was unconditionally initialized with the capacity of hints_directory. But, it's likely that hints_directory doesn't exist at all if hinted handoff is not enabled, which results in Scylla failing to boot. So, max_shard_disk_space_size is now initialized with the capacity of hints_for_views directory, which is always present. This commit also moves max_shard_disk_space_size to the .cc file where it belongs - resource_manager.cc. Tests: unit (release) Message-Id: <9f7b86b6452af328c05c5c6c55bfad3382e12445.1528977363.git.sarna@scylladb.com>	2018-06-14 14:24:01 +01:00
Gleb Natapov	894673ac14	Provide cql max request limit to cql server object during creation	2018-06-11 15:34:14 +03:00
Gleb Natapov	cdf1289b43	Provide available memory size to hinted handoff resource manager during creation	2018-06-11 15:34:13 +03:00
Gleb Natapov	ac88935baa	Provide available memory size to storage_proxy object during creation	2018-06-11 15:34:13 +03:00
Vlad Zolotarov	12e3e4fb2a	service::client_state::has_access(): make readable_system_resources an std::unordered_set There is not reason to use an std::set for it since we don't care about the ordering - only about the existance of a particular entry. Hash table will be more efficient for this use case. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <1528220892-5784-2-git-send-email-vladz@scylladb.com>	2018-06-06 15:29:29 +03:00
Piotr Sarna	f12fdcffdb	storage_proxy: restore optional hinted handoff Since hinted handoff for materialized views is now a separate entity, regular hinted handoff can go back to being optional.	2018-06-04 09:46:06 +02:00
Piotr Sarna	a6aae369da	storage_proxy: add hints manager for views This commit adds a separate hints manager that serves only failed materialized view updates.	2018-06-04 09:46:06 +02:00

1 2 3 4 5 ...

1247 Commits