scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-03 13:37:04 +00:00

Author	SHA1	Message	Date
Gleb Natapov	3039e4c7de	storage_proxy: stop range query with limit after the limit is reached	2016-05-02 15:10:15 +03:00
Gleb Natapov	41c586313a	storage_proxy: fix calculation of concurrency queried ranges	2016-05-02 15:10:15 +03:00
Gleb Natapov	c364ab9121	storage_proxy: add logging for range query row count estimation	2016-05-02 15:10:15 +03:00
Vlad Zolotarov	9bf8253412	storage_proxy: add read requests split counters Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters for the following read categories: - data reads - digest reads - mutation data reads Each category is added attempts, completions and errors metrics. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:19 +03:00
Vlad Zolotarov	cbcbdc3b4a	storage_proxy: add split counters for writes Added split metrics for operations on a local Node and on external Nodes aggregated per Nodes' DCs. Added separate split counters for: - total writes attempts/errors - read repair write attempts (there is no easy way to separate errors at the moment) Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:28:15 +03:00
Vlad Zolotarov	c92654b281	storage_proxy: add counters for received and forwarded mutations Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-04-21 11:27:29 +03:00
Gleb Natapov	9801d69d53	storage_proxy: add query result row count to brief format Report number of rows in brief reporting format, but only if we can count them without linearizing result's buffer.	2016-04-14 19:26:00 +03:00
Gleb Natapov	53993527ed	storage_proxy: move verbose query result printing into separate logger If query result is large tracing cannot be done since printing the result takes too much time and space.	2016-04-14 19:26:00 +03:00
Gleb Natapov	46e5d05220	storage_proxy: cleanup query logging. Since commit `c1cffd06` logger catch errors internally, so no need to catch most of them at the top level. Only those that can happen during parameter evaluation can reach here. Change parameters to not throw too.	2016-04-14 19:26:00 +03:00
Gleb Natapov	6f13715f8c	storage_proxy: add logging to read executor creation path Message-Id: <1460549369-29523-4-git-send-email-gleb@scylladb.com>	2016-04-14 14:58:02 +03:00
Gleb Natapov	14ecadb247	storage_proxy: add logging for mutation write path Message-Id: <1460549369-29523-3-git-send-email-gleb@scylladb.com>	2016-04-14 14:57:29 +03:00
Gleb Natapov	dfdbb1e703	storage_proxy: move hack to make coordinator most preferable node for read into sorting function This is kind of sorting, so it belongs there, but it also fixes a bug in storage_proxy::get_read_executor() that assumes filter_for_query() do not change order of nodes in all_nodes when extra replica is chosen. Otherwise if coordinator ip happens to be last in all_nodes then it will be chosen as extra replica and will be quired twice. Message-Id: <1460549369-29523-1-git-send-email-gleb@scylladb.com>	2016-04-14 14:56:21 +03:00
Pekka Enberg	64c9ebb962	Merge "More exception safety fixes" from Paweł "This is the second part of exception safety fixes for issues discovered using memory allocation failure injector."	2016-04-12 08:08:00 +03:00
Paweł Dziepak	d53354947c	storage_proxy: mark hint_to_dead_endpoints() noexcept Hints are currently unimplemented but there is code depending on the fact that hint_to_dead_endpoints() doesn't throw. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-12 00:06:10 +01:00
Paweł Dziepak	b75c4098f2	storage_proxy: catch all errors in abstract_read_executor::execute() Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-04-11 23:52:13 +01:00
Gleb Natapov	3734dcbace	storage_proxy: cleanup data_read_resolver::resolve() live_row_count is summed several times in the same function. Do it only once. -- v1->v2: - call get() on std::reference_wrapper<std::vector<partition>> to get to reference for moving out of it. Message-Id: <20160411123829.GE21479@scylladb.com>	2016-04-11 17:13:48 +02:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Paweł Dziepak	3e0555809e	storage_proxy: catch all exceptions in read executor abstract_read_executor::reconcile() is supposed to make sure that _result_promise is eventually set to either a result or an exception. That may not happen however if reconciliation throws any exception since only read timeouts are being caught. When that happends the continuation chain becomes stuck. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-31 16:38:41 +01:00
Glauber Costa	5fa866223d	streaming: add incoming streaming mutations to a different sstable Keeping the mutations coming from the streaming process as mutations like any other have a number of advantages - and that's why we do it. However, this makes it impossible for Seastar's I/O scheduler to differentiate between incoming requests from clients, and those who are arriving from peers in the streaming process. As a result, if the streaming mutations consume a significant fraction of the total mutations, and we happen to be using the disk at its limits, we are in no position to provide any guarantees - defeating the whole purpose of the scheduler. To implement that, we'll keep a separate set of memtables that will contain only streaming mutations. We don't have to do it this way, but doing so makes life a lot easier. In particular, to write an SSTable, our API requires (because the filter requires), that a good estimate on the number of partitions is informed in advance. The partitions also need to be sorted. We could write mutations directly to disk, but the above conditions couldn't be met without significant effort. In particular, because mutations can be arriving from multiple peer nodes, we can't really sort them without keeping a staging area anyway. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-03-23 09:13:00 -04:00
Paweł Dziepak	9f3893980a	move SCHEMA_CHECK registration to migration_manager The verb is just for reporting and debugging purposes, but it is better not to register it until it can return a meaningful value. Besides, it really belongs to the migration manager subsystem anyway. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1458037053-14836-1-git-send-email-pdziepak@scylladb.com>	2016-03-15 12:24:37 +02:00
Asias He	883d8cb8fd	storage_service: Move REPLICATION_FINISHED verb to storage_service It belongs to storage_service not storage_proxy.	2016-03-15 16:13:22 +08:00
Gleb Natapov	5076f4878b	main: Defer storage proxy RPC verb registration after commitlog replay Message-Id: <20160315071229.GM6117@scylladb.com>	2016-03-15 09:18:12 +02:00
Pekka Enberg	1429213b4c	main: Defer migration manager RPC verb registration after commitlog replay Defer registering migration manager RPC verbs after commitlog has has been replayed so that our own schema is fully loaded before other other nodes start querying it or sending schema updates. Message-Id: <1457971028-7325-1-git-send-email-penberg@scylladb.com>	2016-03-14 18:03:16 +01:00
Paweł Dziepak	82d2a2dccb	specify whether query::result, result_digest or both are needed Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Paweł Dziepak	46079f763b	query: add keys and tombstones to result digest Query result digest is used to verify that all replicas have the same data. Therefore, it needs to contain more information than the query result itself in order to ensure proper detection of disagreements. Generally, adding clustering keys to the digest regardless of whether the client asked for them will guarantee correctness. However, adding tombstones as well improves the chances of early detection of nodes containing stale data. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:27:13 +00:00
Paweł Dziepak	77dbe3c12f	storage_proxy: fix reconciliation with limits Currently, if there is a disagreement between replicas we get mutations from all of them, merge this mutations and send the result to the client, difference between the result and the mutation sent by a particular replica is sent back to repair it. Unfortunately, that may not suffice to provide user with correct results in case of disagreements. Consider the following scenario: create table cf(p int, c int, r int, primary key(p, c)); node1: p=0, c=1, r=1 (timestamp = 1) p=0, c=2, r=2 (timestamp = 2) node2: p=0, c=1, r=tombstone (timestamp = 2) p=0, c=2, r=1 (timestamp = 1) query: select r from cf limit 1; Let's assume there are no row markers. node1 will send only outdated cell (p=0, c=1, r=1) while node2 will send both tombstone for c=1 and outdated cell (p=0, c=2, r=1). A disagreement will be detected, the replies will be merged and the coordinator will respond to the client with result r=1, while the correct answer is r=2. The solution proposed in this patch is to attempt to detect cases when the problem may occur and retry queries with larger limit which result in replicas providing more information. The detection logic is simple: the partition key and clustering key of the last row in the reconciled result are compared with the partition keys and clustering keys of the last rows of replies from replicas (except short reads). If the (pk, ck) of the replica last row is smaller than the (pk, ck) of the reconciled result the query is retried. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-11 18:26:33 +00:00
Gleb Natapov	f242c6395c	storage_proxy: add counter for retries reads Message-Id: <20160309130453.GF2253@scylladb.com>	2016-03-09 14:09:42 +01:00
Gleb Natapov	ce6d1a242a	storage_proxy: fix background_reads counter background_reads collectd counter was not always properly decremented. Fix it and streamline background read repair error handling. Message-Id: <20160307182255.GI4849@scylladb.com>	2016-03-07 19:41:09 +01:00
Gleb Natapov	2d092bbd32	storage_proxy: send read requests with timeout No need to wait for replies long after request is timed out. Message-Id: <1457351304-28721-2-git-send-email-gleb@scylladb.com>	2016-03-07 14:00:11 +01:00
Gleb Natapov	4122422d19	storage_proxy: always wait for digest read resolver done future Currently it is waited upon only if background read repair check is needed and this cause unhandled exception warning to be printed if it enters failed state. Fix this by always waiting on it, but doing anything beyond ignoring an exception only if check is needed. Message-Id: <1457351304-28721-1-git-send-email-gleb@scylladb.com>	2016-03-07 14:00:09 +01:00
Gleb Natapov	626c9d046b	fix EACH_QUORUM handling during bootstrapping Currently write acknowledgements handling does not take bootstrapping node into account for CL=EACH_QUORUM. The patch fixes it. Fixes #994 Message-Id: <20160307121620.GR2253@scylladb.com>	2016-03-07 13:56:34 +01:00
Gleb Natapov	f59415b3c6	Take pending endpoints into account while checking for sufficient live nodes During bootstrapping additional copies of data has to be made to ensure that CL level is met (see CASSANDRA-833 for details). Our code does that, but it does not take into account that bootstraping node can be dead which may cause request to proceed even though there is no enough live nodes for it to be completed. In such a case request neither completes nor timeouts, so it appear to be stuck from CQL layer POV. The patch fixes this by taking into account pending nodes while checking that there are enough sufficient live nodes for operation to proceed. Fixes #965 Message-Id: <20160303165250.GG2253@scylladb.com>	2016-03-07 13:30:13 +01:00
Gleb Natapov	b89b6f442b	storage_proxy: fix race between read cl completion and timeout in digest resolver If timeout happens after cl promise is fulfilled, but before continuation runs it removes all the data that cl continuation needs to calculate result. Fix this by calculating result immediately and returning it in cl promise instead of delaying this work until continuation runs. This has a nice side effect of simplifying digest mismatch handling and making it exception free. Fixes #977. Message-Id: <1457015870-2106-3-git-send-email-gleb@scylladb.com>	2016-03-03 16:48:28 +02:00
Gleb Natapov	e4ac5157bc	storage_proxy: store only one data reply in digest resolver. Read executor may ask for more than one data reply during digest resolving stage, but only one result is actually needed to satisfy a query, so no need to store all of them. Message-Id: <1457015870-2106-2-git-send-email-gleb@scylladb.com>	2016-03-03 16:47:53 +02:00
Gleb Natapov	69b61b81ce	storage_proxy: fix cl achieved condition in digest resolver timeout handler In digest resolver for cl to be achieved it is not enough to get correct number of replies, but also to have data reply among them. The condition in digest timeout does not check that, fortunately we have a variable that we set to true when cl is achieved, so use it instead. Message-Id: <1457015870-2106-1-git-send-email-gleb@scylladb.com>	2016-03-03 16:47:11 +02:00
Pekka Enberg	6d7e14a53a	Merge "Implement describe_schema_versions" from Paweł "This series implements describe_schema_versions so that we nodetool describecluster can return proper schema information for the whole cluster. It involves adding new verb SCHEMA_CHECK which is used to get schema version for a given node and a simple map-reduce that using that verb gets info from the whole cluster. This fixes #677, fixes #684, and fixes #472."	2016-03-02 16:02:53 +02:00
Paweł Dziepak	ca68c36c8c	storage_proxy: handle SCHEMA_CHECK verb Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 12:49:54 +00:00
Paweł Dziepak	bdc23ae5b5	remove db/serializer.hh includes Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-03-02 09:07:09 +00:00
Gleb Natapov	22d2b9a2dc	Yield execution in mutation_result_merger mutation_result_merger::get can run for a long time. Make it yield execution from time to time. Message-Id: <1456674046-14502-1-git-send-email-gleb@scylladb.com>	2016-02-28 17:55:33 +02:00
Gleb Natapov	32e9f1ecd4	Fix read_timeouts storage_proxy counter Read timeouts are not counted now. The patch fixes it. Message-Id: <20160228133315.GN6705@scylladb.com>	2016-02-28 15:34:42 +02:00
Calle Wilund	590ec1674b	truncate: Require timestamp join-function to ensure equal values Fixes #937 In fixing #884, truncation not truncating memtables properly, time stamping in truncate was made shard-local. This however breaks the snapshot logic, since for all shards in a truncate, the sstables should snapshot to the same location. This patch adds a required function argument to truncate (and by extension drop_column_family) that produces a time stamp in a "join" fashion (i.e. same on all shards), and utilizes the joinpoint type in caller to do so. Message-Id: <1456332856-23395-2-git-send-email-calle@scylladb.com>	2016-02-24 18:59:31 +02:00
Avi Kivity	1f752446d2	Merge "Truncation format & fixes" from Calle "Fixes #884 Fixes #895 Also at seastar-dev: calle/truncate_more 1.) Change truncation records to be stored with IDL serialization 2.) Fix db::serializers encoding of replay_position 3.) Detect attempted reading of Origin truncation records, and instead of crashing, ignore and warn. 4.) Change truncation time stamps to be generated per-shard, _after_ CF flush is done, otherwise data in memtables at flush would be retained/replayed on next start. Retain the highest time stamp generated. Note for (3): This patch set does _not_ clear out origin records automatically. This because I feel that is a somewhat drastic and irreversible thing to do. If we want to avail the user of a means to get rid of the (3) warning, we should probably tell him to either use cqlsh, or add an API call for this, so he can do it explicitly. "	2016-02-15 11:39:56 +02:00
Tomasz Grabiec	456275e06a	storage_proxy: Simplify condition Message-Id: <1455288472-30538-1-git-send-email-tgrabiec@scylladb.com>	2016-02-14 11:22:15 +02:00
Calle Wilund	18203a4244	database::truncate/drop: Move time stamp generation to shard Fixes #884 Time stamps for truncation must be generated after flush, either by splitting the truncate into two (or more) for-each-shard operations, or simply by doing time stamping per shard (this solution). We generate TS on each shard after flushing, and then rely on the actual stored value to be the highest time point generated. This should however, from batch replay point of view, be functionally equivalent. And not a problem.	2016-02-09 15:45:37 +00:00
Gleb Natapov	63a5aa6122	prevent superfluous frozen_mutation copying Sometimes frozen_mutation is copied while it can be moved instead. Fix those cases. Message-Id: <20160204165708.GI6705@scylladb.com>	2016-02-07 10:54:16 +02:00
Gleb Natapov	049ae37d08	storage_proxy: change collectd to show foreground mutation instead of overall mutation count It is much easier to see what is going on this way otherwise graphs for bg mutations and overall mutations are very close with usual scaling for many workloads. Message-Id: <20160204083452.GH6705@scylladb.com>	2016-02-04 14:58:56 +02:00
Gleb Natapov	b4b560e0fc	change result_digest to hold std::array instead of a std::vector Digest size if fixed, so no need to use std::vector to hold it. Message-Id: <20160203102530.GU6705@scylladb.com>	2016-02-03 12:27:39 +02:00
Glauber Costa	f6cfb04d61	add a priority class to mutation readers SSTables already have a priority argument wired to their read path. However, most of our reads do not call that interface directly, but employ the services of a mutation reader instead. Some of those readers will be used to read through a mutation_source, and those have to patched as well. Right now, whenever we need to pass a class, we pass Seastar's default priority class. Signed-off-by: Glauber Costa <glauber@scylladb.com>	2016-01-25 15:20:38 -05:00
Gleb Natapov	dde2e80a20	storage_proxy: remove batchlog synchronously Wait for batchlog removal before completing a query otherwise batchlog removal queries may accumulate. Still ignore an error if it happens since it is not critical, but log it. Message-Id: <20160118095642.GB6705@scylladb.com>	2016-01-18 12:38:12 +02:00
Avi Kivity	d5050e4c6a	storage_proxy: make MUTATION and MUTATION_DONE verbs sychronous at the server side While MUTATION and MUTATION_DONE are asynchronous by nature (when a MUTATION completes, it sends a MUTATION_DONE message instead of responding synchronously), we still want them to be synchronous at the server side wrt. the RPC server itself. This is because RPC accounts for resources consumed by the handler only while the handler is executing; if we return immediately, and let the code execute asynchronously, RPC believes no resources are consumed and can instantiate more handlers than the shard has resources for. Fix by changing the return type of the handlers to future<no_wait_type> (from a plain no_wait_type), and making that future complete when local processing is over. Ref #596. Message-Id: <1453048967-5286-1-git-send-email-avi@scylladb.com>	2016-01-18 09:59:34 +02:00

1 2 3 4 5 ...

262 Commits