scylladb

Author	SHA1	Message	Date
Gleb Natapov	b4c368a6bc	storage_proxy: update correct statistics on range reads Fixes #2167 Message-Id: <20170405094119.GM8197@scylladb.com>	2017-04-09 18:16:06 +03:00
Avi Kivity	27c42359bc	Merge seastar upstream * seastar 6b21197...2ebe842 (6): > Merge "Various improvements to execution stages" from Paweł > app-template: allow apps to specify a name for help message > bool_class: avoid initializing object of incomplete type > app-template: make sure we can still get help with required options > prometheus: Http handler that returns prometheus 0.4 protobuf or text format > Update DPDK to 17.02 Includes patch from Pawel to adjust to updated execution_stage interface.	2017-03-26 10:50:21 +03:00
Amnon Heiman	295a981c61	storage_proxy: metrics should have unique name Metrics should have their unique name. This patch changes throttled_writes of the queu lenght to current_throttled_writes. Without it, metrics will be reported twice under the same name, which may cause errors in the prometheus server. This could be related to scylladb/seastar#250 Fixes #2163. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <20170314081456.6392-1-amnon@scylladb.com>	2017-03-14 11:19:39 +02:00
Paweł Dziepak	cfde2ad5b4	storage_proxy: make mutate() an execution stage	2017-03-09 09:27:43 +00:00
Paweł Dziepak	00b42c477f	storage_proxy: count counter updates for which the node was a leader	2017-03-02 09:05:12 +00:00
Paweł Dziepak	cf193f4b41	storage_proxy: use counter-specific timeout for writes	2017-03-02 09:05:12 +00:00
Paweł Dziepak	d177160f90	storage_proxy: transform counter timeouts to mutation_write_timeout_exception	2017-03-02 09:05:12 +00:00
Paweł Dziepak	774241648d	db: add more tracing events for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	277501f42f	db: propagate tracing state for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	25173f8095	db: propagate timeout for counter writes	2017-03-02 09:05:10 +00:00
Paweł Dziepak	426345e1d4	storage_proxy: avoid excessive mutation freezes	2017-03-01 16:33:36 +00:00
Paweł Dziepak	f10eb952d0	coordinator: do not apply counter write twice on leader	2017-03-01 16:33:36 +00:00
Calle Wilund	0a4edca756	counters/cql: allow wormholing actual counter values (with shards) via cql Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that argument value, which must be a list of tuples <int, UUID, long, long>, should be inserted as an actual counter value, not update. This of course to allow counters to be read from sstable loader. Note that we also need to allow timestamps for counter mutations, as well as convince the counter code itself to treat the data as already baked. So ugly wormhole galore. v2: * Changed flag names * More explicit wormholing, bypassing normal counter path, to avoid read-before-write etc * throw exceptions on unhandled shard types in marshalling v3: * Added counter id ordering check * Added batch statement check for mixing normal and raw counter updates Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>	2017-02-22 09:19:46 +00:00
Gleb Natapov	bb72425b61	storage_proxy: fix send_to_endpoint() to use correct create_write_response_handler() overload There are several problems with storage_proxy::send_to_endpoint right now. It uses create_write_response_handler() overload that is specific to read repair which is suboptimal and creates incorrect logs, it does not process errors and it does not hold storage_proxy object until write is complete. The patch fixes all of the problems. Message-Id: <20170208101949.GA19474@scylladb.com> Reviewed-by: Nadav Har'El <nyh@scylladb.com>	2017-02-12 10:46:13 +02:00
Avi Kivity	9530bac2d6	Merge "Adding metrics using histogram and labels" from Amnon "This series uses the newly added histogram and label support to add metrics to the storage_proxy and to the column_family. This would add latency and histogram and the missing metrics from column family." * 'amnon/histogram_metrics' of github.com:cloudius-systems/seastar-dev: database: add metrics registration for the coloumn family storage_proxy: add read and write latency histogram estimated_histogram: returns a metrics histogram	2017-02-09 11:40:57 +02:00
Amnon Heiman	2cf13c26e2	storage_proxy: add read and write latency histogram Register the read and write latency histogram on the metrics layer. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2017-02-06 17:54:47 +02:00
Nadav Har'El	f2fd81ece0	materialized views: function to send a mutation to endpoint Add a function for sending one mutation to one remote replica owning this mutation. This is needed for materialized views, where each base replica sends each view mutation to one particular view replica. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2017-02-06 13:36:45 +01:00
Gleb Natapov	3c372525ed	storage_proxy: use storage_proxy clock instead of explicit lowres_clock Merge commit `45b6070832` used butchered version of storage_proxy patch to adjust to rpc timer change instead the one I've sent. This patch fixes the differences. Message-Id: <20170206095237.GA7691@scylladb.com>	2017-02-06 12:51:36 +02:00
Paweł Dziepak	1e8814f5ce	storage_proxy: support counter updates	2017-02-02 10:35:14 +00:00
Paweł Dziepak	c14c6b753b	storage_proxy: add get_live_endpoints()	2017-02-02 10:35:14 +00:00
Amnon Heiman	45b6070832	Merge seastar upstream * seastar 397685c...c1dbd89 (13): > lowres_clock: drop cache-line alignment for _timer > net/packet: add missing include > Merge "Adding histogram and description support" from Amnon > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&' > Set the option '--server' of tests/tcp_sctp_client to be required > core/memory: Remove superfluous assignment > core/memory: Remove dead code > core/reactor: Use logger instead of cerr > fix inverted logic in overprovision parameter > rpc: fix timeout checking condition > rpc: use lowres_clock instead of high resolution one > semaphore: make semaphore's clock configurable > rpc: detect timedout outgoing packets earlier Includes treewide change to accomodate rpc changing its timeout clock to lowres_clock. Includes fixup from Amnon: collectd api should use the metrics getters As part of a preperation of the change in the metrics layer, this change the way the collectd api uses the metrics value to use the getters instead of calling the member directly. This will be important when the internal implementation will changed from union to variant. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>	2017-02-01 14:39:08 +02:00
Gleb Natapov	6e4817137e	storage_proxy: report foreground reads instead of reads The reason is the same as why foreground writes are reported instead of total writes (`049ae37d08`): It is much easier to see what is going on this way. Also fixes a typo in a counter's description. Fixes #1217 Message-Id: <20170129093412.GS11469@scylladb.com>	2017-01-29 12:40:56 +02:00
Gleb Natapov	64660397fc	storage_proxy: move operation type information from counter's name to a label Makes it much more flexible to view the data in various ways in Graphana. Message-Id: <20170126102746.GL11469@scylladb.com>	2017-01-26 12:38:29 +02:00
Gleb Natapov	ccee01f352	storage_proxy: put datacenter name into a label instead of counter's name Having datacenter name as a label makes it possible to create Prometheus board for the counters. Message-Id: <20170124132051.GX11469@scylladb.com>	2017-01-24 15:27:34 +02:00
Amnon Heiman	e19fa02a17	remove scollectd from headers As the metrics migration progressed, some include to scollectd.hh left behind. Because of the nature of the scollecd implementation those include brings alot of code with them to the header files and eventually to many source file. This patch remove those include and add a missing include to storage_proxy.cc. The reason the compiler didn't complain is an indication to the problematic nature of those include in the first place. Before this patch, change in metrics.hh would cause 169 files to compile, after this change 17. Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1484667536-2185-1-git-send-email-amnon@scylladb.com>	2017-01-17 17:39:47 +02:00
Tomasz Grabiec	3c3a4358ae	storage_proxy: Fix capturing of on-stack variable by reference partition_range_count was accepted by do_with callback by value and then captured by reference by async code, thus invoking use after destroy. Message-Id: <1484317846-14485-1-git-send-email-tgrabiec@scylladb.com>	2017-01-16 11:49:11 +02:00
Tomasz Grabiec	66547e7d7c	storage_proxy: Add missing initialization of _short_read_allowed Dropped by `a1cafed370` ("storage_proxy: handle range scans of sparsely populated tables"). Fixes the failure in update_cluster_layout_tests.TestUpdateClusterLayout test. Message-Id: <1484317450-13525-1-git-send-email-tgrabiec@scylladb.com>	2017-01-13 16:47:54 +02:00
Tomasz Grabiec	1e8151b4f2	storage_proxy: Fix use-after-free on one_or_two_partition_ranges query_mutations_locally() takes one_or_two_partition_ranges by reference and requires, indirectly, that it is kept alive until operation resolves. However, we were passing expiring value to it, the result of unwrap(). Fixes dtest failure in consistent_bootstrap_test.py:TestBootstrapConsistency.consistent_reads_after_bootstrap_test Another potential problem was that we were dereferencing "s" in the same expression which move-constructs an argument out of it. Message-Id: <1484222759-4967-1-git-send-email-tgrabiec@scylladb.com>	2017-01-12 15:10:51 +02:00
Gleb Natapov	76aed548e3	storage_proxy: add replica side counters for data read Message-Id: <20170112085907.GN11469@scylladb.com>	2017-01-12 11:41:04 +02:00
Avi Kivity	8f36dca6f1	storage_proxy: prevent short read due to buffer size limit from being swallowed during range scan mutation_result_merger::get() assumes that the merged result may be a short read if at least one of the partial results is a short read (in other words, if none of the partial results is a short read, then the merged result is also not a short read). However this is not true; because we update the memory accounter incrementally, we may stop scanning early. All the partial results are full; but we did not scan the entire range. Fix by changing the short_read variable initialization from `no` (which assumes we'll encounter a short read indication when processing one of the batches) to `this->short_read()`, which also takes into account the memory accounter. Fixes #2001. Message-Id: <20170108111315.17877-1-avi@scylladb.com>	2017-01-09 09:21:43 +00:00
Avi Kivity	eb520e7352	storage_proxy: fix result ordering for parallel partition range scans During a range scan, we try to avoid sorting according to partition range when we can do so. This is when we scan fewer than smp::count shards -- each shard's range is strictly ordered with respect to the others. However, we use the wrong key for the sort -- we use the shard number. But if we started at shard s > 0 and wrapped around to shard 0, then shard 0's range will be after the range belonging to shard s, but will sort before it. Fix by storing the iteration order as the sort key. We use that when we know that shards do not overlap (shards < smp::count) and the index within the source partition range vector when they do. Fixes #1998. Message-Id: <20170105114253.17492-1-avi@scylladb.com>	2017-01-05 12:51:37 +01:00
Gleb Natapov	4ca58959ad	storage_proxy: do not deref unengaged stdx:optional Fixes intentional short reads. Message-Id: <20161227142133.GE1829@scylladb.com>	2016-12-27 16:30:03 +02:00
Paweł Dziepak	e6d27ac529	query: introduce result_memory_accounter::foreign_state Range queries used to be performed sequentially and the shard performing part of the read was reading state of the merger's memory accounter directly. Now, they may be performed in parallel so it is safer to just pass relevant data by value to the intersted shards so that they are not reading something that another shard is modyfing at the same time. Since query is done in parallel there is a chance of overread. However, the parallelism is high only in sparsely populated tables and that's when the overread is less serious problem.	2016-12-22 17:16:24 +01:00
Paweł Dziepak	49d675223e	storage_proxy: fix short reads in parallel range queries Since `a1cafed370` "storage_proxy: handle range scans of sparsely populated tables" nonsingular range queries may be performed in parallel on multiple shards. The consequence of this that result may be added to the merger out of order. This requires more complex logic for handling short reads. As soon as mutation_result_merger gets a short read it starts to discard all subsequently received results that are known to contain partitions with larger keys. Then when the final result is being prepared the merger may need to combine and sorts results which ordering is not known. If at least one of these results is a short one all partitions with larger keys are removed. Due to request being performed in parallel it is possible that even though there was a short read the merger has got enough live data to satisfy specified limits. If this has happened the short read flag is not set on the final result.	2016-12-22 17:16:24 +01:00
Paweł Dziepak	1a52569f7d	storage_proxy: pass maximum result size to replicas We may want to change the default individual result size limit in the future. If it is provided by the coordinator and not hardcoded in the replicas this can be done without causing data query digest mismatches or wasteful mutation query results.	2016-12-22 17:16:23 +01:00
Paweł Dziepak	aa083d3d85	result_memory_limiter: split new_read() to new_{data, mutation}_read() For data queries it is very important that all replicas get limited in the same place (this includes replicas returning only digest). That's why they shouldn't be affected by per-shard result memory limit. Moreover, we should make sure that individual memory limits are the same, making the coordinator provide it for replicas which allow to safely change it in the future. Mutation queries are not as sensitive but it is still beneficial to make sure that all replicas use the same individual limit.	2016-12-22 13:35:04 +01:00
Paweł Dziepak	a7a454c388	storage_proxy: fix _is_short_read computation	2016-12-22 13:35:04 +01:00
Paweł Dziepak	8c1e4a707c	storage_proxy: disallow short reads if got no live rows If after reconciliation the coordinator ends up with no live rows and short reads are allowed a retry may not make any progress if replicas end their reads in the same place. The solution is to disallow short reads on retries which are caused by final result having no live rows.	2016-12-22 13:35:04 +01:00
Paweł Dziepak	6db262446f	storage_proxy: don't stop after result with no live rows mutation_result_merger merges results from different shards and stops as soon as a shard returned a short read or memory usage on the merging shard is too high. However, it should never stop unless at least one live rows is in the merged result.	2016-12-22 13:35:04 +01:00
Avi Kivity	a1cafed370	storage_proxy: handle range scans of sparsely populated tables When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the default), a scan range can be split into a large number of subranges, each going to a separate shard. With the current implementation, subranges were queried sequentially, resulting in very long latency when the table was empty or nearly empty. Switch to an exponential retry mechanism, where the number of subranges queried doubles each time, dropping the latency from O(number of subranges) to O(log(number of subranges)). If, during an iteration of a retry, we read at most one range from each shard, then partial results are merged by concatentation. This optimizes for the dense(r) case, where few partial results are required. If, during an iteration of a retry, we need more than one range per shard, then we collapse all of a shard's ranges into just one range, and merge partial results by sorting decorated keys. This reduces the number of sstable read creations we need to make, and optimizes for the sparse table case, where we need many partial results, most of which are empty. We don't merge subranges that come from different partition ranges, because those need to be sorted in request order, not decorated key order. [tgrabiec: trivial conflicts] Message-Id: <20161220170532.25173-1-avi@scylladb.com>	2016-12-20 18:32:29 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	e5485f3ea6	Get rid of query::partition_range Use dht::partition_range instead	2016-12-19 08:09:25 +08:00
Duarte Nunes	fee0b7fa48	query_result_merger: Limit rows This patch makes the row limit enforced by the storage_proxy layer. It adds a row limit to the query_result_merger, useful when merging results for concurrent queries. More importantly, it provides guarantees that upper layers may be relying on implicitly (e.g., the paging code). Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 11:00:36 +00:00
Duarte Nunes	efc986d548	mutation_query: to_data_query_result enforces row limit This patch changes mutation_query::to_data_query_result() so that it enforces the row limit alongside the partition limit and the per-partition limit. In the following patch, we'll enforce the row limit in an upper layer, but this lets us optimize the case where only when replica replies. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 10:56:40 +00:00
Duarte Nunes	c2072c7dc9	storage_proxy: Decrease limits when retrying command This patch changes a read_command's limits when retrying it, so that we don't ask for more rows than necessary. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 10:41:06 +00:00
Duarte Nunes	9572c19dc6	storage_proxy: Don't fetch superfluous partitions This patch ensures we keep track of how many partitions we've queried so we don't ask for more than the number we need. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 10:27:46 +00:00
Duarte Nunes	108011a839	query_result_merger: Limit partitions This patch adds a partition limit to the query_result_merger, useful when merging results for concurrent queries. This change also makes the partition limit enforced by the storage_proxy layer, no changes being needed by the upper layers, namely the Thrift interface. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-12-15 10:27:41 +00:00
Paweł Dziepak	4c69d7e2fe	storage_proxy: clean up after primary_key introduction primary_key was introduced as a replacement for std::pair<dht::decorated_key, std::optional<clustering_key>>. In order to simplify patch introducing its fields were named 'first' and 'second'. This patch changes the names to something less useless, removes old row_address alias and removes is_missing_rows() in favour of primary_key::less_compare_clustering comparator. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:28:37 +00:00
Paweł Dziepak	3c173d87b5	storage_proxy: handle intentional short reads If the result is going to be too large the replica may decide to make it shorter and coordinator should handle this properly (i.e. do not retry). Moreover, coordinator could avoid some retries by setting the short_read flag itself. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:28:37 +00:00
Paweł Dziepak	dd67de7218	storage_proxy: make sure coordinator has complete data got_incomplete_information() ensures that the coordinator has received all required data from all replicas. (see `77dbe3c12f` "storage_proxy: fix reconciliation with limits" for the examples when that may not be the case). However, this function is called only if reconciled result has at least as much rows as the user asked for. This was correct when we had only total row limit: if the result was shorter than that either all replicas sent all data they have or the coordinator will retry anyway. However, since then we got partition limit and per partition row limit and a request may be limited by one of these while being still below the total row limit. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>	2016-12-14 14:28:36 +00:00

1 2 3 4 5 ...

378 Commits