scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 11:36:54 +00:00

Author	SHA1	Message	Date
Gleb Natapov	93f068bd44	storage_proxy: fix speculation target selection logic Current speculation target selection logic has several bugs in multi-dc setup. It may select a non local target for CL=LOCAL and it may select more than one target to speculate, one of which is non local. Examples: 1. Two dataceneters: DC1 RF 2, DC2 RF 2 and read with LOCAL_QUORUM. In this scenario db::filter_for_query() will return both replicas from local DC and speculation target selection logic will peek one one which will be in different DC. 2. Two dataceneters: DC1 RF 2, DC2 RF 2 and read with LOCAL_ONE + RRD.DC_LOCAL In this scenario db::filter_for_query() will return all nodes in local DC and there already be enough nodes to speculate, but current logic will add one node from non local dc as a speculation target. The patch below fixed both of those scenarios. Message-Id: <20161103154637.GS7766@scylladb.com>	2016-11-08 18:32:47 +01:00
Paweł Dziepak	a8308e2a8d	row_cache: dummy entry does not count as partition Since continuity flag introduction row cache contains a single dummy entry. cache_tracker knows nothing about it so that it doesn't appear in any of the metrics. However, cache destructor calls cache_tracker::on_erase() for every entry in the cache including the dummy one. This is incorrect since the tracker wasn't informed when the dummy entry was created. Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com> Message-Id: <1478608776-10363-1-git-send-email-pdziepak@scylladb.com>	2016-11-08 13:54:44 +01:00
Piotr Jastrzebski	50b41f7d1d	Fix row_cache_test partition_range passed to row_cache::make_reader has to be kept alive as long as the resulting reader is used. Otherwise weird things start to happen. This used to work just because of a pure luck. When I started changing the row_cache implementation I run into very weird behaviors for this tests. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <2c9e337dbbcf35f4e1394cad043eda10b8c2bd4a.1478602876.git.piotr@scylladb.com>	2016-11-08 13:28:53 +01:00
Calle Wilund	473326d49a	api/column_family: Make mean row size return integral As (at least) per C3, these metrics are integral in origin. Adapt. (Other option would be to translate in jmx).	2016-11-08 12:22:04 +00:00
Calle Wilund	bd646a6755	repair (api): Add option handling (sort of) for nodetool default options	2016-11-08 12:22:04 +00:00
Calle Wilund	0181fc8159	api::cache_service: Add (dummy) calls for key&counter metrics	2016-11-08 12:22:04 +00:00
Calle Wilund	5eb54f9bc4	api::storage_service: c3 compat - make query keyspaces a trinary choice all, user or non-local strategy ones.	2016-11-08 12:22:04 +00:00
Calle Wilund	3b7a7dd383	api::failure_detector: c3 compat - add endpoint phi value query	2016-11-08 12:22:04 +00:00
Calle Wilund	218df55349	failure_detector: add accessor and api shortcut for arrival samples	2016-11-08 12:22:04 +00:00
Calle Wilund	f9836cd23b	api::endpoint_snitch: c3 compat - allow dc/rack query for broadcast	2016-11-08 12:22:04 +00:00
Calle Wilund	54ba06a8bf	api::column_family: Add calls/parameters for c3 compatibility	2016-11-08 12:22:04 +00:00
Amnon Heiman	c8082ccadb	API: fix a type in storage_proxy This patch fixes a typo in the URL definition, causing the metric in the jmx not to find it. Fixes #1821 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Message-Id: <1478563869-20504-1-git-send-email-amnon@scylladb.com>	2016-11-08 11:09:21 +02:00
Amos Kong	95fe88c1d3	scripts/scylla_current_repo: use HTTP to access downloads.scylladb.com Https isn't available for downloads.scylladb.com, or we can access it by https://s3.amazonaws.com/downloads.scylladb.com/... Signed-off-by: Amos Kong <amos@scylladb.com> Message-Id: <d4b65e1724bbeb76c928790d5d3e95b91ee9db79.1478153034.git.amos@scylladb.com>	2016-11-08 11:03:50 +02:00
Avi Kivity	767cfb4fe9	storage_service: fix range wrapping in describe_ring even more Commit `8fca1887c2` ("storage_service: fix range wrapping in describe_ring") fixed incorrect range wrapping code for describe_ring, but fails when the number of endpoints for a token is greater than one, because the endpoints are stored in an unordered vector. Fix by comparing the endpoints in a way that ignores their order. Message-Id: <1478460826-15923-1-git-send-email-avi@scylladb.com>	2016-11-07 16:18:20 +01:00
Calle Wilund	11baf37ab5	commitlog: Prevent exceptions in stream::produce from being set twice Fixes #1775 stream lacks a check "is_open", which is a bummer. We have to both prevent exception propagation and add a flag of our own to make sure exceptions in producer code reaches consumer, and does not simply get lost in the reactor. Message-Id: <1478508817-18854-1-git-send-email-calle@scylladb.com>	2016-11-07 11:41:33 +01:00
Tomasz Grabiec	e6cc0a2e10	Merge branch '1766/v1' from duarten/scylla.git This patchset adds missing properties to the create_view_statement, such as whether the view is compact or the order of its clustering columns. Fixes #1766	2016-11-07 10:44:24 +01:00
Takuya ASADA	0f1ba1a3bb	dist/redhat: remove unused dependencies Seems like we mistakenly added unneeded packages for BuildRequires when we created .spec file, so remove them. Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <1478504761-15067-1-git-send-email-syuu@scylladb.com>	2016-11-07 09:48:50 +02:00
Paweł Dziepak	985d2f6d4a	Merge "Remove quadratic behavior from atomic sstable deletion" from Avi "The atomic sstable deletion provides exception safety at the cost of quadratic behavior in the number of sstables awaiting deletion. This causes high cpu utilization during startup. Change the code to avoid quadratic complexity, and add some unit tests. See #1812."	2016-11-04 15:48:04 +00:00
Avi Kivity	f75aceabc5	sstables: add unit tests for atomic deletion We simulate shards deleting sstables, but this is all happening on a single core, and no sstables are harmed during test execution.	2016-11-04 15:48:43 +02:00
Avi Kivity	f10b9906d8	sstables: move atomic deletion code to its own files This will simplify unit testing. We move generic code that depends only on seastar, so compile time should not increase too much.	2016-11-04 15:47:35 +02:00
Avi Kivity	9e85653c33	sstables: make atomic_deletion_manager more abstract Make the shard count and method of deleting sstables abstract, in order not to require all that machinery for unit tests.	2016-11-04 15:44:09 +02:00
Avi Kivity	e527da1e3c	sstables: wrap atomic deletion code in a class This makes it easier to abstract and unit-test.	2016-11-04 15:44:07 +02:00
Avi Kivity	a05837936a	sstables: remove quadratic behavior from atomic sstable deletions In order to ensure exception safety, the atomic sstable deletion code creates a copy of the list of sstables pending deletion, modifies that copy, and then replaces the original data with the copy. This guarantees that any exception does not change the data, since the assignment does not require allocation. However, it does result in quadratic behavior. During startup, all sstables are loaded on each shard, and each shard deletes sstables that are do not have any partitions served by that shard; this results in almost all sstables being deleted from all shards, with all that work going to shard 0; the list grows to O(nr sstables), and there are O((nr sstables) * (nr shards)) operations to perform. Fix by replacing the copy-modify-assign method with an in-place update, but one that is designed to only commit changes after all allocations have been made; in addition, instead of using a list, use a hash table, removing another source of quadratic behavior. Fixes #1812 (the quadratic beahvior part).	2016-11-04 15:42:44 +02:00
Avi Kivity	8fca1887c2	storage_service: fix range wrapping in describe_ring describe_ring() tries to re-wrap the ranges, but fails because the ranges are not sorted. Adjust the code not to rely on sorting. Message-Id: <1478198630-27483-1-git-send-email-avi@scylladb.com>	2016-11-04 10:48:14 +00:00
Paweł Dziepak	8afd9e52c7	Merge "Process range queries sequentially on shards" from Avi "Currently, partition range queries are processed in parallel on all shards. This is inefficient because we are likely to drop the results from all but one shard, assuming a well-populated column family. We are multiplying our work by a factor of smp::count. While this is worthwhile in its own right, it is really an excuse to sneak in the range/shard generator (patch 5), which is preliminary for a new sharding algorithm, dividing tokens among shards based on the middle-significant bits rather than the most-siginificant bits (which alias with vnodes) Fixes #1573."	2016-11-04 09:58:04 +00:00
Tomasz Grabiec	c1a7e2090e	Revert "database: change find_column_families signature so it returns a lw_shared_ptr" This reverts commit `f3528ede65`.	2016-11-04 10:48:21 +01:00
Tomasz Grabiec	3b5ccda70e	Revert "database: refactor code so apply_in_memory() is called only once" This reverts commit `3f825f593d`.	2016-11-04 10:48:18 +01:00
Tomasz Grabiec	6366eb5cf8	Revert "correctly calculate latencies for writes" This reverts commit `a382f10fc4`.	2016-11-04 10:48:02 +01:00
Tomasz Grabiec	a5ee87611a	Revert "database: when querying, move latency counter instead of copying" This reverts commit `8840a5a593`.	2016-11-04 10:47:58 +01:00
Tomasz Grabiec	f3c1ff78e6	Merge branch 'cql_read_write_counters-v4' from seastar-dev.git New CQL counters from Vlad.	2016-11-04 09:19:07 +01:00
Avi Kivity	b3299d5bc3	storage_proxy: simplify range queries Instead of asking a shard for cmd->partition_limit and cmd->row_limit, just ask it for the number of partitions and rows still needed to satisfy the query. This removes the need to trim the shard's result.	2016-11-03 19:10:20 +02:00
Avi Kivity	a668e575f6	storage_proxy: execute multi-partition query sequentially over shards Since every shard might cause the row_limit quota to be satisfied, every shard might be the last one we need. Hence it is better to process shards sequentially, stopping if the quota is reached or the range is exhausted. The original code tried to yield to reduce latency, but this is now unnecessary, as we're doing a lot less work per iteration (if it becomes necessary, we should do it on the replica shard, not the coordinating shard).	2016-11-03 19:10:20 +02:00
Avi Kivity	1d77e3a03a	partitioner: add unit tests for token_for_next_shard() i_partitioner::token_for_next_shard() is an inverse for i_partitioner::shard_of(), test that this is so.	2016-11-03 19:10:20 +02:00
Avi Kivity	7202b94183	dht: introduce a sharder for vectors of partition ranges Building on the single-range sharder, add a sharder for vectors of partition ranges. This helps with wrapped ranges, which are translated into a vector containing two shards.	2016-11-03 19:10:20 +02:00
Avi Kivity	43a2380899	dht: add a generator for shard/range pairs Divides a ring_position range into a sequence of shard/range pairs. This allows sequential iteration over shards in ring order. The current multi-partition query executes on all shards in parallel, but this is very wasteful, as most of the data will be thrown away if it is not included in the page. With the generator, we can switch to sequential execution.	2016-11-03 19:10:17 +02:00
Avi Kivity	1f88d103a8	partitioner: add i_partitioner::token_for_next_shard() When performing a range query, we want to iterate over shards, running the query on each shard in order until the query range is exhausted or we have the right number of rows. To be able to do this, introduce token_for_next_shard(), which allows us to determine the boundary between shards. It is a sort-of inverse to shard_of(), in that shard_of(token_for_next_range(t)) == shard_of(t) + 1	2016-11-03 19:09:23 +02:00
Vlad Zolotarov	6c15dd967a	cql3::query_processor: make the collectd metrics registration nicer Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-11-03 11:49:20 -04:00
Vlad Zolotarov	36cc351ae1	cql3::query_processor: add a counter for BATCH CQL statements - Add a "batches" member to cql_stats. - Update it where appropriate. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-11-03 11:49:20 -04:00
Vlad Zolotarov	6e1d27bed1	cql3::query_processor: add a counter for a number of CQL modification requests ("writes") - Add a inserts, updates, deletes members to cql_stats. - Store cql_stats& in a modification_statement and increment the corresponding counter according to the value of a "type" field. - Store cql_stats& in a batch_statement and increment the statistics for each BATCH member. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-11-03 11:49:15 -04:00
Vlad Zolotarov	fa4e1db0cb	cql: add a counter for CQL read (SELECT) requests - Add a "reads" counter to a cql3::cql_stats struct. - Store a reference for a query_processor::_cql_stats in the select_statement object. - Increment a "reads" counter where needed. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-11-03 11:48:57 -04:00
Vlad Zolotarov	7606588267	cql3::query_processor: add cql_stats - Add cql_stats member. - Pass it to cql3::raw::parsed_statement::prepare() virtual method. Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>	2016-11-03 11:48:57 -04:00
Glauber Costa	8840a5a593	database: when querying, move latency counter instead of copying It is comprised of two time points. Let's move it instead of copying it. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <c7c155c77780e188bfbe05881c81ce86456016d5.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Glauber Costa	a382f10fc4	correctly calculate latencies for writes Right now we are calculating latencies only when we are about to add an item to the memtable. That's incorrect and misleading, for two reasons. First, it leaves the commitlog latencies out. But second, it is done after the memtable wall effect is applied, which means we are not counting throttle time neither in the memtables or in the commitlog. To do that, we'll start the latency_counter object as soon as possible and move it all the way to apply_in_memory(). That should span the entire write operation. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <4e424780d290fd5938046060df2b17e2b470b717.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Glauber Costa	3f825f593d	database: refactor code so apply_in_memory() is called only once There are two variants of apply_in_memory() being called in do_apply(): with and without the commitlog. The main differences are that when the commitlog is involved, we need to wait for its future to complete before moving to apply_in_memory. That can easily be factored out by providing an always-ready future if we don't have the commitlog enabled, and waiting on that. The second, is that the commitlog version can cause apply_in_memory to generate an exception if there is replay position reordering. However, there is no harm in appending the exception handler to both versions. In one of them it's an impossible exception, but that's fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <8cee0cad9b1930a057a24e095f0a655069ae8be2.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Glauber Costa	f3528ede65	database: change find_column_families signature so it returns a lw_shared_ptr There are places in which we need to use the column family object many times, with deferring points in between. Because the column family may have been destroyed in the deferring point, we need to go and find it again. If we use lw_shared_ptr, however, we'll be able to at least guarantee that the object will be alive. Some users will still need to check, if they want to guarantee that the column family wasn't removed. But others that only need to make sure we don't access an invalid object will be able to avoid the cost of re-finding it just fine. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <722bf49e158da77ff509372c2034e5707706e5bf.1478111467.git.glauber@scylladb.com>	2016-11-03 13:27:31 +01:00
Avi Kivity	6c45b0bae8	partitioner: make comparators public The public comparison operators depend on global_partitioner(), and are therefore less useful for tests.	2016-11-03 11:27:40 +02:00
Avi Kivity	6320181b97	partitioner: const correctness for comparators	2016-11-03 11:27:40 +02:00
Avi Kivity	470826d127	partitioner: change partitioners to have shard counts independent from smp::count Useful for testing.	2016-11-03 11:27:40 +02:00
Avi Kivity	75706c0a26	size_estimates_recorder: sort token range before rewrapping it Since size estimates are stored as wrapped ranges, we call compat::wrap() to convert from the now-standard unwrapped ranges back to wrapped ranges. However, compat::wrap() relies on the ranges being in sorted order, but our input is not. This leads to a crash as we find an unexpected empty token in the middle of the vector. Sort it so compat::wrap() works as expected. Fixes #1804. Message-Id: <1478161908-25051-1-git-send-email-avi@scylladb.com>	2016-11-03 09:43:41 +01:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00

... 20 21 22 23 24 ...

11716 Commits