scylladb

Author	SHA1	Message	Date
Paweł Dziepak	8c3b7fea81	Merge "Introduce new API and converters from/to old mutation_reader" from Piotr "This changeset is the first step to flatten mutation_reader. Then it introduces new mutation_fragment types for partition header and end of partition. Using those a new flat_mutation_reader is defined. Finally it introduces converters between new flat_mutation_reader and old mutation_reader." * 'haaawk/flattened_mutation_reader_v12' of github.com:scylladb/seastar-dev: Add tests for flat_mutation_reader Introduce conversion from flat_mutation_reader to mutation_reader Introduce conversion from mutation_reader to flat_mutation_reader Introduce flat_mutation_reader Extract FlattenedConsumer concept using GCC6_CONCEPT Introduce partition_end mutation_fragment Introduce a position for end of partition Introduce partition_start mutation_fragment Introduce FragmentConsumer Introduce a position for partition start streamed_mutation: Extract concepts using GCC6_CONCEPT macro	2017-10-16 12:14:23 +01:00
Duarte Nunes	2210d10552	gms/gossiper: Cleanup is_alive() Make it use get_endpoint_state_for_endpoint_ptr(), check if gossiper is enabled, mark it as const, and have some callers use it instead of open coding the logic. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-10-11 10:02:32 +01:00
Piotr Jastrzebski	2516b42752	Introduce partition_start mutation_fragment This type of mutation_fragment will be used in new mutation_reader to signal the beginning of the next partition. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2017-10-10 16:15:59 +02:00
Duarte Nunes	ceebbe14cc	gossiper: Avoid endpoint_state copies gossiper::get_endpoint_state_for_endpoint() returns a copy of endpoint_state, which we've seen can be very expensive. This patch adds a similar function which returns a pointer instead, and changes the call sites where using the pointer-returning variant is deemed safe (the pointer neither escapes the function, nor crosses any defer point). Fixes #764 Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2017-10-10 13:48:02 +01:00
Tomasz Grabiec	741ec61269	streaming: Fix streaming not streaming all ranges It skipped one sub-range in each of the 10 range batch, and tried to access the range vector using end() iterator. Fixes sporadic failures of update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_node_1_test. Message-Id: <1505848902-16734-1-git-send-email-tgrabiec@scylladb.com>	2017-09-20 10:33:59 +03:00
Botond Dénes	a980ff6463	Use abort() instead of assert + throw in unreachable code Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <393c3730111dfe090c44d8fc2e31602956a7d008.1504022425.git.bdenes@scylladb.com>	2017-09-03 11:07:27 +03:00
Botond Dénes	d1209c548a	Fix -Wreturn-type warnings Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <99f7a006daaa78eb87720ac51c394093398bc868.1504013915.git.bdenes@scylladb.com>	2017-08-29 16:41:09 +03:00
Tomasz Grabiec	2ca99be27d	ring_position_view: Print token instead of token pointer Broken in `e989d65539`. Message-Id: <1503667158-7544-1-git-send-email-tgrabiec@scylladb.com>	2017-08-25 14:25:21 +01:00
Avi Kivity	81a33df25d	dht: reduce split_range_to_single_shard contiguous memory demand split_range_to_single_shard() returns a vector of size 4096, with each element (a partition_range) of size 100. The total of 400k can cause defragmentation if memory is fragmented. Fix by using a deque. Fixes #2707. Message-Id: <20170819141017.28287-1-avi@scylladb.com>	2017-08-21 14:25:45 +02:00
Duarte Nunes	ec75eac37d	ring_position_exponential_vector_sharder: Take ranges by rvalue Avoids some copies. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20170814093310.29200-1-duarte@scylladb.com>	2017-08-14 12:55:43 +03:00
Asias He	f239b11a84	storage_service: Use the new range_streamer interface for bootstrap So that bootstrap operation will now stream small ranges at a time and restream the failed ranges.	2017-08-07 16:31:47 +08:00
Asias He	6810031ba7	dht: Extend range_streamer interface After this patch and the following patches to use the new range_streamder interface, all the following cluster operations: - bootstrap - rebuild - decommission - removenode will use the same code to do the streaming. The range_streamer is now extended to support both fetch from and push to peer node. Another big change is now the range_streamer will stream less ranges at a time, so less data, per stream_plan and range_streamer will remember which ranges are failed to stream and can retry later. The retry policy is very simple at the moment it retries at most 5 times and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes .... Later, we can introduce api for user to decide when to stop retrying and the retry interval. The benefits: - All the cluster operation shares the same code to stream - We can know the operation progress, e.g., we can know total number of ranges need to be streamed and number of ranges finished in bootstrap, decommission and etc. - All the cluster operation can survive peer node down during the operation which usually takes long time to complete, e.g., when adding a new node, currently if any of the existing node which streams data to the new node had issue sending data to the new node, the whole bootstrap process will fail. After this patch, we can fix the problematic node and restart it, the joining node will retry streaming from the node again. - We can fail streaming early and timeout early and retry less because all the operations use stream can survive failure of a single stream_plan. It is not that important for now to have to make a single stream_plan successful. Note, another user of streaming, repair, is now using small stream_plan as well and can rerun the repair for the failed ranges too. This is one step closer to supporting the resumable add/remove node opeartions.	2017-08-07 16:31:47 +08:00
Paweł Dziepak	68e57a742f	ring_position_comparator: drop unused overloads	2017-07-26 14:36:37 +01:00
Paweł Dziepak	fe7eba7f06	ring_position_comparator: accept sstables::decorated_key_view ring_position_comparator has overloads for comparing ring_positions as well as sstables::key_view. In the case of the latter it needs to compute the token of the key. However, the sstable layer could cache some tokens so let's allow the comparator callers to provide it directly.	2017-07-26 14:36:36 +01:00
Tomasz Grabiec	60678f0e8a	ring_position: Optimize contruction from r-value referenceces of decorated_key Message-Id: <1500650171-26291-1-git-send-email-tgrabiec@scylladb.com>	2017-07-24 10:25:14 +03:00
Asias He	d835cf2748	dht: Add selective_token_range_sharder It is like ring_position_range_sharder but it works with dht::token_range. This sharder will return the ranges belong to a selected shard.	2017-07-04 18:46:19 +08:00
Tomasz Grabiec	e989d65539	dht: Make ring_position_view copyable dht::token needs to be stored as a pointer now and not a reference so that validity of old pointers is not impacted by in-place object construction which would occur in the copy-assignment operator. [1] says that old pointers can be used to access the new object only if the type "does not contain any non-static data member whose type is const-qualified or a reference type". [1] http://en.cppreference.com/w/cpp/language/lifetime#Storage_reuse	2017-06-24 18:06:11 +02:00
Tomasz Grabiec	81e7b561da	dht: Add ring_position min()/max()	2017-06-24 18:06:11 +02:00
Avi Kivity	f9f2f18145	dht: fix bad to_sstring() call to_sstring() is part of seastar, nor the global namespace.	2017-06-22 17:51:27 +03:00
Avi Kivity	ebaeefa02b	Merge seatar upstream (seastar namespace) - introcduced "seastarx.hh" header, which does a "using namespace seastar"; - 'net' namespace conflicts with seastar::net, renamed to 'netw'. - 'transport' namespace conflicts with seastar::transport, renamed to cql_transport. - "logger" global variables now conflict with logger global type, renamed to xlogger. - other minor changes	2017-05-21 12:26:15 +03:00
Calle Wilund	6ca07f16c1	scylla: fix compilation errors on gcc 5 Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com>	2017-05-17 17:56:06 +03:00
Avi Kivity	68034604e1	dht: murmur3_partitioner: simplify moving to and from the zero-based token range	2017-05-17 13:50:30 +03:00
Avi Kivity	76f12a8842	dht: add split_range_to_single_shard() Intersects a shard's owning range with a ring position range, and return the sorted result.	2017-05-17 13:50:27 +03:00
Avi Kivity	a65e8bd215	dht: add a ring-position-range-vector variant of the exponential sharder The "exponentiality" is not carried over from one range to another, because we expect one or two ranges (two ranges result from a wrapped around thrift token range).	2017-05-17 13:18:52 +03:00
Avi Kivity	f671ac13b4	dht: add an exponential ring_position range sharder Like the regular sharder, the exponential sharder divides a range into subranges owned by individual ranges. Unlike the regular sharder, it generates ever-increasing subranges, spanning more and more shards, and eventually returns several subranges per shard. To avoid using exponential cpu and memory, subranges belonging to a single shard are merged, and a flag is set to indicate the subranges are not ordered wrt. each other.	2017-05-17 13:18:49 +03:00
Avi Kivity	025c6b45b2	dht: extend i_partitioner::next_token_for_shard() Right now, next_token_for_shard() only allows iterating linearly in shard order. Add the ability to select a specific shard to skip to (in case we're only interested in a single shard), and to select larger ranges (so that exponential increases are not implemented by iteration).	2017-05-17 12:30:03 +03:00
Avi Kivity	7156ea8804	dht: make ring_position_range_sharder more independent of global_partitioner Useful for testing.	2017-05-17 12:30:03 +03:00
Avi Kivity	302fec8293	dht: make i_partitioner::name() const	2017-05-17 12:30:03 +03:00
Avi Kivity	f462c4327e	dht: make i_partitioner keep track of the number of shards it was configured with Useful for testing classes layered on top of the partitioner (the sharders).	2017-05-17 12:30:03 +03:00
Avi Kivity	04b16ae8ec	dht: fix partitioner initialization for tests The partitioners now depend on smp::count to be initialized correctly, but smp::count isn't available at static initialization time. The scylla executable isn't affected because it calls set_global_partitioner() after smp::count has been initialized. Fix by deferring initialization to the first global_partitioner() call.	2017-05-17 12:30:03 +03:00
Tomasz Grabiec	7db83fa3fe	sstables: index_reader: Optimize advancing to extreme positions	2017-04-20 10:54:38 +02:00
Tomasz Grabiec	c7b9c5dfd3	dht: ring_position_view: Add key getter	2017-04-20 10:54:38 +02:00
Tomasz Grabiec	5b71e0b9ab	dht: ring_position_view: Add constructor and factory from ring_position_view	2017-04-20 10:54:38 +02:00
Avi Kivity	af118ab52b	murmur3_partitioner: fix build on clang Don't know what the root cause it, but the fix is harmless.	2017-04-17 23:03:15 +03:00
Avi Kivity	c05f60387b	i_partitioner: remove unused function Found by clang.	2017-04-17 23:03:15 +03:00
Avi Kivity	a496ec7f5b	byte_ordered_partitioner: fix bad operator precedence Found by clang.	2017-04-17 23:03:15 +03:00
Tomasz Grabiec	d4b6e430ed	dht: Introduce ring_position_view	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	55a7cceef5	dht: Move comparison logic from ring_position::tri_compare() to ring_position_comparator It will soon define common ordering for many objects, not just ring_position.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	65a8920b25	dht: Make min/max tokens capturable by reference So that they can be later used in views.	2017-03-28 18:10:39 +02:00
Avi Kivity	54b8acdd9f	dht: add hashing and comparison helpers to dht::decorarted_key An std::hash specialization, and an equality comparator.	2017-01-20 11:24:14 +02:00
Avi Kivity	141048e0e5	dht: improve token hash function For a small token, we can just return it, since it already is a hash. We hash large tokens using murmur3, which is supposedly a good hash.	2017-01-20 11:24:14 +02:00
Avi Kivity	8686a59ea5	dht: use nonwrapping_ranges in ring_position_range_sharder It was the observation that ring_position_range_sharder doesn't support wrapping ranges that started the nonwrapping_range madness, but that class still has some leftover wrapping ranges. Close the circle by removing them. Message-Id: <20161123153113.8944-1-avi@scylladb.com>	2016-12-22 14:40:30 +01:00
Avi Kivity	a1cafed370	storage_proxy: handle range scans of sparsely populated tables When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the default), a scan range can be split into a large number of subranges, each going to a separate shard. With the current implementation, subranges were queried sequentially, resulting in very long latency when the table was empty or nearly empty. Switch to an exponential retry mechanism, where the number of subranges queried doubles each time, dropping the latency from O(number of subranges) to O(log(number of subranges)). If, during an iteration of a retry, we read at most one range from each shard, then partial results are merged by concatentation. This optimizes for the dense(r) case, where few partial results are required. If, during an iteration of a retry, we need more than one range per shard, then we collapse all of a shard's ranges into just one range, and merge partial results by sorting decorated keys. This reduces the number of sstable read creations we need to make, and optimizes for the sparse table case, where we need many partial results, most of which are empty. We don't merge subranges that come from different partition ranges, because those need to be sorted in request order, not decorated key order. [tgrabiec: trivial conflicts] Message-Id: <20161220170532.25173-1-avi@scylladb.com>	2016-12-20 18:32:29 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	7a446986fa	dht: Introduce dht::partition_range_vector and dht::token_range_vector std::vector<dht::partition_range> and std::vector<dht::token_range> are used in a lot of places, introduce dht::partition_range_vector and dht::token_range_vector as the alias.	2016-12-19 08:09:28 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Asias He	1f06eedb58	dht: Rename token_range to token_range_endpoints It is a helper class used in storage_service only. Rename it so we can use it for the real dht::token_range.	2016-12-19 08:04:29 +08:00
Asias He	264b6ee69e	dht: Introduce dht::token_range an dht::partition_range nonwrapping_range<ring_position> and nonwrapping_range<token> are used in many places. Let's make an alias for them to make it less verbose. Also there is a query::partition_range in query-request.hh which is the alias of nonwrapping_range<ring_position>. query::partition_range is used in places not related to query at all. Let's unify the usage project wide.	2016-12-19 08:04:29 +08:00
Paweł Dziepak	b86a826baf	dht: describe split_range[s]_to_shards() guarantees We are going to require these functions to return sorted and disjoint ranges. They already do so (provided that the input ranges are sorted and disjoint), but if the guarantee is not explicitly stated it may disappear some day.	2016-12-15 13:07:32 +00:00

1 2 3 4

199 Commits