scylladb

Author	SHA1	Message	Date
Avi Kivity	af118ab52b	murmur3_partitioner: fix build on clang Don't know what the root cause it, but the fix is harmless.	2017-04-17 23:03:15 +03:00
Avi Kivity	c05f60387b	i_partitioner: remove unused function Found by clang.	2017-04-17 23:03:15 +03:00
Avi Kivity	a496ec7f5b	byte_ordered_partitioner: fix bad operator precedence Found by clang.	2017-04-17 23:03:15 +03:00
Tomasz Grabiec	d4b6e430ed	dht: Introduce ring_position_view	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	55a7cceef5	dht: Move comparison logic from ring_position::tri_compare() to ring_position_comparator It will soon define common ordering for many objects, not just ring_position.	2017-03-28 18:10:39 +02:00
Tomasz Grabiec	65a8920b25	dht: Make min/max tokens capturable by reference So that they can be later used in views.	2017-03-28 18:10:39 +02:00
Avi Kivity	54b8acdd9f	dht: add hashing and comparison helpers to dht::decorarted_key An std::hash specialization, and an equality comparator.	2017-01-20 11:24:14 +02:00
Avi Kivity	141048e0e5	dht: improve token hash function For a small token, we can just return it, since it already is a hash. We hash large tokens using murmur3, which is supposedly a good hash.	2017-01-20 11:24:14 +02:00
Avi Kivity	8686a59ea5	dht: use nonwrapping_ranges in ring_position_range_sharder It was the observation that ring_position_range_sharder doesn't support wrapping ranges that started the nonwrapping_range madness, but that class still has some leftover wrapping ranges. Close the circle by removing them. Message-Id: <20161123153113.8944-1-avi@scylladb.com>	2016-12-22 14:40:30 +01:00
Avi Kivity	a1cafed370	storage_proxy: handle range scans of sparsely populated tables When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the default), a scan range can be split into a large number of subranges, each going to a separate shard. With the current implementation, subranges were queried sequentially, resulting in very long latency when the table was empty or nearly empty. Switch to an exponential retry mechanism, where the number of subranges queried doubles each time, dropping the latency from O(number of subranges) to O(log(number of subranges)). If, during an iteration of a retry, we read at most one range from each shard, then partial results are merged by concatentation. This optimizes for the dense(r) case, where few partial results are required. If, during an iteration of a retry, we need more than one range per shard, then we collapse all of a shard's ranges into just one range, and merge partial results by sorting decorated keys. This reduces the number of sstable read creations we need to make, and optimizes for the sparse table case, where we need many partial results, most of which are empty. We don't merge subranges that come from different partition ranges, because those need to be sorted in request order, not decorated key order. [tgrabiec: trivial conflicts] Message-Id: <20161220170532.25173-1-avi@scylladb.com>	2016-12-20 18:32:29 +01:00
Asias He	937f28d2f1	Convert to use dht::partition_range_vector and dht::token_range_vector	2016-12-19 14:08:50 +08:00
Asias He	7a446986fa	dht: Introduce dht::partition_range_vector and dht::token_range_vector std::vector<dht::partition_range> and std::vector<dht::token_range> are used in a lot of places, introduce dht::partition_range_vector and dht::token_range_vector as the alias.	2016-12-19 08:09:28 +08:00
Asias He	85034c1b57	Convert to use dht::partition_range	2016-12-19 08:04:30 +08:00
Asias He	d1178fa299	Convert to use dht::token_range	2016-12-19 08:04:29 +08:00
Asias He	1f06eedb58	dht: Rename token_range to token_range_endpoints It is a helper class used in storage_service only. Rename it so we can use it for the real dht::token_range.	2016-12-19 08:04:29 +08:00
Asias He	264b6ee69e	dht: Introduce dht::token_range an dht::partition_range nonwrapping_range<ring_position> and nonwrapping_range<token> are used in many places. Let's make an alias for them to make it less verbose. Also there is a query::partition_range in query-request.hh which is the alias of nonwrapping_range<ring_position>. query::partition_range is used in places not related to query at all. Let's unify the usage project wide.	2016-12-19 08:04:29 +08:00
Paweł Dziepak	b86a826baf	dht: describe split_range[s]_to_shards() guarantees We are going to require these functions to return sorted and disjoint ranges. They already do so (provided that the input ranges are sorted and disjoint), but if the guarantee is not explicitly stated it may disappear some day.	2016-12-15 13:07:32 +00:00
Asias He	463cc4fbde	dht: Introduce split_ranges_to_shards Split a ranges into shard ranges map with ring_position_range_sharder helper.	2016-12-12 09:04:21 +08:00
Asias He	044c4ff44c	dht: Introduce split_range_to_shards Split a range into shard ranges map with ring_position_range_sharder helper.	2016-12-12 09:04:21 +08:00
Duarte Nunes	ada2f1092e	dht: Make i_partitioner::tri_compare pure virtual This patch makes the i_partitioner::tri_compare() function pure virtual as it is overridden by all partitioners. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161211172037.16496-1-duarte@scylladb.com>	2016-12-11 19:29:37 +02:00
Duarte Nunes	bb66b051ed	dht: Make i_partitioner::tri_compare memory safe This patch fixes a typo in i_partitioner::tri_compare() where we were using std::max instead of std::min, thus avoiding accessing random memory and getting random results. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161211165043.17816-1-duarte@scylladb.com>	2016-12-11 18:58:10 +02:00
Avi Kivity	28857e42e7	Merge " Virtualize size_estimates system table" from Duarte "We currently write the size_estimates system table for every schema on a periodic basis, currently set to 5 minutes, which can interfere with an ongoing workload. This patchset virtualizes it such that queries are intercepted and we calculate the results on the fly, only for the ranges the caller is interested in. Fixes #1616" * 'virtual-estimates/v4' of github.com:duarten/scylla: size_estimates_virtual_reader: Add unit test db: Delete size_estimates_recorder size_estimates: Add virtual reader column_family: Add support for virtual readers storage_service: get_local_tokens() returns a future nonwrapping_range: Add slice() function range: Find a sequence's lower and upper bounds system_keyspace: Build mutations for size estimates size_estimates: Store the token range as bytes range_estimates: Add schema murmur3_partitioner: Convert maximum_token to sstring	2016-11-28 10:12:59 +02:00
Avi Kivity	07d5a20bae	Wire up sharding ignore msb parameter to configuration We might have used a fancy map<sstring, any> to pass the parameters, but that's overkill for now.	2016-11-22 22:40:47 +02:00
Avi Kivity	8b1d689de8	partitioner: add ignore_msb parameters to byte ordered and random partitioners Ignored; doesn't make sense on byte ordered, and random is deprecated.	2016-11-22 21:56:42 +02:00
Avi Kivity	af16c0fac4	murmur3_partitioner: shard on the middle token bits, not most significant bits Sharding on the most significant token bits aliases with the vnode mechanism, which also uses the most significant bits; this requires a huge number of vnodes to achieve good sharding. This patch teaches the murmur3 partitioner to ignore the most significant N bits when calculating a token's hard, so we use token bits which still have some entropy. In effect, with changes the token range layout from shard 0 shard 1 ... shard S-1 to shard 0 shard 1 ... shard S-1 shard 0 shard 1 ... shard S-1 ... shard 0 shard 1 ... shard S-1 Where the number of repetitions of the block is 2^(ignored msb bits). For compatibility, the default is zero ignored bits, matching the pre-patch state, until we wire things up.	2016-11-22 21:56:42 +02:00
Duarte Nunes	01815ecd24	murmur3_partitioner: Convert maximum_token to sstring This patch ensures we can convert the maximum_token to an sstring. For Cassandra, the minimum and maximum tokens have the same representation. So, we use the string representation of the maximum_token for the maximum_token. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-11-21 10:56:32 +00:00
Duarte Nunes	66f6a367a4	ring_position_range_sharder: Avoid copying eagerly Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20161104115632.15974-1-duarte@scylladb.com>	2016-11-13 11:42:23 +02:00
Avi Kivity	7202b94183	dht: introduce a sharder for vectors of partition ranges Building on the single-range sharder, add a sharder for vectors of partition ranges. This helps with wrapped ranges, which are translated into a vector containing two shards.	2016-11-03 19:10:20 +02:00
Avi Kivity	43a2380899	dht: add a generator for shard/range pairs Divides a ring_position range into a sequence of shard/range pairs. This allows sequential iteration over shards in ring order. The current multi-partition query executes on all shards in parallel, but this is very wasteful, as most of the data will be thrown away if it is not included in the page. With the generator, we can switch to sequential execution.	2016-11-03 19:10:17 +02:00
Avi Kivity	1f88d103a8	partitioner: add i_partitioner::token_for_next_shard() When performing a range query, we want to iterate over shards, running the query on each shard in order until the query range is exhausted or we have the right number of rows. To be able to do this, introduce token_for_next_shard(), which allows us to determine the boundary between shards. It is a sort-of inverse to shard_of(), in that shard_of(token_for_next_range(t)) == shard_of(t) + 1	2016-11-03 19:09:23 +02:00
Avi Kivity	6c45b0bae8	partitioner: make comparators public The public comparison operators depend on global_partitioner(), and are therefore less useful for tests.	2016-11-03 11:27:40 +02:00
Avi Kivity	6320181b97	partitioner: const correctness for comparators	2016-11-03 11:27:40 +02:00
Avi Kivity	470826d127	partitioner: change partitioners to have shard counts independent from smp::count Useful for testing.	2016-11-03 11:27:40 +02:00
Avi Kivity	a35136533d	Convert ring_position and token ranges to be nonwrapping Wrapping ranges are a pain, so we are moving wrap handling to the edges. Since cql can't generate wrapping ranges, this means thrift and the ring maintenance code; also range->ring transformations need to merge the first and last ranges. Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>	2016-11-02 21:04:11 +02:00
Duarte Nunes	862f51cddf	partitioner: Parse token from bytes This patch adds the from_bytes() function to the i_partitioner class, whose purpose is parse a particular token and explicitly handle the case when the minimum token is specified. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2016-09-30 11:17:02 +00:00
Avi Kivity	4fcebd4ca6	random_partitioner: fix overflow in shard_of() uint128_t will overflow if smp::count > 2. Replace with a larger type. Message-Id: <1471188765-30142-1-git-send-email-avi@scylladb.com>	2016-08-15 09:41:54 +03:00
Asias He	2f4cd86809	random_partitioner: Implement random_partitioner Cassandra 1.x clusters often use RandomPartitioner. Supporting RandomPartitioner will allow easier migration to Scylla Tests are added to make sure scylla generates the same token as Cassandra does for the same partition key. Fixes #1438 Message-Id: <3bc8b7f06fad16d59aaaa96e2827198ce74214c6.1469166766.git.asias@scylladb.com>	2016-07-24 16:25:25 +03:00
Duarte Nunes	aaa76d58ba	query: Move to_partition_range to dht namespace This patch moves to_partition_range, from the query namespace to the dht namespace, where it is a more natural fit. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>	2016-07-15 10:41:52 +02:00
Asias He	f4389349e4	config: Enable partitioner option Enable --partitioner option so that user can choose partitioner other than the default Murmur3Partitioner. Currently, only Murmur3Partitioner and ByteOrderedPartitioner are supported. When non-supported partitioner is specifed, error will be propogated to user.	2016-07-08 17:44:55 +08:00
Asias He	9c27b5c46e	byte_ordered_partitioner: Implement missing describe_ownership and midpoint In order to support ByteOrderedPartitioner, we need to implement the missing describe_ownership and midpoint function in byte_ordered_partitioner class. As a starter, this path uses a simple node token distance based method to calculate ownership. C* uses a complicated key samples based method. We can switch to what C* does later. Tests are added to tests/partitioner_test.cc. Fixes #1378	2016-07-08 17:44:55 +08:00
Asias He	f6a2672be0	storage_service: Modify log to match config option of scylla We currently log as follow: May 9 00:09:13 node3.nl scylla[2546]: [shard 0] storage_service - This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set,or all existing data is removed and the node is bootstrapped again Howerver, user should use override_decommission:true instead of cassandra.override_decommission:true in scylla.yaml where the cassandra prefix is stripped. Fixes #1240 Message-Id: <b0c9424c6922431ad049ab49391771e07ca6fbde.1467079190.git.asias@scylladb.com>	2016-07-04 10:47:49 +02:00
Piotr Jastrzebski	27575a0528	Fix previous_entry_is_continuous Rename it to check_previous_entry. Remove unnesessary test. Make sure ring_position always has working relation_to_keys method. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com> Message-Id: <6bc790d492ba9b5c302a50218f3e26b924f657d0.1467101754.git.piotr@scylladb.com>	2016-06-28 10:27:08 +02:00
Asias He	ee0585cee9	dht: Add default constructor for token It is needed to put token in to a boost interval_map in the following patch.	2016-05-17 17:32:15 +08:00
Pekka Enberg	38a54df863	Fix pre-ScyllaDB copyright statements People keep tripping over the old copyrights and copy-pasting them to new files. Search and replace "Cloudius Systems" with "ScyllaDB". Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>	2016-04-08 08:12:47 +03:00
Gleb Natapov	775cc93880	remove unused range and token serializers	2016-02-02 12:15:49 +02:00
Asias He	bdd6a69af7	streaming: Drop unused parameters - int connections_per_host Scylla does not create connections per stream_session, instead it uses rpc, thus connections_per_host is not relevant to scylla. - bool keep_ss_table_level - int repaired_at Scylla does not stream sstable files. They are not relevant to scylla.	2016-01-25 11:38:13 +08:00
Gleb Natapov	043d132ba9	Remove no longer used serializers.	2016-01-24 12:45:41 +02:00
Gleb Natapov	49ce2b83df	Add ring_position constructor needed by serializer.	2016-01-24 12:45:41 +02:00
Asias He	89b79d44de	streaming: Get rid of the _connecting_ parameter messaging_service will use private ip address automatically to connect a peer node if possible. There is no need for the upper level like streaming to worry about it. Drop it simplifies things a bit.	2015-12-31 11:25:08 +01:00
Nadav Har'El	f0b27671a2	murmur3 partitioner: remove outdated comment, and code Since commit `16596385ee`, long_token() is already checking t.is_minimum(), so the comment which explains why it does not (for performance) is no longer relevant. And we no longer need to check t._kind before calling long_token (the check we do here is the same as is_minimum). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2015-12-30 10:01:29 +02:00

1 2 3 4

166 Commits