Commit Graph

53 Commits

Author SHA1 Message Date
Avi Kivity
c05f60387b i_partitioner: remove unused function
Found by clang.
2017-04-17 23:03:15 +03:00
Tomasz Grabiec
d4b6e430ed dht: Introduce ring_position_view 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
55a7cceef5 dht: Move comparison logic from ring_position::tri_compare() to ring_position_comparator
It will soon define common ordering for many objects, not just
ring_position.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
65a8920b25 dht: Make min/max tokens capturable by reference
So that they can be later used in views.
2017-03-28 18:10:39 +02:00
Avi Kivity
141048e0e5 dht: improve token hash function
For a small token, we can just return it, since it already is a hash.
We hash large tokens using murmur3, which is supposedly a good hash.
2017-01-20 11:24:14 +02:00
Avi Kivity
a1cafed370 storage_proxy: handle range scans of sparsely populated tables
When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the
default), a scan range can be split into a large number of subranges, each
going to a separate shard.  With the current implementation, subranges were
queried sequentially, resulting in very long latency when the table was empty
or nearly empty.

Switch to an exponential retry mechanism, where the number of subranges
queried doubles each time, dropping the latency from O(number of subranges)
to O(log(number of subranges)).

If, during an iteration of a retry, we read at most one range
from each shard, then partial results are merged by concatentation.  This
optimizes for the dense(r) case, where few partial results are required.

If, during an iteration of a retry, we need more than one range per
shard, then we collapse all of a shard's ranges into just one range,
and merge partial results by sorting decorated keys.  This reduces
the number of sstable read creations we need to make, and optimizes for
the sparse table case, where we need many partial results, most of which
are empty.

We don't merge subranges that come from different partition ranges,
because those need to be sorted in request order, not decorated key order.

[tgrabiec: trivial conflicts]

Message-Id: <20161220170532.25173-1-avi@scylladb.com>
2016-12-20 18:32:29 +01:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Asias He
463cc4fbde dht: Introduce split_ranges_to_shards
Split a ranges into shard ranges map with ring_position_range_sharder
helper.
2016-12-12 09:04:21 +08:00
Asias He
044c4ff44c dht: Introduce split_range_to_shards
Split a range into shard ranges map with ring_position_range_sharder
helper.
2016-12-12 09:04:21 +08:00
Duarte Nunes
ada2f1092e dht: Make i_partitioner::tri_compare pure virtual
This patch makes the i_partitioner::tri_compare() function pure
virtual as it is overridden by all partitioners.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161211172037.16496-1-duarte@scylladb.com>
2016-12-11 19:29:37 +02:00
Duarte Nunes
bb66b051ed dht: Make i_partitioner::tri_compare memory safe
This patch fixes a typo in i_partitioner::tri_compare() where we were
using std::max instead of std::min, thus avoiding accessing random
memory and getting random results.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161211165043.17816-1-duarte@scylladb.com>
2016-12-11 18:58:10 +02:00
Avi Kivity
07d5a20bae Wire up sharding ignore msb parameter to configuration
We might have used a fancy map<sstring, any> to pass the parameters, but
that's overkill for now.
2016-11-22 22:40:47 +02:00
Duarte Nunes
66f6a367a4 ring_position_range_sharder: Avoid copying eagerly
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161104115632.15974-1-duarte@scylladb.com>
2016-11-13 11:42:23 +02:00
Avi Kivity
7202b94183 dht: introduce a sharder for vectors of partition ranges
Building on the single-range sharder, add a sharder for vectors of
partition ranges.  This helps with wrapped ranges, which are translated
into a vector containing two shards.
2016-11-03 19:10:20 +02:00
Avi Kivity
43a2380899 dht: add a generator for shard/range pairs
Divides a ring_position range into a sequence of shard/range pairs.  This
allows sequential iteration over shards in ring order.

The current multi-partition query executes on all shards in parallel, but
this is very wasteful, as most of the data will be thrown away if it is not
included in the page.  With the generator, we can switch to sequential
execution.
2016-11-03 19:10:17 +02:00
Avi Kivity
6320181b97 partitioner: const correctness for comparators 2016-11-03 11:27:40 +02:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Duarte Nunes
aaa76d58ba query: Move to_partition_range to dht namespace
This patch moves to_partition_range, from the query namespace
to the dht namespace, where it is a more natural fit.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1468498060-19251-1-git-send-email-duarte@scylladb.com>
2016-07-15 10:41:52 +02:00
Asias He
f4389349e4 config: Enable partitioner option
Enable --partitioner option so that user can choose partitioner other
than the default Murmur3Partitioner. Currently, only Murmur3Partitioner
and ByteOrderedPartitioner are supported. When non-supported partitioner
is specifed, error will be propogated to user.
2016-07-08 17:44:55 +08:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Gleb Natapov
775cc93880 remove unused range and token serializers 2016-02-02 12:15:49 +02:00
Gleb Natapov
043d132ba9 Remove no longer used serializers. 2016-01-24 12:45:41 +02:00
Asias He
6259dd73b6 token: Print token using the partitioner defined method
Make nodetool ring output align with c* output.
2015-10-29 15:53:47 +02:00
Raphael S. Carvalho
936575efd2 dht: introduce comparator for token
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-10-15 19:34:50 -03:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Glauber Costa
e1968c389e dht: use tri_compare for token comparisons
Loading data from memory tends to be the most expensive part of the comparison
operations. Because we don't have a tri_compare function for tokens, we end up
having to do an equality test, which will load the token's data in memory, and
then, because all we know is that they are not equal, we need to do another
one.

Having two dereferences is harmful, and shows up in my simple benchmark. This
is because before writing to sstables, we must order the keys in decorated key
order, which is heavy on the comparisons.

The proposed change speeds up index write benchmark by 8.6%:

Before:
41458.14 +- 1.49 partitions / sec (30 runs)

After:
45020.81 +- 3.60 partitions / sec (30 runs)

Parameters:
--smp 6 --partitions 500000

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:23:42 -05:00
Tomasz Grabiec
593fa725e9 dht: Relax dependencies on bytes const&
In preparation to switching to a bytes container which is not "bytes"
switch to bytes_view.
2015-08-06 14:05:16 +02:00
Avi Kivity
f915ff1fcd dht: introduce i_partitioner::shard_of() and implement msb sharding
Make sharding partitioner-specific, since different partitioners interpret
the byte content differently.

Implement it by extracting the shard from the most significant bits, which
can be used to minimize cross shard traffic for range queries, and reduces
sstable sharing.
2015-08-03 20:17:40 +03:00
Tomasz Grabiec
e5feff5d71 dht: ring_position: Switch to total ordering
range::is_wrap_around() and range::contains() rely on total ordering
on values to work properly. Current ring_position_comparator was only
imposing a weak ordering (token positions equal to all key positions
with that token).

range::before() and range::after() can't work for weak ordering. If
the bound is exclusive, we don't know if user-provided token position
is inside or outside.

Also, is_wrap_around() can't properly detect wrap around in all
cases. Consider this case:

 (1) ]A; B]
 (2) [A; B]

For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is
not. Without total ordering between A and B, range::is_wrap_around() can't
tell that.

I think the simplest soution is to define a total ordering on
ring_position by making token positions positioned either before or
after all keys with that token.
2015-07-24 16:08:41 +02:00
Tomasz Grabiec
2e845140e2 dht: ring_position: Implement less_compare() and equals() using tri_compare() 2015-07-24 16:08:41 +02:00
Tomasz Grabiec
7b30e8fcff dht: ring_position: Move definitions out of line 2015-07-24 16:08:41 +02:00
Asias He
243dbd3bfd dht: Reuse token serializer in ring_position 2015-07-23 09:08:15 +08:00
Asias He
1761be4dc4 dht: Implement serializer interface for token
It is needed by query::range<token>.
2015-07-23 09:08:15 +08:00
Gleb Natapov
253ba71747 add comparison functions for ring_position
Two ring_positions are equal if tokens and keys are equal or tokens are
equal and one or both of them do not specify key. So ring_positions
without a key is a wildcard that equals any ring_positions with the same
token.
2015-07-15 12:41:31 +03:00
Gleb Natapov
49fb10d640 improve token printout
Add min/max token printout
2015-07-14 12:21:42 +03:00
Paweł Dziepak
351b113913 dht: allow configuration file to choose partitioner
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-12 15:14:53 +02:00
Tomasz Grabiec
d035c499b8 db: Move database::shard_of() to dht::shard_of() 2015-07-07 16:56:25 +02:00
Gleb Natapov
730170ff1a serialize data structures needed for read clustering 2015-07-01 13:36:28 +03:00
Tomasz Grabiec
ac333f04e2 dht: Make decorated_key comparable with ring_position 2015-06-25 18:45:12 +02:00
Tomasz Grabiec
9525464f74 Move query::ring_position to dht::ring_position 2015-06-25 18:45:12 +02:00
Avi Kivity
f8e2c13933 dht: rename midpoint() to midpoint_unsigned()
midpoint()'s algorithm only works for unsigned tokens, so rename it
accordingly.
2015-06-25 14:54:40 +03:00
Avi Kivity
d8946d07ed dht: fix token wraparound in midpoint()
midpoint(l, r) where l > r needs to wrap around the end of the ring.  Adjust
the midpoint() function to do this.

Note this is still broken for the murmur3 partitioner, since it doesn't treat
tokens as unsigned.
2015-06-25 13:55:46 +03:00
Asias He
9649414011 dht: Align token print
Before:
token=df 96 79 87 21 b2 ed 80
token=5c 98 e a0 4f 5e 28 6b

After:
token=df 96 79 87 21 b2 ed 80
token=5c 98 0e a0 4f 5e 28 6b
2015-06-04 17:12:10 +08:00
Glauber Costa
6a8049dce1 dht: add maximum_token
Analogous to minimum_token.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-05-21 16:27:35 -04:00
Tomasz Grabiec
aec740f895 db: Make decorated_key have ordering compatible with Origin 2015-04-30 12:02:39 +02:00
Tomasz Grabiec
71041eb0d6 dht: Implement operator<< for decorated_key 2015-04-24 18:01:01 +02:00
Tomasz Grabiec
841a13da93 dht: Implement operator!= for decorated_key 2015-04-24 18:01:01 +02:00
Gleb Natapov
02fb270fbe token operator<< 2015-04-19 10:15:14 +03:00