Commit Graph

76 Commits

Author SHA1 Message Date
Glauber Costa
229ce6cd85 dht: provide a from_sstring method
Only the partitioner knows how to convert a token to a sstring. Conversely,
only the partitioner can know how to convert it back.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 11:03:35 -07:00
Glauber Costa
5f807784bf dht: fix to_sstring methods to account for min tokens
Right now, we are converting the _data part of the token to a sstring, which
may be latter stored somewhere - in a system sstable, for instance. Later on,
we will have to get it back, but the way the code currently stands, we will get
undefined results for min and max tokens, since they have the _data field
empty.

For murmur3, strictly speaking, the correct solution would be to change
long_token to account for that. However, when we compare values, we already do
kind comparations explicitly. Inserting them there would only make that
operation branchier == costlier, which being a very common one, we don't want
to.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 10:23:19 -07:00
Glauber Costa
6fcbb3570e murmur3 partitioner: explicitly use int64_t instead of long
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-17 10:19:52 -07:00
Paweł Dziepak
d06e450616 dht: add i_paritioner::token_to_bytes()
This allows token::_data to be in a different representation
than the one expected by the token type.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-14 16:12:30 +02:00
Paweł Dziepak
faa588cb0a dht: murmur3_paritioner: implement get_token_validator()
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-08-14 15:58:39 +02:00
Glauber Costa
e1968c389e dht: use tri_compare for token comparisons
Loading data from memory tends to be the most expensive part of the comparison
operations. Because we don't have a tri_compare function for tokens, we end up
having to do an equality test, which will load the token's data in memory, and
then, because all we know is that they are not equal, we need to do another
one.

Having two dereferences is harmful, and shows up in my simple benchmark. This
is because before writing to sstables, we must order the keys in decorated key
order, which is heavy on the comparisons.

The proposed change speeds up index write benchmark by 8.6%:

Before:
41458.14 +- 1.49 partitions / sec (30 runs)

After:
45020.81 +- 3.60 partitions / sec (30 runs)

Parameters:
--smp 6 --partitions 500000

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-12 09:23:42 -05:00
Glauber Costa
3426b3ecc1 bootstrap tokens: get tokens from config file
Aside from being the obviously correct thing to do, not having this will force us
to manually adjust num_tokens when running our sstables into Cassandra.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 11:10:56 -05:00
Glauber Costa
2678b0e606 dht: change get_bootstrap_tokens()'s signature
It needs to access the non-existent "DatabaseDescriptor". Do as we have been doing,
and just pass the database object instead.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-08-07 11:10:56 -05:00
Raphael S. Carvalho
915b05c956 dht: fix load of misaligned address
Error reported by debug mode when running sstable test
Solution is to use unaligned cast.

dht/murmur3_partitioner.cc:67:25: runtime error: load of misaligned
address 0x6030000478fc for type 'const long int', which requires 8
byte alignment

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-08-06 18:28:38 +03:00
Tomasz Grabiec
c4acdb2068 db: Switch from bytes to managed_bytes for storing data
We need a container which can be used with compacting
allocators. "bytes" can't be used with compacting allocator because it
can't handle its external storage being moved.
2015-08-06 14:05:16 +02:00
Tomasz Grabiec
593fa725e9 dht: Relax dependencies on bytes const&
In preparation to switching to a bytes container which is not "bytes"
switch to bytes_view.
2015-08-06 14:05:16 +02:00
Avi Kivity
f915ff1fcd dht: introduce i_partitioner::shard_of() and implement msb sharding
Make sharding partitioner-specific, since different partitioners interpret
the byte content differently.

Implement it by extracting the shard from the most significant bits, which
can be used to minimize cross shard traffic for range queries, and reduces
sstable sharing.
2015-08-03 20:17:40 +03:00
Tomasz Grabiec
e5feff5d71 dht: ring_position: Switch to total ordering
range::is_wrap_around() and range::contains() rely on total ordering
on values to work properly. Current ring_position_comparator was only
imposing a weak ordering (token positions equal to all key positions
with that token).

range::before() and range::after() can't work for weak ordering. If
the bound is exclusive, we don't know if user-provided token position
is inside or outside.

Also, is_wrap_around() can't properly detect wrap around in all
cases. Consider this case:

 (1) ]A; B]
 (2) [A; B]

For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is
not. Without total ordering between A and B, range::is_wrap_around() can't
tell that.

I think the simplest soution is to define a total ordering on
ring_position by making token positions positioned either before or
after all keys with that token.
2015-07-24 16:08:41 +02:00
Tomasz Grabiec
2e845140e2 dht: ring_position: Implement less_compare() and equals() using tri_compare() 2015-07-24 16:08:41 +02:00
Tomasz Grabiec
7b30e8fcff dht: ring_position: Move definitions out of line 2015-07-24 16:08:41 +02:00
Asias He
243dbd3bfd dht: Reuse token serializer in ring_position 2015-07-23 09:08:15 +08:00
Asias He
1761be4dc4 dht: Implement serializer interface for token
It is needed by query::range<token>.
2015-07-23 09:08:15 +08:00
Tomasz Grabiec
6373987704 partition_range: Introduce is_wrap_around() helper 2015-07-22 13:13:38 +02:00
Tomasz Grabiec
cce2c648c5 Merge branch 'dev/gleb/for_urchin' from seastar-dev.git
Beginning of read implementation for range queries from Gleb.
2015-07-15 12:53:43 +03:00
Gleb Natapov
253ba71747 add comparison functions for ring_position
Two ring_positions are equal if tokens and keys are equal or tokens are
equal and one or both of them do not specify key. So ring_positions
without a key is a wildcard that equals any ring_positions with the same
token.
2015-07-15 12:41:31 +03:00
Tomasz Grabiec
0a4651cd28 dht: Add field getters to decorated_key 2015-07-14 19:57:37 +02:00
Gleb Natapov
49fb10d640 improve token printout
Add min/max token printout
2015-07-14 12:21:42 +03:00
Paweł Dziepak
351b113913 dht: allow configuration file to choose partitioner
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-12 15:14:53 +02:00
Paweł Dziepak
ede9886f50 dht: add byte_ordered_partitioner
Some of the tests in DTEST take advantage of the fact that
ByteOrderedPartitioner guarantees certain ordering of partition keys.

Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
2015-07-09 23:43:16 +02:00
Avi Kivity
dd29ac9593 Merge "cqlsh" from Glauber
System table Work to make cqlsh connect.
2015-07-07 19:33:23 +03:00
Glauber Costa
bd13e3995b dht: implement method to convert a token to a string
We need to be able to do it so we can, among other things, create CQL
statements that include the current state of the tokens.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-07 11:38:22 -04:00
Glauber Costa
45905ec94d dht: change partitioner name to sstring
It is a better fit for things that are names, not blobs. We have a user that expects
a bytes parameter, but that is for no other reason than the fact that the field used
to be of bytes type.

Let's fix that, and future users will be able to use sstrings

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-07-07 11:38:22 -04:00
Tomasz Grabiec
d035c499b8 db: Move database::shard_of() to dht::shard_of() 2015-07-07 16:56:25 +02:00
Gleb Natapov
730170ff1a serialize data structures needed for read clustering 2015-07-01 13:36:28 +03:00
Tomasz Grabiec
ac333f04e2 dht: Make decorated_key comparable with ring_position 2015-06-25 18:45:12 +02:00
Tomasz Grabiec
9525464f74 Move query::ring_position to dht::ring_position 2015-06-25 18:45:12 +02:00
Tomasz Grabiec
6515cb642b dht: Remove obsolete files 2015-06-25 18:45:12 +02:00
Avi Kivity
ba3fb44bf5 dht: implement more token relational operators 2015-06-25 15:25:37 +03:00
Avi Kivity
e4140a19ed dht: virtualize i_parititoner::midpoint() and implement for murmur3 2015-06-25 15:25:04 +03:00
Avi Kivity
f8e2c13933 dht: rename midpoint() to midpoint_unsigned()
midpoint()'s algorithm only works for unsigned tokens, so rename it
accordingly.
2015-06-25 14:54:40 +03:00
Avi Kivity
d8946d07ed dht: fix token wraparound in midpoint()
midpoint(l, r) where l > r needs to wrap around the end of the ring.  Adjust
the midpoint() function to do this.

Note this is still broken for the murmur3 partitioner, since it doesn't treat
tokens as unsigned.
2015-06-25 13:55:46 +03:00
Avi Kivity
0eb8d7384a Reduce partitioner's dependencies on sstables/*.hh 2015-06-22 19:00:55 +03:00
Shlomi Livne
954c697958 dht: Update name of Murmur3Partitioner to align with org.apache.cassandra.dht.Murmur3Partitioner
In sstables the paritioner name is store for validation. To allow Origin
to process our files we need to comply with Origin's paritioner name or
else Origin's SSTableReader::open fails on paritioner comparison.
2015-06-16 12:20:04 +03:00
Asias He
9649414011 dht: Align token print
Before:
token=df 96 79 87 21 b2 ed 80
token=5c 98 e a0 4f 5e 28 6b

After:
token=df 96 79 87 21 b2 ed 80
token=5c 98 0e a0 4f 5e 28 6b
2015-06-04 17:12:10 +08:00
Asias He
34b3d679ab dht: Do move in token constructor 2015-06-04 17:12:09 +08:00
Asias He
9c5cd2bca8 storage_service: Switch to use unordered_set for tokens
We do not care about the order of the tokens.

Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.
2015-06-04 17:12:09 +08:00
Raphael S. Carvalho
6ae4476427 sstables: collect validation metadata
Validation metadata stores partitioner name and bloom filter chance.
Cassandra gets the partitioner name by getting a object of the class
itself and getting its canonical name.

Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
2015-06-02 10:32:12 +03:00
Asias He
8d7aff89e3 storage_service: Reduce default num_tokens to 3
The default value 256 in Origin is too big for debug. We can set it back
when storage_service is maturer.
2015-06-01 11:24:38 +08:00
Asias He
d026ca6852 token: Add constructor token(kind, bytes) 2015-06-01 11:24:38 +08:00
Asias He
aa4a23a248 dht: Virtualize get_random_token
murmur3_partitioner::get_random_token is implemented.
2015-05-27 13:06:33 +08:00
Asias He
a66f06ad44 dht: Specify override on a virtual function in a derived class
The murmur3_partitioner overrides functions in i_partitioner.
2015-05-27 13:06:33 +08:00
Asias He
ee1d79cd2b dht: Convert BootStrapper.java to C++ 2015-05-27 13:06:33 +08:00
Asias He
1c9e88dcb9 dht: Import BootStrapper.java 2015-05-27 13:06:33 +08:00
Glauber Costa
6a8049dce1 dht: add maximum_token
Analogous to minimum_token.

Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
2015-05-21 16:27:35 -04:00
Tomasz Grabiec
96bbac8a57 dht: Make partitioner work on partition_key_view 2015-05-06 15:52:56 +02:00