Only the partitioner knows how to convert a token to a sstring. Conversely,
only the partitioner can know how to convert it back.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Right now, we are converting the _data part of the token to a sstring, which
may be latter stored somewhere - in a system sstable, for instance. Later on,
we will have to get it back, but the way the code currently stands, we will get
undefined results for min and max tokens, since they have the _data field
empty.
For murmur3, strictly speaking, the correct solution would be to change
long_token to account for that. However, when we compare values, we already do
kind comparations explicitly. Inserting them there would only make that
operation branchier == costlier, which being a very common one, we don't want
to.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This allows token::_data to be in a different representation
than the one expected by the token type.
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
Loading data from memory tends to be the most expensive part of the comparison
operations. Because we don't have a tri_compare function for tokens, we end up
having to do an equality test, which will load the token's data in memory, and
then, because all we know is that they are not equal, we need to do another
one.
Having two dereferences is harmful, and shows up in my simple benchmark. This
is because before writing to sstables, we must order the keys in decorated key
order, which is heavy on the comparisons.
The proposed change speeds up index write benchmark by 8.6%:
Before:
41458.14 +- 1.49 partitions / sec (30 runs)
After:
45020.81 +- 3.60 partitions / sec (30 runs)
Parameters:
--smp 6 --partitions 500000
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Aside from being the obviously correct thing to do, not having this will force us
to manually adjust num_tokens when running our sstables into Cassandra.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
It needs to access the non-existent "DatabaseDescriptor". Do as we have been doing,
and just pass the database object instead.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Error reported by debug mode when running sstable test
Solution is to use unaligned cast.
dht/murmur3_partitioner.cc:67:25: runtime error: load of misaligned
address 0x6030000478fc for type 'const long int', which requires 8
byte alignment
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
We need a container which can be used with compacting
allocators. "bytes" can't be used with compacting allocator because it
can't handle its external storage being moved.
Make sharding partitioner-specific, since different partitioners interpret
the byte content differently.
Implement it by extracting the shard from the most significant bits, which
can be used to minimize cross shard traffic for range queries, and reduces
sstable sharing.
range::is_wrap_around() and range::contains() rely on total ordering
on values to work properly. Current ring_position_comparator was only
imposing a weak ordering (token positions equal to all key positions
with that token).
range::before() and range::after() can't work for weak ordering. If
the bound is exclusive, we don't know if user-provided token position
is inside or outside.
Also, is_wrap_around() can't properly detect wrap around in all
cases. Consider this case:
(1) ]A; B]
(2) [A; B]
For A = (tok1) and B = (tok1, key1), (1) is a wrap around and (2) is
not. Without total ordering between A and B, range::is_wrap_around() can't
tell that.
I think the simplest soution is to define a total ordering on
ring_position by making token positions positioned either before or
after all keys with that token.
Two ring_positions are equal if tokens and keys are equal or tokens are
equal and one or both of them do not specify key. So ring_positions
without a key is a wildcard that equals any ring_positions with the same
token.
Some of the tests in DTEST take advantage of the fact that
ByteOrderedPartitioner guarantees certain ordering of partition keys.
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>
We need to be able to do it so we can, among other things, create CQL
statements that include the current state of the tokens.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
It is a better fit for things that are names, not blobs. We have a user that expects
a bytes parameter, but that is for no other reason than the fact that the field used
to be of bytes type.
Let's fix that, and future users will be able to use sstrings
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
midpoint(l, r) where l > r needs to wrap around the end of the ring. Adjust
the midpoint() function to do this.
Note this is still broken for the murmur3 partitioner, since it doesn't treat
tokens as unsigned.
In sstables the paritioner name is store for validation. To allow Origin
to process our files we need to comply with Origin's paritioner name or
else Origin's SSTableReader::open fails on paritioner comparison.
We do not care about the order of the tokens.
Also, in token_metadata, we use unordered_set for tokens as well, e.g.
update_normal_tokens. Unify the usage.
Validation metadata stores partitioner name and bloom filter chance.
Cassandra gets the partitioner name by getting a object of the class
itself and getting its canonical name.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>