Ideally we would like tokens to be trivially destructible, so that we
can easily dispose of giant vectors holding them. While that is hard to
do with our current infrastructure, we can introduce a token_view, which
holds a bytes_view elements instead of the real data - making it
trivially destructible.
The comparators are then changed to take a token_view, and an implicit
conversion function is provided from tokens so they get compared.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Right now, next_token_for_shard() only allows iterating linearly in shard
order. Add the ability to select a specific shard to skip to (in case we're
only interested in a single shard), and to select larger ranges (so that
exponential increases are not implemented by iteration).
When performing a range query, we want to iterate over shards, running the
query on each shard in order until the query range is exhausted or we have
the right number of rows.
To be able to do this, introduce token_for_next_shard(), which allows us
to determine the boundary between shards.
It is a sort-of inverse to shard_of(), in that
shard_of(token_for_next_range(t)) == shard_of(t) + 1
This patch adds the from_bytes() function to the i_partitioner class,
whose purpose is parse a particular token and explicitly handle the
case when the minimum token is specified.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Only the partitioner knows how to convert a token to a sstring. Conversely,
only the partitioner can know how to convert it back.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Right now, we are converting the _data part of the token to a sstring, which
may be latter stored somewhere - in a system sstable, for instance. Later on,
we will have to get it back, but the way the code currently stands, we will get
undefined results for min and max tokens, since they have the _data field
empty.
For murmur3, strictly speaking, the correct solution would be to change
long_token to account for that. However, when we compare values, we already do
kind comparations explicitly. Inserting them there would only make that
operation branchier == costlier, which being a very common one, we don't want
to.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Loading data from memory tends to be the most expensive part of the comparison
operations. Because we don't have a tri_compare function for tokens, we end up
having to do an equality test, which will load the token's data in memory, and
then, because all we know is that they are not equal, we need to do another
one.
Having two dereferences is harmful, and shows up in my simple benchmark. This
is because before writing to sstables, we must order the keys in decorated key
order, which is heavy on the comparisons.
The proposed change speeds up index write benchmark by 8.6%:
Before:
41458.14 +- 1.49 partitions / sec (30 runs)
After:
45020.81 +- 3.60 partitions / sec (30 runs)
Parameters:
--smp 6 --partitions 500000
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Make sharding partitioner-specific, since different partitioners interpret
the byte content differently.
Implement it by extracting the shard from the most significant bits, which
can be used to minimize cross shard traffic for range queries, and reduces
sstable sharing.
Some of the tests in DTEST take advantage of the fact that
ByteOrderedPartitioner guarantees certain ordering of partition keys.
Signed-off-by: Paweł Dziepak <pdziepak@cloudius-systems.com>