Commit Graph

193 Commits

Author SHA1 Message Date
Botond Dénes
d1209c548a Fix -Wreturn-type warnings
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <99f7a006daaa78eb87720ac51c394093398bc868.1504013915.git.bdenes@scylladb.com>
2017-08-29 16:41:09 +03:00
Tomasz Grabiec
2ca99be27d ring_position_view: Print token instead of token pointer
Broken in e989d65539.
Message-Id: <1503667158-7544-1-git-send-email-tgrabiec@scylladb.com>
2017-08-25 14:25:21 +01:00
Avi Kivity
81a33df25d dht: reduce split_range_to_single_shard contiguous memory demand
split_range_to_single_shard() returns a vector of size 4096, with
each element (a partition_range) of size 100. The total of 400k can
cause defragmentation if memory is fragmented.

Fix by using a deque.

Fixes #2707.
Message-Id: <20170819141017.28287-1-avi@scylladb.com>
2017-08-21 14:25:45 +02:00
Duarte Nunes
ec75eac37d ring_position_exponential_vector_sharder: Take ranges by rvalue
Avoids some copies.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20170814093310.29200-1-duarte@scylladb.com>
2017-08-14 12:55:43 +03:00
Asias He
f239b11a84 storage_service: Use the new range_streamer interface for bootstrap
So that bootstrap operation will now stream small ranges at a time and
restream the failed ranges.
2017-08-07 16:31:47 +08:00
Asias He
6810031ba7 dht: Extend range_streamer interface
After this patch and the following patches to use the new
range_streamder interface, all the following cluster operations:

- bootstrap
- rebuild
- decommission
- removenode

will use the same code to do the streaming.

The range_streamer is now extended to support both fetch from and push
to peer node. Another big change is now the range_streamer will stream
less ranges at a time, so less data, per stream_plan and range_streamer
will remember which ranges are failed to stream and can retry later.

The retry policy is very simple at the moment it retries at most 5 times
and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes ....

Later, we can introduce api for user to decide when to stop retrying and
the retry interval.

The benefits:

- All the cluster operation shares the same code to stream

- We can know the operation progress, e.g., we can know total number of
  ranges need to be streamed and number of ranges finished in
  bootstrap, decommission and etc.
- All the cluster operation can survive peer node down during the
  operation which usually takes long time to complete, e.g., when adding
  a new node, currently if any of the existing node which streams data to
  the new node had issue sending data to the new node, the whole bootstrap
  process will fail. After this patch, we can fix the problematic node
  and restart it, the joining node will retry streaming from the node
  again.
- We can fail streaming early and timeout early and retry less because
  all the operations use stream can survive failure of a single
  stream_plan. It is not that important for now to have to make a single
  stream_plan successful. Note, another user of streaming, repair, is now
  using small stream_plan as well and can rerun the repair for the
  failed ranges too.

This is one step closer to supporting the resumable add/remove node
opeartions.
2017-08-07 16:31:47 +08:00
Paweł Dziepak
68e57a742f ring_position_comparator: drop unused overloads 2017-07-26 14:36:37 +01:00
Paweł Dziepak
fe7eba7f06 ring_position_comparator: accept sstables::decorated_key_view
ring_position_comparator has overloads for comparing ring_positions as
well as sstables::key_view. In the case of the latter it needs to
compute the token of the key. However, the sstable layer could cache
some tokens so let's allow the comparator callers to provide it
directly.
2017-07-26 14:36:36 +01:00
Tomasz Grabiec
60678f0e8a ring_position: Optimize contruction from r-value referenceces of decorated_key
Message-Id: <1500650171-26291-1-git-send-email-tgrabiec@scylladb.com>
2017-07-24 10:25:14 +03:00
Asias He
d835cf2748 dht: Add selective_token_range_sharder
It is like ring_position_range_sharder but it works with
dht::token_range. This sharder will return the ranges belong to a
selected shard.
2017-07-04 18:46:19 +08:00
Tomasz Grabiec
e989d65539 dht: Make ring_position_view copyable
dht::token needs to be stored as a pointer now and not a reference so
that validity of old pointers is not impacted by in-place object
construction which would occur in the copy-assignment operator.

[1] says that old pointers can be used to access the new object only
if the type "does not contain any non-static data member whose type is
const-qualified or a reference type".

[1] http://en.cppreference.com/w/cpp/language/lifetime#Storage_reuse
2017-06-24 18:06:11 +02:00
Tomasz Grabiec
81e7b561da dht: Add ring_position min()/max() 2017-06-24 18:06:11 +02:00
Avi Kivity
f9f2f18145 dht: fix bad to_sstring() call
to_sstring() is part of seastar, nor the global namespace.
2017-06-22 17:51:27 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Calle Wilund
6ca07f16c1 scylla: fix compilation errors on gcc 5
Message-Id: <1495030581-2138-1-git-send-email-calle@scylladb.com>
2017-05-17 17:56:06 +03:00
Avi Kivity
68034604e1 dht: murmur3_partitioner: simplify moving to and from the zero-based token range 2017-05-17 13:50:30 +03:00
Avi Kivity
76f12a8842 dht: add split_range_to_single_shard()
Intersects a shard's owning range with a ring position range, and return
the sorted result.
2017-05-17 13:50:27 +03:00
Avi Kivity
a65e8bd215 dht: add a ring-position-range-vector variant of the exponential sharder
The "exponentiality" is not carried over from one range to another, because
we expect one or two ranges (two ranges result from a wrapped around thrift
token range).
2017-05-17 13:18:52 +03:00
Avi Kivity
f671ac13b4 dht: add an exponential ring_position range sharder
Like the regular sharder, the exponential sharder divides a range into
subranges owned by individual ranges.  Unlike the regular sharder, it
generates ever-increasing subranges, spanning more and more shards, and
eventually returns several subranges per shard.  To avoid using
exponential cpu and memory, subranges belonging to a single shard are merged,
and a flag is set to indicate the subranges are not ordered wrt. each other.
2017-05-17 13:18:49 +03:00
Avi Kivity
025c6b45b2 dht: extend i_partitioner::next_token_for_shard()
Right now, next_token_for_shard() only allows iterating linearly in shard
order.  Add the ability to select a specific shard to skip to (in case we're
only interested in a single shard), and to select larger ranges (so that
exponential increases are not implemented by iteration).
2017-05-17 12:30:03 +03:00
Avi Kivity
7156ea8804 dht: make ring_position_range_sharder more independent of global_partitioner
Useful for testing.
2017-05-17 12:30:03 +03:00
Avi Kivity
302fec8293 dht: make i_partitioner::name() const 2017-05-17 12:30:03 +03:00
Avi Kivity
f462c4327e dht: make i_partitioner keep track of the number of shards it was configured with
Useful for testing classes layered on top of the partitioner (the sharders).
2017-05-17 12:30:03 +03:00
Avi Kivity
04b16ae8ec dht: fix partitioner initialization for tests
The partitioners now depend on smp::count to be initialized correctly,
but smp::count isn't available at static initialization time.

The scylla executable isn't affected because it calls set_global_partitioner()
after smp::count has been initialized.

Fix by deferring initialization to the first global_partitioner() call.
2017-05-17 12:30:03 +03:00
Tomasz Grabiec
7db83fa3fe sstables: index_reader: Optimize advancing to extreme positions 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
c7b9c5dfd3 dht: ring_position_view: Add key getter 2017-04-20 10:54:38 +02:00
Tomasz Grabiec
5b71e0b9ab dht: ring_position_view: Add constructor and factory from ring_position_view 2017-04-20 10:54:38 +02:00
Avi Kivity
af118ab52b murmur3_partitioner: fix build on clang
Don't know what the root cause it, but the fix is harmless.
2017-04-17 23:03:15 +03:00
Avi Kivity
c05f60387b i_partitioner: remove unused function
Found by clang.
2017-04-17 23:03:15 +03:00
Avi Kivity
a496ec7f5b byte_ordered_partitioner: fix bad operator precedence
Found by clang.
2017-04-17 23:03:15 +03:00
Tomasz Grabiec
d4b6e430ed dht: Introduce ring_position_view 2017-03-28 18:10:39 +02:00
Tomasz Grabiec
55a7cceef5 dht: Move comparison logic from ring_position::tri_compare() to ring_position_comparator
It will soon define common ordering for many objects, not just
ring_position.
2017-03-28 18:10:39 +02:00
Tomasz Grabiec
65a8920b25 dht: Make min/max tokens capturable by reference
So that they can be later used in views.
2017-03-28 18:10:39 +02:00
Avi Kivity
54b8acdd9f dht: add hashing and comparison helpers to dht::decorarted_key
An std::hash specialization, and an equality comparator.
2017-01-20 11:24:14 +02:00
Avi Kivity
141048e0e5 dht: improve token hash function
For a small token, we can just return it, since it already is a hash.
We hash large tokens using murmur3, which is supposedly a good hash.
2017-01-20 11:24:14 +02:00
Avi Kivity
8686a59ea5 dht: use nonwrapping_ranges in ring_position_range_sharder
It was the observation that ring_position_range_sharder doesn't support
wrapping ranges that started the nonwrapping_range madness, but that
class still has some leftover wrapping ranges.  Close the circle by
removing them.
Message-Id: <20161123153113.8944-1-avi@scylladb.com>
2016-12-22 14:40:30 +01:00
Avi Kivity
a1cafed370 storage_proxy: handle range scans of sparsely populated tables
When murmur3_partitioner_ignore_msb_bits = 12 (which we'd like to be the
default), a scan range can be split into a large number of subranges, each
going to a separate shard.  With the current implementation, subranges were
queried sequentially, resulting in very long latency when the table was empty
or nearly empty.

Switch to an exponential retry mechanism, where the number of subranges
queried doubles each time, dropping the latency from O(number of subranges)
to O(log(number of subranges)).

If, during an iteration of a retry, we read at most one range
from each shard, then partial results are merged by concatentation.  This
optimizes for the dense(r) case, where few partial results are required.

If, during an iteration of a retry, we need more than one range per
shard, then we collapse all of a shard's ranges into just one range,
and merge partial results by sorting decorated keys.  This reduces
the number of sstable read creations we need to make, and optimizes for
the sparse table case, where we need many partial results, most of which
are empty.

We don't merge subranges that come from different partition ranges,
because those need to be sorted in request order, not decorated key order.

[tgrabiec: trivial conflicts]

Message-Id: <20161220170532.25173-1-avi@scylladb.com>
2016-12-20 18:32:29 +01:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
7a446986fa dht: Introduce dht::partition_range_vector and dht::token_range_vector
std::vector<dht::partition_range> and std::vector<dht::token_range> are
used in a lot of places, introduce dht::partition_range_vector and
dht::token_range_vector as the alias.
2016-12-19 08:09:28 +08:00
Asias He
85034c1b57 Convert to use dht::partition_range 2016-12-19 08:04:30 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Asias He
1f06eedb58 dht: Rename token_range to token_range_endpoints
It is a helper class used in storage_service only. Rename it so we can
use it for the real dht::token_range.
2016-12-19 08:04:29 +08:00
Asias He
264b6ee69e dht: Introduce dht::token_range an dht::partition_range
nonwrapping_range<ring_position> and nonwrapping_range<token> are used
in many places. Let's make an alias for them to make it less verbose.

Also there is a query::partition_range in query-request.hh which is the alias of
nonwrapping_range<ring_position>. query::partition_range is used in
places not related to query at all. Let's unify the usage project wide.
2016-12-19 08:04:29 +08:00
Paweł Dziepak
b86a826baf dht: describe split_range[s]_to_shards() guarantees
We are going to require these functions to return sorted and disjoint
ranges. They already do so (provided that the input ranges are sorted
and disjoint), but if the guarantee is not explicitly stated it may
disappear some day.
2016-12-15 13:07:32 +00:00
Asias He
463cc4fbde dht: Introduce split_ranges_to_shards
Split a ranges into shard ranges map with ring_position_range_sharder
helper.
2016-12-12 09:04:21 +08:00
Asias He
044c4ff44c dht: Introduce split_range_to_shards
Split a range into shard ranges map with ring_position_range_sharder
helper.
2016-12-12 09:04:21 +08:00
Duarte Nunes
ada2f1092e dht: Make i_partitioner::tri_compare pure virtual
This patch makes the i_partitioner::tri_compare() function pure
virtual as it is overridden by all partitioners.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161211172037.16496-1-duarte@scylladb.com>
2016-12-11 19:29:37 +02:00
Duarte Nunes
bb66b051ed dht: Make i_partitioner::tri_compare memory safe
This patch fixes a typo in i_partitioner::tri_compare() where we were
using std::max instead of std::min, thus avoiding accessing random
memory and getting random results.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20161211165043.17816-1-duarte@scylladb.com>
2016-12-11 18:58:10 +02:00
Avi Kivity
28857e42e7 Merge " Virtualize size_estimates system table" from Duarte
"We currently write the size_estimates system table for every schema
on a periodic basis, currently set to 5 minutes, which can interfere
with an ongoing workload.

This patchset virtualizes it such that queries are intercepted and we
calculate the results on the fly, only for the ranges the caller is interested in.

Fixes #1616"

* 'virtual-estimates/v4' of github.com:duarten/scylla:
  size_estimates_virtual_reader: Add unit test
  db: Delete size_estimates_recorder
  size_estimates: Add virtual reader
  column_family: Add support for virtual readers
  storage_service: get_local_tokens() returns a future
  nonwrapping_range: Add slice() function
  range: Find a sequence's lower and upper bounds
  system_keyspace: Build mutations for size estimates
  size_estimates: Store the token range as bytes
  range_estimates: Add schema
  murmur3_partitioner: Convert maximum_token to sstring
2016-11-28 10:12:59 +02:00
Avi Kivity
07d5a20bae Wire up sharding ignore msb parameter to configuration
We might have used a fancy map<sstring, any> to pass the parameters, but
that's overkill for now.
2016-11-22 22:40:47 +02:00