Commit Graph

124 Commits

Author SHA1 Message Date
Paweł Dziepak
cfde2ad5b4 storage_proxy: make mutate() an execution stage 2017-03-09 09:27:43 +00:00
Paweł Dziepak
00b42c477f storage_proxy: count counter updates for which the node was a leader 2017-03-02 09:05:12 +00:00
Paweł Dziepak
277501f42f db: propagate tracing state for counter writes 2017-03-02 09:05:10 +00:00
Paweł Dziepak
25173f8095 db: propagate timeout for counter writes 2017-03-02 09:05:10 +00:00
Paweł Dziepak
426345e1d4 storage_proxy: avoid excessive mutation freezes 2017-03-01 16:33:36 +00:00
Calle Wilund
0a4edca756 counters/cql: allow wormholing actual counter values (with shards) via cql
Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that
argument value, which must be a list of tuples <int, UUID, long, long>,
should be inserted as an actual counter value, not update.

This of course to allow counters to be read from sstable loader.

Note that we also need to allow timestamps for counter mutations,
as well as convince the counter code itself to treat the data as
already baked. So ugly wormhole galore.

v2:
* Changed flag names
* More explicit wormholing, bypassing normal counter path, to
  avoid read-before-write etc
* throw exceptions on unhandled shard types in marshalling
v3:
* Added counter id ordering check
* Added batch statement check for mixing normal and raw counter updates
Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>
2017-02-22 09:19:46 +00:00
Nadav Har'El
f2fd81ece0 materialized views: function to send a mutation to endpoint
Add a function for sending one mutation to one remote replica owning
this mutation. This is needed for materialized views, where each
base replica sends each view mutation to one particular view replica.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2017-02-06 13:36:45 +01:00
Gleb Natapov
3c372525ed storage_proxy: use storage_proxy clock instead of explicit lowres_clock
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.

Message-Id: <20170206095237.GA7691@scylladb.com>
2017-02-06 12:51:36 +02:00
Paweł Dziepak
1e8814f5ce storage_proxy: support counter updates 2017-02-02 10:35:14 +00:00
Paweł Dziepak
c14c6b753b storage_proxy: add get_live_endpoints() 2017-02-02 10:35:14 +00:00
Amnon Heiman
45b6070832 Merge seastar upstream
* seastar 397685c...c1dbd89 (13):
  > lowres_clock: drop cache-line alignment for _timer
  > net/packet: add missing include
  > Merge "Adding histogram and description support" from Amnon
  > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
  > Set the option '--server' of tests/tcp_sctp_client to be required
  > core/memory: Remove superfluous assignment
  > core/memory: Remove dead code
  > core/reactor: Use logger instead of cerr
  > fix inverted logic in overprovision parameter
  > rpc: fix timeout checking condition
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier

Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.

Includes fixup from Amnon:

collectd api should use the metrics getters

As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.

This will be important when the internal implementation will changed
from union to variant.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
2017-02-01 14:39:08 +02:00
Gleb Natapov
64660397fc storage_proxy: move operation type information from counter's name to a label
Makes it much more flexible to view the data in various ways in Graphana.

Message-Id: <20170126102746.GL11469@scylladb.com>
2017-01-26 12:38:29 +02:00
Gleb Natapov
ccee01f352 storage_proxy: put datacenter name into a label instead of counter's name
Having datacenter name as a label makes it possible to create Prometheus board for the counters.

Message-Id: <20170124132051.GX11469@scylladb.com>
2017-01-24 15:27:34 +02:00
Gleb Natapov
76aed548e3 storage_proxy: add replica side counters for data read
Message-Id: <20170112085907.GN11469@scylladb.com>
2017-01-12 11:41:04 +02:00
Paweł Dziepak
1a52569f7d storage_proxy: pass maximum result size to replicas
We may want to change the default individual result size limit in the
future. If it is provided by the coordinator and not hardcoded in the
replicas this can be done without causing data query digest mismatches
or wasteful mutation query results.
2016-12-22 17:16:23 +01:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
e5485f3ea6 Get rid of query::partition_range
Use dht::partition_range instead
2016-12-19 08:09:25 +08:00
Duarte Nunes
c2072c7dc9 storage_proxy: Decrease limits when retrying command
This patch changes a read_command's limits when retrying it, so that
we don't ask for more rows than necessary.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:41:06 +00:00
Duarte Nunes
9572c19dc6 storage_proxy: Don't fetch superfluous partitions
This patch ensures we keep track of how many partitions we've queried
so we don't ask for more than the number we need.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-12-15 10:27:46 +00:00
Gleb Natapov
a05516f14c storage_proxy: wire up range_slice_timeouts, range_slice_unavailables and read_unavailables counters
Message-Id: <20161206105154.GL1866@scylladb.com>
2016-12-08 11:42:52 +02:00
Vlad Zolotarov
e5e7ac1bd4 service::storage_proxy: rework the collectd counters registration
Use the new seastar's metrics_registration framework:
   - Change the registration syntax.
   - Add a long description for each counter.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-12-01 22:38:09 -05:00
Vlad Zolotarov
3bf12e4ffc service/storage_proxy: regroup collectd statistics
Instead of putting all statistics under the same "storage_proxy" category
separate them into 2 groups according to where the corresponding counters
are updated:
   - "storage_proxy_replica"
   - "storage_proxy_coordinator"

Fixes #1763

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-12-01 22:27:47 -05:00
Tomasz Grabiec
ba3779802f storage_proxy: Propagate timeout to local writes 2016-11-29 16:40:59 +01:00
Tomasz Grabiec
6d195a1538 storage_proxy: Use shared ownership for abstract_write_response_handler 2016-11-29 16:40:58 +01:00
Tomasz Grabiec
5805330d98 storage_proxy: Add counter for all alive write handlers
Currently the counter uses _response_handlers.size(), but after later
patches we may have an active (timed out) write with no response
handler, so count live instances instead.
2016-11-29 16:40:58 +01:00
Tomasz Grabiec
11c5f4ab50 storage_proxy: Add counters for throttled writes 2016-11-15 17:18:25 +01:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Avi Kivity
63f053e9b7 storage_proxy: fix mutation reordering with wrapping ranges
If we have a range query involving a wrapping range (i.e., from thrift),
and mutations from both halves of the result are involved, then
we will return the results in the wrong order (and potentially the wrong
partitions) since we order by token, so the results from the second half
of the wrapping range end up before the first.

Fix by splitting the two queries, and merging the second half with lower
priority compared to the first half.

Note: this will be fixed in a better way once we have the sharding iterator,
as then we can query sequentially.

Fixes #1761.
Message-Id: <1476262693-30162-1-git-send-email-avi@scylladb.com>
2016-10-12 15:59:16 +02:00
Duarte Nunes
f4cf2f2aef tracing: Make trace_state_ptr argument required
This patch makes the optional trace_state_ptr arguments introduced in
previous patches mandatory where possible. Functions which are called
internally don't have a trace context, so for those we keep the
argument's default value for convenience.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Duarte Nunes
46b86ff801 storage_proxy: Pass along trace_state for queries
This patch changes the storage_proxy so it passed along a
trace_state_ptr to the layers below, when querying locally or
receiving a remote query request.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2016-09-01 12:04:32 +02:00
Glauber Costa
4310635bae move estimated histogram to utils
Nothing sstable-specific in it, really.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:23 -04:00
Glauber Costa
ffc2131c51 decouple estimated_histogram from sstables
There is nothing really that fundamentally ties the estimated histogram to
sstables. This patch gets rid of the few incidental ties. They are:

 - the namespace name, which is now moved to utils. Users inside sstables/
   now need to add a namespace prefix, while the ones outside have to change
   it to the right one
 - sstables::merge, which has a very non-descriptive name to begin with, is
   changed to a more descriptive name that can live inside utils/
 - the disk_types.hh include has to be removed - but it had no reason to be
   here in the first place.

Todo, is to actually move the file outside sstables/. That is done in a separate
step for clarity.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-08-31 15:13:23 -04:00
Duarte Nunes
39e0fb1260 storage_proxy: Support multiple partition ranges
This patch adds the ability to query multiple partition ranges. This
is needed since 55f2cf1626, where we
started unwrapping partition ranges in Thrift.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <1472474594-15368-1-git-send-email-duarte@scylladb.com>
2016-08-30 17:43:40 +03:00
Piotr Jastrzebski
f212a6cfcb Fix after free access bug in storage proxy
Due to speculative reads we can't guarantee that all
fibers started by storage_proxy::query will be finished
by the time the method returns a result.

We need to make sure that no parameter passed to this
method ever changes.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <31952e323e599905814b7f378aafdf779f7072b8.1471005642.git.piotr@scylladb.com>
2016-08-12 16:34:43 +02:00
Vlad Zolotarov
b36b69c1d6 service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
baa6496816 service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
962bddf8fe transport: CQL tracing: instrument a BATCH command
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
4c16df9e4c service: instrument MUTATE flow with tracing
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
4031c0ed8f streaming: pass plan_id to column family for apply and flush
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.

Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
579de26e95 storage_proxy: drop make_local_reader()
This code was used only by its unit test.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Gleb Natapov
e089166cfa storage_proxy: wait only for expected CL when writing back data during read repair
When read repair writes diffs back to replicas it is enough to wait
for requested CL to guaranty read monotonicity. This patch makes read
repair write reuse regular mutate functionality which already tracks
CL status. This is done by changing write response handler to not hold
mutation directly, but instead hold a container that, depending on
whether
this is read repair write or regular one, can provide different mutation
per destination.

Message-Id: <20160613124727.GL1096@scylladb.com>
2016-06-13 19:01:51 +03:00
Avi Kivity
3f6ecb9f28 Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb 2016-05-29 15:58:55 +03:00
Gleb Natapov
2efbccc901 storage_proxy: do only local read repair if non matching data was recently modified
When read/write to a partition happens in parallel reader may detect
digest mismatch that may potentially cause cross DC read repair attempt,
but the repair is not really needed, so added latency is not justified.

This patch tries to prevent such parallel access from causing heavy
cross DC repair operation buy checking a timestamp of most resent
modification. If the modification happens less then "write timeout"
seconds ago the patch assumes that the read operation raced with write
one and cancel cross DC repair, but only if CL is LOCAL_*.
2016-05-29 15:26:51 +03:00
Gleb Natapov
12cf60c302 messaging_service: add timestemp of last modification to READ_DIGEST verb return value 2016-05-24 13:27:34 +03:00
Amnon Heiman
64e0c8cd1b storage_proxy: Change histogram to
timed_rate_moving_average_and_histogram

As part of moving the derived statistic in to scylla, this replaces the
histogram object in the storage_proxy to
timed_rate_moving_average_and_histogram. and the read, write and range
counters where replaced by rate_moving_average.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:52:16 +03:00
Gleb Natapov
3039e4c7de storage_proxy: stop range query with limit after the limit is reached 2016-05-02 15:10:15 +03:00
Vlad Zolotarov
9bf8253412 storage_proxy: add read requests split counters
Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters
for the following read categories:
   - data reads
   - digest reads
   - mutation data reads

Each category is added attempts, completions and errors metrics.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:19 +03:00
Vlad Zolotarov
cbcbdc3b4a storage_proxy: add split counters for writes
Added split metrics for operations on a local Node and on external
Nodes aggregated per Nodes' DCs.

Added separate split counters for:
    - total writes attempts/errors
    - read repair write attempts (there is no easy way to separate errors
      at the moment)

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:15 +03:00
Vlad Zolotarov
c92654b281 storage_proxy: add counters for received and forwarded mutations
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:27:29 +03:00