Commit Graph

90 Commits

Author SHA1 Message Date
Vlad Zolotarov
b36b69c1d6 service::storage_proxy: remove a default value for a tracing::trace_state_ptr parameter
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
baa6496816 service::storage_proxy: READ instrumentation: store trace state object in abstract_read_executor
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:59 +03:00
Vlad Zolotarov
962bddf8fe transport: CQL tracing: instrument a BATCH command
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Vlad Zolotarov
4c16df9e4c service: instrument MUTATE flow with tracing
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-07-19 18:21:58 +03:00
Paweł Dziepak
32a5de7a1f db: handle receiving fragmented mutations
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.

Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.

This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
4031c0ed8f streaming: pass plan_id to column family for apply and flush
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.

Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-07-07 12:18:35 +01:00
Paweł Dziepak
579de26e95 storage_proxy: drop make_local_reader()
This code was used only by its unit test.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-06-20 21:29:49 +01:00
Gleb Natapov
e089166cfa storage_proxy: wait only for expected CL when writing back data during read repair
When read repair writes diffs back to replicas it is enough to wait
for requested CL to guaranty read monotonicity. This patch makes read
repair write reuse regular mutate functionality which already tracks
CL status. This is done by changing write response handler to not hold
mutation directly, but instead hold a container that, depending on
whether
this is read repair write or regular one, can provide different mutation
per destination.

Message-Id: <20160613124727.GL1096@scylladb.com>
2016-06-13 19:01:51 +03:00
Avi Kivity
3f6ecb9f28 Merge "cancel cross DC read repair if non matching data was recently modified" from Gleb 2016-05-29 15:58:55 +03:00
Gleb Natapov
2efbccc901 storage_proxy: do only local read repair if non matching data was recently modified
When read/write to a partition happens in parallel reader may detect
digest mismatch that may potentially cause cross DC read repair attempt,
but the repair is not really needed, so added latency is not justified.

This patch tries to prevent such parallel access from causing heavy
cross DC repair operation buy checking a timestamp of most resent
modification. If the modification happens less then "write timeout"
seconds ago the patch assumes that the read operation raced with write
one and cancel cross DC repair, but only if CL is LOCAL_*.
2016-05-29 15:26:51 +03:00
Gleb Natapov
12cf60c302 messaging_service: add timestemp of last modification to READ_DIGEST verb return value 2016-05-24 13:27:34 +03:00
Amnon Heiman
64e0c8cd1b storage_proxy: Change histogram to
timed_rate_moving_average_and_histogram

As part of moving the derived statistic in to scylla, this replaces the
histogram object in the storage_proxy to
timed_rate_moving_average_and_histogram. and the read, write and range
counters where replaced by rate_moving_average.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2016-05-17 11:52:16 +03:00
Gleb Natapov
3039e4c7de storage_proxy: stop range query with limit after the limit is reached 2016-05-02 15:10:15 +03:00
Vlad Zolotarov
9bf8253412 storage_proxy: add read requests split counters
Add split (local Nodes, external Nodes aggregated per Nodes' DCs) counters
for the following read categories:
   - data reads
   - digest reads
   - mutation data reads

Each category is added attempts, completions and errors metrics.

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:19 +03:00
Vlad Zolotarov
cbcbdc3b4a storage_proxy: add split counters for writes
Added split metrics for operations on a local Node and on external
Nodes aggregated per Nodes' DCs.

Added separate split counters for:
    - total writes attempts/errors
    - read repair write attempts (there is no easy way to separate errors
      at the moment)

Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:28:15 +03:00
Vlad Zolotarov
c92654b281 storage_proxy: add counters for received and forwarded mutations
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
2016-04-21 11:27:29 +03:00
Paweł Dziepak
d53354947c storage_proxy: mark hint_to_dead_endpoints() noexcept
Hints are currently unimplemented but there is code depending on the
fact that hint_to_dead_endpoints() doesn't throw.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-04-12 00:06:10 +01:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Glauber Costa
5fa866223d streaming: add incoming streaming mutations to a different sstable
Keeping the mutations coming from the streaming process as mutations like any
other have a number of advantages - and that's why we do it.

However, this makes it impossible for Seastar's I/O scheduler to differentiate
between incoming requests from clients, and those who are arriving from peers
in the streaming process.

As a result, if the streaming mutations consume a significant fraction of the
total mutations, and we happen to be using the disk at its limits, we are in no
position to provide any guarantees - defeating the whole purpose of the
scheduler.

To implement that, we'll keep a separate set of memtables that will contain
only streaming mutations. We don't have to do it this way, but doing so
makes life a lot easier. In particular, to write an SSTable, our API requires
(because the filter requires), that a good estimate on the number of partitions
is informed in advance. The partitions also need to be sorted.

We could write mutations directly to disk, but the above conditions couldn't be
met without significant effort. In particular, because mutations can be
arriving from multiple peer nodes, we can't really sort them without keeping a
staging area anyway.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-03-23 09:13:00 -04:00
Gleb Natapov
5076f4878b main: Defer storage proxy RPC verb registration after commitlog replay
Message-Id: <20160315071229.GM6117@scylladb.com>
2016-03-15 09:18:12 +02:00
Paweł Dziepak
82d2a2dccb specify whether query::result, result_digest or both are needed
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2016-03-11 18:27:13 +00:00
Gleb Natapov
f242c6395c storage_proxy: add counter for retries reads
Message-Id: <20160309130453.GF2253@scylladb.com>
2016-03-09 14:09:42 +01:00
Gleb Natapov
32e9f1ecd4 Fix read_timeouts storage_proxy counter
Read timeouts are not counted now. The patch fixes it.

Message-Id: <20160228133315.GN6705@scylladb.com>
2016-02-28 15:34:42 +02:00
Glauber Costa
f6cfb04d61 add a priority class to mutation readers
SSTables already have a priority argument wired to their read path. However,
most of our reads do not call that interface directly, but employ the services
of a mutation reader instead.

Some of those readers will be used to read through a mutation_source, and those
have to patched as well.

Right now, whenever we need to pass a class, we pass Seastar's default priority
class.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-01-25 15:20:38 -05:00
Tomasz Grabiec
4e5a52d6fa db: Make read interface schema version aware
The intent is to make data returned by queries always conform to a
single schema version, which is requested by the client. For CQL
queries, for example, we want to use the same schema which was used to
compile the query. The other node expects to receive data conforming
to the requested schema.

Interface on shard level accepts schema_ptr, across nodes we use
table_schema_version UUID. To transfer schema_ptr across shards, we
use global_schema_ptr.

Because schema is identified with UUID across nodes, requestors must
be prepared for being queried for the definition of the schema. They
must hold a live schema_ptr around the request. This guarantees that
schema_registry will always know about the requested version. This is
not an issue because for queries the requestor needs to hold on to the
schema anyway to be able to interpret the results. But care must be
taken to always use the same schema version for making the request and
parsing the results.

Schema requesting across nodes is currently stubbed (throws runtime
exception).
2016-01-11 10:34:52 +01:00
Tomasz Grabiec
036974e19b Make mutation interfaces support multiple versions
Schema is tracked in memtable and cache per-entry. Entries are
upgraded lazily on access. Incoming mutations are upgraded to table's
current schema on given shard.

Mutating nodes need to keep schema_ptr alive in case schema version is
requested by target node.
2016-01-11 10:34:51 +01:00
Avi Kivity
f3980f1fad Merge seastar upstream
* seastar 51154f7...8b2171e (9):
  > memcached: avoid a collision of an expiration with time_point(-1).
  > tutorial: minor spelling corrections etc.
  > tutorial: expand semaphores section
  > Merge "Use steady_clock where monotonic clock is required" from Vlad
  > Merge "TLS fixes + RPC adaption" from Calle
  > do_with() optimization
  > tutorial: explain limiting parallelism using semaphores
  > submit_io: change pending flushes criteria
  > apps: remove defunct apps/seastar

Adjust code to use steady_clock instead of high_resolution_clock.
2015-12-27 14:40:20 +02:00
Pekka Enberg
9604d55a44 Merge "Add unit test for get_restricted_ranges()" from Tomek 2015-12-17 09:14:30 +02:00
Tomasz Grabiec
e445e4785c storage_proxy: Extract get_restricted_ranges() as a free function
To make it directly testable.
2015-12-16 13:09:01 +01:00
Gleb Natapov
de63b3a824 storage_proxy: provide timeout for send_mutation verb
Providing timeout for send_mutation verb allows rpc to drop packets that
sit in outgoing queue for to long.
2015-12-16 10:13:46 +02:00
Gleb Natapov
fe4bc741f4 storage_proxy: throttle mutations based on ongoing background activity
With consistency level less then ALL mutation processing can move to
background (meaning client was answered, but there is still work to
do on behalf of the request). If background request rate completion
is lower than incoming request rate background request will accumulate
and eventually will exhaust all memory resources. This patch's aim is
to prevent this situation by monitoring how much memory all current
background request take and when some threshold is passed stop moving
request to background (by not replying to a client until either memory
consumptions moves below the threshold or request is fully completed).

There are two main point where each background mutation consumes memory:
holding frozen mutation until operation is complete in order to hint it
if it does not) and on rpc queue to each replica where it sits until it's
sent out on the wire. The patch accounts for both of those separately
and limits the former to be 10% of total memory and the later to be 6M.
Why 6M? The best answer I can give is why not :) But on a more serious
note the number should be small enough so that all the data can be
sent out in a reasonable amount of time and one shard is not capable to
achieve even close to a full bandwidth, so empirical evidence shows 6M
to be a good number.
2015-12-16 10:13:46 +02:00
Gleb Natapov
e43ae7521f storage_proxy: unfuturize send_to_live_endpoints()
send_to_live_endpoints() is never waited upon, it does its job in the
background. This patch formalize that by changing return value to void
and also refactoring code so that frozen_mutation shared pointer is not
held more that it should: currently it is held until send_mutation()
completes, but since send_mutation() does not use frozen_mutation
asynchronously this is not necessary.
2015-12-15 15:40:36 +02:00
Gleb Natapov
cf95c3f681 storage_proxy: introduce unique_response_handler object to prevent write request leaks
If something bad happens between write request handler creation and
request execution the request handler have to be destroyed. Currently
code tries to do that explicitly in all places where request may be
abandoned, but it misses some (at least one). This patch replaces this
by introducing unique_response_handler object that will remove the handler
automatically if request is not executed for some reason.
2015-11-30 17:41:27 +02:00
Tomasz Grabiec
3a402db1be storage_proxy: Remove dead signature 2015-11-25 16:57:03 +02:00
Amnon Heiman
b6034572dc storage_proxy: Add read repair statistics
This adds the read repair statistics to he storage_proxy stats and adds
to its implementation incrementing the counters value.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2015-11-02 16:16:40 +02:00
Gleb Natapov
9af55b3a7b storage_proxy: add ongoing read statistics
Add statistics for ongoing reads and ongoing background reads. Read is
a background one if it was acknowledged, but there still work to do to
complete it.
2015-11-02 15:02:13 +02:00
Gleb Natapov
5067d027ba storage_proxy: export statistics via collectd 2015-11-02 15:01:16 +02:00
Gleb Natapov
287cf894a0 storage_proxy: count background mutation writes
Count how many writes are running in the background. Write is a
background write if it was acknowledged, but there still work to do to
complete it.
2015-11-02 15:01:12 +02:00
Gleb Natapov
59d8f9f392 storage_proxy: move proxy pointer to write response handler
We will need it later there.
2015-11-02 15:00:47 +02:00
Gleb Natapov
9381ad0741 storage_proxy: initialize _stats 2015-11-02 15:00:47 +02:00
Gleb Natapov
ac5f92db70 storage_proxy: clean up local_dc checking
The only place local_dc is checked during mutation sending is in
send_to_live_endpoints(), but current code pass it there throw several
function call layers. Simplify the code by getting local_dc when it is
used directly.
2015-10-28 16:10:18 +02:00
Gleb Natapov
58154333e8 storage_proxy: send out mutation diffs to each destination 2015-10-27 14:58:35 +02:00
Amnon Heiman
7b8c557f30 storage_service: Add estimated histogram for read, write and range
This patch adds an estimated histogram for read, write and range to the
proxy_service stats.

Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
2015-10-22 18:54:45 +03:00
Gleb Natapov
19770268be storage_proxy: use common write code to write batch log mutations.
Reworks write code further so it can be used to write batch log
mutations.
2015-10-14 17:12:57 +03:00
Gleb Natapov
db49a196da storage_proxy: remove code duplication between logged and unlogged batches
Currently logged batch has most of the logic on unlogged batch
duplicated. This patch rework unlogged batch code in such a way that
it can be reused.
2015-10-14 17:12:57 +03:00
Calle Wilund
d0864be20f storage_proxy: Implement "truncate_blocking" 2015-09-30 09:09:43 +02:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Gleb Natapov
031f6e1aeb storage_proxy: do not capture storage_proxy reference in rpc callback
Callback may be called on different cpus so shared pointer cannot be
captured.
2015-09-08 09:55:23 +02:00
Gleb Natapov
41f16159b3 storage_proxy: track reference to storage_proxy during mutate/query operations
This patch makes sure that storage_proxy cannot be deleted while
mutate/query operation is in progress.
2015-09-07 14:46:13 +02:00
Gleb Natapov
cf10416786 Implement new_read_repair_decision() function. 2015-08-23 15:26:48 +03:00