Timeouts are a global property. However, for tables in keyspaces like
the system keyspace, we don't want to uphold that timeout--in fact, we
wan't no timeout there at all.
We already apply such configuration for requests waiting in the queued
sstable queue: system keyspace requests won't be removed. However, the
storage proxy will insert its own timeouts in those requests, causing
them to fail.
This patch changes the storage proxy read layer so that the timeout is
applied based on the column family configuration, which is in turn
inherited from the keyspace configuration. This matches our usual
way of passing db parameters down.
In terms of implementation, we can either move the timeout inside the
abstract read executor or keep it external. The former is a bit cleaner,
the the latter has the nice property that all executors generated will
share the exact same timeout point. In this patch, we chose the latter.
We are also careful to propagate the timeout information to the replica.
So even if we are talking about the local replica, when we add the
request to the concurrency queue, we will do it in accordance with the
timeout specified by the storage proxy layer.
After this patch, Scylla is able to start just fine with very low
timeouts--since read timeouts in the system keyspace are now ignored.
Fixes#2462
Implementation notes, and general comments about open discussion in 2462:
* Because we are not bypassing the timeout, just setting it high enough,
I consider the concerns about the batchlog moot: if we fail for any
other reason that will be propagated. Last case, because the timeout
is per-CF, we could do what we do for the dirty memory manager and
move the batchlog alone to use a different timeout setting.
* Storage proxy likes specifying its timeouts as a time_point, whereas
when we get low enough as to deal with the read_concurrency_config,
we are talking about deltas. So at some point we need to convert time_points
to durations. We do that in the database query functions.
v2:
- use per-request instead of per-table timeouts.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
In commit 1f4f71e619, an
stdx::optional<std::vector<sstring>> parameter was added to storage_proxy's
constructor. However, this parameter was not made optional, and
tests/cql_test_env.cc failed to compile because it didn't provide this
parameter.
This patch makes this parameter optional (if missing, it's like an empty
stdx::optional) so cql_test_env.cc compiles.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20171218132121.18782-1-nyh@scylladb.com>
Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that
argument value, which must be a list of tuples <int, UUID, long, long>,
should be inserted as an actual counter value, not update.
This of course to allow counters to be read from sstable loader.
Note that we also need to allow timestamps for counter mutations,
as well as convince the counter code itself to treat the data as
already baked. So ugly wormhole galore.
v2:
* Changed flag names
* More explicit wormholing, bypassing normal counter path, to
avoid read-before-write etc
* throw exceptions on unhandled shard types in marshalling
v3:
* Added counter id ordering check
* Added batch statement check for mixing normal and raw counter updates
Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>
Add a function for sending one mutation to one remote replica owning
this mutation. This is needed for materialized views, where each
base replica sends each view mutation to one particular view replica.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.
Message-Id: <20170206095237.GA7691@scylladb.com>
* seastar 397685c...c1dbd89 (13):
> lowres_clock: drop cache-line alignment for _timer
> net/packet: add missing include
> Merge "Adding histogram and description support" from Amnon
> reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
> Set the option '--server' of tests/tcp_sctp_client to be required
> core/memory: Remove superfluous assignment
> core/memory: Remove dead code
> core/reactor: Use logger instead of cerr
> fix inverted logic in overprovision parameter
> rpc: fix timeout checking condition
> rpc: use lowres_clock instead of high resolution one
> semaphore: make semaphore's clock configurable
> rpc: detect timedout outgoing packets earlier
Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.
Includes fixup from Amnon:
collectd api should use the metrics getters
As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.
This will be important when the internal implementation will changed
from union to variant.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
We may want to change the default individual result size limit in the
future. If it is provided by the coordinator and not hardcoded in the
replicas this can be done without causing data query digest mismatches
or wasteful mutation query results.
This patch changes a read_command's limits when retrying it, so that
we don't ask for more rows than necessary.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch ensures we keep track of how many partitions we've queried
so we don't ask for more than the number we need.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Use the new seastar's metrics_registration framework:
- Change the registration syntax.
- Add a long description for each counter.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Instead of putting all statistics under the same "storage_proxy" category
separate them into 2 groups according to where the corresponding counters
are updated:
- "storage_proxy_replica"
- "storage_proxy_coordinator"
Fixes#1763
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Currently the counter uses _response_handlers.size(), but after later
patches we may have an active (timed out) write with no response
handler, so count live instances instead.
Wrapping ranges are a pain, so we are moving wrap handling to the edges.
Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.
Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
If we have a range query involving a wrapping range (i.e., from thrift),
and mutations from both halves of the result are involved, then
we will return the results in the wrong order (and potentially the wrong
partitions) since we order by token, so the results from the second half
of the wrapping range end up before the first.
Fix by splitting the two queries, and merging the second half with lower
priority compared to the first half.
Note: this will be fixed in a better way once we have the sharding iterator,
as then we can query sequentially.
Fixes#1761.
Message-Id: <1476262693-30162-1-git-send-email-avi@scylladb.com>
This patch makes the optional trace_state_ptr arguments introduced in
previous patches mandatory where possible. Functions which are called
internally don't have a trace context, so for those we keep the
argument's default value for convenience.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch changes the storage_proxy so it passed along a
trace_state_ptr to the layers below, when querying locally or
receiving a remote query request.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
There is nothing really that fundamentally ties the estimated histogram to
sstables. This patch gets rid of the few incidental ties. They are:
- the namespace name, which is now moved to utils. Users inside sstables/
now need to add a namespace prefix, while the ones outside have to change
it to the right one
- sstables::merge, which has a very non-descriptive name to begin with, is
changed to a more descriptive name that can live inside utils/
- the disk_types.hh include has to be removed - but it had no reason to be
here in the first place.
Todo, is to actually move the file outside sstables/. That is done in a separate
step for clarity.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Having a trace_state_ptr in the storage_proxy level is needed to trace code bits in this level.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
Store the trace state in the abstract_write_response_handler.
Instrument send_mutation RPC to receive an additional
rpc::optional parameter that will contain optional<trace_info>
value.
Signed-off-by: Vlad Zolotarov <vladz@cloudius-systems.com>
If mutations are fragmented during streaming a special care must be
taken so that isolation guarantees are not broken.
Mutations received with flag "fragmented" set are applied to a memtable
that is used only by that particular streaming task and the sstables
created by flushing such memtables are not made visible until the task
is complte. Also, in case the streaming fails all data is dropped.
This means that fragmented mutations cannot benefit from coalescing of
writes from multiple streaming plans, hence separate way of handling
them so that there is no loss of performance for small partitions.
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
plan_id is needed to keep track of the origin of mutations so that if
they are fragmented all fragments are made visible at the same time,
when that particular streaming plan_id completes.
Basically, each streaming plan that sends big (fragmented) mutations is
going to have its own memtables and a list of sstables which will get
flushed and made visible when that plan completes (or dropped if it
fails).
Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>