This commit extracts metrics related to writes from stats structure,
so it can be easily replaced later, e.g. for materialized view metrics.
References #3385
References #3416
This commit initializes and enables hinted handoff for materialized
views, even if HH is not explicitly turned on in config.
User writes still use hinted handoff only if it is explicitly enabled,
while materialized views are allowed to use it unconditionally
in order to store failed replica updates somewhere.
Fixes#3383
"
This patchset implements separate timeouts for range queries, and lays
the foundations for separate timeouts for other query types.
While the feature in itself is worthy, the real motivation is to have
the timeouts decided by the caller, instead of storage_proxy. This in
turn is required to disentangle each layer behaving differently
depending on whether the query is internal or not; instead, the goal
is to have each caller declare its needs in terms of consistency level
and timeouts, and have the lower layers implement its requirements
instead of making their own decisions.
Fixes#3013.
Tests: unit (release)
"
* tag '3013/v1.1' of https://github.com/avikivity/scylla:
storage_proxy: remove default_query_timeout()
storage_proxy: don't use default timeouts
query_options: augment with timeout_config
thrift: configure thrift transport and handler with a timeout_config
transport: configure native transport with a timeout_config
cql3: define and populate timeout_config_selector
timeout_config: introduce timeout configuration
When node is decommissioned/removed it will drain all its hints and all
remote nodes that have hints to it will drain their hints to this node.
What "drain" means? - The node that "drains" hints to a specific
destination will ignore failures and will continue sending hints till the end
of the current segment, erase it and move to the next one till there are
no more segments left.
After all hints are drained the corresponding hints directory is removed.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Make the read-repair decision on the first page of a paged-query and use
it for all the remaining pages. This helps querier-cache hit-rates as
reads to nodes will be sent consistently throught the query.
As yet more parameters and return-values are about to be added to all
storage_proxy::query_* methods we need a way that scales better than
changing the signatures every time. To this end we aggregate all
non-mandatory query parameters into `coordinator_query_options` and all
return values into `coordinator_query_result`.
This way new fields can be simply added to the respective structs while
the signatures of the methods themselves and their client code can
remain unchanged.
Propagate the preferred_replicas to db::filter_for_query() and consider
them when selecting the endpoints. The algoritm for selecting the
endpoints is as follows:
* Compute the intersection of the endpoint candidates and the
preferred endpoints.
* If this yields a set of endpoints that already satisfies the CL
requirements use this set.
* Otherwise select the remaining endpoints according to the
load-balancing strategy, just like before.
preferred_replicas are added to the parameters and last_replicas are
added to the return type. The preferred replicas will be used as a hint
for the selection of the replicas to send the read requests to. The last
replicas (returned) are the replicas actually selected for the read.
This will allow queries to consistently hit the same replicas for each
page thus reusing readers created on these replicas.
For convenience a query() overload is provided that doesn't take or
return the preferred and last replicas.
This patch only adds the parameters and propagates them down to
query_singular() and query_partition_key_range(). The code to actually
use these preferred-replicas will be added in later patches.
This reason for separating this is to reduce noise and improve
reviewability for those functional changes later.
While not strictly needed, specify which algorithm to use when request
a digest from a remote node. This is more flexible than relying on a
cluster wide feature, although that's what we'll do in subsequent
patches. It also makes the verb more consistent with the data request.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Introduce class result_options to carry result options through the
request pipeline, which at this point mean the result type and the
digest algorithm. This class allows us to encapsulate the concrete
digest algorithm to use.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Timeouts are a global property. However, for tables in keyspaces like
the system keyspace, we don't want to uphold that timeout--in fact, we
wan't no timeout there at all.
We already apply such configuration for requests waiting in the queued
sstable queue: system keyspace requests won't be removed. However, the
storage proxy will insert its own timeouts in those requests, causing
them to fail.
This patch changes the storage proxy read layer so that the timeout is
applied based on the column family configuration, which is in turn
inherited from the keyspace configuration. This matches our usual
way of passing db parameters down.
In terms of implementation, we can either move the timeout inside the
abstract read executor or keep it external. The former is a bit cleaner,
the the latter has the nice property that all executors generated will
share the exact same timeout point. In this patch, we chose the latter.
We are also careful to propagate the timeout information to the replica.
So even if we are talking about the local replica, when we add the
request to the concurrency queue, we will do it in accordance with the
timeout specified by the storage proxy layer.
After this patch, Scylla is able to start just fine with very low
timeouts--since read timeouts in the system keyspace are now ignored.
Fixes#2462
Implementation notes, and general comments about open discussion in 2462:
* Because we are not bypassing the timeout, just setting it high enough,
I consider the concerns about the batchlog moot: if we fail for any
other reason that will be propagated. Last case, because the timeout
is per-CF, we could do what we do for the dirty memory manager and
move the batchlog alone to use a different timeout setting.
* Storage proxy likes specifying its timeouts as a time_point, whereas
when we get low enough as to deal with the read_concurrency_config,
we are talking about deltas. So at some point we need to convert time_points
to durations. We do that in the database query functions.
v2:
- use per-request instead of per-table timeouts.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
In commit 1f4f71e619, an
stdx::optional<std::vector<sstring>> parameter was added to storage_proxy's
constructor. However, this parameter was not made optional, and
tests/cql_test_env.cc failed to compile because it didn't provide this
parameter.
This patch makes this parameter optional (if missing, it's like an empty
stdx::optional) so cql_test_env.cc compiles.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20171218132121.18782-1-nyh@scylladb.com>
Adds yet another magic function "SCYLLA_COUNTER_SHARD_LIST", indicating that
argument value, which must be a list of tuples <int, UUID, long, long>,
should be inserted as an actual counter value, not update.
This of course to allow counters to be read from sstable loader.
Note that we also need to allow timestamps for counter mutations,
as well as convince the counter code itself to treat the data as
already baked. So ugly wormhole galore.
v2:
* Changed flag names
* More explicit wormholing, bypassing normal counter path, to
avoid read-before-write etc
* throw exceptions on unhandled shard types in marshalling
v3:
* Added counter id ordering check
* Added batch statement check for mixing normal and raw counter updates
Message-Id: <1487683665-23426-2-git-send-email-calle@scylladb.com>
Add a function for sending one mutation to one remote replica owning
this mutation. This is needed for materialized views, where each
base replica sends each view mutation to one particular view replica.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Merge commit 45b6070832 used butchered version of storage_proxy
patch to adjust to rpc timer change instead the one I've sent. This
patch fixes the differences.
Message-Id: <20170206095237.GA7691@scylladb.com>
* seastar 397685c...c1dbd89 (13):
> lowres_clock: drop cache-line alignment for _timer
> net/packet: add missing include
> Merge "Adding histogram and description support" from Amnon
> reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
> Set the option '--server' of tests/tcp_sctp_client to be required
> core/memory: Remove superfluous assignment
> core/memory: Remove dead code
> core/reactor: Use logger instead of cerr
> fix inverted logic in overprovision parameter
> rpc: fix timeout checking condition
> rpc: use lowres_clock instead of high resolution one
> semaphore: make semaphore's clock configurable
> rpc: detect timedout outgoing packets earlier
Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.
Includes fixup from Amnon:
collectd api should use the metrics getters
As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.
This will be important when the internal implementation will changed
from union to variant.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
We may want to change the default individual result size limit in the
future. If it is provided by the coordinator and not hardcoded in the
replicas this can be done without causing data query digest mismatches
or wasteful mutation query results.
This patch changes a read_command's limits when retrying it, so that
we don't ask for more rows than necessary.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch ensures we keep track of how many partitions we've queried
so we don't ask for more than the number we need.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>