Do not recalculate too much ranges in advance, it requires large
allocation and usually means that a consumer of the interface is going
to do to much work in parallel.
Fixes: #3767
get_restricted_ranges() function gets query provided key ranges
and divides them on vnode boundaries. It iterates over all ranges and
calculates all vnodes, but all its users are usually interested in only
one vnode since most likely it will be enough to populate a page. If it
will be not enough they will ask for more. This patch introduces new
interface instead of the function that allows to generate vnode ranges
on demand instead of precalculating all of them.
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
This patch introduces the view_update_backlog_broker class, which is
responsible for periodically updating the local gossip state with the
current node's view update backlog. It also registers to updates from
other nodes, and updates the local coordinator's view of their view
update backlogs.
We consider the view update backlog received from a peer through the
mutation_done verb to be always fresh, but we consider the one received
through gossip to be fresh only if it has a higher timestamp than what
we currently have recorded.
This is because a node only updates its gossip state periodically, and
also because a node can transitively receive gossip state about a third
node with outdated information.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
In subsequent patches, replicas will reply to the coordinator with
their view update backlog. Before introducing changes to the
messaging_service, prepare the storage_proxy to receive and store
those backlogs.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The local view update backlog is the max backlog out of the relative
memory backlog size and the relative hints backlog size.
We leverage the db::view::node_update_backlog class so we can send the
max backlog out of the node's shards.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
Working on database.hh or any header that is included in database.hh
(of which there is a lot), is a major pain as each change involves the
recompilation of half of our compilation units.
Reduce the impact by removing the `#include "database.hh"` directive
from as many header files as possible. Many headers can make do with
just some forward declarations and don't need to include the entire
headers. I also found some headers that included database.hh without
actually needing it.
Results
Before:
$ touch database.hh
$ ninja build/release/scylla
[1/154] CXX build/release/gen/cql3/CqlParser.o
After:
$ touch database.hh
$ ninja build/release/scylla
[1/107] CXX build/release/gen/cql3/CqlParser.o
"
* 'reduce-dependencies-on-database-hh/v2' of https://github.com/denesb/scylla:
treewide: remove include database.hh from headers where possible
database_fwd.hh: add keyspace fwd declaration
service/client_state: de-inline set_keyspace()
Move cache_temperature into its own header
Many headers don't really need to include database.hh, the include can
be replaced by forward declarations and/or including the actually needed
headers directly. Some headers don't need this include at all.
Each header was verified to be compilable on its own after the change,
by including it into an empty `.cc` file and compiling it. `.cc` files
that used to get `database.hh` through headers that no longer include it
were changed to include it themselves.
Embedding the expire timer for a write response in the
abstract_write_response_handler simplifies the code as it allows
removing the rh_entry type.
It will also make the timeout easily accessible inside the handler,
for future patches.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20181213111818.39983-1-duarte@scylladb.com>
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.
Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).
In addition, some outright pointless inclusions of db/config.hh are
removed.
The result is somewhat shorter compile times, and fewer needless
recompiles.
* https://github.com/avikivity/scylla unconfig-1/v1:
config: remove inclusions of db/config.hh from header files
repair: remove unneeded config.hh inclusion
batchlog_manager: remove dependency on db::config
auth: remove permissions_cache dependency on db::config
auth: remove auth::service dependency on db::config
auth: remove unneeded db/config.hh includes
Instead, distribute those inclusions to .cc files that require them. This
reduces rebuilds when config.hh changes, and makes it easier to locate files
that need config disaggregation.
rh_entry address is captured inside timeout's callback lambda, so the
structure should not be moved after it is created. Change the code to
create rh_entry in-place instead of moving it into the map.
Fixes#3972.
Message-Id: <20181206164043.GN25283@scylladb.com>
* seastar d59fcef...b924495 (2):
> build: Fix protobuf generation rules
> Merge "Restructure files" from Jesse
Includes fixup patch from Jesse:
"
Update Seastar `#include`s to reflect restructure
All Seastar header files are now prefixed with "seastar" and the
configure script reflects the new locations of files.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <5d22d964a7735696fb6bb7606ed88f35dde31413.1542731639.git.jhaberku@scylladb.com>
"
When messaging_service is started we may immediately receive a mutation
from another node (e.g. in the MV update context). If hinted handoff is not
ready to store hints at that point we may fail some of MV updates.
We are going to resolve this by start()ing hints::managers before we
start messaging_service and blocking hints replaying until all relevant
objects are initialized.
Refs #3828
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
"
This series changes hinted handoff to work with `frozen_mutation`s
instead of naked `mutation`s. Instead of unfreezing a mutation from
the commitlog entry and then freezing it again for sending, now we'll
just keep the read, frozen mutation.
Tests: unit(release)
"
* 'hh-manager-cleanup/v1' of https://github.com/duarten/scylla:
db/hints/manager: Use frozen_mutation instead of mutation
db/hints/manager: Use database::find_schema()
db/commitlog/commitlog_entry: Allow moving the contained mutation
service/storage_proxy: send_to_endpoint overload accepting frozen_mutation
service/storage_proxy: Build a shared_mutation from a frozen_mutation
service/storage_proxy: Lift frozen_mutation_and_schema
service/storage_proxy: Allow non-const ranges in mutate_prepare()
write_stats is referenced from write handler which is available in
send_to_live_endpoints already. No need to pass it down.
Message-Id: <20181009133017.GA14449@scylladb.com>
Add an overload to send_to_endpoint() which accepts a frozen_mutation.
The motivation is to allow better accounting of pending view updates,
but this change also allows some callers to avoid unfreezing already
frozen mutations.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Lift frozen_mutation_and_schema to frozen_mutation.hh, since other
subsystems using frozen_mutations will likely want to pass it around
together with the schema.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Right now, storage_proxy's mutate_stage violates isolation by running
in a plain execution_stage without a scheduling_group. This means do_mutate()
will run under the main scheduling_group, at least until we reach the database
apply execution stage, which is correct.
Fix by moving to an inheriting execution stage; this works because the
messaging service will tell RPC to set the correct execution stage for us. We
could explicitly specify statement_scheduling_group, but inheriting the
scheduling group allows us to have multiple statment scheduling groups, later.
Require a timeout parameter for storage_proxy::mutate_begin() and
all its callers (all the way to thrift and cql modification_statement
and batch_statement).
This should fix spurious debug-mode test failures, where overcommit
and general debug slowness result in the default timeouts being
exceeded. Since the tests use infinite timeouts, they should not
time out any more.
Tests: unit (release), with an extra patch that aborts
when a non-infinite timeout is detected.
Message-Id: <20180707204424.17116-1-avi@scylladb.com>
Initializing write response id to the same value on each reboot may
cause stale id to be taken for active one if node restarts after
sending only a couple of write request and before receiving replies.
On next reboot it will start assigning id's from the same value and
receiving old replies will confuse it. Mitigate this by assigning
initial id to wall clock value in milliseconds. It will not solve the
problem completely, but will mitigate it.
This commit extracts metrics related to writes from stats structure,
so it can be easily replaced later, e.g. for materialized view metrics.
References #3385
References #3416
This commit initializes and enables hinted handoff for materialized
views, even if HH is not explicitly turned on in config.
User writes still use hinted handoff only if it is explicitly enabled,
while materialized views are allowed to use it unconditionally
in order to store failed replica updates somewhere.
Fixes#3383
"
This patchset implements separate timeouts for range queries, and lays
the foundations for separate timeouts for other query types.
While the feature in itself is worthy, the real motivation is to have
the timeouts decided by the caller, instead of storage_proxy. This in
turn is required to disentangle each layer behaving differently
depending on whether the query is internal or not; instead, the goal
is to have each caller declare its needs in terms of consistency level
and timeouts, and have the lower layers implement its requirements
instead of making their own decisions.
Fixes#3013.
Tests: unit (release)
"
* tag '3013/v1.1' of https://github.com/avikivity/scylla:
storage_proxy: remove default_query_timeout()
storage_proxy: don't use default timeouts
query_options: augment with timeout_config
thrift: configure thrift transport and handler with a timeout_config
transport: configure native transport with a timeout_config
cql3: define and populate timeout_config_selector
timeout_config: introduce timeout configuration
When node is decommissioned/removed it will drain all its hints and all
remote nodes that have hints to it will drain their hints to this node.
What "drain" means? - The node that "drains" hints to a specific
destination will ignore failures and will continue sending hints till the end
of the current segment, erase it and move to the next one till there are
no more segments left.
After all hints are drained the corresponding hints directory is removed.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Make the read-repair decision on the first page of a paged-query and use
it for all the remaining pages. This helps querier-cache hit-rates as
reads to nodes will be sent consistently throught the query.
As yet more parameters and return-values are about to be added to all
storage_proxy::query_* methods we need a way that scales better than
changing the signatures every time. To this end we aggregate all
non-mandatory query parameters into `coordinator_query_options` and all
return values into `coordinator_query_result`.
This way new fields can be simply added to the respective structs while
the signatures of the methods themselves and their client code can
remain unchanged.
Propagate the preferred_replicas to db::filter_for_query() and consider
them when selecting the endpoints. The algoritm for selecting the
endpoints is as follows:
* Compute the intersection of the endpoint candidates and the
preferred endpoints.
* If this yields a set of endpoints that already satisfies the CL
requirements use this set.
* Otherwise select the remaining endpoints according to the
load-balancing strategy, just like before.
preferred_replicas are added to the parameters and last_replicas are
added to the return type. The preferred replicas will be used as a hint
for the selection of the replicas to send the read requests to. The last
replicas (returned) are the replicas actually selected for the read.
This will allow queries to consistently hit the same replicas for each
page thus reusing readers created on these replicas.
For convenience a query() overload is provided that doesn't take or
return the preferred and last replicas.
This patch only adds the parameters and propagates them down to
query_singular() and query_partition_key_range(). The code to actually
use these preferred-replicas will be added in later patches.
This reason for separating this is to reduce noise and improve
reviewability for those functional changes later.