"
This patchset makes all users of query_processor specify their timeouts
explicitly, in preparation for the removal of
cql_statement::execute_internal() (whose main function was to override
timeouts).
"
* tag 'cql-explicit-timeouts/v1' of https://github.com/avikivity/scylla:
query_processor: require clients to specify timeout configuration
query_processor: un-default consistency level in make_internal_options
This commit adapts view_stats structure so it can be passed
to storage_proxy as write stats. Thanks to that, mv replica updates
will not interfere with user write metrics. As a side effect it also
provides more stats to replica view updates.
Closes#3385Closes#3416
This commit adds statistics to row_locker class. Metrics are
independendly counted for all lock types: row<->partition and
exclusive<->shared.
Metrics gathered:
- total acquisitions
- operations that wait on the lock
- histogram of the time spent on waiting on this type of lock
References #3385
References #3416
This commit introduces view statistics:
- updates pushed to local/remote replicas
- updates failed to be pushed to local/remote replicas
Metrics are kept on per-table basis, i.e. updates_pushed_remote
shows the number of total updates (mutations) pushed to all paired
mv replicas that this particular table has.
Every single update is taken into consideration, so if view update
requires removing a row from one view and adding a row to another,
it will be counted as 2 updates.
References #3385
References #3416
Fixes#3446
Previously, only shutdown-synced objects where actually closed,
which is wrong.
This introduces yet another queue, processed together with the
deletion objects, which ensures we explicitly close all objects
that have been discarded.
Message-Id: <20180521140456.32100-1-calle@scylladb.com>
We are currently moving the pointer we acquired to the segment inside
the lambda in which we'll handle the cycle.
The problem is, we also use that same pointer inside the exception
handler. If an exception happens we'll access it and we'll crash.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20180518125820.10726-1-glauber@scylladb.com>
Remove implicit timeouts and replace with caller-specified timeouts.
This allows removing the ambiguity about what timeout a statement is
executed with, and allows removing cql_statement::execute_internal(),
which mostly overrode timeouts and consistency levels.
Timeout selection is now as follows:
query_processor::*_internal: infinite timeout, CL=ONE
query_processor::process(), execute(): user-specified consisistency level and timeout
All callers were adjusted to specify an infinite timeout. This can be
further adjusted later to use the "other" timeout for DCL and the
read or write timeout (as needed) for authentication in the normal
query path.
Note that infinite timeouts don't mean that the query will hang; as
soon as the failure detector decides that the node is down, RPC
responses will termiante with a failure and the query will fail.
"
This patchset implements separate timeouts for range queries, and lays
the foundations for separate timeouts for other query types.
While the feature in itself is worthy, the real motivation is to have
the timeouts decided by the caller, instead of storage_proxy. This in
turn is required to disentangle each layer behaving differently
depending on whether the query is internal or not; instead, the goal
is to have each caller declare its needs in terms of consistency level
and timeouts, and have the lower layers implement its requirements
instead of making their own decisions.
Fixes#3013.
Tests: unit (release)
"
* tag '3013/v1.1' of https://github.com/avikivity/scylla:
storage_proxy: remove default_query_timeout()
storage_proxy: don't use default timeouts
query_options: augment with timeout_config
thrift: configure thrift transport and handler with a timeout_config
transport: configure native transport with a timeout_config
cql3: define and populate timeout_config_selector
timeout_config: introduce timeout configuration
The switch to the new in-memory representation will require a larger
parts of the logic be aware of the type of the values they are dealing
with. In most cases it is not a significant burden for the users.
When node is decommissioned/removed it will drain all its hints and all
remote nodes that have hints to it will drain their hints to this node.
What "drain" means? - The node that "drains" hints to a specific
destination will ignore failures and will continue sending hints till the end
of the current segment, erase it and move to the next one till there are
no more segments left.
After all hints are drained the corresponding hints directory is removed.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Returning a future with an exception from end_point_manager::stop()
is practically useless because the best the caller can do is to log
it and continue as if it didn't happen because it has other things
to shut down.
Therefore in order to simplify the caller we will log the exception
if it happens and will always return a non-exceptional future.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
This ensures we respect the write timeout set by the client when
applying base writes, in case a writes takes too long to acquire the
row lock for the read-before-write phase of a materialized view
update.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180507132755.8751-1-duarte@scylladb.com>
This commit makes database, sstables and tests aware
of which large_partition_handler they use.
Proper large_partition_handler is retrievable from config information
and is based on existing compaction_large_partition_warning_threshold_mb
entry. Right now CQL TABLE variant of large_partition_handler is used
in the database.
Tests use a NOP version of large_partition_handler, which does not
depend on CQL queries at all.
This commit introduces large_partition_handler class, which can be used
to take additional action when large partitions are written.
It comes with two implementations:
* NOP, used in tests, which does nothing on large partition
update/delete
* CQL TABLE, which inserts/deletes information on particular sstable
to system.large_partitions table, in order to be retrievable from
cqlsh later.
References #3292
This commit adds a system.large_partitions table, which can be used
to trace largest partitions of a cluster.
Schema: (
keyspace_name text,
table_name text,
sstable_name text,
partition_size bigint,
key text,
compaction_time timestamp,
PRIMARY KEY((keyspace_name, table_name), sstable_name, partition_size, key)
) WITH CLUSTERING ORDER BY (partition_size DESC);
References #3292
The db/index directory contains just a few lines of code that exists
there for historical reasons. It's confusing that we have both db/index
and index/ directory related to secondary-indexing.
This patch moves what little is still in db/index/ to index/. In the
future we should probably get rid of the "secondary_index" class we had
there, but for now, let's at least not have a whole new directory for it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180501101246.21143-1-nyh@scylladb.com>
We had another case-sensitivity bug in materialized views, where if
a case-sensitive (quoted) column name was listed explicitly on "SELECT"
(instead of implicitly, e.g., in "SELECT *") the column name was
incorrectly folded to lower-case and inserts would fail.
This patch fixes the code, where a "SELECT" statement was built using
the desired column names, but column names that needed quoting were
not being quoted. The bug was in a helper function build_select_statement()
which took column name strings and failed to quote them. We clean up this
function to take column definitions instead of strings - and take care
of the quoting itself. It also needs to quote the table's name in the
select statement being built.
Fixes#3391.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180429221857.6248-6-nyh@scylladb.com>
After upgrade from 1.7 to 2.0, nodes will record a per-table schema
version which matches that on 1.7 to support the rolling upgrade. Any
later schema change (after the upgrade is done) will drop this record
from affected tables so that the per-table schema version is
recalculated. If nodes perform a schema pull (they detect schema
mismatch), then the merge will affect all tables and will wipe the
per-table schema version record from all tables, even if their schema
did not change. If then only some nodes get restarted, the restarted
nodes will load tables with the new (recalculated) per-table schema
version, while not restarted nodes will still use the 1.7 per-table
schema version. Until all nodes are restarted, writes or reads between
nodes from different groups will involve a needless exchange of schema
definition.
This will manifest in logs with repeated messages indicating schema
merge with no effect, triggered by writes:
database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
database - Schema version changed to 85ab46cd-771d-36c9-bc37-db6d61bfa31f
The sync will be performed if the receiving shard forgets the foreign
version, which happens if it doesn't process any request referencing
it for more than 1 second.
This may impact latency of writes and reads.
The fix is to treat schema changes which drop the 1.7 per-table schema
version marker as an alter, which will switch in-memory data
structures to use the new per-table schema version immediately,
without the need for a restart.
Fixes#3394
Tests:
- dtest: schema_test.py, schema_management_test.py
- reproduced and validated the fix with run_upgrade_tests.sh from git@github.com:tgrabiec/scylla-dtest.git
- unit (release)
Message-Id: <1524764211-12868-1-git-send-email-tgrabiec@scylladb.com>
Fixes#3187
Requires seastar "inet_address: Add constructor and conversion function
from/to IPv4"
Implements support IPv6 for CQL inet data. The actual data stored will
now vary between 4 and 16 bytes. gms::inet_address has been augumented
to interop with seastar::inet_address, though of course actually trying
to use an Ipv6 address there or in any of its tables with throw badly.
Tests assuming ipv4 changed. Storing a ipv4_address should be
transparent, as it now "widens". However, since all ipv4 is
inet_address, but not vice versa, there is no implicit overloading on
the read paths. I.e. tests and system_keyspace (where we read ip
addresses from tables explicitly) are modified to use the proper type.
Message-Id: <20180424161817.26316-1-calle@scylladb.com>
When a view's PK only contains the columns that form the base's PK,
then the liveness of a particular view row is determined not only by
the base row's marker, but also by the selected and, more importantly,
unselected columns.
This patch ensures that unselected columns are considered as much as
possible, even though some limitations will still exist. In
particular, we need to represent multiple timestamps (from all the
unselected columns), but have only mechanisms to record a single
timestamp.
We also have some issues when dealing with selected column, and the
way we currently delete them. Consider the following:
create table cf (p int, c int, a int, b int, primary key (p, c))
create materialized view vcf as select a, b
from cf where p is not null and c is not null
primary key (p, c)
1) update cf using timestamp 10 set a = 1 where p = 1 and c = 1
2) delete a from cf using timestamp 11 where p = 1 and c = 1
3) update cf using timestamp 1 set a = 2 where p = 1 and c = 1
After 1), the MV should include a row with row marker @ ts10,
p = 1, c = 1, a = 1. After 2), this row should be removed.
At 3), we should add a row with row marker @ ts1, p = 1, c = 1, a = 1,
with a lower timestamp. This means that the delete should not
insert a row tombstone with timestamp @ 11, as we do now but it should
just delete the view's row marker (which exists) with ts1.
Refs #3362Fixes#3140Fixes#3361
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
When views contain a primary key column that is not part of the base
table primary key, that column determines whether the row is live or
not. We need to ensure that when that cell is dead, and thus the
derived row marker, either by normal deletion of by TTL, so is the
rest of the row.
This patch introduces the idea of shawdowing row marker. We map the
status of the regular base column in the view's PK to the view row's
marker. If this marker is dead, so is that cell in the base table, and
so should the view row become. To enforce that, a view row's dead
marker shadows the whole row if that view includes a base regular
column in its PK.
Fixes#3360
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
When a view's PK only contains the columns that form the base's PK,
then the liveness of a particular view row is determined not only by
the base row's marker, but also by the selected and, more importantly,
unselected columns. When calculating the view's row marker we need
to access those unselected columns, so we can't avoid the
read-before-write as we were doing.
Refs #3362
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
When a view's PK only contains the columns that form the base's PK,
then the liveness of a particular view row is determined not only by
the base row's marker, but also by the selected and, more importantly,
unselected columns. So, process base updates to columns unselected by
any of its views.
Refs #3362
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Not adding the partition tombstone to the current list of tombstones
may cause updates to be incorrectly generated.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Instead of lazily-initializing the regular base column in the view's
PK field, explicitly initialize it. This will be used by future
patches that don't have access to the schema when wanting to obtain
that column.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
"
Pass sstable version to parse, write and describe_type methods to make it possible to handle different versions.
For now serialization header from 3.x format is ignored.
Tests: units (release)
"
* 'haaawk/sstables3/loading_v3' of ssh://github.com/scylladb/seastar-dev:
Add test for loading the whole sstable
Add test for loading statistics
Add support for 3_x stats metadata
Pass sstable version to describe_type
Pass sstable version to write methods
metadata_type: add Serialization type
Pass sstable_version_types to parse methods
Add test for reading filter
Add test for read_summary
sstables 3.x: Add test for reading TOC
sstable: Make component_map version dependent
sstable::component_type: add operator<<
Extract sstable::component_type to separete header
Remove unused sstable::get_shared_components
sstable_version_types: add mc version
Since storage_proxy provides access to the entire cluster, a local shard
reference is sufficient. Adjust query_processor to store a reference to
just the local shard, rather than a seastar::sharded<storage_proxy> and
adjust callers.
This simplifies the code a little.
Message-Id: <20180415142656.25370-3-avi@scylladb.com>
"
Fixes to the view building process, discovered from field experience.
Tests: dtest(materialized_view_tests.py, smp=2)
"
* 'views/view-build-fixes/v1' of https://github.com/duarten/scylla:
db/view: Start view building after schema agreement
db/system_keyspace: scylla_views_builds_in_progress writes are user mem
db/view: Require configuration option to enable view building
Empty partition keys are not supported on normal tables - they cannot
be inserted or queried (surprisingly, the rules for composite
partition keys are different: all components are then allowed to be
empty). However, the (non-composite) partition key of a view could end
up being empty if that column is: a base table regular column, a
base table clustering key column, or a base table partition key column,
part of a composite key.
Fixes#3262
Refs CASSANDRA-14345
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20180403122244.10626-1-duarte@scylladb.com>
If a base table or view has been dropped in one node, but another
one hasn't yet learned about it, it starts the view build process
immediately on boot, possibly calculating unneeded view updates and
causing errors at the view replica, if that replica has already
processed the schema changes. We should thus wait for schema
agreement, even if the node is a seed.
Fixes#3328
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Treat writes to scylla_views_builds_in_progress as user memory, as the
number of writes is dependent on the amount of user data on views
(times the number of views, divided by the view building batch size).
Fixes#3325
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
View building, enabled by default, can contain or expose issues that
prevent the node from starting. In those cases, it is necessary to
disable view building such that the node can be submitted to
maintenance operations.
Fixes#3329
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
While building with -O1, I saw that the linker could not find
the vtable for named_value<log_level>. Rather than fixing up the
includes (and likely lengthening build time), fix by defining
the class as an extern template, preventing it from being
instantiated at the call site.
Message-Id: <20180401150235.13451-1-avi@scylladb.com>
Intended for use by unit tests, this patch allows synchronizing with
the end of a build for a particular view.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
This patch adds the missing view building code to the eponymous class.
We consume from the reader associated with each base table until all
its views are built. If the reader reaches the end and there are
incomplete views, then a view was added while others were being built.
In such cases, we restart the reader to the beginning of the current
token, but not to the beginning of the token range, when the view is
added. Then, when we exhaust the reader, we simply create a new one
for the whole token range, and resume building the pending views.
We aim to be resource-conscious. On a given shard, at any given moment,
we consume at most from one reader. We also strive for fairness, in that
each build step inserts entries for the views of a different base. Each
build step reads and generates updates for batch_size rows. We lack a
controller, which could potentially allow us to go faster (to execute
multiple steps at the same time, or consume more rows per batch), and
also which would apply backpressure, so we could, for example, delay
executing a build step.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
The view_builder now uses the migration_manager to subscribe to schema
change events, and update its bookkeeping accordingly. We prefer this
to having the database call into the view_builder, as that would
create a cyclic dependency.
We serialize changes to the views of a particular base table, such
that schema changes do not interfere with the upcoming view building
code.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>