Commit Graph

28001 Commits

Author SHA1 Message Date
Avi Kivity
562e68835b cql3: expr, user types: convert user type literals to expressions
Convert the user_types::literal raw to a new expression type
usertype_constructor. I used "usertype" to convey that is is a
((user type) constructor), not a (user (type constructor)).
2021-08-26 15:26:35 +03:00
Avi Kivity
4d7e00d0f8 cql3: selection: make selectable.hh not include expr/expresion.hh
We have this dependency now:

   column_identifier -> selectable -> expression

and want to introduce this:

   expression -> user types -> column_identifier

This leads to a loop, since expression is not (yet) forward
declarable.

Fix by moving any mention of expression from selectable.hh to a new
header selection-expr.hh.

database.cc lost access to timeout_config, so adjust its includes
to regain it.
2021-08-26 15:19:14 +03:00
Avi Kivity
9d6bc7eae6 cql3: sets, user types: move user types raw functions around
Move them closer to prepare related functions for modification.
2021-08-26 15:15:59 +03:00
Avi Kivity
06bca067f8 cql3: expr, sets, maps: convert set and map literals to collection_constructor
Add set and map styles to collection_constructor. Maps are implemented as
collection_constructor{tuple_constructor{key, value}...}. This saves
having a new expression type, and reduces the effort to implement
recursive descent evaluation for this omitted expression type.
2021-08-26 15:13:37 +03:00
Avi Kivity
658cd47d21 cql3: sets, maps, expr: move set and map raw functions around
Move them closer to prepare related functions for modification. Since
sets and maps share some implementation details in the grammar, they
are moved and converted as a unit.
2021-08-26 15:13:07 +03:00
Avi Kivity
d2ab7fc26d cql3: expr, lists: convert lists::literal to new collection_constructor
Introduce a collection_constructor (similar to C++'s std::initializer_list)
to hold subexpressions being gathered into a list. Since sets, maps, and
lists construction share some attributes (all elements must be of the
same type) collection_constructor will be used for all of them, so it
also holds an enum. I used "style" for the enum since it's a weak
attribute - an empty set is also an empty map. I chose collection_constructor
rather than plain 'collection' to highlight that it's not the only way
to get a collection (selecting a collection column is another, as an
example) and to hint at what it does - construct a collection from
more primitive elements.
2021-08-26 15:10:41 +03:00
Avi Kivity
4defb42c86 cql3: lists, expr: move list raw functions around
Move them closer to prepare related functions for modification.
2021-08-26 15:08:14 +03:00
Avi Kivity
5e448e4a2a cql3: tuples, expr: convert tuples::literal to expr::tuple_constructor
Introduce tuple_constructor (not a literal, since (?, ?) and (column_value,
column_value) are not literals) to represent a tuple constructed from
subexpressions. In the future we can replace column_value_tuple
with tuple_constructor(column_value, column_value, ...), but this is
not done now.

I chose the name 'tuple_constructor' since other expressions can represent
tuples (e.g. my_tuple_column, :bind_variable_of_tuple_type,
func_returning_tuple()). It also explains what the expression does.
2021-08-26 15:07:15 +03:00
Avi Kivity
41c532f19c cql3: expr, tuples: deinline and move tuple raw functions
Move them closer to prepare functions for modification.
2021-08-26 15:04:21 +03:00
Avi Kivity
2c42a65db1 cql3: expr, constants: convert constants::literal to untyped_constant
Introduce a new expression untyped_constant that corresponds to
constants::literal, which is removed. untyped_constant is rather
ugly in that it won't exist post-prepare. We should probably instead
replace it with typed constants that use the widest possible type
(decimal and varint), and select a narrower type during the prepare
phase when we perform type inference. The conversion itseld is
straightforward.
2021-08-26 15:03:07 +03:00
Avi Kivity
4d9bde561a cql3: constants: move constants::literal implementation around
Move it closer to prepare functions for modification.
2021-08-26 15:01:06 +03:00
Avi Kivity
838bfbd3e0 cql3: expr, abstract_marker: convert to expressions
Convert the four forms of abstract_marker to expr::bind_variable (the
name was chosen since variable is the role of the thing, while "marker"
refers more to the grammar). Having four variants is unnecessary, but
this patch doesn't do anything about that.
2021-08-26 15:01:04 +03:00
Avi Kivity
218f4d87f8 cql3: column_condition: relax types around abstact_marker::in_raw
We can only convert expressions to term::raw, not the subclass
abstract_marker::in_raw, so relax the types. They will all be converted
to expressions. Relaxing types isn't good, but the structure is enforced
now by the grammar (and dynamically using variant casts), and in the future
by a typecheck pass (which will allow us to remove the many variations
of markers).
2021-08-26 14:55:17 +03:00
Avi Kivity
6dcc43d227 cql3: tuple markers: deinline and rearrange
Move raw methods near to the other prepare-related functions.
2021-08-26 14:54:15 +03:00
Avi Kivity
35db2b34e4 cql3: abstract_marker, term_expr: rearrange raw abstract marker implementation
Move raw methods near to the other prepare-related functions.
2021-08-26 14:53:58 +03:00
Avi Kivity
aba205917d cql3: expr, constants: convert cql3::constants::null_literal to new cql3::expr::null
Introduce cql3::expr::null and use it to represent null_literal, which is
removed.
2021-08-26 14:49:46 +03:00
Avi Kivity
5b42cbf9e0 cql3: expr, constants: deinline null_literal
Deinline null_literal methods and place them near the other prepare-related
functions.
2021-08-26 14:45:56 +03:00
Avi Kivity
51f62d5953 cql3: constants: extricate cql3::constants::null_literal::null_value from null_literal
null_literal (which is in the term::raw domain) will be converted to an
expression, so unnest the nested class null_value (which is in the term
domain and is not converted now).
2021-08-26 14:44:21 +03:00
Avi Kivity
10e08dc87e cql3: term::raw, expr: convert type casts to expressions
We reuse the expr::cast type that was previously used for selectables.
When preparing, subexpressions are converted to term::raw; this will
be removed later.
2021-08-26 14:42:55 +03:00
Avi Kivity
6f8b6aef17 cql3: type_cast: deinline some methods
These methods will be converted to the expression variant, and
it's impossible to do this while inlined due to #include cycles. In
any case, deinlining is better.

Since there is no type_cast.cc, and since they'll become part of
expr_term call chain soon, they're moved there, even though it seems
odd for this patch. It's a waste to create type_cast.cc just for those
three functions.
2021-08-26 14:41:38 +03:00
Avi Kivity
3d30c161e4 cql3: expr: prepare expr::cast for unprepared types
The cast expression has two operands: the subexpression to cast and the
type to cast to. Since prepared and unprepared expressions are the
same type, we don't have to do anything, but prepared and unprepared
types are different. So add a variant to be able to support both.

The reason the selectable->expression transformation did not need to
do this is that casts in a selector cannot accept a user defined type.
Note those casts also have different syntax and different execution,
so we'll have to choose whether to unify the two semantics, or whether
to keep them separate. This patch does not force anything (but does hint
at unification by not including any discriminant beyond the type's
rawness). The string representation matches the part of the grammar
it was derived from (or conversion back to CQL will yield wrong
results).
2021-08-26 14:39:33 +03:00
Avi Kivity
b76395a410 cql3: expr, functions: move raw function calls to expressions
Remove cql3::functions::function_call::raw and replace it with
cql3::expr::function_call, which already existed from the selector
migration to expressions. The virtual functions implementing term::raw
are made free functions and remain in place, to ease migration and
review.

Note that preparing becomes a more complicated as it needs to
account for anonymous functions, which were not representable
in the previous structure (and still cannot be created by the
parser for the term::raw path).

The parser now wraps all its arguments with the term::raw->expr
bridge, since that's what expr::function_call expects, and in
turn wraps the function call with an expr->term::raw bridge, since
that's what the rest of the parser expects. These will disappear
when the migration completes.
2021-08-26 14:38:16 +03:00
Avi Kivity
0d24af7775 cql3: expr, term::raw: add conversions between the two types
Add a way to convert between the old world and the new, and back. Note
that instead of blindly wrapping, we unwrap if we received a wrapped
object.
2021-08-26 14:35:46 +03:00
Avi Kivity
a5031dd5bf cql3: expr, term::raw: add reverse bridge
Since expressions can nest, and since we won't covert everything at once,
add a way to store a term::raw as an expression. We can now have a
term::raw that is internally an expression, and an expression that is
implemented as term::raw.
2021-08-26 14:32:04 +03:00
Avi Kivity
725065b066 cql3: term::raw, expr: add bridge between term::raw and expressions
A term_raw_expression is a term::raw that holds an expression. It will
be used to incrementally convert the source base to expressions, while
still exposing the result to the common interface of shared_ptr<term::raw>.
2021-08-26 14:14:18 +03:00
Avi Kivity
9a158cd7b5 cql3: eliminate multi_column_raw
Now that the signatures of term::raw::prepare and multi_column_raw::prepare
are identical, we can eliminate multi_column_raw, replacing it with
term::raw where needed. In some cases we delete it from the inheritance chain
since we reach term::raw via a different base class.

Note that a dynamic_cast<> is eliminated, so we compenate for the addition
of runtime checks in the previous patch by the deletion of runtime checks
in this patch.
2021-08-26 14:11:42 +03:00
Avi Kivity
660be97028 cql3: term::raw, multi_column_raw: unify prepare() signatures
In order to replace the term::raw hierarchy with expressions,
we need to unify the signatures of term::raw::prepare() and
term::multi_column_raw::prepare(). This is because we'll only have
one expression type to represent both single values and tuples
(although, different subexpression types will may used).

The difference in the two prepare() signatures is the
`receiver` parameter - which is a (type, name) pair used
to perfom type inference on the expression being prepared,
with the name used to report errors. In a perfect world, this
would just be an expression - a tuple or a singular expression
as the case requires. But we don't have the needed expression
infrastructure yet - general tuples or name-annotated expressions.

Resolve the problem by introducing a variant for the single-value
and tuple. This is more or less creating a mini-expression type
used just for this. Once our expression type grows the needed
capabilities, it can replace this type.

Note that for some cases, this replaces compile-time checks by
runtime checks (which should never trigger). In other cases
the classes really needed both interfaces, so the new variant
is a better fit.
2021-08-26 14:11:42 +03:00
Avi Kivity
acf8da2bce Merge "flat_mutation_reader: keep timeout in permit" from Benny
"
This series moves the timeout parameter, that is passed to most
f_m_r methods, into the reader_permit.  This eliminates
the need to pass the timeout around, as it's taken
from the permit when needed.

The permit timeout is updated in certain cases
when the permit/reader is paused and retrieved
later on for reuse.

Following are perf_simple_query results showing ~1%
reduction in insns/op and corresponding increase in tps.

$ build/release/test/perf/perf_simple_query -c 1 --operations-per-shard 1000000 --task-quota-ms 10

Before:
102500.38 tps ( 75.1 allocs/op,  12.1 tasks/op,   45620 insns/op)

After:
103957.53 tps ( 75.1 allocs/op,  12.1 tasks/op,   45372 insns/op)

Test: unit(dev)
DTest:
    repair_additional_test.py:RepairAdditionalTest.repair_abort_test (release)
    materialized_views_test.py:TestMaterializedViews.remove_node_during_mv_insert_3_nodes_test (release)
    materialized_views_test.py:InterruptBuildProcess.interrupt_build_process_with_resharding_half_to_max_test (release)
    migration_test.py:TTLWithMigrate.big_table_with_ttls_test (release)
"

* tag 'reader_permit-timeout-v6' of github.com:bhalevy/scylla:
  flat_mutation_reader: get rid of timeout parameter
  reader_concurrency_semaphore: use permit timeout for admission
  reader_concurrency_semaphore: adjust reactivated reader timeout
  multishard_mutation_query: create_reader: validate saved reader permit
  repair: row_level: read_mutation_fragment: set reader timeout
  flat_mutation_reader: maybe_timed_out: use permit timeout
  test: sstable_datafile_test: add sstable_reader_with_timeout
  reader_permit: add timeout member
2021-08-25 17:51:10 +03:00
Raphael S. Carvalho
a4053dbb72 repair: Postpone data segregation to off-strategy compaction
With data segregation on repair, thousands of sstables are potentially
added to maintenance set which causes high latency due to stalls.

That's because N*M sstables are created by a repair,
	where N = # of ranges
	and M = # of segregations

For TWCS, M = # of windows.

Assuming N = 768 and M = 20, ~15k sstables end up in sstable set

To fix this problem, let's avoid performing data segregation in repair,
as offstrategy will already perform the segregation anyway.

So from now on, only N non-overlapping sstables will be added to set.
Read amplification isn't affected because a query will only touch one
sstable in maintenance set.
When offstrategy starts, it will pick all sstables from set and
compact them in a single step while performing data segregation,
so data is properly laid out before integrated into the main set.

tests:
	- sstable_compaction_test.twcs_reshape_with_disjoint_set_test
	- mode(dev)
	- manual test using repair-based bootstrap

Fixes #9199.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210824185043.76475-1-raphaelsc@scylladb.com>
2021-08-25 15:31:38 +03:00
Pavel Emelyanov
b012040a76 mutation: Keep range tombstone in tree when consuming
Current code std::move()-s the range tombstone into consumer thus
moving the tombstone's linkage to the containing list as well. As
the result the orignal range tombstone itself leaks as it leaves
the tree and cannot be reached on .clear(). Another danger is that
the iterator pointing to the tombstone becomes invalid while it's
then ++-ed to advance to the next entry.

The immediate fix is to keep the tombstone linked to the list while
moving.

fixes: #9207

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210825100834.3216-1-xemul@scylladb.com>
2021-08-25 13:25:18 +03:00
Botond Dénes
6df77e350a mutation_fragment{_v2}: MutationFragmentConsumer: allow for abstract consumer
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20210825083244.436274-1-bdenes@scylladb.com>
2021-08-25 13:12:41 +03:00
Avi Kivity
993f824cfd Merge "raft: implement linearisable reads on a follower" from Gleb and Kostja
"
This series implements section 6.4 of the Raft PhD. It allows to do
linearisable reads on a follower bypassing raft log entirely. After this
series server::read_barrier can be executed on a follower as well as
leader and after it completes local user's state machine state can be
accessed directly.
"

* 'raft-read-v9' of github.com:scylladb/scylla-dev:
  raft: test: add read_barrier test to replication_test
  raft: test: add read_barrier tests to fsm_test
  raft: make read_barrier work on a follower as well as on a leader
  raft: add a function to wait for an index to be applied
  raft: (server) add a helper to wait through uncertainty period
  raft: make fsm::current_leader() public
  raft: add hasher for raft::internal::tagged_uint64
  serialize: add serialized for std::monostate
  raft: fix indentation in applier_fiber
2021-08-25 13:11:35 +03:00
Gleb Natapov
3ff6f76cef raft: test: add read_barrier test to replication_test 2021-08-25 08:57:13 +03:00
Gleb Natapov
ad2c2abcb8 raft: test: add read_barrier tests to fsm_test 2021-08-25 08:57:13 +03:00
Gleb Natapov
03a266d73b raft: make read_barrier work on a follower as well as on a leader
This patch implements RAFT extension that allows to perform linearisable
reads by accessing local state machine. The extension is described
in section 6.4 of the PhD. To sum it up to perform a read barrier on
a follower it needs to asks a leader the last committed index that it
knows about. The leader must make sure that it is still a leader before
answering by communicating with a quorum. When follower gets the index
back it waits for it to be applied and by that completes read_barrier
invocation.

The patch adds three new RPC: read_barrier, read_barrier_reply and
execute_read_barrier_on_leader. The last one is the one a follower uses
to ask a leader about safe index it can read. First two are used by a
leader to communicate with a quorum.
2021-08-25 08:57:13 +03:00
Gleb Natapov
73af7edc78 raft: add a function to wait for an index to be applied 2021-08-25 08:19:25 +03:00
Konstantin Osipov
0429196e06 raft: (server) add a helper to wait through uncertainty period
Add a helper to be able to wait until a Raft cluster
leader is elected. It can be used to avoid sleeps
when it's necessary to forward a request to the leader,
but the leader is yet unknown.
2021-08-25 08:19:25 +03:00
Gleb Natapov
376785042f raft: make fsm::current_leader() public
Later patch will call it from server class.
2021-08-25 08:19:25 +03:00
Gleb Natapov
273f753815 raft: add hasher for raft::internal::tagged_uint64
Need it to be able to use tagged_uint64 as a key in an unordered  map.
2021-08-25 08:19:25 +03:00
Gleb Natapov
4851d64c68 serialize: add serialized for std::monostate 2021-08-25 08:19:25 +03:00
Gleb Natapov
bd0fd579cf raft: fix indentation in applier_fiber 2021-08-25 08:19:25 +03:00
Nadav Har'El
cf06b7cd40 test/alternator: correct some typos in comments
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210729125317.1610573-1-nyh@scylladb.com>
2021-08-24 19:43:29 +03:00
Avi Kivity
4a42b69ba8 Merge "raft: testing: many nodes test" from Alejo
"
Factor out replication test, make it work with different clocks, add
some features, and add a many nodes test with steady_clock. Also
refactor common test helper.

Many nodes test passes for release and dev and normal tick of 100ms for
up to 1000 servers. For debug mode it's much fewer due to lack of
optimizations so it's only tested for smaller numbers.

Tests: unit ({dev}), unit ({debug}), unit ({release})
"

* 'raft-many-22-v12' of https://github.com/alecco/scylla: (21 commits)
  raft: candidate timeout proportional to cluster size
  raft: testing: many nodes test
  raft: replication test: remove unused tick_all
  raft: replication test: delays
  raft: replication test: packet drop rpc helper
  raft: replication test: connectivity configuration
  raft: replication test: rpc network map in raft_cluster
  raft: replication test: use minimum granularity
  raft: replication test: minor: rename local to int ids
  raft: replication test: fix restart_tickers when partitioning
  raft: replication test: partition ranges
  raft: replication test: isolate one server
  raft: replication test: move objects out of header
  raft: replication test: make dummy command const
  raft: replication test: template clock type
  raft: replication test: tick delta inside raft_cluster
  raft: replication test: style - member initializer
  raft: replication test: move common code out
  raft: testing: refactor helper
  raft: log election stages
  ...
2021-08-24 17:05:05 +03:00
Benny Halevy
4476800493 flat_mutation_reader: get rid of timeout parameter
Now that the timeout is taken from the reader_permit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
4e3dcfd7d6 reader_concurrency_semaphore: use permit timeout for admission
Now that the timeout is stored in the reader
permit use it for admission rather than a timeout
parameter.

Note that evictable_reader::next_partition
currently passes db::no_timeout to
resume_or_create_reader, which propagated to
maybe_wait_readmission, but it seems to be
an oversight of the f_m_r api that doesn't
pass a timeout to next_partition().

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
9b0b13c450 reader_concurrency_semaphore: adjust reactivated reader timeout
Update the reader's timeout where needed
after unregistering inactive_read.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
605a1e6943 multishard_mutation_query: create_reader: validate saved reader permit
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:51 +03:00
Benny Halevy
eeab5f77d9 repair: row_level: read_mutation_fragment: set reader timeout
The timeout needs to be propagated to the reader's permit.
Reset it to db::no_timeout in repair_reader::pause().

Warn if set_timeout asks to change the timeout too far into the
past (100ms).  It is possible that it will be passed a
past timeout from the rcp path, where the message timeout
is applied (as duration) over the local lowres_clock time
and parallel read_data messages that share the query may end
up having close, but different timeout values.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 16:30:40 +03:00
Benny Halevy
f25aabf1b2 flat_mutation_reader: maybe_timed_out: use permit timeout
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00
Benny Halevy
46fb7fe68e test: sstable_datafile_test: add sstable_reader_with_timeout
Verify that the sstable reader (for the highest supported version)
times out properly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-08-24 14:29:44 +03:00