scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 06:23:03 +00:00

Author	SHA1	Message	Date
Avi Kivity	562e68835b	cql3: expr, user types: convert user type literals to expressions Convert the user_types::literal raw to a new expression type usertype_constructor. I used "usertype" to convey that is is a ((user type) constructor), not a (user (type constructor)).	2021-08-26 15:26:35 +03:00
Avi Kivity	4d7e00d0f8	cql3: selection: make selectable.hh not include expr/expresion.hh We have this dependency now: column_identifier -> selectable -> expression and want to introduce this: expression -> user types -> column_identifier This leads to a loop, since expression is not (yet) forward declarable. Fix by moving any mention of expression from selectable.hh to a new header selection-expr.hh. database.cc lost access to timeout_config, so adjust its includes to regain it.	2021-08-26 15:19:14 +03:00
Avi Kivity	9d6bc7eae6	cql3: sets, user types: move user types raw functions around Move them closer to prepare related functions for modification.	2021-08-26 15:15:59 +03:00
Avi Kivity	06bca067f8	cql3: expr, sets, maps: convert set and map literals to collection_constructor Add set and map styles to collection_constructor. Maps are implemented as collection_constructor{tuple_constructor{key, value}...}. This saves having a new expression type, and reduces the effort to implement recursive descent evaluation for this omitted expression type.	2021-08-26 15:13:37 +03:00
Avi Kivity	658cd47d21	cql3: sets, maps, expr: move set and map raw functions around Move them closer to prepare related functions for modification. Since sets and maps share some implementation details in the grammar, they are moved and converted as a unit.	2021-08-26 15:13:07 +03:00
Avi Kivity	d2ab7fc26d	cql3: expr, lists: convert lists::literal to new collection_constructor Introduce a collection_constructor (similar to C++'s std::initializer_list) to hold subexpressions being gathered into a list. Since sets, maps, and lists construction share some attributes (all elements must be of the same type) collection_constructor will be used for all of them, so it also holds an enum. I used "style" for the enum since it's a weak attribute - an empty set is also an empty map. I chose collection_constructor rather than plain 'collection' to highlight that it's not the only way to get a collection (selecting a collection column is another, as an example) and to hint at what it does - construct a collection from more primitive elements.	2021-08-26 15:10:41 +03:00
Avi Kivity	4defb42c86	cql3: lists, expr: move list raw functions around Move them closer to prepare related functions for modification.	2021-08-26 15:08:14 +03:00
Avi Kivity	5e448e4a2a	cql3: tuples, expr: convert tuples::literal to expr::tuple_constructor Introduce tuple_constructor (not a literal, since (?, ?) and (column_value, column_value) are not literals) to represent a tuple constructed from subexpressions. In the future we can replace column_value_tuple with tuple_constructor(column_value, column_value, ...), but this is not done now. I chose the name 'tuple_constructor' since other expressions can represent tuples (e.g. my_tuple_column, :bind_variable_of_tuple_type, func_returning_tuple()). It also explains what the expression does.	2021-08-26 15:07:15 +03:00
Avi Kivity	41c532f19c	cql3: expr, tuples: deinline and move tuple raw functions Move them closer to prepare functions for modification.	2021-08-26 15:04:21 +03:00
Avi Kivity	2c42a65db1	cql3: expr, constants: convert constants::literal to untyped_constant Introduce a new expression untyped_constant that corresponds to constants::literal, which is removed. untyped_constant is rather ugly in that it won't exist post-prepare. We should probably instead replace it with typed constants that use the widest possible type (decimal and varint), and select a narrower type during the prepare phase when we perform type inference. The conversion itseld is straightforward.	2021-08-26 15:03:07 +03:00
Avi Kivity	4d9bde561a	cql3: constants: move constants::literal implementation around Move it closer to prepare functions for modification.	2021-08-26 15:01:06 +03:00
Avi Kivity	838bfbd3e0	cql3: expr, abstract_marker: convert to expressions Convert the four forms of abstract_marker to expr::bind_variable (the name was chosen since variable is the role of the thing, while "marker" refers more to the grammar). Having four variants is unnecessary, but this patch doesn't do anything about that.	2021-08-26 15:01:04 +03:00
Avi Kivity	218f4d87f8	cql3: column_condition: relax types around abstact_marker::in_raw We can only convert expressions to term::raw, not the subclass abstract_marker::in_raw, so relax the types. They will all be converted to expressions. Relaxing types isn't good, but the structure is enforced now by the grammar (and dynamically using variant casts), and in the future by a typecheck pass (which will allow us to remove the many variations of markers).	2021-08-26 14:55:17 +03:00
Avi Kivity	6dcc43d227	cql3: tuple markers: deinline and rearrange Move raw methods near to the other prepare-related functions.	2021-08-26 14:54:15 +03:00
Avi Kivity	35db2b34e4	cql3: abstract_marker, term_expr: rearrange raw abstract marker implementation Move raw methods near to the other prepare-related functions.	2021-08-26 14:53:58 +03:00
Avi Kivity	aba205917d	cql3: expr, constants: convert cql3::constants::null_literal to new cql3::expr::null Introduce cql3::expr::null and use it to represent null_literal, which is removed.	2021-08-26 14:49:46 +03:00
Avi Kivity	5b42cbf9e0	cql3: expr, constants: deinline null_literal Deinline null_literal methods and place them near the other prepare-related functions.	2021-08-26 14:45:56 +03:00
Avi Kivity	51f62d5953	cql3: constants: extricate cql3::constants::null_literal::null_value from null_literal null_literal (which is in the term::raw domain) will be converted to an expression, so unnest the nested class null_value (which is in the term domain and is not converted now).	2021-08-26 14:44:21 +03:00
Avi Kivity	10e08dc87e	cql3: term::raw, expr: convert type casts to expressions We reuse the expr::cast type that was previously used for selectables. When preparing, subexpressions are converted to term::raw; this will be removed later.	2021-08-26 14:42:55 +03:00
Avi Kivity	6f8b6aef17	cql3: type_cast: deinline some methods These methods will be converted to the expression variant, and it's impossible to do this while inlined due to #include cycles. In any case, deinlining is better. Since there is no type_cast.cc, and since they'll become part of expr_term call chain soon, they're moved there, even though it seems odd for this patch. It's a waste to create type_cast.cc just for those three functions.	2021-08-26 14:41:38 +03:00
Avi Kivity	3d30c161e4	cql3: expr: prepare expr::cast for unprepared types The cast expression has two operands: the subexpression to cast and the type to cast to. Since prepared and unprepared expressions are the same type, we don't have to do anything, but prepared and unprepared types are different. So add a variant to be able to support both. The reason the selectable->expression transformation did not need to do this is that casts in a selector cannot accept a user defined type. Note those casts also have different syntax and different execution, so we'll have to choose whether to unify the two semantics, or whether to keep them separate. This patch does not force anything (but does hint at unification by not including any discriminant beyond the type's rawness). The string representation matches the part of the grammar it was derived from (or conversion back to CQL will yield wrong results).	2021-08-26 14:39:33 +03:00
Avi Kivity	b76395a410	cql3: expr, functions: move raw function calls to expressions Remove cql3::functions::function_call::raw and replace it with cql3::expr::function_call, which already existed from the selector migration to expressions. The virtual functions implementing term::raw are made free functions and remain in place, to ease migration and review. Note that preparing becomes a more complicated as it needs to account for anonymous functions, which were not representable in the previous structure (and still cannot be created by the parser for the term::raw path). The parser now wraps all its arguments with the term::raw->expr bridge, since that's what expr::function_call expects, and in turn wraps the function call with an expr->term::raw bridge, since that's what the rest of the parser expects. These will disappear when the migration completes.	2021-08-26 14:38:16 +03:00
Avi Kivity	0d24af7775	cql3: expr, term::raw: add conversions between the two types Add a way to convert between the old world and the new, and back. Note that instead of blindly wrapping, we unwrap if we received a wrapped object.	2021-08-26 14:35:46 +03:00
Avi Kivity	a5031dd5bf	cql3: expr, term::raw: add reverse bridge Since expressions can nest, and since we won't covert everything at once, add a way to store a term::raw as an expression. We can now have a term::raw that is internally an expression, and an expression that is implemented as term::raw.	2021-08-26 14:32:04 +03:00
Avi Kivity	725065b066	cql3: term::raw, expr: add bridge between term::raw and expressions A term_raw_expression is a term::raw that holds an expression. It will be used to incrementally convert the source base to expressions, while still exposing the result to the common interface of shared_ptr<term::raw>.	2021-08-26 14:14:18 +03:00
Avi Kivity	9a158cd7b5	cql3: eliminate multi_column_raw Now that the signatures of term::raw::prepare and multi_column_raw::prepare are identical, we can eliminate multi_column_raw, replacing it with term::raw where needed. In some cases we delete it from the inheritance chain since we reach term::raw via a different base class. Note that a dynamic_cast<> is eliminated, so we compenate for the addition of runtime checks in the previous patch by the deletion of runtime checks in this patch.	2021-08-26 14:11:42 +03:00
Avi Kivity	660be97028	cql3: term::raw, multi_column_raw: unify prepare() signatures In order to replace the term::raw hierarchy with expressions, we need to unify the signatures of term::raw::prepare() and term::multi_column_raw::prepare(). This is because we'll only have one expression type to represent both single values and tuples (although, different subexpression types will may used). The difference in the two prepare() signatures is the `receiver` parameter - which is a (type, name) pair used to perfom type inference on the expression being prepared, with the name used to report errors. In a perfect world, this would just be an expression - a tuple or a singular expression as the case requires. But we don't have the needed expression infrastructure yet - general tuples or name-annotated expressions. Resolve the problem by introducing a variant for the single-value and tuple. This is more or less creating a mini-expression type used just for this. Once our expression type grows the needed capabilities, it can replace this type. Note that for some cases, this replaces compile-time checks by runtime checks (which should never trigger). In other cases the classes really needed both interfaces, so the new variant is a better fit.	2021-08-26 14:11:42 +03:00
Avi Kivity	acf8da2bce	Merge "flat_mutation_reader: keep timeout in permit" from Benny " This series moves the timeout parameter, that is passed to most f_m_r methods, into the reader_permit. This eliminates the need to pass the timeout around, as it's taken from the permit when needed. The permit timeout is updated in certain cases when the permit/reader is paused and retrieved later on for reuse. Following are perf_simple_query results showing ~1% reduction in insns/op and corresponding increase in tps. $ build/release/test/perf/perf_simple_query -c 1 --operations-per-shard 1000000 --task-quota-ms 10 Before: 102500.38 tps ( 75.1 allocs/op, 12.1 tasks/op, 45620 insns/op) After: 103957.53 tps ( 75.1 allocs/op, 12.1 tasks/op, 45372 insns/op) Test: unit(dev) DTest: repair_additional_test.py:RepairAdditionalTest.repair_abort_test (release) materialized_views_test.py:TestMaterializedViews.remove_node_during_mv_insert_3_nodes_test (release) materialized_views_test.py:InterruptBuildProcess.interrupt_build_process_with_resharding_half_to_max_test (release) migration_test.py:TTLWithMigrate.big_table_with_ttls_test (release) " * tag 'reader_permit-timeout-v6' of github.com:bhalevy/scylla: flat_mutation_reader: get rid of timeout parameter reader_concurrency_semaphore: use permit timeout for admission reader_concurrency_semaphore: adjust reactivated reader timeout multishard_mutation_query: create_reader: validate saved reader permit repair: row_level: read_mutation_fragment: set reader timeout flat_mutation_reader: maybe_timed_out: use permit timeout test: sstable_datafile_test: add sstable_reader_with_timeout reader_permit: add timeout member	2021-08-25 17:51:10 +03:00
Raphael S. Carvalho	a4053dbb72	repair: Postpone data segregation to off-strategy compaction With data segregation on repair, thousands of sstables are potentially added to maintenance set which causes high latency due to stalls. That's because N*M sstables are created by a repair, where N = # of ranges and M = # of segregations For TWCS, M = # of windows. Assuming N = 768 and M = 20, ~15k sstables end up in sstable set To fix this problem, let's avoid performing data segregation in repair, as offstrategy will already perform the segregation anyway. So from now on, only N non-overlapping sstables will be added to set. Read amplification isn't affected because a query will only touch one sstable in maintenance set. When offstrategy starts, it will pick all sstables from set and compact them in a single step while performing data segregation, so data is properly laid out before integrated into the main set. tests: - sstable_compaction_test.twcs_reshape_with_disjoint_set_test - mode(dev) - manual test using repair-based bootstrap Fixes #9199. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210824185043.76475-1-raphaelsc@scylladb.com>	2021-08-25 15:31:38 +03:00
Pavel Emelyanov	b012040a76	mutation: Keep range tombstone in tree when consuming Current code std::move()-s the range tombstone into consumer thus moving the tombstone's linkage to the containing list as well. As the result the orignal range tombstone itself leaks as it leaves the tree and cannot be reached on .clear(). Another danger is that the iterator pointing to the tombstone becomes invalid while it's then ++-ed to advance to the next entry. The immediate fix is to keep the tombstone linked to the list while moving. fixes: #9207 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210825100834.3216-1-xemul@scylladb.com>	2021-08-25 13:25:18 +03:00
Botond Dénes	6df77e350a	mutation_fragment{_v2}: MutationFragmentConsumer: allow for abstract consumer Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210825083244.436274-1-bdenes@scylladb.com>	2021-08-25 13:12:41 +03:00
Avi Kivity	993f824cfd	Merge "raft: implement linearisable reads on a follower" from Gleb and Kostja " This series implements section 6.4 of the Raft PhD. It allows to do linearisable reads on a follower bypassing raft log entirely. After this series server::read_barrier can be executed on a follower as well as leader and after it completes local user's state machine state can be accessed directly. " * 'raft-read-v9' of github.com:scylladb/scylla-dev: raft: test: add read_barrier test to replication_test raft: test: add read_barrier tests to fsm_test raft: make read_barrier work on a follower as well as on a leader raft: add a function to wait for an index to be applied raft: (server) add a helper to wait through uncertainty period raft: make fsm::current_leader() public raft: add hasher for raft::internal::tagged_uint64 serialize: add serialized for std::monostate raft: fix indentation in applier_fiber	2021-08-25 13:11:35 +03:00
Gleb Natapov	3ff6f76cef	raft: test: add read_barrier test to replication_test	2021-08-25 08:57:13 +03:00
Gleb Natapov	ad2c2abcb8	raft: test: add read_barrier tests to fsm_test	2021-08-25 08:57:13 +03:00
Gleb Natapov	03a266d73b	raft: make read_barrier work on a follower as well as on a leader This patch implements RAFT extension that allows to perform linearisable reads by accessing local state machine. The extension is described in section 6.4 of the PhD. To sum it up to perform a read barrier on a follower it needs to asks a leader the last committed index that it knows about. The leader must make sure that it is still a leader before answering by communicating with a quorum. When follower gets the index back it waits for it to be applied and by that completes read_barrier invocation. The patch adds three new RPC: read_barrier, read_barrier_reply and execute_read_barrier_on_leader. The last one is the one a follower uses to ask a leader about safe index it can read. First two are used by a leader to communicate with a quorum.	2021-08-25 08:57:13 +03:00
Gleb Natapov	73af7edc78	raft: add a function to wait for an index to be applied	2021-08-25 08:19:25 +03:00
Konstantin Osipov	0429196e06	raft: (server) add a helper to wait through uncertainty period Add a helper to be able to wait until a Raft cluster leader is elected. It can be used to avoid sleeps when it's necessary to forward a request to the leader, but the leader is yet unknown.	2021-08-25 08:19:25 +03:00
Gleb Natapov	376785042f	raft: make fsm::current_leader() public Later patch will call it from server class.	2021-08-25 08:19:25 +03:00
Gleb Natapov	273f753815	raft: add hasher for raft::internal::tagged_uint64 Need it to be able to use tagged_uint64 as a key in an unordered map.	2021-08-25 08:19:25 +03:00
Gleb Natapov	4851d64c68	serialize: add serialized for std::monostate	2021-08-25 08:19:25 +03:00
Gleb Natapov	bd0fd579cf	raft: fix indentation in applier_fiber	2021-08-25 08:19:25 +03:00
Nadav Har'El	cf06b7cd40	test/alternator: correct some typos in comments Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210729125317.1610573-1-nyh@scylladb.com>	2021-08-24 19:43:29 +03:00
Avi Kivity	4a42b69ba8	Merge "raft: testing: many nodes test" from Alejo " Factor out replication test, make it work with different clocks, add some features, and add a many nodes test with steady_clock. Also refactor common test helper. Many nodes test passes for release and dev and normal tick of 100ms for up to 1000 servers. For debug mode it's much fewer due to lack of optimizations so it's only tested for smaller numbers. Tests: unit ({dev}), unit ({debug}), unit ({release}) " * 'raft-many-22-v12' of https://github.com/alecco/scylla: (21 commits) raft: candidate timeout proportional to cluster size raft: testing: many nodes test raft: replication test: remove unused tick_all raft: replication test: delays raft: replication test: packet drop rpc helper raft: replication test: connectivity configuration raft: replication test: rpc network map in raft_cluster raft: replication test: use minimum granularity raft: replication test: minor: rename local to int ids raft: replication test: fix restart_tickers when partitioning raft: replication test: partition ranges raft: replication test: isolate one server raft: replication test: move objects out of header raft: replication test: make dummy command const raft: replication test: template clock type raft: replication test: tick delta inside raft_cluster raft: replication test: style - member initializer raft: replication test: move common code out raft: testing: refactor helper raft: log election stages ...	2021-08-24 17:05:05 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	4e3dcfd7d6	reader_concurrency_semaphore: use permit timeout for admission Now that the timeout is stored in the reader permit use it for admission rather than a timeout parameter. Note that evictable_reader::next_partition currently passes db::no_timeout to resume_or_create_reader, which propagated to maybe_wait_readmission, but it seems to be an oversight of the f_m_r api that doesn't pass a timeout to next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	9b0b13c450	reader_concurrency_semaphore: adjust reactivated reader timeout Update the reader's timeout where needed after unregistering inactive_read. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	605a1e6943	multishard_mutation_query: create_reader: validate saved reader permit Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	eeab5f77d9	repair: row_level: read_mutation_fragment: set reader timeout The timeout needs to be propagated to the reader's permit. Reset it to db::no_timeout in repair_reader::pause(). Warn if set_timeout asks to change the timeout too far into the past (100ms). It is possible that it will be passed a past timeout from the rcp path, where the message timeout is applied (as duration) over the local lowres_clock time and parallel read_data messages that share the query may end up having close, but different timeout values. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:40 +03:00
Benny Halevy	f25aabf1b2	flat_mutation_reader: maybe_timed_out: use permit timeout Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Benny Halevy	46fb7fe68e	test: sstable_datafile_test: add sstable_reader_with_timeout Verify that the sstable reader (for the highest supported version) times out properly. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00

1 2 3 4 5 ...

28001 Commits