scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Kefu Chai	5fa459bd1a	treewide: do not include unused header since #13452, we switched most of the caller sites from std::regex to boost::regex. in this change, all occurences of `#include <regex>` are dropped unless std::regex is used in the same source file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13765	2023-05-07 19:01:29 +03:00
Kefu Chai	c76486c508	build: only apply -Wno-parentheses-equality to ANTLR generated sources it turns out the only places where we have compiler warnings of -W-parentheses-equality is the source code generated by ANTLR. strictly speaking, this is valid C++ code, just not quite readable from the hygienic point of view. so let's enable this warning in the source tree, but only disable it when compiling the sources generated by ANTLR. please note, this warning option is supported by both GCC and Clang, so no need to test if it is supported. for a sample of the warnings, see: ``` /home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality] if ( (LA4_0 == '$')) ~~~~~~^~~~~~ /home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: note: remove extraneous parentheses around the comparison to silence this warning if ( (LA4_0 == '$')) ~ ^ ~ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-05-04 11:16:27 +08:00
Nadav Har'El	57ffbcbb22	cql3: fix spurious token names in syntax error messages We have known for a long time (see issue #1703) that the quality of our CQL "syntax error" messages leave a lot to be desired, especially when compared to Cassandra. This patch doesn't yet bring us great error messages with great context - doing this isn't easy and it appears that Antlr3's C++ runtime isn't as good as the Java one in this regard - but this patch at least fixes garbage printed in some error messages. Specifically, when the parser can deduce that a specific token is missing, it used to print line 1:83 missing ')' at '<missing ' After this patch we get rid of the meaningless string '<missing ': line 1:83 : Missing ')' Also, when the parser deduced that a specific token was unneeded, it used to print: line 1:83 extraneous input ')' expecting <invalid> Now we got rid of this silly "<invalid>" and write just: line 1:83 : Unexpected ')' Refs #1703. I didn't yet marked that issue "fixed" because I think a complete fix would also require printing the entire misparsed line and the point of the parse failure. Scylla still prints a generic "Syntax Error" in most cases now, and although the character number (83 in the above example) can help, it's much more useful to see the actual failed statement and where character 83 is. Unfortunately some tests enshrine buggy error messages and had to be fixed. Other tests enshrined strange text for a generic unexplained error message, which used to say " : syntax error..." (note the two spaces and elipses) and after this patch is " : Syntax error". So these tests are changed. Another message, "no viable alternative at input" is deliberately kept unchanged by this patch so as not to break many more tests which enshrined this message. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13731	2023-05-02 11:23:58 +03:00
Wojciech Mitros	b18c21147f	cql: check if the keyspace is system when altering permissions Currently, when altering permissions on a functions resource, we only check if it's a builtin function and not if it's all functions in the "system" keyspace, which contains all builtin functions. This patch adds a check of whether the function resource keyspace is "system". This check actually covers both "single function" and "all functions in keyspace" cases, so the additional check for single functions is removed. Closes #13596	2023-05-02 10:13:59 +03:00
Kefu Chai	108f20c684	cql3: capture reference to temporary value by value `data_dictionary::database::find_keyspace()` returns a temporary object, and `data_dictionary::keyspace::user_types()` returns a references pointing to a member of this temporary object. so we cannot use the reference after the expression is evaluated. in this change, we capture the return value of `find_keyspace()` using universal reference, and keep the return value of `user_types()` with a reference, to ensure us that we can use it later. this change silences the warning from GCC-13, like: ``` /home/kefu/dev/scylladb/cql3/statements/authorization_statement.cc:68:21: error: possibly dangling reference to a temporary [-Werror=dangling-reference] 68 \| const auto& utm = qp.db().find_keyspace(*keyspace).user_types(); \| ^~~ ``` Fixes #13725 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13726	2023-05-01 22:41:41 +03:00
Kefu Chai	0a3a254284	cql3: do not capture reference to temporary value `data_dictionary::database::find_column_family()` return a temporary value, and `data_dictionary::table::get_index_manager()` returns a reference in this temporary value, so we cannot capture this reference and use it after the expression is evaluated. in this change, we keep the return value of `find_column_family()` by value, to extend the lifecycle of the return value of `get_index_manager()`. this should address the warning from GCC-13, like: ``` /home/kefu/dev/scylladb/cql3/restrictions/statement_restrictions.cc:519:15: error: possibly dangling reference to a temporary [-Werror=dangling-reference] 519 \| auto& sim = db.find_column_family(_schema).get_index_manager(); \| ^~~ ``` Fixes #13727 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13728	2023-05-01 22:39:48 +03:00
Nadav Har'El	1cefb662cd	Merge 'cql3/expr: remove expr::token' from Jan Ciołek Let's remove `expr::token` and replace all of its functionality with `expr::function_call`. `expr::token` is a struct whose job is to represent a partition key token. The idea is that when the user types in `token(p1, p2) < 1234`, this will be internally represented as an expression which uses `expr::token` to represent the `token(p1, p2)` part. The situation with `expr::token` is a bit complicated. On one hand side it's supposed to represent the partition token, but sometimes it's also assumed that it can represent a generic call to the `token()` function, for example `token(1, 2, 3)` could be a `function_call`, but it could also be `expr::token`. The query planning code assumes that each occurence of expr::token represents the partition token without checking the arguments. Because of this allowing `token(1, 2, 3)` to be represented as `expr::token` is dangerous - the query planning might think that it is `token(p1, p2, p3)` and plan the query based on this, which would be wrong. Currently `expr::token` is created only in one specific case. When the parser detects that the user typed in a restriction which has a call to `token` on the LHS it generates `expr::token`. In all other cases it generates an `expr::function_call`. Even when the `function_call` represents a valid partition token, it stays a `function_call`. During preparation there is no check to see if a `function_call` to `token` could be turned into `expr::token`. This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented as `expr::token` and the query planner handles that, but sometimes it might be represented as `function_call`, which the query planner doesn't handle. There is also a problem because there's a lot of code duplication between a `function_call` and `expr::token`. All of the evaluation and preparation is the same for `expr::token` as it's for a `function_call` to the token function. Currently it's impossible to evaluate `expr::token` and preparation has some flaws, but implementing it would basically consist of copy-pasting the corresponding code from token `function_call`. One more aspect is multi-table queries. With `expr::token` we turn a call to the `token()` function into a struct that is schema-specific. What happens when a single expression is used to make queries to multiple tables? The schema is different, so something that is represented as `expr::token` for one schema would be represented as `function_call` in the context of a different schema. Translating expressions to different tables would require careful manipulation to convert `expr::token` to `function_call` and vice versa. This could cause trouble for index queries. Overall I think it would be best to remove `expr::token`. Although having a clear marker for the partition token is sometimes nice for query planning, in my opinion the pros are outweighted by the cons. I'm a big fan of having a single way to represent things, having two separate representations of the same thing without clear boundaries between them causes trouble. Instead of having both `expr::token` and `function_call` we can just have the `function_call` and check if it represents a partition token when needed. Refs: #12906 Refs: #12677 Closes: #12905 Closes #13480 * github.com:scylladb/scylladb: cql3: remove expr::token cql3: keep a schema in visitor for extract_clustering_prefix_restrictions cql3: keep a schema inside the visitor for extract_partition_range cql3/prepare_expr: make get_lhs_receiver handle any function_call cql3/expr: properly print token function_call expr_test: use unresolved_identifier when creating token cql3/expr: split possible_lhs_values into column and token variants cql3/expr: fix error message in possible_lhs_values cql3: expr: reimplement is_satisfied_by() in terms of evaluate() cql3/expr: add a schema argument to expr::replace_token cql3/expr: add a comment for expr::has_partition_token cql3/expr: add a schema argument to expr::has_token cql3: use statement_restrictions::has_token_restrictions() wherever possible cql3/expr: add expr::is_partition_token_for_schema cql3/expr: add expr::is_token_function cql3/expr: implement preparing function_call without a receiver cql3/functions: make column family argument optional in functions::get cql3/expr: make it possible to prepare expr::constant cql3/expr: implement test_assignment for column_value cql3/expr: implement test_assignment for expr::constant	2023-04-30 15:31:35 +03:00
Jan Ciolek	be8ef63bf5	cql3: remove expr::token Let's remove expr::token and replace all of its functionality with expr::function_call. expr::token is a struct whose job is to represent a partition key token. The idea is that when the user types in `token(p1, p2) < 1234`, this will be internally represented as an expression which uses expr::token to represent the `token(p1, p2)` part. The situation with expr::token is a bit complicated. On one hand side it's supposed to represent the partition token, but sometimes it's also assumed that it can represent a generic call to the token() function, for example `token(1, 2, 3)` could be a function_call, but it could also be expr::token. The query planning code assumes that each occurence of expr::token represents the partition token without checking the arguments. Because of this allowing `token(1, 2, 3)` to be represented as expr::token is dangerous - the query planning might think that it is `token(p1, p2, p3)` and plan the query based on this, which would be wrong. Currently expr::token is created only in one specific case. When the parser detects that the user typed in a restriction which has a call to `token` on the LHS it generates expr::token. In all other cases it generates an `expr::function_call`. Even when the `function_call` represents a valid partition token, it stays a `function_call`. During preparation there is no check to see if a `function_call` to `token` could be turned into `expr::token`. This is a bit inconsistent - sometimes `token(p1, p2, p3)` is represented as `expr::token` and the query planner handles that, but sometimes it might be represented as `function_call`, which the query planner doesn't handle. There is also a problem because there's a lot of duplication between a `function_call` and `expr::token`. All of the evaluation and preparation is the same for `expr::token` as it's for a `function_call` to the token function. Currently it's impossible to evaluate `expr::token` and preparation has some flaws, but implementing it would basically consist of copy-pasting the corresponding code from token `function_call`. One more aspect is multi-table queries. With `expr::token` we turn a call to the `token()` function into a struct that is schema-specific. What happens when a single expression is used to make queries to multiple tables? The schema is different, so something that is representad as `expr::token` for one schema would be represented as `function_call` in the context of a different schema. Translating expressions to different tables would require careful manipulation to convert `expr::token` to `function_call` and vice versa. This could cause trouble for index queries. Overall I think it would be best to remove expr::token. Although having a clear marker for the partition token is sometimes nice for query planning, in my opinion the pros are outweighted by the cons. I'm a big fan of having a single way to represent things, having two separate representations of the same thing without clear boundaries between them causes trouble. Instead of having expr::token and function_call we can just have the function_call and check if it represents a partition token when needed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:11:31 +02:00
Jan Ciolek	6e0ae59c5a	cql3: keep a schema in visitor for extract_clustering_prefix_restrictions The schema will be needed once we remove expr::token and switch to using expr::is_partition_token_for_schema, which requires a schema arguments. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:11:31 +02:00
Jan Ciolek	551135e83f	cql3: keep a schema inside the visitor for extract_partition_range The schema will be needed once we remove expr::token and switch to using expr::is_partition_token_for_schema, which requires a schema arguments. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:11:30 +02:00
Jan Ciolek	16bc1c930f	cql3/prepare_expr: make get_lhs_receiver handle any function_call get_lhs_receiver looks at the prepared LHS of a binary operator and creates a receiver corresponding to this LHS expression. This receiver is later used to prepare the RHS of the binary operator. It's able to handle a few expression types - the ones that are currently allowed to be on the LHS. One of those types is `expr::token`, to handle restrictions like `token(p1, p2) = 3`. Soon token will be replaced by `expr::function_call`, so the function will need to handle `function_calls` to the token function. Although we expect there to be only calls to the `token()` function, as other functions are not allowed on the LHS, it can be made generic over all function calls, which will help in future grammar extensions. The functions call that it can currently get are calls to the token function, but they're not validated yet, so it could also be something like `token(pk, pk, ck)`. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:53 +02:00
Jan Ciolek	d3a958490e	cql3/expr: properly print token function_call Printing for function_call is a bit strange. When printing an unprepared function it prints the name and then the arguments. For prepared function it prints <anonymous function> as the name and then the arguments. Prepared functions have a name() method, but printing doesn't use it, maybe not all functions have a valid name(?). The token() function will soon be represent as a function_call and it should be printable in a user-readable way. Let's add an if which prints `token(arg1, arg2)` instead of `<anonymous function>(arg1, arg2)` when printing a call to the token function. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:53 +02:00
Jan Ciolek	096efc2f38	cql3/expr: split possible_lhs_values into column and token variants The possible_lhs_values takes an expression and a column and finds all possible values for the column that make the expression true. Apart from finding column values it's also capable of finding all matching values for the partition key token. When a nullptr column is passed, possible_lhs_values switches into token values mode and finds all values for the token. This interface isn't ideal. It's confusing to pass a nullptr column when one wants to find values for the token. It would be better to have a flag, or just have a separate function. Additionally in the future expr::token will be removed and we will use expr::is_partition_token_for_schema to find all occurences of the partition token. expr::is_partition_token_for_schema takes a schema as an argument, which possible_lhs_values doesn't have, so it would have to be extended to get the schema from somewhere. To fix these two problems let's split possible_lhs_values into two functions - one that finds possible values for a column, which doesn't require a schema, and one that finds possible values for the partition token and requires a schema: value_set possible_column_values(const column_definition* col, const expression& e, const query_options& options); value_set possible_partition_token_values(const expression& e, const query_options& options, const schema& table_schema); This will make the interface cleaner and enable smooth transition once expr::token is removed. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:53 +02:00
Jan Ciolek	f2e5f654f2	cql3/expr: fix error message in possible_lhs_values In possible_lhs_values there was a message talking about is_satisifed_by. It looks like a badly copy-pasted message. Change it to possibel_lhs_values as it should be. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Avi Kivity	dc3c28516d	cql3: expr: reimplement is_satisfied_by() in terms of evaluate() It calls evaluate() internally anyway. There's a scary if () in there talking about tokens, but everything appears to work.	2023-04-29 13:04:52 +02:00
Jan Ciolek	ad5c931102	cql3/expr: add a schema argument to expr::replace_token Just like has_token, replace_token will use expr::is_partition_token_for_schema to find all instance of the partition token to replace. Let's prepare for this change by adding a schema argument to the function before making the big change. It's unsued at the moment, but having a separate commit should make it easier to review. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	d50db32d14	cql3/expr: add a comment for expr::has_partition_token Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	18879aad6f	cql3/expr: add a schema argument to expr::has_token In the future expr::token will be removed and checking whether there is a partition token inside an expression will be done using expr::is_partition_token_for_schema. This function takes a schema as an argument, so all functions that will call it also need to get the schema from somewhere. Right now it's an unused argument, but in the future it will be used. Adding it in a separate commit makes it easier to review. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	90b3b85bd0	cql3: use statement_restrictions::has_token_restrictions() wherever possible The statement_restrictions class has a method called has_token_restriction(). This method checks whether the partition key restrictions contain expr::token. Let's use this function in all applicable places instead of manually calling has_token(). In the future has_token() will have an additional schema argument, so eliminating calls to has_token() will simplify the transition. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:52 +02:00
Jan Ciolek	7af010095e	cql3/expr: add expr::is_partition_token_for_schema Add a function to check whether the expression represents a partition token - that is a call to the token function with consecutive partition key columns as the arguments. For example for `token(p1, p2, p3)` this function would return `true`, but for `token(1, 2, 3)` or `token(p3, p2, p1)` the result would be `false`. The function has a schema argument because a schema is required to get the list of partition columns that should be passed as arguments to token(). Maybe it would be possible to infer the schema from the information given earlier during prepare_expression, but it would be complicated and a bit dangerous to do this. Sometimes we operate on multiple tables and the schema is needed to differentiate between them - a token() call can represent the base table's partition token, but for an index table this is just a normal function call, not the partition token. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:51 +02:00
Jan Ciolek	694d9298aa	cql3/expr: add expr::is_token_function Add a function that can be used to check whether a given expression represents a call to the token() function. Note that a call to token() doesn't mean that the expression represents a partition token - it could be something like token(1, 2, 3), just a normal function_call. The code for checking has been taken from functions::get. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:51 +02:00
Jan Ciolek	f7cac10fe0	cql3/expr: implement preparing function_call without a receiver Currently trying to do prepare_expression(function_call) with a nullptr receiver fails. It should be possible to prepare function calls without a known receiver. When the user types in: `token(1, 2, 3)` the code should be able to figure out that they are looking for a function with name `token`, which takes 3 integers as arguments. In order to support that we need to prepare all arguments that can be prepared before attempting to find a function. Prepared expressions have a known type, which helps to find the right function for the given arguments. Additionally the current code for finding a function requires all arguments to be assignment_testable, which requires to prepare some expression types, e.g column_values. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:04:51 +02:00
Jan Ciolek	15ed83adbc	cql3/functions: make column family argument optional in functions::get The method `functions::get` is used to get the `functions::function` object of the CQL function called using `expr::function_call`. Until now `functions::get` required the caller to pass both the keyspace and the column family. The keyspace argument is always needed, as every CQL function belongs to some keyspace, but the column family isn't used in most cases. The only case where having the column family is really required is the `token()` function. Each variant of the `token()` function belongs to some table, as the arguments to the function are the consecutive partition key columns. Let's make the column family argument optional. In most cases the function will work without information about column family. In case of the `token()` function there's gonna be a check and it will throw an exception if the argument is nullopt. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-29 13:00:01 +02:00
Jan Ciolek	b3d05f3525	cql3/expr: make it possible to prepare expr::constant try_prepare_expression(constant) used to throw an error when trying to prepeare expr::constant. It would be useful to be able to do this and it's not hard to implement. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-28 14:34:59 +02:00
Jan Ciolek	bf36cde29a	cql3/expr: implement test_assignment for column_value Make it possible to do test_assignment for column_values. It's implemented using the generic expression assignment testing function. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-28 14:34:59 +02:00
Jan Ciolek	fd174bda60	cql3/expr: implement test_assignment for expr::constant test_assignment checks whether a value of some type can be assigned to a value of different type. There is no implementation of test_assignment for expr::constant, but I would like to have one. Currently there is a custom implementation of test_assignment for each type of expression, but generally each of them boils down to checking: ``` type1->is_value_compatible_with(type2) ``` Instead of implementing another type-specific funtion I added expresion_test_assignment and used it to implement test_assignment for constant. Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>	2023-04-28 14:34:56 +02:00
Kefu Chai	91f22b0e81	cql3/stats: use zero-initialization use {} instead of {0ul} for zero initialization. as `_query_cnt` is a multi-dimension array, each elements in `_query_cnt` is yet another array. so we cannot initialize it with a `{0ul}`. but to zero-initialize this array, we can just use `{}`, as per https://en.cppreference.com/w/cpp/language/zero_initialization > If T is array type, each element is zero-initialized. so this should recursively zero-initialize all arrays in `_query_cnt`. this change should silence following warning: stats.hh:88:60: error: suggest braces around initialization of subobject [-Werror,-Wmissing-braces] [statements::statement_type::MAX_VALUE + 1] = {0ul}; ^~~ { } Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-28 16:59:29 +08:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Kefu Chai	f5b05cf981	treewide: use defaulted operator!=() and operator==() in C++20, compiler generate operator!=() if the corresponding operator==() is already defined, the language now understands that the comparison is symmetric in the new standard. fortunately, our operator!=() is always equivalent to `! operator==()`, this matches the behavior of the default generated operator!=(). so, in this change, all `operator!=` are removed. in addition to the defaulted operator!=, C++20 also brings to us the defaulted operator==() -- it is able to generated the operator==() if the member-wise lexicographical comparison. under some circumstances, this is exactly what we need. so, in this change, if the operator==() is also implemented as a lexicographical comparison of all memeber variables of the class/struct in question, it is implemented using the default generated one by removing its body and mark the function as `default`. moreover, if the class happen to have other comparison operators which are implemented using lexicographical comparison, the default generated `operator<=>` is used in place of the defaulted `operator==`. sometimes, we fail to mark the operator== with the `const` specifier, in this change, to fulfil the need of C++ standard, and to be more correct, the `const` specifier is added. also, to generate the defaulted operator==, the operand should be `const class_name&`, but it is not always the case, in the class of `version`, we use `version` as the parameter type, to fulfill the need of the C++ standard, the parameter type is changed to `const version&` instead. this does not change the semantic of the comparison operator. and is a more idiomatic way to pass non-trivial struct as function parameters. please note, because in C++20, both operator= and operator<=> are symmetric, some of the operators in `multiprecision` are removed. they are the symmetric form of the another variant. if they were not removed, compiler would, for instance, find ambiguous overloaded operator '=='. this change is a cleanup to modernize the code base with C++20 features. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13687	2023-04-27 10:24:46 +03:00
Botond Dénes	5aaa30b267	Merge 'treewide: stop using std::rel_ops' from Kefu Chai std::rel_ops was deprecated in C++20, as C++20 provides a better solution for defining comparison operators. and all the use cases previously to be addressed by `using namespace std::rel_ops` have been addressed either by `operator<=>` or the default-generated `operator!=`. so, in this series, to avoid using deprecated facilities, let's drop all these `using namespace std::rel_ops`. there are many more cases where we could either use `operator<=>` or the default-generated `operator!=` to simplify the implementation. but here, we care more about `std::rel_ops`, we will drop the most (if not all of them) of the explicitly defined `operator!=` and other comparison operators later. Closes #13676 * github.com:scylladb/scylladb: treewide: do not use std::rel_ops dht: token: s/tri_compare/operator<=>/	2023-04-26 16:49:44 +03:00
Kefu Chai	951457a711	treewide: do not use std::rel_ops std::rel_ops was deprecated in C++20, as C++20 provides a better solution for defining comparison operators. and all the use cases previously to be addressed by `using namespace std::rel_ops` have been addressed either by `operator<=>` or the default-generated `operator!=`. so, in this change, to avoid using deprecated facilities, let's drop all these `using namespace std::rel_ops`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-04-26 14:09:58 +08:00
Kefu Chai	c8aa7295d4	cql3: drop unused function there are two variants of `query_processor::for_each_cql_result()`, both of them perform the pagination of results returned by a CQL statement. the one which accepts a function returning an instant value is not used now. so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13675	2023-04-26 08:43:22 +03:00
Kamil Braun	a29b8cd02b	Merge 'cql3: fix a few misformatted printouts of column names in error messages' from Nadav Har'El Fix a few cases where instead of printing column names in error messages, we printed weird stuff like ASCII codes or the address of the name. Fixes #13657 Closes #13658 * github.com:scylladb/scylladb: cql3: fix printing of column_specification::name in some error messages cql3: fix printing of column_definition::name in some error messages	2023-04-25 14:21:09 +02:00
Kefu Chai	f4016d3289	cql3: coroutinize query_processor::for_each_cql_result Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13621	2023-04-25 09:53:47 +02:00
Nadav Har'El	bd09dc308c	cql3: fix printing of column_specification::name in some error messages column_specification::name is a shared pointer, so it should be dereferenced before printing - because we want to print the name, not the pointer. Fix a few instances of this mistake in prepare_expr.cc. Other instances were already correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-25 10:46:56 +03:00
Nadav Har'El	4eabb3f429	cql3: fix printing of column_definition::name in some error messages Printing a column_definition::name() in an error message is wrong, because it is "bytes" and printed as hexadecimal ASCII codes :-( Some error messages in cql3/operation.cc incorrectly used name() and should be changed to name_as_text(), as was correctly done in a few other error messages in the same file. This patch also fixes a few places in the test/cql approval tests which "enshrined" the wrong behavior - printing things like 666c697374696e74 in error messages - and now needs to be fixed for the right behavior. Fixes #13657 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-25 10:46:47 +03:00
Tomasz Grabiec	41e69836fd	db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	5b046043ea	migration_manager: Make prepare_keyspace_drop_announcement() return a future<> It will be extended with listener notification firing, which is an async operation.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	e4865bd4d1	dht, storage_proxy: Abstract token space splitting Currently, scans are splitting partition ranges around tokens. This will have to change with tablets, where we should split at tablet boundaries. This patch introduces token_range_splitter which abstracts this task. It is provided by effective_replication_map implementation.	2023-04-24 10:49:36 +02:00
Kefu Chai	ca6ebbd1f0	cql3, db: sstable: specialize fmt::formatter<function_name> this is a part of a series to migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print `function_name` without the help of `operator<<`. the corresponding `operator<<()` are dropped dropped in this change, as all its callers are now using fmtlib for formatting now. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13608	2023-04-21 10:07:28 +03:00
Kefu Chai	ecb5380638	treewide: s/boost::lexical_cast<std::string>/fmt::to_string()/ this change replaces all occurrences of `boost::lexical_cast<std::string>` in the source tree with `fmt::to_string()`. for couple reasons: * `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`, so the latter is easier to parse and read. * `boost::lexical_cast<std::string>` creates a stringstream under the hood, so it can use the `operator<<` to stringify the given object. but stringstream is known to be less performant than fmtlib. * we are migrating to fmtlib based formatting, see #13245. so using `fmt::to_string()` helps us to remove yet another dependency on `operator<<`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13611	2023-04-21 09:43:53 +03:00
Botond Dénes	d828cfcb23	Merge 'db, cql3: functions: switch argument passing to std::span' from Avi Kivity Database functions currently receive their arguments as an std::vector. This is inflexible (for example, one cannot use small_vector to reduce allocations). This series adapts the function signature to accept parameters using std::span. Some changes in the keys interface are needed to support this. Lastly, one call site is migrated to small_vector. This is in support of changing selectors to use expressions. Closes #13581 * github.com:scylladb/scylladb: cql3: abstract_function_selector: use small_vector for argument buffer db, cql3: functions: pass function parameters as a span instead of a vector keys: change from_optional_exploded to accept a span instead of a vector	2023-04-21 06:49:07 +03:00
Avi Kivity	9fb5443f87	cql3: abstract_function_selector: use small_vector for argument buffer abstract_function_selector uses a preallocated vector to store the arguments to aggregate functions, to prevent an allocation for every row. Use small_vector to prevent an allocation per query, if the number of arguments happens to be small. This isn't expected to make a significant performance difference.	2023-04-19 20:42:25 +03:00
Avi Kivity	3e0aacc8b5	db, cql3: functions: pass function parameters as a span instead of a vector Spans are more flexible and can be constructed from any contiguous container (such as small_vector), or a subrange of such a container. This can save allocations, so change the signature to accept a span. Spans cannot be constructed from std::initializer_list, so one such call site is changed to use construct a span directly from the single argument.	2023-04-19 20:38:55 +03:00
Nadav Har'El	81e0f5b581	cql3: allow SUM() aggregation to result in a NaN When floating-point data contains +Inf and -Inf, the sum is NaN. Our SUM() aggregation calculated this sum correctly, but then instead of returning it, complained that the sum overflowed by narrowing. This was a false positive: The sum() finalizer wanted to test that no precision was lost when casting the accumulator to the result type, so checked that the result before and after the cast are the same. But specifically for NaN, it is never equal to anything - not even to itself. This check is wrong for floating point, but moreover - isn't even necessary when the two types (accumulator type and result type) are identical so in this patch we skip it in this case. Note that in the current code, a different accumulator and result type is only used in the case of integer types; When accumulating floating point sums, the same type is used, so the broken check will be avoided. The test for this issue starts to pass with this patch, so the xfail tag is removed. Fixes #13551 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2023-04-19 09:31:41 +03:00
Kefu Chai	c580e30ec7	cql3: expr: return more accurate error message for invalidated token() args before this change, we just print out the addresses of the elements in `column_defs`, if the arguments passed to `token()` function are not valid. this is not quite helpful from the user's perspective. as user would be more interested in the values. also, we could print more accurate error message for different error. in this change, following Cassandra 4.1's behavior, three cases are identified, and corresponding errors are returned respectively: * duplicated partition keys * wrong order of partition key * missing keys where, if the partition key order is wrong, instead of printing the keys specified by user, the correct order is printed in the error message for helping user to correct the `token()` function. for better performance, the checks are performed only if the keys do not match, based on the assumption that the error handling path is not likely to be executed. tests are added accordingly. they tested with Canssandra 4.1.1 also. Fixes #13468 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13470	2023-04-14 11:46:18 +03:00
Botond Dénes	de402878e4	cql3: s/std::regex/boost::regex/ The former is prone to producing stack-overflow as it uses recursion in it match implementation. The migration is entirely mechanical.	2023-04-06 09:50:32 -04:00
Pavel Emelyanov	7d6ab5c84d	code: Remove some headers from query_processor.hh The forward_service.hh and raft_group0_client.hh can be replaced with forward declarations. Few other files need their previously indirectly included headers back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13384	2023-03-31 07:08:41 +03:00
Pavel Emelyanov	92318fdeae	Merge 'Initialize Wasm together with query_processor' from Wojciech Mitros The wasm engine is moved from replica::database to the query_processor. The wasm instance cache and compilation thread runner were already there, but now they're also initialized in the query_processor constructor. By moving the initialization to the constructor, we can now be certain that all wasm-related objects (wasm instance cache, compilation thread runner, and wasm engine, which was already passed in the constructor) are initialized when we try to use them because we have to use the query processor to access them anyway. The change is also motivated by the fact that we're planning to take Wasm UDFs out of experimental, after which they should stop getting special treatment. Closes #13311 * github.com:scylladb/scylladb: wasm: move wasm initialization to query_processor constructor wasm: return wasm instance cache as a reference instead of a pointer wasm: move wasm engine to query_processor	2023-03-30 14:30:23 +03:00
Nadav Har'El	59ab9aac44	Merge 'functions: reframe aggregate functions in terms of scalar functions' from Avi Kivity Currently, aggregate functions are implemented in a statefull manner. The accumulator is stored internally in an aggregate_function::aggregate, requiring each query to instantiate new instances (see aggregate_function_selector's constructor, and note how it's called from selector::new_instance()). This makes aggregates hard to use in expressions, since expressions are stateless (with state only provided to evaluate()). To facilitate migration towards stateless expressions, we define a stateless_aggregate_function (modeled after user-defined aggregates, which are already stateless). This new struct defines the aggregate in terms of three scalar functions: one to aggregate a new input into an accumulator (provided in the first parameter), one to finalize an accumulator into a result, and one to reduce two accumulators for parallelized aggregation. All existing native aggregate functions are converted to the new model, and the old interface is removed. This series does not yet convert selectors to expressions, but it does remove one of the obstacles. Performance evaluation: I created a table with a million ints on a single-node cluster, and ran the avg() function on them. I measured the number of instructions executed with `perf stat -p $(pgrep scylla) -e instructions` while the query was running. The query executed from cache, memtables were flushed beforehand. The instruction count per row increased from roughly 49k to roughly 52k, indicating 3k extra instructions per row. While 3k instructions to execute a function is huge, it is currently dwarfed by other overhead (and will be even less important in a cluster where it CL>1 will cause non-coordinator code to run multiple times). Closes #13105 * github.com:scylladb/scylladb: cql3/selection, forward_service: use use stateless_aggregate_function directly db: functions: fold stateless_aggregate_function_adapter into aggregate_function cql3: functions: simplify accumulator_for template cql3: functions: base user-defined aggregates on stateless aggregates cql3: functions: drop native_aggregate_function cql3: functions: reimplement count(column) statelessly cql3: functions: reimplement avg() statelessly cql3: functions: reimplement sum() statelessly cql3: functions: change wide accumulator type to varint cql3: functions: unreverse types for min/max cql3: functions: rename make_{min,max}_dynamic_function cql3: functions: reimplement min/max statelessly cql3: functions: reimplement count(*) statelessly cql3: functions: simplify creating native functions even more cql3: functions: add helpers for automating marshalling for scalar functions types: fix big_decimal constructor from literal 0 cql3: functions: add helper class for internal scalar functions db: functions: add stateless aggregate functions db, cql3: move scalar_function from cql3/functions to db/functions	2023-03-30 13:58:47 +03:00

1 2 3 4 5 ...

3126 Commits