scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Gleb Natapov	4ffc39d885	cql3: Extend the scope of group0_guard during DDL statement execution Currently we hold group0_guard only during DDL statement's execute() function, but unfortunately some statements access underlying schema state also during check_access() and validate() calls which are called by the query_processor before it calls execute. We need to cover those calls with group0_guard as well and also move retry loop up. This patch does it by introducing new function to cql_statement class take_guard(). Schema altering statements return group0 guard while others do not return any guard. Query processor takes this guard at the beginning of a statement execution and retries if service::group0_concurrent_modification is thrown. The guard is passed to the execute in query_state structure. Fixes: #13942 Message-ID: <ZNsynXayKim2XAFr@scylladb.com>	2023-08-17 15:52:48 +03:00
Avi Kivity	d57a951d48	Revert "cql3: Extend the scope of group0_guard during DDL statement execution" This reverts commit `70b5360a73`. It generates a failure in group0_test .test_concurrent_group0_modifications in debug mode with about 4% probability. Fixes #15050	2023-08-15 00:26:45 +03:00
Gleb Natapov	70b5360a73	cql3: Extend the scope of group0_guard during DDL statement execution Currently we hold group0_guard only during DDL statement's execute() function, but unfortunately some statements access underlying schema state also during check_access() and validate() calls which are called by the query_processor before it calls execute. We need to cover those calls with group0_guard as well and also move retry loop up. This patch does it by introducing new function to cql_statement class take_guard(). Schema altering statements return group0 guard while others do not return any guard. Query processor takes this guard at the beginning of a statement execution and retries if service::group0_concurrent_modification is thrown. The guard is passed to the execute in query_state structure. Fixes: #13942 Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>	2023-08-13 14:19:39 +03:00
Kefu Chai	565f5c7380	transport: correct format string when printing logging message we print the stream id in the logging messages, but in this case, we forgot to pass `stream` to `log::debug()`. but the placeholder for `stream` was added. if the underlying fmtlib actually formats the argument with the format string, it would throw. fortunately, we don't enable debug level logging often, guess that's why we haven't spotted this issue yet. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14620	2023-07-13 11:21:43 +03:00
Calle Wilund	20e9619bb1	transport: Try to do early, transport based auth if possible Bypassing the need for an AUTH message+response. I.e. do auth _without_ client having login specified.	2023-06-26 15:00:21 +00:00
Kefu Chai	c3d91f5190	tracing: drop trace(.., std::string&&) overload this change is a follow-up of `4f5fcb02fd`, the goal is to avoid the programming oversights like ```c++ trace(trace_ptr, "foo {} with {} but {} is {}"); ``` as `trace(const trace_state_ptr& p, const std::string& msg)` is a better match than the templated one, i.e., `trace(const trace_state_ptr& p, fmt::format_string<T...> fmt, T&&... args)`. so we cannot detect this with the compile-time format checking. so let's just drop this overload, and update its callers to use the other overload. The change was suggested by Avi. the example also came from him. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14188	2023-06-10 20:09:35 +03:00
Avi Kivity	26c8470f65	treewide: use #include <seastar/...> for seastar headers We treat Seastar as an external library, so fix the few places that didn't do so to use angle brackets. Closes #14037	2023-06-06 08:36:09 +03:00
Avi Kivity	42a1ced73b	cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt The expression system uses managed_bytes_opt for values, but result_set uses bytes_opt. This means that processing values from the result set in expressions requires a copy. Out of the two, managed_bytes_opt is the better choice, since it prevents large contiguous allocations for large blobs. So we switch result_set to use managed_bytes_opt. Users of the result_set API are adjusted. The db::function interface is not modified to limit churn; instead we convert the types on entry and exit. This will be adjusted in a following patch.	2023-05-07 17:17:36 +03:00
Kefu Chai	b76877fd99	transport: capture reference to temp value by value `current_scheduling_group()` returns a temporary value, and `name()` returns a reference, so we cannot capture the return value by reference, and use the reference after this expression is evaluated. this would cause undefined behavior. so let's just capture it by value. this change also silence following warning from GCC-13: ``` /home/kefu/dev/scylladb/transport/server.cc:204:11: error: possibly dangling reference to a temporary [-Werror=dangling-reference] 204 \| auto& cur_sg_name = current_scheduling_group().name(); \| ^~~~~~~~~~~ /home/kefu/dev/scylladb/transport/server.cc:204:56: note: the temporary was destroyed at the end of the full expression ‘seastar::current_scheduling_group().seastar::scheduling_group::name()’ 204 \| auto& cur_sg_name = current_scheduling_group().name(); \| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~ ``` Fixes #13719 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13724	2023-05-01 22:40:36 +03:00
Kamil Braun	30cc07b40d	Merge 'Introduce tablets' from Tomasz Grabiec This PR introduces an experimental feature called "tablets". Tablets are a way to distribute data in the cluster, which is an alternative to the current vnode-based replication. Vnode-based replication strategy tries to evenly distribute the global token space shared by all tables among nodes and shards. With tablets, the aim is to start from a different side. Divide resources of replica-shard into tablets, with a goal of having a fixed target tablet size, and then assign those tablets to serve fragments of tables (also called tablets). This will allow us to balance the load in a more flexible manner, by moving individual tablets around. Also, unlike with vnode ranges, tablet replicas live on a particular shard on a given node, which will allow us to bind raft groups to tablets. Those goals are not yet achieved with this PR, but it lays the ground for this. Things achieved in this PR: - You can start a cluster and create a keyspace whose tables will use tablet-based replication. This is done by setting `initial_tablets` option: ``` CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy', 'replication_factor': 3, 'initial_tablets': 8}; ``` All tables created in such a keyspace will be tablet-based. Tablet-based replication is a trait, not a separate replication strategy. Tablets don't change the spirit of replication strategy, it just alters the way in which data ownership is managed. In theory, we could use it for other strategies as well like EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy is augmented to support tablets. - You can create and drop tablet-based tables (no DDL language changes) - DML / DQL work with tablet-based tables Replicas for tablet-based tables are chosen from tablet metadata instead of token metadata Things which are not yet implemented: - handling of views, indexes, CDC created on tablet-based tables - sharding is done using the old method, it ignores the shard allocated in tablet metadata - node operations (topology changes, repair, rebuild) are not handling tablet-based tables - not integrated with compaction groups - tablet allocator piggy-backs on tokens to choose replicas. Eventually we want to allocate based on current load, not statically Closes #13387 * github.com:scylladb/scylladb: test: topology: Introduce test_tablets.py raft: Introduce 'raft_server_force_snapshot' error injection locator: network_topology_strategy: Support tablet replication service: Introduce tablet_allocator locator: Introduce tablet_aware_replication_strategy locator: Extract maybe_remove_node_being_replaced() dht: token_metadata: Introduce get_my_id() migration_manager: Send tablet metadata as part of schema pull storage_service: Load tablet metadata when reloading topology state storage_service: Load tablet metadata on boot and from group0 changes db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata() migration_notifier: Introduce before_drop_keyspace() migration_manager: Make prepare_keyspace_drop_announcement() return a future<> test: perf: Introduce perf-tablets test: Introduce tablets_test test: lib: Do not override table id in create_table() utils, tablets: Introduce external_memory_usage() db: tablets: Add printers db: tablets: Add persistence layer dht: Use last_token_of_compaction_group() in split_token_range_msb() locator: Introduce tablet_metadata dht: Introduce first_token() dht: Introduce next_token() storage_proxy: Improve trace-level logging locator: token_metadata: Fix confusing comment on ring_range() dht, storage_proxy: Abstract token space splitting Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries" db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms() db: Introduce get_non_local_vnode_based_strategy_keyspaces() service: storage_proxy: Avoid copying keyspace name in write handler locator: Introduce per-table replication strategy treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type locator: Introduce effective_replication_map locator: Rename effective_replication_map to vnode_effective_replication_map locator: effective_replication_map: Abstract get_pending_endpoints() db: Propagate feature_service to abstract_replication_strategy::validate_options() db: config: Introduce experimental "TABLETS" feature db: Log replication strategy for debugging purposes db: Log full exception on error in do_parse_schema_tables() db: keyspace: Remove non-const replication strategy getter config: Reformat	2023-04-27 09:40:18 +02:00
Botond Dénes	3e92bcaa20	Merge 'utils: redesign reusable_buffer' from Michał Chojnowski Common compression libraries work on contiguous buffers. Contiguous buffers are a problem for the allocator. However, as long as they are short-lived, we can avoid the expensive allocations by reusing buffers across tasks. This idea is already applied to the compression of CQL frames, but with some deficiencies. `utils: redesign reusable_buffer` attempts to improve upon it in a few ways. See its commit message for an extended discussion. Compression buffer reuse also happens in the zstd SSTable compressor, but the implementation is misguided. Every `zstd_processor` instance reuses a buffer, but each instance has its own buffer. This is very bad, because a healthy database might have thousands of concurrent instances (because there is one for each sstable reader). Together, the buffers might require gigabytes of memory, and the reuse actually increases memory pressure significantly, instead of reducing it. `zstd: share buffers between compressor instances` aims to improve that by letting a single buffer be shared across all instances on a shard. Closes #13324 * github.com:scylladb/scylladb: zstd: share buffers between compressor instances utils: redesign reusable_buffer	2023-04-27 09:09:09 +03:00
Michał Chojnowski	bf26a8c467	utils: redesign reusable_buffer Large contiguous buffers put large pressure on the allocator and are a common source of reactor stalls. Therefore, Scylla avoids their use, replacing it with fragmented buffers whenever possible. However, the use of large contiguous buffers is impossible to avoid when dealing with some external libraries (i.e. some compression libraries, like LZ4). Fortunately, calls to external libraries are synchronous, so we can minimize the allocator impact by reusing a single buffer between calls. An implementation of such a reusable buffer has two conflicting goals: to allocate as rarely as possible, and to waste as little memory as possible. The bigger the buffer, the more likely that it will be able to handle future requests without reallocation, but also the memory memory it ties up. If request sizes are repetitive, the near-optimal solution is to simply resize the buffer up to match the biggest seen request, and never resize down. However, if we anticipate pathologically large requests, which are caused by an application/configuration bug and are never repeated again after they are fixed, we might want to resize down after such pathological requests stop, so that the memory they took isn't tied up forever. The current implementation of reusable buffers handles this by resizing down to 0 every 100'000 requests. This patch attempts to solve a few shortcomings of the current implementation. 1. Resizing to 0 is too aggressive. During regular operation, we will surely need to resize it back to the previous size again. If something is allocated in the hole left by the old buffer, this might cause a stall. We prefer to resize down only after pathological requests. 2. When resizing, the current implementation allocates the new buffer before freeing the old one. This increases allocator pressure for no reason. 3. When resizing up, the buffer is resized to exactly the requested size. That is, if the current size is 1MiB, following requests of 1MiB+1B and 1MiB+2B will both cause a resize. It's preferable to limit the set of possible sizes so that every reset doesn't tend to cause multiple resizes of almost the same size. The natural set of sizes is powers of 2, because that's what the underlying buddy allocator uses. No waste is caused by rounding up the allocation to a power of 2. 4. The interval of 100'000 uses is both too low and too arbitrary. This is up for discussion, but I think that it's preferable to base the dynamics of the buffer on time, rather than the number of uses. It's more predictable to humans. The implementation proposed in this patch addresses these as follows: 1. Instead of resizing down to 0, we resize to the biggest size seen in the last period. As long as at least one maximal (up to a power of 2) "normal" request appears each period, the buffer will never have to be resized. 2. The capacity of the buffer is always rounded up to the nearest power of 2. 3. The resize down period is no longer measured in number of requests but in real time. Additionally, since a shared buffer in asynchronous code is quite a footgun, some rudimentary refcounting is added to assert that only one reference to the buffer exists at a time, and that the buffer isn't downsized while a reference to it exists. Fixes #13437	2023-04-26 22:09:17 +02:00
Tomasz Grabiec	41e69836fd	db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()	2023-04-24 10:49:37 +02:00
Kefu Chai	ebf5e138e8	redis,thrift,transport: make timeout_config live-updateable * timeout_config - add `updated_timeout_config` which represents an always-updated options backed by `utils::updateable_value<>`. this class is used by servers which need to access the latest timeout related options. the existing `timeout_config` is more like a snapshot of the `updated_timeout_config`. it is used in the use case where we don't need to most updated options or we update the options manually on demand. * redis, thrift, transport: s/timeout_config/updated_timeout_config/ when appropriate. use the improved version of timeout_config where we need to have the access to the most-updated version of the timeout options. Fixes #10172 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:17:45 +08:00
Kefu Chai	1cc28679bc	transport: mark cql_server::timeout_config() const this function returns a const reference to member variable, so we can mark it with the `const` specifier for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	d72ab78ffd	transport: drop unused member function since `cql_server::connection::timeout_config()` is used nowhere, let's just drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	c642ca9e73	redis,thrift,transport: initialize _config with std::move(config) instead of copying the `config` parameter, move away from it. this change also prepares for a non-copyable config. if the class of `config` is not copyable, we will not be able to initialize the member variable by copying from the given `config` parameter. after the live-updateable config change, the `_config` member variable will contain instances of utils::observer<>, which is not copyable, but is move-constructable, hence in this change, we just move away from the give `config`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:02 +08:00
Kefu Chai	e0ac2eb770	redis,thrift,transport: pass config via sharded_parameter * pass config via sharded_parameter * initialize config using designated initializer this change paves the road to servers with live-updateable timeout options. before this change, the servers initialize a domain specific combo config, like `redis_server_config`, with the same instance of a timeout_config, and pass the combox config as a ctor parameter to construct each sharded service instance. but this design assumes the value semantic of the config class, say, it should be copyable. but if we want to use utils::updateable_value<> to get updated option values, we would have to postpone the instantiation of the config until the sharded service is about to be initialized. so, in this change, instead of taking a domain specific config created before hand, all services constructed with a `timeout_config` will take a `sharded_parameter()` for creating the config. also, take this opportunity to initialize the config using designated initializer. for two reasons: * less repeatings this way. we don't have to repeat the variable name of the config being initialized for each member variable. * prepare for some member variables which do not have a default constructor. this applies to the timeout_config's updater which will not have a default constructor, as it should be initialized by db::config and a reference to the timeout_config to be updated. we will update the `timeout_config` side in a follow-up commit. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-29 20:06:00 +08:00
Botond Dénes	19560419d2	Merge 'treewide: improve compatibility with gcc 13' from Avi Kivity An assortment of patches that reduce our incompatibilities with the upcoming gcc 13. Closes #13243 * github.com:scylladb/scylladb: transport: correctly format unknown opcode treewide: catch by reference test: raft: avoid confusing string compare utils, types, test: extract lexicographical compare utilities test: raft: fsm_test: disambiguate raft::configuration construction test: reader_concurrency_semaphore_test: handle all enum values repair: fix signed/unsigned compare repair: fix incorrect signed/unsigned compare treewide: avoid unused variables in if statements keys: disambiguate construction from initializer_list<bytes> cql3: expr: fix serialize_listlike() reference-to-temporary with gcc compaction: error on invalid scrub type treewide: prevent redefining names api: task_manager: fix signed/unsigned compare alternator: streams: fix signed/unsigned comparison test: fix some mismatched signed/unsigned comparisons	2023-03-24 15:16:05 +02:00
Vlad Zolotarov	f94bbc5b34	transport: add per-scheduling-group CQL opcode-specific metrics This patch extends a previous patch that added these metrics globally: - cql_requests_count - cql_request_bytes - cql_response_bytes This patch adds a "scheduling_group_name" label to these metrics and changes corresponding counters to be accounted on a per-scheduling-group level. As a bonus this patch also marks all 3 metrics as 'skip_when_empty'. Ref #13061 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20230321201412.3004845-1-vladz@scylladb.com>	2023-03-22 13:27:48 +02:00
Avi Kivity	19810cfc5e	transport: correctly format unknown opcode gcc allows an enum to contain values outside its members. For extra safety, as this can be user visible, format the unknown opcode and return it.	2023-03-21 15:43:00 +02:00
Avi Kivity	e75009cd49	treewide: catch by reference gcc rightly warns about capturing by value, so capture by reference.	2023-03-21 15:43:00 +02:00
Kefu Chai	21a7c439bb	build: cmake: find Snappy before using it Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-08 22:53:42 +08:00
Vlad Zolotarov	ae6724f155	transport: refactor CQL metrics This patch reorganizes and extends CQL related metrics. Before this patch we only had counters for specific CQL requests. However, many times we need to reason about the size of CQL queries: corresponding requests and response sizes. This patch adds corresponding metrics: - Arranges all 3 per-opcode statistics counters in a single struct. - Defines a vector of such structs for each CQL opcode. - Adjusts statistics updates accordingly - the code is much simpler now. - Removes old metrics that were accounting some CQL opcodes. - Adds new per-opcode metrics for requests number, request and response sizes: - New metrics are of a derived kind - rate() should be applied to them. - There are 3 new metrics names: - 'cql_requests_count' - 'cql_request_bytes' - 'cql_response_bytes' - New metrics have a per-opcode label - 'kind'. For example: A number of response bytes for an EXECUTE opcode on shard 0 looks as follows: scylla_transport_cql_response_bytes{kind="EXECUTE",shard="0"} Ref #13061 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Message-Id: <20230302154816.299721-1-vladz@scylladb.com>	2023-03-07 12:02:34 +02:00
Kefu Chai	563fbb2d11	build: cmake: extract more subsystem out into its own CMakeLists.txt namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica, service, tools, tracing and transport. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Kefu Chai	412953fdd5	compress, transport: do not detect LZ4_compress_default() `LZ4_compress_default()` was introduced in liblz4 v1.7.3, despite that the release note (https://github.com/lz4/lz4/releases/tag/v1.7.3) of v1.7.3 didn't mention this. if we check the commit which added this API, we can find all releases including it: see ``` $ git tag --contains 1b17bf2ab8cf66dd2b740eca376e2d46f7ad7041 lz4-r130 r129 r130 r131 rc129v0 v1.7.3 v1.7.4 v1.7.4.2 v1.7.5 v1.8.0 v1.8.1 v1.8.1.2 v1.8.2 v1.8.3 v1.9.0 v1.9.1 v1.9.2 v1.9.3 v1.9.4 ``` and v1.7.3 was released in Nov 17, 2016. some popular distros releases also package new enough liblz4: - fedora 35 ships lz4-devel 1.9.3, - CentOS 7 ships lz4-devel 1.8.3 - debian 10 ships liblz4-dev 1.8.3 - ubuntu 18.04 ships liblz4-dev r131 so, in this change, we drop the support of liblz4 < 1.7.3 for better code readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #12971	2023-02-23 14:39:20 +02:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Petr Gusev	3263523b54	transport server: fix "request size too large" handling Calling _read_buf.close() doesn't imply eof(), some data may have already been read into kernel or client buffers and will be returned next time read() is called. When the _server._max_request_size limit was exceeded and the _read_buf was closed, the process_request method finished and we started processing the next request in connection::process. The unread data from _read_buf was treated as the header of the next request frame, resulting in "Invalid or unsupported protocol version" error. The existing test_shed_too_large_request was adjusted. It was originally written with the assumption that the data of a large query would simply be dropped from the socket and the connection could be used to handle the next requests. This behaviour was changed in scylladb#8800, now the connection is closed on the Scylla side and can no longer be used. To check there are no errors in this case, we use Scylla metrics, getting them from the Scylla Prometheus API.	2023-02-08 00:07:08 +04:00
Petr Gusev	0904f98ebf	transport server: log failed requests with debug level These logs can be helpful for debugging, e.g. if an error was not handled correctly by the client driver, or another error occurred while handling it.	2023-02-08 00:07:08 +04:00
Petr Gusev	a4cf509c3d	transport server: fix unexpected server errors handling If request processing ended with an error, it is worth sending the error to the client through make_error/write_response. Previously in this case we just wrote a message to the log and didn't handle the client connection in any way. As a result, the only thing the client got in this case was timeout error. A new test_batch_with_error is added. It is quite difficult to reproduce error condition in a test, so we use error injection instead. Passing injection_key in the body of the request ensures that the exception will be thrown only for this test request and will not affect other requests that the driver may send in the background. Closes: scylladb#12104	2023-02-08 00:07:02 +04:00
Petr Gusev	bd80a449d5	transport server: log client errors with debug level Ideally, these errors should be transparently delivered to the client, but in practice, due to various flaws/bugs in scylla and/or the driver, they can be lost, which enormously complicates troubleshooting. const socket_address& get_remote_address() is needed for its convenient conversion to string, which includes ip and port.	2023-02-07 13:53:38 +04:00
Avi Kivity	0b418fa7cf	cql3, transport, tests: remove "unset" from value type system The CQL binary protocol introduced "unset" values in version 4 of the protocol. Unset values can be bound to variables, which cause certain CQL fragments to be skipped. For example, the fragment `SET a = :var` will not change the value of `a` if `:var` is bound to an unset value. Unsets, however, are very limited in where they can appear. They can only appear at the top-level of an expression, and any computation done with them is invalid. For example, `SET list_column = [3, :var]` is invalid if `:var` is bound to unset. This causes the code to be littered with checks for unset, and there are plenty of tests dedicated to catching unsets. However, a simpler way is possible - prevent the infiltration of unsets at the point of entry (when evaluating a bind variable expression), and introduce guards to check for the few cases where unsets are allowed. This is what this long patch does. It performs the following: (general) 1. unset is removed from the possible values of cql3::raw_value and cql3::raw_value_view. (external->cql3) 2. query_options is fortified with a vector of booleans, unset_bind_variable_vector, where each boolean corresponds to a bind variable index and is true when it is unset. 3. To avoid churn, two compatiblity structs are introduced: cql3::raw_value{,_view}_vector_with_unset, which can be constructed from a std::vector<raw_value{,_view/}>, which is what most callers have. They can also be constructed with explicit unset vectors, for the few cases they are needed. (cql3->variables) 4. query_options::get_value_at() now throws if the requested bind variable is unset. This replaces all the throwing checks in expression evaluation and statement execution, which are removed. 5. A new query_options::is_unset() is added for the users that can tolerate unset; though it is not used directly. 6. A new cql3::unset_operation_guard class guards against unsets. It accepts an expression, and can be queried whether an unset is present. Two conditions are checked: the expression must be a singleton bind variable, and at runtime it must be bound to an unset value. 7. The modification_statement operations are split into two, via two new subclasses of cql3::operation. cql3::operation_no_unset_support ignores unsets completely. cql3::operation_skip_if_unset checks if an operand is unset (luckily all operations have at most one operand that tolerates unset) and applies unset_operation_guard to it. 8. The various sites that accept expressions or operations are modified to check for should_skip_operation(). This are the loops around operations in update_statement and delete_statement, and the checks for unset in attributes (LIMIT and PER PARTITION LIMIT) (tests) 9. Many unset tests are removed. It's now impossible to enter an unset value into the expression evaluation machinery (there's just no unset value), so it's impossible to test for it. 10. Other unset tests now have to be invoked via bind variables, since there's no way to create an unset cql3::expr::constant. 11. Many tests have their exception message match strings relaxed. Since unsets are now checked very early, we don't know the context where they happen. It would be possible to reintroduce it (by adding a format string parameter to cql3::unset_operation_guard), but it seems not to be worth the effort. Usage of unsets is rare, and it is explicit (at least with the Python driver, an unset cannot be introduced by ommission). I tried as an alternative to wrap cql3::raw_value{,_view} (that doesn't recognize unsets) with cql3::maybe_unset_value (that does), but that caused huge amounts of churn, so I abandoned that in favor of the current approach. Closes #12517	2023-01-16 21:10:56 +02:00
Avi Kivity	7a8a442c1e	transport: drop some dead code around v1 and v2 protocols In `424dbf43f` ("transport: drop cql protocol versions 1 and 2"), we dropped support for protocols 1 and 2, but some code remains that checks for those versions. It is now dead code, so remove it. Closes #12497	2023-01-12 12:52:19 +02:00
Avi Kivity	2739ac66ed	treewide: drop cql_serialization_format Now that we don't accept cql protocol version 1 or 2, we can drop cql_serialization format everywhere, except when in the IDL (since it's part of the inter-node protocol). A few functions had duplicate versions, one with and one without a cql_serialization_format parameter. They are deduplicated. Care is taken that `partition_slice`, which communicates the cql_serialization_format across nodes, still presents a valid cql_serialization_format to other nodes when transmitting itself and rejects protocol 1 and 2 serialization\ format when receiving. The IDL is unchanged. One test checking the 16-bit serialization format is removed.	2023-01-03 19:54:13 +02:00
Avi Kivity	424dbf43f3	transport: drop cql protocol versions 1 and 2 Version 3 was introduced in 2014 (Cassandra 2.1) and was supported in the very first version of Scylla (`2a7da21481` "CQL binary protocol"). Cassandra 3.0 (2015) dropped protocols 1 and 2 as well. It's safe enough to drop it now, 9 years after introduction of v3 and 7 years after Cassandra stopped supporting it. Dropping it allows dropping cql_serialization_format, which causes quite a lot of pain, and is probably broken. This will be dropped in the following patch.	2023-01-03 19:47:49 +02:00
Eliran Sinvani	5a5514d052	cql server: Only parallelize relevant cql requests The cql server uses an execution stage to process and execute queries, however, processing stage is best utilized when having a recurrent flow that needs to be called repeatedly since it better utilizes the instruction cache. Up until now, every request was sent through the processing stage, but most requests are not meant to be executed repeatedly with high volume. This change processes and executes the data queries asynchronously, through an execution stage, and all of the rest are processed one by one, only continuing once the request has been done end to end. Tests: Unit tests in dev and debug. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com> Closes #12202	2022-12-05 21:06:58 +02:00
Avi Kivity	5dedf85288	transport: prevent signed and unsigned comparison This can lead to undefined behavior. Cast to unsigned, after we've verified the value is indeed positive.	2022-11-28 21:58:30 +02:00
Karol Baryła	1c2eef384d	transport/server.cc: Return correct size of decompressed lz4 buffer An incorrect size is returned from the function, which could lead to crashes or undefined behavior. Fix by erroring out in these cases. Fixes #11476	2022-09-07 10:58:23 +03:00
David Garcia	b85843b9cc	Fix broken links Fix broken links	2022-06-28 15:19:36 +01:00
Pavel Emelyanov	f3841c1b45	exceptions: Define operator<< for exception_code Otherwise cql_transport::additional_options_for_proto_ext() complains about inability to format the enum class value Introduced by `efc3953c` (transport: add rate_limit_error) Fmt version 8.1.1-5.fc35, fresher one must have it out of the box Fixes #10884 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20220627052703.32024-1-xemul@scylladb.com>	2022-06-27 14:49:58 +03:00
Piotr Dulikowski	efc3953c0a	transport: add rate_limit_error Adds a CQL protocol extension which introduces the rate_limit_error. The new error code will be used to indicate that the operation failed due to it exceeding the allowed per-partition rate limit. The error code is supposed to be returned only if the corresponding CQL extension is enabled by the client - if it's not enabled, then Config_error will be returned in its stead.	2022-06-22 20:07:58 +02:00
Avi Kivity	528ab5a502	treewide: change metric calls from make_derive to make_counter make_derive was recently deprecated in favor of make_counter, so make the change throughput the codebase. Closes #10564	2022-05-14 12:53:55 +02:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Juliusz Stasiewicz	603dd72f9e	CQL: Replace assert by exception on invalid auth opcode One user observed this assertion fail, but it's an extremely rare event. The root cause - interlacing of processing STARTUP and OPTIONS messages - is still there, but now it's harmless enough to leave it as is. Fixes #10487 Closes #10503	2022-05-08 11:33:58 +03:00
Avi Kivity	987e6533d2	transport: return correct error codes when downgrading v4 {WRITE,READ}_FAILURE to {WRITE,READ}_TIMEOUT Protocol v4 added WRITE_FAILURE and READ_FAILURE. When running under v3 we downgrade these exceptions to WRITE_TIMEOUT and READ_TIMEOUT (since the client won't understand the v4 errors), but we still send the new error codes. This causes the client to become confused. Fix by updating the error codes. A better fix is to move the error code from the constructor parameter list and hard-code it in the constructor, but that is left for a follow-up after this minimal fix. Fixes #5610. Closes #10362	2022-04-12 19:19:52 +03:00
Nadav Har'El	fa7a302130	cross-tree: split coordinator_result from exceptions.hh Recently, coordinator_result was introduced as an alternative for exceptions. It was placed in the main "exceptions/exceptions.hh" header, which virtually every single source file in Scylla includes. But unfortunately, it brings in some heavy header files and templates, leading to a lot of wasted build time - ClangBuildAnalyzer measured that we include exceptions.hh in 323 source files, taking almost two seconds each on average. In this patch, we split the coordinator_result feature into a separate header file, "exceptions/coordinator_result", and only the few places which need it include the header file. Unfortunately, some of these few places are themselves header, so the new header file ends up being included in 100 source files - but 100 is still much less than 323 and perhaps we can reduce this number 100 later. After this patch, the total Scylla object-file size is reduced by 6.5% (the object size is a proxy for build time, which I didn't directly measure). ClangBuildAnalyzer reports that now each of the 323 includes of exceptions.hh only takes 80ms, coordinator_result.hh is only included 100 times, and virtually all the cost to include it comes from Boost's result.hh (400ms per inclusion). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>	2022-03-02 10:12:57 +02:00
Pavel Emelyanov	de6c60c1c9	client_data: Sanitize connection_notifier Now the connection_notifier is all gone, only the client_data bits are left. To keep it consistent -- rename the files. Also, while at it, brush up the header dependencies and remove the not really used constexprs for client states. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 15:02:26 +03:00
Pavel Emelyanov	d63ba87266	transport: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 15:02:26 +03:00
Pavel Emelyanov	971c431a23	code: Remove old on-disk version of system.clients table This includes most of the connection_notifier stuff as well as the auxiliary code from system_keyspace.cc and a bunch of updating calls from the client state changing. Other than less code and less disk updates on clients connection paths, this removes one usage of the nasty global qctx thing. Since the system.clients goes away rename the system.clients_v here too so the table is always present out there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 15:02:26 +03:00
Pavel Emelyanov	7bc697ec99	protocol_server: Add get_client_data call The call returns a chunked_vector with client_data's. For now only the native transport implements it, others return empty vector. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 14:25:08 +03:00

1 2 3 4 5 ...

540 Commits