scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	2cb18c2404	transport: Keep backreference from event_notifier The event_notifier is private server subclass that's created once per server to handle events from storage_service. The notifier needs gossiper that already sits on the server, and to get it the simplest way is to equip notifier with the server backreference. Since these two objects are in strict 1:1 relation this reference is safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:55:41 +03:00
Pavel Emelyanov	43951318c8	transport: Keep gossiper on server The gossiper is needed by the transport::event_notifier. There's already gossiper reference on the transport controller, but it's a local reference, because controller doesn't need more. This patch upgrages controller reference to sharded<> and propagates it further up to the server. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:54:45 +03:00
Botond Dénes	9ec55e054d	treewide: distinguish truncated frame errors We have two identical "Truncated frame" errors, at: * read_frame_size() in serialization_visitors.hh; * cql_server::connection::read_and_decompress_frame() in transport/server.cc; When such an exception is thrown, it is impossible to tell where was it thrown from and it doesn't have any further information contained in it (beyond the basic information it being thrown implies). This patch solves both problems: it makes the exception messages unique per location and it adds information about why it was thrown (the expected vs. real size of the frame). Ref: #9482 Closes #9520	2021-10-27 12:27:16 +02:00
Calle Wilund	940058d25a	transport::server: Handle nested exceoptions in cql execution/query Fixes #9491 CQL server, when encountering a "general" exception (i.e. not thrown by cql error checks), reports a wire error with simply the what() part of exception. However, if we have nested exceptions, we will most likely lose info here (hello encryption). General exception case should unwind exception and give back full, concatenated message to avoid confusion. Closes #9492	2021-10-20 17:54:17 +03:00
Piotr Sarna	59bd25d1ea	transport: respond with overloaded exception during shedding This commit makes shedding always respond - with overloaded exception, instead of ignoring the request. Fixes #9442 Closes #9443	2021-10-07 15:38:40 +03:00
Avi Kivity	0876248c2b	Merge "cql3: cache function calls evaluation for non-deterministic functions" from Pavel S " `function_call` AST nodes are created for each function with side effects in a CQL query, i.e. non-deterministic functions (`uuid()`, `now()` and some others timeuuid-related). These nodes are evaluated either when a query itself is executed or query restrictions are computed (e.g. partition/clustering key ranges for LWT requests). We need to cache the calls since otherwise when handling a `bounce_to_shard` request for an LWT query, we can possibly enter an infinite bouncing loop (in case a function is used to calculate partition key ranges for a query), since the results can be different each time. Furthermore, we don't support bouncing more than one time. Returning `bounce_to_shard` message more than one time will result in a crash. Caching works only for LWT statements and only for the function calls that affect partition key range computation for the query. `variable_specifications` class is renamed to `prepare_context` and generalized to record information about each `function_call` AST node and modify them, as needed: * Check whether a given function call is a part of partition key statement restriction. * Assign ids for caching if above is true and the call is a part of an LWT statement. There is no need to include any kind of statement identifier in the cache key since `query_options` (which holds the cache) is limited to a single statement, anyway. Function calls are indexed by the order in which they appear within a statement while parsing. There is no need to include any kind of statement identifier to the cache key since `query_options` (which holds the cache) is limited to a single statement, anyway. Note that `function_call::raw` AST nodes are not created for selection clauses of a SELECT statement hence they can only accept only one of the following things as parameters: * Other function calls. * Literal values. * Parameter markers. In other words, only parameters that can be immediately reduced to a byte buffer are allowed and we don't need to handle database inputs to non-pure functions separately since they are not possible in this context. Anyhow, we don't even have a single non-pure function that accepts arguments, so precautions are not needed at the moment. Add a test written in `cql-pytest` framework to verify that both prepared and unprepared lwt statements handle `bounce_to_shard` messages correctly in such scenario. Fixes: #8604 Tests: unit(dev, debug) NOTE: the patchset uses `query_options` as a container for cached values. This doesn't look clean and `service::query_state` seems to be a better place to store them. But it's not forwarded to most of the CQL code and would mean that a huge number of places would have to be amended. The series presents a trade-off to avoid forwarding `query_state` everywhere (but maybe it's the thing that needs to be done, nonetheless). " * 'lwt_bounce_to_shard_cached_fn_v6' of https://github.com/ManManson/scylla: cql-pytest: add a test for non-pure CQL functions cql3: cache function calls evaluation for non-deterministic functions cql3: rename `variable_specifications` to `prepare_context`	2021-07-30 14:21:11 +03:00
Pavel Solodovnikov	3b6adf3a62	cql3: cache function calls evaluation for non-deterministic functions And reuse these values when handling `bounce_to_shard` messages. Otherwise such a function (e.g. `uuid()`) can yield a different value when a statement re-executed on the other shard. It can lead to an infinite number of `bounce_to_shard` messages sent in case the function value is used to calculate partition key ranges for the query. Which, in turn, will cause crashes since we don't support bouncing more than one time and the second hop will result in a crash. Caching works only for LWT statements and only for the function calls that affect partition key range computation for the query. `variable_specifications` class is renamed to `prepare_context` and generalized to record information about each `function_call` AST node and modify them, as needed: * Check whether a given function call is a part of partition key statement restriction. * Assign ids for caching if above is true and the call is a part of an LWT statement. There is no need to include any kind of statement identifier in the cache key since `query_options` (which holds the cache) is limited to a single statement, anyway. Note that `function_call::raw` AST nodes are not created for selection clauses of a SELECT statement hence they can only accept only one of the following things as parameters: * Other function calls. * Literal values. * Parameter markers. In other words, only parameters that can be immediately reduced to a byte buffer are allowed and we don't need to handle database inputs to non-pure functions separately since they are not possible in this context. Anyhow, we don't even have a single non-pure function that accepts arguments, so precautions are not needed at the moment. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-07-30 01:22:39 +03:00
Pavel Emelyanov	c7b0b25494	transport, generic_server: Remove no longer used functionality After subscription management was moved onto controller level a bunch of code can be dropped: - passing migration notifier beyond controller - event_notifier's _stopped bit - event_notifier .stop() method - event_notifier empty constructor and destrictor - generic_server's on_stop virtual method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:41:32 +03:00
Avi Kivity	9059514335	build, treewide: enable -Wpessimizing-move warning This warning prevents using std::move() where it can hurt - on an unnamed temporary or a named automatic variable being returned from a function. In both cases the value could be constructed directly in its final destination, but std::move() prevents it. Fix the handful of cases (all trivial), and enable the warning. Closes #8992	2021-07-08 17:52:34 +03:00
Pavel Emelyanov	990db016e9	transport: Untie transport and database Both controller and server only need database to get config from. Since controller creation only happens in main() code which has the config itself, we may remove database mentioning from transport. Previous attempt was not to carry the config down to the server level, but it stepped on an updateable_value landmine -- the u._v. isn't copyable cross-shard (despite the docs) and to properly initialize server's max_concurrent_requests we need the config's named_value member itself. The db::config that flies through the stack is const reference, but its named_values do not get copied along the way -- the updateable value accepts both references and const references to subscribe on. tests: start-stop in debug mode Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210607135656.18522-1-xemul@scylladb.com>	2021-06-09 20:04:12 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Piotr Sarna	fa29b79c20	transport: close connections when too large requests arrive Too large requests are currently handled by the CQL server by skipping them and sending back an error response. That's however wasteful and dangerous: bogus request sizes will force Scylla to potentially skip gigabytes of data - and skipping is done by simply reading from the socket, so it may results in gigabytes of bandwidth wasted. Even if the request size is not bogus, closing the connection forces users to adjust their request sizes, which should be done anyway. Originally, there was a bug in handling too large requests which only read their headers and then left the connection in a broken, undefined state, trying to interpret the rest of the large request as a next CQL header. It was later fixed to skip the request, but closing the connection is a safer thing to do. Fixes #8798 Closes #8800	2021-06-07 12:23:55 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	100d6f4094	build: enable -Wunused-function Also drop a single violation in transport/server.cc. This helps prevent dead code from piling up. Three functions in row_cache_test that are not used in debug mode are moved near their user, and under the same ifdef, to avoid triggering the error. Closes #8767	2021-06-06 09:21:23 +03:00
Piotr Sarna	cb27ebe61d	transport: start shedding requests during potential overload This commit implements the following overload prevention heuristics: if the admission queue becomes full, a timer is armed for 50ms. If any of the ongoing requests finishes, the timer is disarmed, but if that doesn't happen, the server goes into shedding mode, which means that it reads new requests from the socket and immediately drops them until one of the ongoing requests finishes. This heuristics is not recommended for OLAP workloads, so it is applied only if the session declared itself as interactive (via service level's workload_type parameter).	2021-05-27 13:02:22 +02:00
Piotr Sarna	6da59b8a38	transport: add updating per-service-level params Per-service-level parameters (currently timeouts) are now updated when a new connection is established. The other connections which have the changed role are currently not immediately reloaded.	2021-05-10 12:39:41 +02:00
Piotr Sarna	e257ec11c0	treewide: remove service level controller from query state ... since it's accessible through its member, client state.	2021-05-10 11:48:14 +02:00
Piotr Sarna	d1f2e8b469	treewide: propagate service level to client state ... since it's going to be used to set up per-service-level timeouts.	2021-05-10 11:48:14 +02:00
Nadav Har'El	58e275e362	cross-tree: reduce dependency on db/config.hh and database.hh Every time db/config.hh is modified (e.g., to add a new configuration option), 110 source files need to be recompiled. Many of those 110 didn't really care about configuration options, and just got the dependency accidentally by including some other header file. In this patch, I remove the include of "db/config.hh" from all header files. It is only needed in source files - and header files only need forward declarations. In some cases, source files were missing certain includes which they got incidentally from db/config.hh, so I had to add these includes explicitly. After this patch, the number of source files that get recompiled after a change to db/config.hh goes down from 110 to 45. It also means that 65 source files now compile faster because they don't include db/config.hh and whatever it included. Additionally, this patch also eliminates a few unnecessary inclusions of database.hh in other header files, which can use a forward declaration or database_fwd.hh. Some of the source files including one of those header files relied on one of the many header files brought in by database.hh, so we need to include those explicitly. In view_update_generator.hh something interesting happened - it needs database.hh because of code in the header file, but only included database_fwd.hh, and the only reason this worked was that the files including view_update_generator.hh already happened to unnecessarily include database.hh. So we fix that too. Refs #1 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210505102111.955470-1-nyh@scylladb.com>	2021-05-05 13:23:00 +03:00
Avi Kivity	daeddda7cc	treewide: remove inclusions of storage_proxy.hh from headers storage_proxy.hh is huge and includes many headers itself, so remove its inclusions from headers and re-add smaller headers where needed (and storage_proxy.hh itself in source files that need it). Ref #1.	2021-04-20 21:23:00 +03:00
Pekka Enberg	16f262b852	transport, redis: Use generic server::listen() Let's pull up cql_server listen() to generic_server::server base class and convert redis_server to use it.	2021-04-13 14:13:24 +03:00
Pekka Enberg	6c619e4462	transport/server: Remove "redis_server" prefix from logging The logger itself has the name "redis_server" that appears in the logs.	2021-04-13 13:57:22 +03:00
Pekka Enberg	7ef3c60864	transport/server: Remove "cql_server" prefix from logging The logger itself has the name "cql_server" that appears in the logs.	2021-04-13 13:57:22 +03:00
Pekka Enberg	ac90a8ea50	transport, redis: Use generic server::do_accepts() The cql_server and redis_server share the same ancestor of do_accepts(). Let's pull up the cql_server version of do_accept() (that has more functionality) to generic_server::server and use it in the redis_server too.	2021-04-13 13:57:21 +03:00
Pekka Enberg	3689db26fc	transport, redis: Use generic server::process() Pull up the cql_server process() to base class and convert redis_server to use it. Please note that this fixes EPIPE and connection reset issue in the Redis server, which was fixed in the CQL server in commit `1a8630e6a` ("transport: silence "broken pipe" and "connection reset by peer" errors").	2021-04-13 13:56:45 +03:00
Pekka Enberg	66d6899727	transport: Move CQL specific error handling to handle_error() This moves the CQL specific error handling to handle_error() to make process() more generic in preparation for move to generic_server.	2021-04-13 13:56:45 +03:00
Pekka Enberg	ab339cfaf7	transport, redis: Move connection tracking to generic_server::server class The cql_server and redis_server classes have identical connection tracking code. Pull it up to the generic_server::server base class.	2021-04-13 13:56:45 +03:00
Pekka Enberg	19507bb7ea	transport, redis: Use generic connection::shutdown() This patch moves the duplicated connection::shutdown() method to to a new generic_server::connection base class that is now inherited by cql_server and redis_server.	2021-04-13 13:56:44 +03:00
Piotr Sarna	26ee6aa1e9	transport: initialize query state with service level controller Query state should be aware of the service level controller in order to properly serve service-level-related CQL queries.	2021-04-12 16:31:27 +02:00
Pavel Emelyanov	f0a79574d4	memory_limiter: Use main-local instance everyehere The cql_server and alternator both need the limiter, so patch them to stop using storage service's one and use the main-local one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-03-18 11:28:45 +01:00
Pavel Emelyanov	c2f94fb527	cql_server: Remove semaphore getter fn from config The cql_server() need to get the memory limiter semaphore from local storage service instance. To make this happen a callback in introduced on the config structure. The same can be achieved in a simler manner -- by providing the local storage service instances directly. Actually, the storage service will be removed in further patches from this place, so this patch is mostly to get rid of the callback from the config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-03-18 11:28:45 +01:00
Michał Chojnowski	4e35befcf2	treewide: get rid of incorrect reinterpret casts In some places we use the `reinterpret_cast<const net::packed<T>>(&x)` pattern to reinterpret memory. This is a violation of C++'s aliasing rules, which invokes undefined behaviour. The blessed way to correctly reinterpret memory is to copy it into a new object. Let's do that. Note: the reinterpret_cast way has no performance advantage. Compilers recognize the memory copy pattern and optimize it away.	2021-03-17 17:00:38 +01:00
Piotr Sarna	8635094144	transport: return error on correct stream during size shedding When a request is shed due to being too large, its response was sent with stream id 0 instead of the stream id that matches the communication lane. That in turn confused the client, which is no longer the case.	2021-03-02 15:10:46 +01:00
Piotr Sarna	d6ea6937ee	transport: return error on correct stream during shedding When a request is shed due to exceeding the max number of concurrent requests, its response was sent with stream id 0 instead of the stream id that matches the communication lane. That in turn confused the client, which is no longer the case.	2021-03-02 15:10:46 +01:00
Piotr Sarna	4a24d7dca0	transport: skip the whole request if it is too large When a request is shed due to being too large, only the header was actually read, and the body was still stuck in the socket - and would be read in the next iteration, which would expect to actually read a new request header. Instead, the whole message is now skipped, so that a new request can be correctly read and parsed. Fixes #8193	2021-03-02 10:10:19 +01:00
Piotr Sarna	3eb7e768cb	transport: skip the whole request during shedding When a request is shed due to exceeding the number of max concurrent requests, only its header was actually read, and the body was still stuck in the socket - and would be read in the next iteration, which would expect to actually read a new request header. Instead, the whole message is now skipped, so that a new request can be correctly read and parsed. Refs #8193	2021-03-02 10:10:19 +01:00
Piotr Sarna	c5214eb096	treewide: remove timeout config from query options Timeout config is now stored in each connection, so there's no point in tracking it inside each query as well. This patch removes timeout_config from query_options and follows by removing now unnecessary parameters of many functions and constructors.	2021-02-25 17:20:27 +01:00
Piotr Sarna	7ceafda70a	service: add timeout config to client state Future patches will use this per-connection timeout config to allow setting different timeouts for each session, based on roles.	2021-02-25 17:20:26 +01:00
Piotr Sarna	25f47561cb	transport: fix an outdated comment The comment mentions calling a lambda in-place, but the lambda is no longer there since 2019! Message-Id: <3903c84d5c151415409f28935e328b552dd548f8.1614155567.git.sarna@scylladb.com>	2021-02-24 11:14:01 +02:00
Pavel Emelyanov	8490c9ff6a	transport: Remove global storage service reference On start the transport controller keeps the storage service on server config's lambda just to let the server grab a database config option. The same can be achieved by passing the sharded database reference to sharded<server>::start, so that each server instance get local database with config. As an nice side effect transport::server's config looks more like a config with simple values and without methods and/or lambdas on board. tests: unit(dev) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210205175611.13464-1-xemul@scylladb.com>	2021-02-08 12:58:49 +01:00
Nadav Har'El	702b1b97bf	cql: fix error return from execution of fromJson() and other functions As reproduced in cql-pytest/test_json.py and reported in issue #7911, failing fromJson() calls should return a FUNCTION_FAILURE error, but currently produce a generic SERVER_ERROR, which can lead the client to think the server experienced some unknown internal error and the query can be retried on another server. This patch adds a new cassandra_exception subclass that we were missing - function_execution_exception - properly formats this error message (as described in the CQL protocol documentation), and uses this exception in two cases: 1. Parse errors in fromJson()'s parameters are converted into a function_execution_exception. 2. Any exceptions during the execute() of a native_scalar_function_for function is converted into a function_execution_exception. In particular, fromJson() uses a native_scalar_function_for. Note, however, that functions which already took care to produce a specific Cassandra error, this error is passed through and not converted to a function_execution_exception. An example is the blobAsText() which can return an invalid_request error, so it is left as such and not converted. This also happens in Cassandra. All relevant tests in cql-pytest/test_json.py now pass, and are no longer marked xfail. This patch also includes a few more improvements to test_json.py. Fixes #7911 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210118140114.4149997-1-nyh@scylladb.com>	2021-01-21 15:21:13 +01:00
Kamil Braun	1a8630e6a7	transport: silence "broken pipe" and "connection reset by peer" errors The code would already silence broken pipe exceptions since it's expected when the other side closes the connection or when we shutdown the socket during Scylla shutdown, but the code wouldn't handle the following: 1. "Connection reset by peer" errors: these can also happen in the aforementioned two scenarios; the conditions that determine which of the two types of errors occur are unclear. 2. The scenarios would sometimes result in a `seastar::nested_exception`, mainly during shutdown. The errors could happen once when trying to send a response to a request (`_write_buf.write(...)/flush(...)`) and then again when trying to close the connection in a `finally` block. These nested exceptions were not silenced. The commit handles each of these cases. Closes #7907. Closes #7931	2021-01-19 10:30:17 +02:00
Pekka Enberg	8d00c16feb	transport/server: Code cleanups Fix up some coding style issues spotted while reading the code: - Fix indentation to be 4 spaces - Remove superfluous semicolons Closes #7793	2020-12-14 12:48:05 +02:00
Piotr Wojtczak	3560acd311	cql_metrics: Add metrics for CQL errors This change adds tracking of all the CQL errors that can be raised in response to a CQL message from a client, as described in the CQL v4 protocol and with Scylla's CDC_WRITE_FAILUREs included. Fixes #5859 Closes #7604	2020-11-30 12:18:37 +02:00
Piotr Wojtczak	d9810ec8eb	cql_metrics: Add counters for CQL request messages This change adds metrics for counting request message types listed in the CQL v.4 spec under section 4.1 (https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v4.spec). To organize things properly, we introduce a new cql_server::transport_stats object type for aggregating the message and server statistics. Fixes #4888 Closes #7574	2020-11-11 20:00:17 +02:00
Juliusz Stasiewicz	0251cb9b31	transport: Update `connection_stage` in `system.clients`	2020-10-12 18:44:00 +02:00
Juliusz Stasiewicz	6abe1352ba	transport: Retrieve driver's name and version from STARTUP message	2020-10-12 18:37:19 +02:00
Juliusz Stasiewicz	d2d162ece3	transport: Notify `system.clients` about "protocol_version"	2020-10-12 18:32:00 +02:00
Juliusz Stasiewicz	acf0341e9b	transport: On successful authentication add `username` to system.clients The username becomes known in the course of resolving challenges from `PasswordAuthenticator`. That's why username is being set on successful authentication; until then all users are "anonymous". Meanwhile, `AllowAllAuthenticator` (the default) does not request username, so users logged with it will remain as "anonymous" in `system.clients`. Shuffling of code was necessary to unify existing infrastructure for INSERTing entries into `system.clients` with later UPDATEs.	2020-10-06 18:52:46 +02:00
Piotr Dulikowski	bfbf02a657	transport/config: fix cross-shard use of updateable_value Recently, the cql_server_config::max_concurrent_requests field was changed to be an updateable_value, so that it is updated when the corresponding option in Scylla's configuration is live-reloaded. Unfortunately, due to how cql_server is constructed, this caused cql_server instances on all shards to store an updateable_value which pointed to an updateable_value_source on shard 0. Unsynchronized cross-shard memory operations ensue. The fix changes the cql_server_config so that it holds a function which creates an updateable_value appropriate for the given shard. This pattern is similar to another, already existing option in the config: get_service_memory_limiter_semaphore. This fix can be reverted if updateable_value becomes safe to use across shards. Tests: unit(dev) Fixes: #7310	2020-10-01 14:10:56 +03:00

1 2 3 4 5 ...

390 Commits