scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-30 05:07:05 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	c04ddc5aa9	transport: Use server gossiper in event notifier The notifier is automatic friend of server and can access its private fields without additional wrappers/decorations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:56:05 +03:00
Pavel Emelyanov	2cb18c2404	transport: Keep backreference from event_notifier The event_notifier is private server subclass that's created once per server to handle events from storage_service. The notifier needs gossiper that already sits on the server, and to get it the simplest way is to equip notifier with the server backreference. Since these two objects are in strict 1:1 relation this reference is safe. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:55:41 +03:00
Pavel Emelyanov	43951318c8	transport: Keep gossiper on server The gossiper is needed by the transport::event_notifier. There's already gossiper reference on the transport controller, but it's a local reference, because controller doesn't need more. This patch upgrages controller reference to sharded<> and propagates it further up to the server. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-25 10:54:45 +03:00
Botond Dénes	a51529dd15	protocol_servers: strengthen guarantees of listen_addresses() In early versions of the series which proposed protocol servers, the interface had two methods answering pretty much the same question of whether the server is running or not: * listen_addresses(): empty list -> server not running * is_server_running() To reduce redundancy and to avoid possible inconsistencies between the two methods, `is_server_running()` was scrapped, but re-added by a follow-up patch because `listen_addresses()` proved to be unreliable as a source for whether the server is running or not. This patch restores the previous state of having only `listen_addresses()` with two additional changes: * rephrase the comment on `listen_addresses()` to make it clear that implementations must return empty list when the server is not running; * those implementations that have a reliable source of whether the server is running or not, use it to force-return an empty list when the server is not running Tests: dtest(nodetool_additional_test.py) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20211117062539.16932-1-bdenes@scylladb.com>	2021-11-19 11:09:09 +03:00
Benny Halevy	9d4262e264	protocol_server: add per-protocol is_server_running method Change `b0a2a9771f` broke the generic api implementation of is_native_transport_running that relied on the addresses list being empty agter the server is stopped. To fix that, this change introduces a pure virtual method: protocol_server::is_server_running that can be implemented by each derived class. Test: unit(dev) DTest: nodetool_additional_test.py:TestNodetool.binary_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211114135248.588798-1-bhalevy@scylladb.com>	2021-11-14 16:01:31 +02:00
Avi Kivity	b0a2a9771f	Merge "Sanitize hostnames resolving on start" from Pavel E " On start scylla resolves several hostnames into addresses. Different places use different hostname selection logic, e.g. the API address can be the listen one if the dedicated option not set. Failure to resolve a hostname is reported with an exception that (sometimes) contains the hostname, but it doesn't look very convenient -- better to know the config option name. Also resolving of different hostnames has different decoration around, e.g. prometheus carries a main-local lambda just to nicely wrap the try/catch block. This set unifies this zoo and makes main() shorter and less hairy: 1. All failures to resolve a hostname are reported with an exception containing the relevant config option 2. The \|\| operator for named_value's is introduced to make the option selection look as short as resolve(cfg->some_address() \|\| cfg->another_address()) 3. All sanity checks are explicit and happen early in main 4. No dangling local variables carrying the cfg->...() value 5. Use resolved IP when logging a "... is listening on ..." message after a service start tests: unit(dev) " * 'br-ip-resolve-on-start' of https://github.com/xemul/scylla: main: Move fb-utilities initialization up the main code: Use utils::resolve instead of inet_address::lookup main: Remove unused variable main: Sanitize resolving of listen address main: Sanitize resolving of broadcast address main: Sanitize resolving of broadcast RPC address main: Sanitize resolving of API address main: Sanitize resolving of prometheus address utils: Introduce \|\| operator for named_values db.config: Verbose address resolver helper main: Remove api-port and prometheus-port variables alternator: Resolve address with the help of inet_address redis, thrift: Remove unused captures	2021-11-09 09:15:40 +02:00
Pavel Emelyanov	2f9c21644b	code: Use utils::resolve instead of inet_address::lookup There are some users of the latter call left. They all suffer from the same problem -- the lack of verbosity on resolving errors. While at it also get rid of useless local variables that are only there to carry the cfg->...() option over. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-11-08 17:33:27 +03:00
Botond Dénes	134fa98ff4	transport: controller: implement the protocol_server interface	2021-11-05 15:42:41 +02:00
Avi Kivity	1aff7d19c2	treewide: replace seastar::fmt_print() with fmt::print() We shouldn't be using Seastar as a text formatting library; that's not its focus. Use fmt directly instead. fmt::print() doesn't return the output stream which is a minor inconvenience, but that's life. Closes #9556	2021-11-01 10:05:16 +02:00
Botond Dénes	9ec55e054d	treewide: distinguish truncated frame errors We have two identical "Truncated frame" errors, at: * read_frame_size() in serialization_visitors.hh; * cql_server::connection::read_and_decompress_frame() in transport/server.cc; When such an exception is thrown, it is impossible to tell where was it thrown from and it doesn't have any further information contained in it (beyond the basic information it being thrown implies). This patch solves both problems: it makes the exception messages unique per location and it adds information about why it was thrown (the expected vs. real size of the frame). Ref: #9482 Closes #9520	2021-10-27 12:27:16 +02:00
Calle Wilund	940058d25a	transport::server: Handle nested exceoptions in cql execution/query Fixes #9491 CQL server, when encountering a "general" exception (i.e. not thrown by cql error checks), reports a wire error with simply the what() part of exception. However, if we have nested exceptions, we will most likely lose info here (hello encryption). General exception case should unwind exception and give back full, concatenated message to avoid confusion. Closes #9492	2021-10-20 17:54:17 +03:00
Piotr Sarna	59bd25d1ea	transport: respond with overloaded exception during shedding This commit makes shedding always respond - with overloaded exception, instead of ignoring the request. Fixes #9442 Closes #9443	2021-10-07 15:38:40 +03:00
Piotr Sarna	06f724857f	transport: remove unused map of stream_id->query states The map is never touched, so it only occupies precious space for each connection. Closes #9383	2021-09-26 13:41:58 +03:00
Avi Kivity	daf028210b	build: enable -Winconsistent-missing-override warning This warning can catch a virtual function that thinks it overrides another, but doesn't, because the two functions have different signatures. This isn't very likely since most of our virtual functions override pure virtuals, but it's still worth having. Enable the warning and fix numerous violations. Closes #9347	2021-09-15 12:55:54 +03:00
Avi Kivity	705f957425	Merge "Generalize TLS creds builder configuration" from Pavel E " There are 4 places out there that do the same steps parsing "client_\|server_encryption_options" and configuring the seastar::tls::creds_builder with the values (messaging, redis, alternator and transport). Also to make redis and transport look slimmer main() cleans the client_encryption_options by ... parsing it too. This set introduces a (coroutinized) helper to configure the creds_builder with map<string, string> and removes the options beautification from main. tests: unit(dev), dtest.internode_ssl_test(dev) " * 'br-generalize-tls-creds-builder-configuration' of https://github.com/xemul/scylla: code: Generalize tls::credentials_builder configuration transport, redis: Do not assume fixed encryption options messaging: Move encryption options parsing to ms main: Open-code internode encryption misconfig warning main, config: Move options parsing helpers	2021-09-01 14:19:19 +03:00
Avi Kivity	22d2a815c9	transport: server.hh: trim unneeded cql3 includes query_processor.hh can be replaced with a forward declaration, and result-message headers, and valuees.hh is unneeded. Closes #9238	2021-08-23 18:09:22 +03:00
Pavel Emelyanov	e02b39ca3d	code: Generalize tls::credentials_builder configuration All the places in code that configure the mentioned creds builder from client_\|server_encryption_options now do it the same way. This patch generalizes it all in the utils:: helper. The alternator code "ignores" require_client_auth and truststore keys, but it's easy to make the generalized helper be compatible. Also make the new helper coroutinized from the beginning. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-20 18:05:41 +03:00
Pavel Emelyanov	35209e7500	transport, redis: Do not assume fixed encryption options On start main() brushes up the client_encryption_options option so that any user of it sees it in some "clean" state and can avoid using get_or_default() to parse. This patch removes this assumption (and the cleaning code itself). Next patch will make use of it and relax the duplicated parsing complexity back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-08-20 17:59:33 +03:00
Avi Kivity	0876248c2b	Merge "cql3: cache function calls evaluation for non-deterministic functions" from Pavel S " `function_call` AST nodes are created for each function with side effects in a CQL query, i.e. non-deterministic functions (`uuid()`, `now()` and some others timeuuid-related). These nodes are evaluated either when a query itself is executed or query restrictions are computed (e.g. partition/clustering key ranges for LWT requests). We need to cache the calls since otherwise when handling a `bounce_to_shard` request for an LWT query, we can possibly enter an infinite bouncing loop (in case a function is used to calculate partition key ranges for a query), since the results can be different each time. Furthermore, we don't support bouncing more than one time. Returning `bounce_to_shard` message more than one time will result in a crash. Caching works only for LWT statements and only for the function calls that affect partition key range computation for the query. `variable_specifications` class is renamed to `prepare_context` and generalized to record information about each `function_call` AST node and modify them, as needed: * Check whether a given function call is a part of partition key statement restriction. * Assign ids for caching if above is true and the call is a part of an LWT statement. There is no need to include any kind of statement identifier in the cache key since `query_options` (which holds the cache) is limited to a single statement, anyway. Function calls are indexed by the order in which they appear within a statement while parsing. There is no need to include any kind of statement identifier to the cache key since `query_options` (which holds the cache) is limited to a single statement, anyway. Note that `function_call::raw` AST nodes are not created for selection clauses of a SELECT statement hence they can only accept only one of the following things as parameters: * Other function calls. * Literal values. * Parameter markers. In other words, only parameters that can be immediately reduced to a byte buffer are allowed and we don't need to handle database inputs to non-pure functions separately since they are not possible in this context. Anyhow, we don't even have a single non-pure function that accepts arguments, so precautions are not needed at the moment. Add a test written in `cql-pytest` framework to verify that both prepared and unprepared lwt statements handle `bounce_to_shard` messages correctly in such scenario. Fixes: #8604 Tests: unit(dev, debug) NOTE: the patchset uses `query_options` as a container for cached values. This doesn't look clean and `service::query_state` seems to be a better place to store them. But it's not forwarded to most of the CQL code and would mean that a huge number of places would have to be amended. The series presents a trade-off to avoid forwarding `query_state` everywhere (but maybe it's the thing that needs to be done, nonetheless). " * 'lwt_bounce_to_shard_cached_fn_v6' of https://github.com/ManManson/scylla: cql-pytest: add a test for non-pure CQL functions cql3: cache function calls evaluation for non-deterministic functions cql3: rename `variable_specifications` to `prepare_context`	2021-07-30 14:21:11 +03:00
Pavel Solodovnikov	3b6adf3a62	cql3: cache function calls evaluation for non-deterministic functions And reuse these values when handling `bounce_to_shard` messages. Otherwise such a function (e.g. `uuid()`) can yield a different value when a statement re-executed on the other shard. It can lead to an infinite number of `bounce_to_shard` messages sent in case the function value is used to calculate partition key ranges for the query. Which, in turn, will cause crashes since we don't support bouncing more than one time and the second hop will result in a crash. Caching works only for LWT statements and only for the function calls that affect partition key range computation for the query. `variable_specifications` class is renamed to `prepare_context` and generalized to record information about each `function_call` AST node and modify them, as needed: * Check whether a given function call is a part of partition key statement restriction. * Assign ids for caching if above is true and the call is a part of an LWT statement. There is no need to include any kind of statement identifier in the cache key since `query_options` (which holds the cache) is limited to a single statement, anyway. Note that `function_call::raw` AST nodes are not created for selection clauses of a SELECT statement hence they can only accept only one of the following things as parameters: * Other function calls. * Literal values. * Parameter markers. In other words, only parameters that can be immediately reduced to a byte buffer are allowed and we don't need to handle database inputs to non-pure functions separately since they are not possible in this context. Anyhow, we don't even have a single non-pure function that accepts arguments, so precautions are not needed at the moment. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-07-30 01:22:39 +03:00
Pavel Emelyanov	b1bb00a95c	transport.controller: Brushup cql_server declarations The controller code sits in the cql_transport namespace and can omit its mentionings. Also the seastar::distributed<> is replaced with modern seastar::sharded<> while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:57 +03:00
Pavel Emelyanov	65b1bb8302	transport: Use local notifier to (un)subscribe server Now the controller has the lifecycle notifier reference and can stop using storage service to manage the subscription. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:48:58 +03:00
Pavel Emelyanov	5f99eeb35e	transport: Keep lifecycle notifier sharded reference It's needed to (un)subscribe server on it (next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:48:20 +03:00
Pavel Emelyanov	c7b0b25494	transport, generic_server: Remove no longer used functionality After subscription management was moved onto controller level a bunch of code can be dropped: - passing migration notifier beyond controller - event_notifier's _stopped bit - event_notifier .stop() method - event_notifier empty constructor and destrictor - generic_server's on_stop virtual method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:41:32 +03:00
Pavel Emelyanov	1acef41626	transport: (Un)Subscribe cql_server::event_notifier from controller There's a migration notifier that's carried through cql_server _just_ to let event-notifier (un)subscribe on it. Also there's a call for global storage-service in there which will need to be replaced with yet another pass-through argument which is not great. It's easier to establish this subscription outside of cql_server like it's currently done for proxy and sl-manager. In case of cql_server the "outside" is the controller. This patch just moves the subscription management from cql_server to controller, next two patches will make more use of this change. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:37:23 +03:00
Avi Kivity	9059514335	build, treewide: enable -Wpessimizing-move warning This warning prevents using std::move() where it can hurt - on an unnamed temporary or a named automatic variable being returned from a function. In both cases the value could be constructed directly in its final destination, but std::move() prevents it. Fix the handful of cases (all trivial), and enable the warning. Closes #8992	2021-07-08 17:52:34 +03:00
Pavel Emelyanov	990db016e9	transport: Untie transport and database Both controller and server only need database to get config from. Since controller creation only happens in main() code which has the config itself, we may remove database mentioning from transport. Previous attempt was not to carry the config down to the server level, but it stepped on an updateable_value landmine -- the u._v. isn't copyable cross-shard (despite the docs) and to properly initialize server's max_concurrent_requests we need the config's named_value member itself. The db::config that flies through the stack is const reference, but its named_values do not get copied along the way -- the updateable value accepts both references and const references to subscribe on. tests: start-stop in debug mode Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Message-Id: <20210607135656.18522-1-xemul@scylladb.com>	2021-06-09 20:04:12 +03:00
Pavel Solodovnikov	76bea23174	treewide: reduce header interdependencies Use forward declarations wherever possible. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com> Closes #8813	2021-06-07 15:58:35 +03:00
Piotr Sarna	fa29b79c20	transport: close connections when too large requests arrive Too large requests are currently handled by the CQL server by skipping them and sending back an error response. That's however wasteful and dangerous: bogus request sizes will force Scylla to potentially skip gigabytes of data - and skipping is done by simply reading from the socket, so it may results in gigabytes of bandwidth wasted. Even if the request size is not bogus, closing the connection forces users to adjust their request sizes, which should be done anyway. Originally, there was a bug in handling too large requests which only read their headers and then left the connection in a broken, undefined state, trying to interpret the rest of the large request as a next CQL header. It was later fixed to skip the request, but closing the connection is a safer thing to do. Fixes #8798 Closes #8800	2021-06-07 12:23:55 +03:00
Avi Kivity	e6c5a63581	Merge "Fix several issues on transport stop" from Pavel E " There's a bunch of issues with starting and stopping of cql_server with the help of cql_controller. fixes: #8796 tests: manual(start + stop, start + exception on cql_set_state() ) unit not run, they don't mess with transport controller " * 'br-transport-stop-fixes' of https://github.com/xemul/scylla: transport/controller: Stop server on state change failure too transport/controller: Rollback server start on state change failure too transport/controller: Do not leave _server uninitialized transport/controller: Rework try-catch into defers	2021-06-07 11:41:36 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Avi Kivity	100d6f4094	build: enable -Wunused-function Also drop a single violation in transport/server.cc. This helps prevent dead code from piling up. Three functions in row_cache_test that are not used in debug mode are moved near their user, and under the same ifdef, to avoid triggering the error. Closes #8767	2021-06-06 09:21:23 +03:00
Pavel Emelyanov	76947c829e	transport/controller: Stop server on state change failure too If on stop the set_cql_state() throws the local sharded<cql_server> will be left not stopped and will fail the respective assertion on its destruction. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:53:21 +03:00
Pavel Emelyanov	f6ef148c76	transport/controller: Rollback server start on state change failure too If set_cql_state() throws the cserver remains started. If this happens on start before the controller stop defer action is scheduled the destruction of controller will fain on assertion that checks the _server must be stopped. Effectively this is the fix of #8796 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:50:51 +03:00
Pavel Emelyanov	6995e41e64	transport/controller: Do not leave _server uninitialized If an exception happens after sharded<cql_server>.start() the controller's _server pointer is left pointing to stopped sharded server. This makes it impossible to start the server again (via API) since the check for if (_server) will always be true. This is the continuation of the `ae4d5a60` fix. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:48:26 +03:00
Pavel Emelyanov	12220b74e8	transport/controller: Rework try-catch into defers This is to make further patching simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-06-04 16:48:12 +03:00
Piotr Sarna	cb27ebe61d	transport: start shedding requests during potential overload This commit implements the following overload prevention heuristics: if the admission queue becomes full, a timer is armed for 50ms. If any of the ongoing requests finishes, the timer is disarmed, but if that doesn't happen, the server goes into shedding mode, which means that it reads new requests from the socket and immediately drops them until one of the ongoing requests finishes. This heuristics is not recommended for OLAP workloads, so it is applied only if the session declared itself as interactive (via service level's workload_type parameter).	2021-05-27 13:02:22 +02:00
Pavel Solodovnikov	b51b11f226	transport: remove extraneous `qos/service_level_controller` includes from headers Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 02:32:15 +03:00
Pavel Solodovnikov	fff7ef1fc2	treewide: reduce boost headers usage in scylla header files `dev-headers` target is also ensured to build successfully. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-20 01:33:18 +03:00
Piotr Sarna	6da59b8a38	transport: add updating per-service-level params Per-service-level parameters (currently timeouts) are now updated when a new connection is established. The other connections which have the changed role are currently not immediately reloaded.	2021-05-10 12:39:41 +02:00
Piotr Sarna	e257ec11c0	treewide: remove service level controller from query state ... since it's accessible through its member, client state.	2021-05-10 11:48:14 +02:00
Piotr Sarna	d1f2e8b469	treewide: propagate service level to client state ... since it's going to be used to set up per-service-level timeouts.	2021-05-10 11:48:14 +02:00
Nadav Har'El	58e275e362	cross-tree: reduce dependency on db/config.hh and database.hh Every time db/config.hh is modified (e.g., to add a new configuration option), 110 source files need to be recompiled. Many of those 110 didn't really care about configuration options, and just got the dependency accidentally by including some other header file. In this patch, I remove the include of "db/config.hh" from all header files. It is only needed in source files - and header files only need forward declarations. In some cases, source files were missing certain includes which they got incidentally from db/config.hh, so I had to add these includes explicitly. After this patch, the number of source files that get recompiled after a change to db/config.hh goes down from 110 to 45. It also means that 65 source files now compile faster because they don't include db/config.hh and whatever it included. Additionally, this patch also eliminates a few unnecessary inclusions of database.hh in other header files, which can use a forward declaration or database_fwd.hh. Some of the source files including one of those header files relied on one of the many header files brought in by database.hh, so we need to include those explicitly. In view_update_generator.hh something interesting happened - it needs database.hh because of code in the header file, but only included database_fwd.hh, and the only reason this worked was that the files including view_update_generator.hh already happened to unnecessarily include database.hh. So we fix that too. Refs #1 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Message-Id: <20210505102111.955470-1-nyh@scylladb.com>	2021-05-05 13:23:00 +03:00
Avi Kivity	daeddda7cc	treewide: remove inclusions of storage_proxy.hh from headers storage_proxy.hh is huge and includes many headers itself, so remove its inclusions from headers and re-add smaller headers where needed (and storage_proxy.hh itself in source files that need it). Ref #1.	2021-04-20 21:23:00 +03:00
Pekka Enberg	16f262b852	transport, redis: Use generic server::listen() Let's pull up cql_server listen() to generic_server::server base class and convert redis_server to use it.	2021-04-13 14:13:24 +03:00
Pekka Enberg	6c619e4462	transport/server: Remove "redis_server" prefix from logging The logger itself has the name "redis_server" that appears in the logs.	2021-04-13 13:57:22 +03:00
Pekka Enberg	7ef3c60864	transport/server: Remove "cql_server" prefix from logging The logger itself has the name "cql_server" that appears in the logs.	2021-04-13 13:57:22 +03:00
Pekka Enberg	ac90a8ea50	transport, redis: Use generic server::do_accepts() The cql_server and redis_server share the same ancestor of do_accepts(). Let's pull up the cql_server version of do_accept() (that has more functionality) to generic_server::server and use it in the redis_server too.	2021-04-13 13:57:21 +03:00
Pekka Enberg	3689db26fc	transport, redis: Use generic server::process() Pull up the cql_server process() to base class and convert redis_server to use it. Please note that this fixes EPIPE and connection reset issue in the Redis server, which was fixed in the CQL server in commit `1a8630e6a` ("transport: silence "broken pipe" and "connection reset by peer" errors").	2021-04-13 13:56:45 +03:00
Pekka Enberg	66d6899727	transport: Move CQL specific error handling to handle_error() This moves the CQL specific error handling to handle_error() to make process() more generic in preparation for move to generic_server.	2021-04-13 13:56:45 +03:00

1 2 3 4 5 ...

482 Commits