scylladb

Author	SHA1	Message	Date
Nadav Har'El	ce0ee27422	generic_server: use utils::scoped_item_list A previous patch introduced utils::scoped_item_list, which maintains a list of items - such as a list of ongoing connections - automatically removing the item from the list when its handle is destroyed. The list can also be iterated "gently" (without risking stalls when the list is long). The implementation of this class was based on very similar code in generic_server.hh / generic_server.cc. So in this patch we change generic_server use the new scoped_item_list, and drop its own copy of the duplicated logic of maintaining the list and iterating gently over it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-08-01 02:32:14 +03:00
Sergey Zolotukhin	ea311be12b	generic_server: Two-step connection shutdown. When shutting down in `generic_server`, connections are now closed in two steps. First, only the RX (receive) side is shut down. Then, after all ongoing requests are completed, or a timeout happened the connections are fully closed. Fixes scylladb/scylladb#24481	2025-07-28 10:08:06 +02:00
Sergey Zolotukhin	7334bf36a4	transport: consmetic change, remove extra blanks.	2025-07-28 10:08:06 +02:00
Sergey Zolotukhin	27b3d5b415	generic_server: replace empty destructor with `= default` This change improves code readability by explicitly marking the destructor as defaulted.	2025-07-28 10:08:05 +02:00
Sergey Zolotukhin	3848d10a8d	generic_server: add `shutdown_input` and `shutdown_output` functions to `connection` class. The functions are just wrappers for _fd.shutdown_input() and _fd.shutdown_output(), with added error reporting. Needed by later changes.	2025-07-28 10:08:05 +02:00
Marcin Maliszkiewicz	c6a25b9140	generic_server: fix connections semaphore config observer When temporary value returned by observer() is destructed it disconnects from updateable_value so the code immediately stops observing. To fix it we need to retain the observer in the class object.	2025-06-23 17:54:01 +02:00
Marcin Maliszkiewicz	1eb580973c	generic_server: make shutdown() return void It's always immediately ready so no need to return future<>.	2025-05-27 19:31:09 +02:00
Marcin Maliszkiewicz	f7e5adaca3	transport: generic_server: remove no longer used connection advertising code	2025-05-27 19:31:09 +02:00
Tomasz Grabiec	fadfbe8459	Merge 'transport: storage_proxy: release ERM when waiting for query timeout' from Andrzej Jackowski Before this change, if a read executor had just enough targets to achieve query's CL, and there was a connection drop (e.g. node failure), the read executor waited for the entire request timeout to give drivers time to execute a speculative read in a meantime. Such behavior don't work well when a very long query timeout (e.g. 1800s) is set, because the unfinished request blocks topology changes. This change implements a mechanism to thrown a new read_failure_exception_with_timeout in the aforementioned scenario. The exception is caught by CQL server which conducts the waiting, after ERM is released. The new exception inherits from read_failure_exception, because layers that don't catch the exception (such as mapreduce service) should handle the exception just a regular read_failure. However, when CQL server catch the exception, it returns read_timeout_exception to the client because after additional waiting such an error message is more appropriate (read_timeout_exception was also returned before this change was introduced). This change: - Rewrite cql_server::connection::process_request_one to use seastar::futurize_invoke and try_catch<> instead of utils::result_try - Add new read_failure_exception_with_timeout and throws it in storage_proxy - Add sleep in CQL server when the new exception is caught - Catch local exceptions in Mapreduce Service and convert them to std::runtime_error. - Add get_cql_exclusive to manager_client.py - Add test_long_query_timeout_erm No backport needed - minor issue fix. Closes scylladb/scylladb#23156 * github.com:scylladb/scylladb: test: add test_long_query_timeout_erm test: add get_cql_exclusive to manager_client.py mapreduce: catch local read_failure_exception_with_timeout transport: storage_proxy: release ERM when waiting for query timeout transport: remove redundant references in process_request_one transport: fix the indentation in process_request_one transport: add futures in CQL server exception handling	2025-05-08 12:45:49 +02:00
Andrzej Jackowski	1fca994c7b	transport: storage_proxy: release ERM when waiting for query timeout Before this change, if a read executor had just enough targets to achieve query's CL, and there was a connection drop (e.g. node failure), the read executor waited for the entire request timeout to give drivers time to execute a speculative read in a meantime. Such behavior don't work well when a very long query timeout (e.g. 1800s) is set, because the unfinished request blocks topology changes. This change implements a mechanism to thrown a new read_failure_exception_with_timeout in the aforementioned scenario. The exception is caught by CQL server which conducts the waiting, after ERM is released. The new exception inherits from read_failure_exception, because layers that don't catch the exception (such as mapreduce service) should handle the exception just a regular read_failure. However, when CQL server catch the exception, it returns read_timeout_exception to the client because after additional waiting such an error message is more appropriate (read_timeout_exception was also returned before this change was introduced). This change: - Add new read_failure_exception_with_timeout exception - Add throw of read_failure_exception_with_timeout in storage_proxy - Add abort_source to CQL server, as well as to_stop() method for the correct abort handling - Add sleep in CQL server when the new exception is caught Refs #21831	2025-04-23 09:29:47 +02:00
Pavel Emelyanov	8b2cababb6	generic_server: Don't mess with db::config The db::config is top-level configuration of scylla, we generally try to avoid using it even in scylla components: each uses its own config initialized by the service creator out of the db::config itself. The generic_server is not an exception, all the more so, it already has its own config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23705	2025-04-16 17:02:30 +03:00
Benny Halevy	bc69bc3de7	generic_server: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:28:48 +03:00
Marcin Maliszkiewicz	ce18909688	transport: move on_connection_close into connection destructor To make the code more robust by ensuring closing code is always executed.	2025-04-09 13:50:19 +02:00
Marcin Maliszkiewicz	599f4d312b	transport: add blocked and shed connection metrics This adds some visibility into connection storm mitigations added in following commits.	2025-04-09 10:49:18 +02:00
Marcin Maliszkiewicz	26518704ab	generic_server: throttle and shed incoming connections according to semaphore limit If we have uninitialized_connections_semaphore_cpu_concurrency (default 2) connections being processed we start delay accepting new connections. Connections which are in network IO state are not counted towards this limit and they can go to cpu phase without blocking. So it can happen that we process more concurrent new connections but that's a necessary tradeof to make progress during storm without implementing more advanced machinery (i.e. priority queue).	2025-04-09 10:48:51 +02:00
Marcin Maliszkiewicz	ed82bede39	generic_server: add semaphore for limiting new connections concurrency It will be used in following commits.	2025-04-09 10:30:58 +02:00
Marcin Maliszkiewicz	33122d3f93	generic_server: add config to the constructor	2025-04-09 10:30:58 +02:00
Marcin Maliszkiewicz	474e84199c	generic_server: add on_connection_ready handler This patch cleans the code a bit so that ready state is set in a single place. And adds handler which will allow adding logic when connection is made ready, this will be added in the following commits.	2025-04-09 10:30:58 +02:00
Calle Wilund	c59c87c233	generic_server: Allow sharing reloadability of certificates across shards Adds an optional callback to "listen", returning the shard local object instance. If provided, instead of creating a "full" reloadable cerificate object, only do so on shard 0, and use callback to reload other shards "manually".	2025-01-27 16:16:23 +00:00
Piotr Dulikowski	6d90a933cd	transport/server: use scheduling group assigned to current user Now, when the user logs in and the connection becomes authenticated, the processing loop of the connection is switched to the scheduling group that corresponds to the service level assigned to the logged in user. The scheduling group is also updated when the service level assigned to this user changes. Starting from this commit, the scheduling groups managed by the service level controller are actually being used by user workload.	2025-01-02 07:13:34 +01:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Michał Jadwiszczak	fe67efda5b	Revert "generic_server: use async function in `for_each_gently()`" This reverts commit `324b3c43c0`. It isn't safe to do asynchronous calls in `for_each_gently`, as the connection may be disconnected while a call in callback preempts. Fixes scylladb/scylla#21801	2024-12-05 13:32:47 +01:00
Kefu Chai	6ead5a4696	treewide: move log.hh into utils/log.hh the log.hh under the root of the tree was created keep the backward compatibility when seastar was extracted into a separate library. so log.hh should belong to `utils` directory, as it is based solely on seastar, and can be used all subsystems. in this change, we move log.hh into utils/log.hh to that it is more modularized. and this also improves the readability, when one see `#include "utils/log.hh"`, it is obvious that this source file needs the logging system, instead of its own log facility -- please note, we do have two other `log.hh` in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 06:54:46 +03:00
Laszlo Ersek	5a04743663	generic_server: convert connection tracking to seastar::gate If we call server::stop() right after "server" construction, it hangs: With the server never listening (never accepting connections and never serving connections), nothing ever calls server::maybe_stop(). Consequently, co_await _all_connections_stopped.get_future(); at the end of server::stop() deadlocks. Such a server::stop() call does occur in controller::do_start_server() [transport/controller.cc], when - cserver->start() (sharded<cql_server>::start()) constructs a "server"-derived object, - start_listening_on_tcp_sockets() throws an exception before reaching listen_on_all_shards() (for example because it fails to set up client encryption -- certificate file is inaccessible etc.), - the "deferred_action" cserver->stop().get(); is invoked during cleanup. (The cserver->stop() call exposing the connection tracking problem dates back to commit `ae4d5a60ca` ("transport::controller: Shut down distributed object on startup exception", 2020-11-25), and it's been triggerable through the above code path since commit `6b178f9a4a` ("transport/controller: split configuring sockets into separate functions", 2024-02-05).) Tracking live connections and connection acceptances seems like a good fit for "seastar::gate", so rewrite the tracking with that. "seastar::gate" can be closed (and the returned future can be waited for) without anyone ever having entered the gate. NOTE: this change makes it quite clear that neither server::stop() nor server::shutdown() must be called multiple times. The permitted sequences are: - server::shutdown() + server::stop() - or just server::stop(). Fixes #10305 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Michał Jadwiszczak	324b3c43c0	generic_server: use async function in `for_each_gently()` In the following patch, we will add a method to update service levels parameters for each cql connections. To support this, this patch allows to pass async function as a parameter to `for_each_gently()` method.	2024-08-08 10:42:09 +02:00
Mikołaj Grzebieluch	4cecda7ead	transport/controller: pass unix_domain_socket_permissions to generic_server::listen	2024-02-05 14:22:03 +01:00
Michał Jadwiszczak	0083ddd7a0	generic_server: use mutable reference in `for_each_gently` Make `generic_server::gentle_iterator` a mutable iterator to allow `for_each_gently` to make changes to the connections. Fixes: #16035 Closes scylladb/scylladb#16036	2023-11-14 14:25:22 +02:00
Pavel Emelyanov	4682c7f9a5	generic_server: Introduce shutdown() The method waits for listening sockets to stop listening and aborts the connected sockets, but doesn't wait for the established connections to finish processing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-11 17:37:48 +03:00
Pavel Emelyanov	6dcf653995	generic_server: Decouple server stopped from connection stopped The _stopped future resolves when all "sockets" stop -- listening and connected ones. Furure patching will need to wait for listening sockets to stop separately from connected ones. Rename the `_stopped` to reflect what it is now while at it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-11 17:32:07 +03:00
Piotr Grabowski	63fa5ac915	generic_server.hh: add missing include Add missing include of "<list>" which caused compile errors on GCC: In file included from generic_server.cc:9: generic_server.hh:91:10: error: ‘list’ in namespace ‘std’ does not name a template type 91 \| std::list<gentle_iterator> _gentle_iterators; \| ^~~~ generic_server.hh:19:1: note: ‘std::list’ is defined in header ‘<list>’; did you forget to ‘#include <list>’? 18 \| #include <seastar/net/tls.hh> +++ \|+#include <list> 19 \| Note that there are some GCC compilation problems still left apart from this one. Closes #10328	2022-04-04 17:31:55 +03:00
Pavel Emelyanov	f035313b16	generic_server: Gentle iterator Add the ability to iterate over the list of connections in a "gentle" manner, i.e. -- preempting the loop when required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 14:25:08 +03:00
Pavel Emelyanov	661c12066b	generic_server: Type alias For simpler future patching Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-02-18 14:25:07 +03:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Pavel Emelyanov	c7b0b25494	transport, generic_server: Remove no longer used functionality After subscription management was moved onto controller level a bunch of code can be dropped: - passing migration notifier beyond controller - event_notifier's _stopped bit - event_notifier .stop() method - event_notifier empty constructor and destrictor - generic_server's on_stop virtual method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:41:32 +03:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Pekka Enberg	2b6438c044	generic_server: Rename "maybe_idle" to "maybe_stop"	2021-04-13 14:13:24 +03:00
Pekka Enberg	66276d6636	generic_server: API documentation for connection and server classes	2021-04-13 14:13:24 +03:00
Pekka Enberg	16f262b852	transport, redis: Use generic server::listen() Let's pull up cql_server listen() to generic_server::server base class and convert redis_server to use it.	2021-04-13 14:13:24 +03:00
Pekka Enberg	6c619e4462	transport/server: Remove "redis_server" prefix from logging The logger itself has the name "redis_server" that appears in the logs.	2021-04-13 13:57:22 +03:00
Pekka Enberg	ac90a8ea50	transport, redis: Use generic server::do_accepts() The cql_server and redis_server share the same ancestor of do_accepts(). Let's pull up the cql_server version of do_accept() (that has more functionality) to generic_server::server and use it in the redis_server too.	2021-04-13 13:57:21 +03:00
Pekka Enberg	3689db26fc	transport, redis: Use generic server::process() Pull up the cql_server process() to base class and convert redis_server to use it. Please note that this fixes EPIPE and connection reset issue in the Redis server, which was fixed in the CQL server in commit `1a8630e6a` ("transport: silence "broken pipe" and "connection reset by peer" errors").	2021-04-13 13:56:45 +03:00
Pekka Enberg	ab339cfaf7	transport, redis: Move connection tracking to generic_server::server class The cql_server and redis_server classes have identical connection tracking code. Pull it up to the generic_server::server base class.	2021-04-13 13:56:45 +03:00
Pekka Enberg	deac5b1810	transport, redis: Move _stopped and _connections_list to generic_server::server class The cql_server and redis_server both have the same "_stopped" and "_connections_list" member variables. Pull them up to the generic_server::server base class.	2021-04-13 13:56:45 +03:00
Pekka Enberg	1af73bec7b	transport, redis: Move total_connections to generic_server::server class Both cql_server and redis_server have the same "total_connections" member variable so pull that up to the generic_server::server base class.	2021-04-13 13:56:45 +03:00
Pekka Enberg	7b46c2da53	transport, redis: Use generic server::maybe_idle() The cql_server and redis_server classes have a maybe_idle() method, which sets the _all_connections_stopped promise if server wants to stop and can be stopped. Pull up the duplicated code to generic_server::server class.	2021-04-13 13:56:45 +03:00
Pekka Enberg	4664a55e05	transport, redis: Move list_base_hook<> inheritance to generic_server::connection Both cql_server::connection and redis_server::connection inherit boost::intrusive::list_base_hook<>, so let's pull up that to the generic_server::connection class that both inherit.	2021-04-13 13:56:45 +03:00
Pekka Enberg	19507bb7ea	transport, redis: Use generic connection::shutdown() This patch moves the duplicated connection::shutdown() method to to a new generic_server::connection base class that is now inherited by cql_server and redis_server.	2021-04-13 13:56:44 +03:00

47 Commits