Commit Graph

47 Commits

Author SHA1 Message Date
Nadav Har'El
ce0ee27422 generic_server: use utils::scoped_item_list
A previous patch introduced utils::scoped_item_list, which maintains
a list of items - such as a list of ongoing connections - automatically
removing the item from the list when its handle is destroyed. The list
can also be iterated "gently" (without risking stalls when the list is
long).

The implementation of this class was based on very similar code in
generic_server.hh / generic_server.cc. So in this patch we change
generic_server use the new scoped_item_list, and drop its own copy
of the duplicated logic of maintaining the list and iterating gently
over it.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-01 02:32:14 +03:00
Sergey Zolotukhin
ea311be12b generic_server: Two-step connection shutdown.
When shutting down in `generic_server`, connections are now closed in two steps.
First, only the RX (receive) side is shut down. Then, after all ongoing requests
are completed, or a timeout happened the connections are fully closed.

Fixes scylladb/scylladb#24481
2025-07-28 10:08:06 +02:00
Sergey Zolotukhin
7334bf36a4 transport: consmetic change, remove extra blanks. 2025-07-28 10:08:06 +02:00
Sergey Zolotukhin
27b3d5b415 generic_server: replace empty destructor with = default
This change improves code readability by explicitly marking the destructor as defaulted.
2025-07-28 10:08:05 +02:00
Sergey Zolotukhin
3848d10a8d generic_server: add shutdown_input and shutdown_output functions to
`connection` class.

The functions are just wrappers for  _fd.shutdown_input() and  _fd.shutdown_output(), with added error reporting.
Needed by later changes.
2025-07-28 10:08:05 +02:00
Marcin Maliszkiewicz
c6a25b9140 generic_server: fix connections semaphore config observer
When temporary value returned by observer() is destructed it
disconnects from updateable_value so the code immediately stops
observing.

To fix it we need to retain the observer in the class object.
2025-06-23 17:54:01 +02:00
Marcin Maliszkiewicz
1eb580973c generic_server: make shutdown() return void
It's always immediately ready so no need to
return future<>.
2025-05-27 19:31:09 +02:00
Marcin Maliszkiewicz
f7e5adaca3 transport: generic_server: remove no longer used connection advertising code 2025-05-27 19:31:09 +02:00
Tomasz Grabiec
fadfbe8459 Merge 'transport: storage_proxy: release ERM when waiting for query timeout' from Andrzej Jackowski
Before this change, if a read executor had just enough targets to
achieve query's CL, and there was a connection drop (e.g. node failure),
the read executor waited for the entire request timeout to give drivers
time to execute a speculative read in a meantime. Such behavior don't
work well when a very long query timeout (e.g. 1800s) is set, because
the unfinished request blocks topology changes.

This change implements a mechanism to thrown a new
read_failure_exception_with_timeout in the aforementioned scenario.
The exception is caught by CQL server which conducts the waiting, after
ERM is released. The new exception inherits from read_failure_exception,
because layers that don't catch the exception (such as mapreduce
service) should handle the exception just a regular read_failure.
However, when CQL server catch the exception, it returns
read_timeout_exception to the client because after additional waiting
such an error message is more appropriate (read_timeout_exception was
also returned before this change was introduced).

This change:
- Rewrite cql_server::connection::process_request_one to use
  seastar::futurize_invoke and try_catch<> instead of utils::result_try
- Add new read_failure_exception_with_timeout and throws it in storage_proxy
- Add sleep in CQL server when the new exception is caught
- Catch local exceptions in Mapreduce Service and convert them
   to std::runtime_error.
- Add get_cql_exclusive to manager_client.py
- Add test_long_query_timeout_erm

No backport needed - minor issue fix.

Closes scylladb/scylladb#23156

* github.com:scylladb/scylladb:
  test: add test_long_query_timeout_erm
  test: add get_cql_exclusive to manager_client.py
  mapreduce: catch local read_failure_exception_with_timeout
  transport: storage_proxy: release ERM when waiting for query timeout
  transport: remove redundant references in process_request_one
  transport: fix the indentation in process_request_one
  transport: add futures in CQL server exception handling
2025-05-08 12:45:49 +02:00
Andrzej Jackowski
1fca994c7b transport: storage_proxy: release ERM when waiting for query timeout
Before this change, if a read executor had just enough targets to
achieve query's CL, and there was a connection drop (e.g. node failure),
the read executor waited for the entire request timeout to give drivers
time to execute a speculative read in a meantime. Such behavior don't
work well when a very long query timeout (e.g. 1800s) is set, because
the unfinished request blocks topology changes.

This change implements a mechanism to thrown a new
read_failure_exception_with_timeout in the aforementioned scenario.
The exception is caught by CQL server which conducts the waiting, after
ERM is released. The new exception inherits from read_failure_exception,
because layers that don't catch the exception (such as mapreduce
service) should handle the exception just a regular read_failure.
However, when CQL server catch the exception, it returns
read_timeout_exception to the client because after additional waiting
such an error message is more appropriate (read_timeout_exception was
also returned before this change was introduced).

This change:
 - Add new read_failure_exception_with_timeout exception
 - Add throw of read_failure_exception_with_timeout in storage_proxy
 - Add abort_source to CQL server, as well as to_stop() method for
   the correct abort handling
 - Add sleep in CQL server when the new exception is caught

Refs #21831
2025-04-23 09:29:47 +02:00
Pavel Emelyanov
8b2cababb6 generic_server: Don't mess with db::config
The db::config is top-level configuration of scylla, we generally try to
avoid using it even in scylla components: each uses its own config
initialized by the service creator out of the db::config itself. The
generic_server is not an exception, all the more so, it already has its
own config.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23705
2025-04-16 17:02:30 +03:00
Benny Halevy
bc69bc3de7 generic_server: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:48 +03:00
Marcin Maliszkiewicz
ce18909688 transport: move on_connection_close into connection destructor
To make the code more robust by ensuring closing code is always executed.
2025-04-09 13:50:19 +02:00
Marcin Maliszkiewicz
599f4d312b transport: add blocked and shed connection metrics
This adds some visibility into connection storm mitigations
added in following commits.
2025-04-09 10:49:18 +02:00
Marcin Maliszkiewicz
26518704ab generic_server: throttle and shed incoming connections according to semaphore limit
If we have uninitialized_connections_semaphore_cpu_concurrency (default
2) connections being processed we start delay accepting new connections.

Connections which are in network IO state are not counted towards this
limit and they can go to cpu phase without blocking. So it can happen
that we process more concurrent new connections but that's a necessary
tradeof to make progress during storm without implementing more advanced
machinery (i.e. priority queue).
2025-04-09 10:48:51 +02:00
Marcin Maliszkiewicz
ed82bede39 generic_server: add semaphore for limiting new connections concurrency
It will be used in following commits.
2025-04-09 10:30:58 +02:00
Marcin Maliszkiewicz
33122d3f93 generic_server: add config to the constructor 2025-04-09 10:30:58 +02:00
Marcin Maliszkiewicz
474e84199c generic_server: add on_connection_ready handler
This patch cleans the code a bit so that ready state is set in a single place.
And adds handler which will allow adding logic when connection is made
ready, this will be added in the following commits.
2025-04-09 10:30:58 +02:00
Calle Wilund
c59c87c233 generic_server: Allow sharing reloadability of certificates across shards
Adds an optional callback to "listen", returning the shard local object
instance. If provided, instead of creating a "full" reloadable cerificate
object, only do so on shard 0, and use callback to reload other shards
"manually".
2025-01-27 16:16:23 +00:00
Piotr Dulikowski
6d90a933cd transport/server: use scheduling group assigned to current user
Now, when the user logs in and the connection becomes authenticated, the
processing loop of the connection is switched to the scheduling group
that corresponds to the service level assigned to the logged in user.
The scheduling group is also updated when the service level assigned to
this user changes.

Starting from this commit, the scheduling groups managed by the service
level controller are actually being used by user workload.
2025-01-02 07:13:34 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Michał Jadwiszczak
fe67efda5b Revert "generic_server: use async function in for_each_gently()"
This reverts commit 324b3c43c0.

It isn't safe to do asynchronous calls in `for_each_gently`, as the
connection may be disconnected while a call in callback preempts.

Fixes scylladb/scylla#21801
2024-12-05 13:32:47 +01:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Laszlo Ersek
5a04743663 generic_server: convert connection tracking to seastar::gate
If we call server::stop() right after "server" construction, it hangs:

With the server never listening (never accepting connections and never
serving connections), nothing ever calls server::maybe_stop().
Consequently,

    co_await _all_connections_stopped.get_future();

at the end of server::stop() deadlocks.

Such a server::stop() call does occur in controller::do_start_server()
[transport/controller.cc], when

- cserver->start() (sharded<cql_server>::start()) constructs a
  "server"-derived object,

- start_listening_on_tcp_sockets() throws an exception before reaching
  listen_on_all_shards() (for example because it fails to set up client
  encryption -- certificate file is inaccessible etc.),

- the "deferred_action"

      cserver->stop().get();

  is invoked during cleanup.

(The cserver->stop() call exposing the connection tracking problem dates
back to commit ae4d5a60ca ("transport::controller: Shut down distributed
object on startup exception", 2020-11-25), and it's been triggerable
through the above code path since commit 6b178f9a4a
("transport/controller: split configuring sockets into separate
functions", 2024-02-05).)

Tracking live connections and connection acceptances seems like a good fit
for "seastar::gate", so rewrite the tracking with that. "seastar::gate"
can be closed (and the returned future can be waited for) without anyone
ever having entered the gate.

NOTE: this change makes it quite clear that neither server::stop() nor
server::shutdown() must be called multiple times. The permitted sequences
are:

- server::shutdown() + server::stop()

- or just server::stop().

Fixes #10305

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-28 10:59:44 +02:00
Michał Jadwiszczak
324b3c43c0 generic_server: use async function in for_each_gently()
In the following patch, we will add a method to update service levels
parameters for each cql connections.
To support this, this patch allows to pass async function as a parameter
to `for_each_gently()` method.
2024-08-08 10:42:09 +02:00
Mikołaj Grzebieluch
4cecda7ead transport/controller: pass unix_domain_socket_permissions to generic_server::listen 2024-02-05 14:22:03 +01:00
Michał Jadwiszczak
0083ddd7a0 generic_server: use mutable reference in for_each_gently
Make `generic_server::gentle_iterator` a mutable iterator to allow
`for_each_gently` to make changes to the connections.

Fixes: #16035

Closes scylladb/scylladb#16036
2023-11-14 14:25:22 +02:00
Pavel Emelyanov
4682c7f9a5 generic_server: Introduce shutdown()
The method waits for listening sockets to stop listening and aborts the
connected sockets, but doesn't wait for the established connections to
finish processing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
6dcf653995 generic_server: Decouple server stopped from connection stopped
The _stopped future resolves when all "sockets" stop -- listening and
connected ones. Furure patching will need to wait for listening sockets
to stop separately from connected ones.

Rename the `_stopped` to reflect what it is now while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Piotr Grabowski
63fa5ac915 generic_server.hh: add missing include
Add missing include of "<list>" which caused compile errors on GCC:

In file included from generic_server.cc:9:
generic_server.hh:91:10: error: ‘list’ in namespace ‘std’ does not name a template type
   91 |     std::list<gentle_iterator> _gentle_iterators;
      |          ^~~~
generic_server.hh:19:1: note: ‘std::list’ is defined in header ‘<list>’; did you forget to ‘#include <list>’?
   18 | #include <seastar/net/tls.hh>
  +++ |+#include <list>
   19 |

Note that there are some GCC compilation problems still left apart from
this one.

Closes #10328
2022-04-04 17:31:55 +03:00
Pavel Emelyanov
f035313b16 generic_server: Gentle iterator
Add the ability to iterate over the list of connections in a "gentle"
manner, i.e. -- preempting the loop when required.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-18 14:25:08 +03:00
Pavel Emelyanov
661c12066b generic_server: Type alias
For simpler future patching

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-18 14:25:07 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Pavel Emelyanov
c7b0b25494 transport, generic_server: Remove no longer used functionality
After subscription management was moved onto controller level
a bunch of code can be dropped:

- passing migration notifier beyond controller
- event_notifier's _stopped bit
- event_notifier .stop() method
- event_notifier empty constructor and destrictor
- generic_server's on_stop virtual method

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-22 18:41:32 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pekka Enberg
2b6438c044 generic_server: Rename "maybe_idle" to "maybe_stop" 2021-04-13 14:13:24 +03:00
Pekka Enberg
66276d6636 generic_server: API documentation for connection and server classes 2021-04-13 14:13:24 +03:00
Pekka Enberg
16f262b852 transport, redis: Use generic server::listen()
Let's pull up cql_server listen() to generic_server::server base class and
convert redis_server to use it.
2021-04-13 14:13:24 +03:00
Pekka Enberg
6c619e4462 transport/server: Remove "redis_server" prefix from logging
The logger itself has the name "redis_server" that appears in the logs.
2021-04-13 13:57:22 +03:00
Pekka Enberg
ac90a8ea50 transport, redis: Use generic server::do_accepts()
The cql_server and redis_server share the same ancestor of do_accepts().
Let's pull up the cql_server version of do_accept() (that has more
functionality) to generic_server::server and use it in the redis_server
too.
2021-04-13 13:57:21 +03:00
Pekka Enberg
3689db26fc transport, redis: Use generic server::process()
Pull up the cql_server process() to base class and convert redis_server
to use it.

Please note that this fixes EPIPE and connection reset issue in the
Redis server, which was fixed in the CQL server in commit 1a8630e6a
("transport: silence "broken pipe" and "connection reset by peer"
errors").
2021-04-13 13:56:45 +03:00
Pekka Enberg
ab339cfaf7 transport, redis: Move connection tracking to generic_server::server class
The cql_server and redis_server classes have identical connection
tracking code. Pull it up to the generic_server::server base class.
2021-04-13 13:56:45 +03:00
Pekka Enberg
deac5b1810 transport, redis: Move _stopped and _connections_list to generic_server::server class
The cql_server and redis_server both have the same "_stopped" and
"_connections_list" member variables. Pull them up to the
generic_server::server base class.
2021-04-13 13:56:45 +03:00
Pekka Enberg
1af73bec7b transport, redis: Move total_connections to generic_server::server class
Both cql_server and redis_server have the same "total_connections"
member variable so pull that up to the generic_server::server base
class.
2021-04-13 13:56:45 +03:00
Pekka Enberg
7b46c2da53 transport, redis: Use generic server::maybe_idle()
The cql_server and redis_server classes have a maybe_idle() method,
which sets the _all_connections_stopped promise if server wants to stop
and can be stopped. Pull up the duplicated code to
generic_server::server class.
2021-04-13 13:56:45 +03:00
Pekka Enberg
4664a55e05 transport, redis: Move list_base_hook<> inheritance to generic_server::connection
Both cql_server::connection and redis_server::connection inherit
boost::intrusive::list_base_hook<>, so let's pull up that to the
generic_server::connection class that both inherit.
2021-04-13 13:56:45 +03:00
Pekka Enberg
19507bb7ea transport, redis: Use generic connection::shutdown()
This patch moves the duplicated connection::shutdown() method to to a
new generic_server::connection base class that is now inherited by
cql_server and redis_server.
2021-04-13 13:56:44 +03:00