Commit Graph

18 Commits

Author SHA1 Message Date
Nadav Har'El
7f89c8b3e3 alternator: clean error shutdown in case of TLS misconfigration
The way our boot-time service "controllers" are written, if a
controller's start_server() finds an error and throws, it cannot
the caller (main.cc) to call stop_server(), and must clean up
resources already created (e.g., sharded services) before returning
or risk crashes on assertion failures.

This patch fixes such a mistake in Alternator's initialization.
As noted in issue #10025, if the Alternator TLS configuration is
broken - especially the certificate or key files are missing -
Scylla would crash on an assertion failure, instead of reporting
the error as expected. Before this patch such a misconfiguration
will result in the unintelligible:

<alternator::server>::~sharded() [Service = alternator::server]:
Assertion `_instances.empty()' failed. Aborting on shard 0.

After this patch we get the right error message:

ERROR 2022-03-21 15:25:07,553 [shard 0] init - Startup failed:
std::_Nested_exception<std::runtime_error> (Failed to set up Alternator
TLS credentials): std::_Nested_exception<std::runtime_error> (Could not
read certificate file conf/scylla.crt): std::filesystem::__cxx11::
filesystem_error (error system:2, filesystem error: open failed:
No such file or directory [conf/scylla.crt])

Arguably this error message is a bit ugly, so I opened
https://github.com/scylladb/seastar/issues/1029, but at least it says
exactly what the error is.

Fixes #10025

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220321133323.3150939-1-nyh@scylladb.com>
2022-03-28 15:26:42 +03:00
Nadav Har'El
79776ff2ff alternator: fix error handling during Alternator startup
A recent restructuring of the startup of Alternator (and also other
protocol servers) led to incorrect error-handling behavior during
startup: If an error was detected on one of the shards of the sharded
service (in alternator/server.cc), the sharded service itself was never
stopped (in alternator/controller.cc), leading to an assertion failure
instead of the desired error message.

A common example of this problem is when the requested port for the
server was already taken (this was issue #9914).

So in this patch, exception handling is removed from server.cc - the
exception will propegate to the code in controller.cc, which will
properly stop the server (including the sharded services) before
returning.

Fixes #9914.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220130131709.1166716-1-nyh@scylladb.com>
2022-02-02 10:35:57 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Botond Dénes
a51529dd15 protocol_servers: strengthen guarantees of listen_addresses()
In early versions of the series which proposed protocol servers, the
interface had two methods answering pretty much the same question of
whether the server is running or not:
* listen_addresses(): empty list -> server not running
* is_server_running()

To reduce redundancy and to avoid possible inconsistencies between the
two methods, `is_server_running()` was scrapped, but re-added by a
follow-up patch because `listen_addresses()` proved to be unreliable as
a source for whether the server is running or not.
This patch restores the previous state of having only
`listen_addresses()` with two additional changes:
* rephrase the comment on `listen_addresses()` to make it clear that
  implementations must return empty list when the server is not running;
* those implementations that have a reliable source of whether the
  server is running or not, use it to force-return an empty list when
  the server is not running

Tests: dtest(nodetool_additional_test.py)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211117062539.16932-1-bdenes@scylladb.com>
2021-11-19 11:09:09 +03:00
Benny Halevy
9d4262e264 protocol_server: add per-protocol is_server_running method
Change b0a2a9771f broke
the generic api implementation of
is_native_transport_running that relied on
the addresses list being empty agter the server is stopped.

To fix that, this change introduces a pure virtual method:
protocol_server::is_server_running that can be implemented
by each derived class.

Test: unit(dev)
DTest: nodetool_additional_test.py:TestNodetool.binary_test

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211114135248.588798-1-bhalevy@scylladb.com>
2021-11-14 16:01:31 +02:00
Avi Kivity
b0a2a9771f Merge "Sanitize hostnames resolving on start" from Pavel E
"
On start scylla resolves several hostnames into addresses. Different
places use different hostname selection logic, e.g. the API address
can be the listen one if the dedicated option not set. Failure to
resolve a hostname is reported with an exception that (sometimes)
contains the hostname, but it doesn't look very convenient -- better
to know the config option name. Also resolving of different hostnames
has different decoration around, e.g. prometheus carries a main-local
lambda just to nicely wrap the try/catch block.

This set unifies this zoo and makes main() shorter and less hairy:

1. All failures to resolve a hostname are reported with an
   exception containing the relevant config option

2. The || operator for named_value's is introduced to make
   the option selection look as short as

     resolve(cfg->some_address() || cfg->another_address())

3. All sanity checks are explicit and happen early in main

4. No dangling local variables carrying the cfg->...() value

5. Use resolved IP when logging a "... is listening on ..."
   message after a service start

tests: unit(dev)
"

* 'br-ip-resolve-on-start' of https://github.com/xemul/scylla:
  main: Move fb-utilities initialization up the main
  code: Use utils::resolve instead of inet_address::lookup
  main: Remove unused variable
  main: Sanitize resolving of listen address
  main: Sanitize resolving of broadcast address
  main: Sanitize resolving of broadcast RPC address
  main: Sanitize resolving of API address
  main: Sanitize resolving of prometheus address
  utils: Introduce || operator for named_values
  db.config: Verbose address resolver helper
  main: Remove api-port and prometheus-port variables
  alternator: Resolve address with the help of inet_address
  redis, thrift: Remove unused captures
2021-11-09 09:15:40 +02:00
Pavel Emelyanov
2f9c21644b code: Use utils::resolve instead of inet_address::lookup
There are some users of the latter call left. They all suffer
from the same problem -- the lack of verbosity on resolving
errors.

While at it also get rid of useless local variables that are
only there to carry the cfg->...() option over.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-08 17:33:27 +03:00
Pavel Emelyanov
acb7068ab5 alternator: Resolve address with the help of inet_address
Alternator needs to lookup its address without preferring ipv4
or ipv6. To do it calls seastar method, but the same effect is
achieved by calling inet_address::lookup.

This change makes all places in scylla resolve addresses in a
similar way, makes this code line shorter and removes the need
to specifically explain the alternator hunks from next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-11-08 17:04:07 +03:00
Botond Dénes
8ddfdd8aa9 alternator: controller: implement the protocol_server interface 2021-11-05 15:42:41 +02:00
Piotr Sarna
0b11771731 alternator: decouple auth from CQL query processor
Alternator auth module used to piggy-back on top of CQL query processor
to retrieve authentication data, but it's no longer the case.
Instead, storage proxy is used directly.

Closes #9538
2021-10-28 21:55:56 +03:00
Avi Kivity
4aaddd8609 alternator: remove uses of get_local_gossiper()
Replace with a gossiper parameter passed from the controller.
2021-09-07 20:08:15 +03:00
Avi Kivity
aa68927873 gossiper: remove get_local_gossiper() from some inline helpers
Some state accessors called get_local_gossiper(); this is removed
and replaced with a parameter. Some callers (redis, alternators)
now have the gossiper passed as a parameter during initialization
so they can use the adjusted API.
2021-09-07 17:03:37 +03:00
Pavel Emelyanov
e02b39ca3d code: Generalize tls::credentials_builder configuration
All the places in code that configure the mentioned creds builder
from client_|server_encryption_options now do it the same way.
This patch generalizes it all in the utils:: helper.

The alternator code "ignores" require_client_auth and truststore
keys, but it's easy to make the generalized helper be compatible.

Also make the new helper coroutinized from the beginning.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-08-20 18:05:41 +03:00
Pavel Emelyanov
aa88527375 main, config: Move options parsing helpers
The get_or_default and is_true are two aux bits that are used
to parse the config options. The former is duplicated in the
alternator code as well.

Put both in utils namespace for future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-08-20 17:53:41 +03:00
Pavel Emelyanov
ba10e96c75 alternator: Keep storage_proxy on server
It's already available on controller and will be needed by
API handlers in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-07-29 05:12:36 +03:00
Pavel Emelyanov
fbd98e6292 alternator: Move start-stop code into controller
This move is not "just move", but also includes:

- putting the whole thing into seastar::async()
- switch from locally captured dependencies into controller's
  class members
- making smp_service_groups optional because it doesn't have
  default contructor and should somehow survive on constructed
  controller until its start()

Also copy few bits from main that can be generalized later:

- get_or_default() helper from main
- sharded_parameter lambda for cdc
- net family and preferred thing from main

( this also fixed the indentation broken by previous patch )

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-06-11 18:17:27 +03:00
Pavel Emelyanov
4aad618409 alternator: Controller skeleton
Add the controller class with all the needed dependencies. For
now completely unused (thus a bunch of (void)-s here and there).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-06-11 18:08:37 +03:00
Pavel Emelyanov
316e9af234 alternator: Controller basement
Add header and source file for transport- (and thrift-) like controller
that'll do all the bookkeeping needed to start and stop this client
service.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-06-11 18:06:10 +03:00