This is short cleanup after recent removal of creating default cassandra superuser and auth-v1 code removal.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1036
Backport: no, just code cleanup
Closesscylladb/scylladb#29004
* github.com:scylladb/scylladb:
auth: remove DEFAULT_SUPERUSER_NAME constant and dead DEFAULT_USER_PASSWORD
auth: use configurable default_superuser in describe_roles
auth: move default_superuser to common, remove _superuser member
auth: use LOCAL_ONE for all auth queries
auth: remove get_auth_ks_name indirection
Replace get_auth_ks_name(qp) with db::system_keyspace::NAME directly.
The function always returned the constant "system" and its qp
parameter was unused.
Replace all calls to SCYLLA_ASSSERT() in Alternator by the better and
safer throwing_assert() introduced in the previous patch.
As a result of this patch, if one of the call sites for these asserts
is buggy and ever fails, only the involved operation will be killed
by an exception, instead of crashing the whole server - and often the
entire cluster (as the same buggy request reaches all nodes and
crashes them all).
Additionally, this patch replaces a few existing uses in Alternator
of on_internal_error() with a non-interesting message with a
more-or-less equivalent, but shorter, throwing_assert(). The idea is
to convert the verbose idiom:
if (!condition) {
on_internal_error(logger, "some error message")
}
With the shorter
throwing_assert(condition)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Addresses outstanding review comments from PR #22961 where SSL field
collection was refactored into generic_server::connection base class.
This patch consists of minor cosmetic enhancements for increased
readability, mainly, with some minor fixups explained in specific
commits.
Cosmetic changes, no need to backport.
Closesscylladb/scylladb#27575
* github.com:scylladb/scylladb:
test_ssl: fix indentation
generic_server: improve logging broken TLS connection
test_ssl: improve timeout and readability
alternator/server: update SSL comment
We have a test in test_compressed_response.py that reproduces a bug
where in Alternator's signature checking code, if a header had multiple
consecutive spaces its signature isn't checked correctly.
This patch fixes this and that xfailing test begins to pass.
But it turns out that the handling of multiple consecutive spaces in
headers when calculating the authentication signature is just one example
of "header canonization" that the AWS Signature V4 specification requires
us to do. There are additional types of header canonization that Alternator
must do, and this patch also adds new tests in test_authorization.py for
checking *all* the types of canonization.
Fortunately, for all other types of canonizations, we already handled
them correctly - Alternator already lowercases header names, sorts them
alphabetically and removes leading and trailing spaces before calculating
the signature. So most of the new tests added pass also without this patch,
and only one of them, test_canonization_middle_whitespace, needs this
patch to pass. As usual, all the new tests also pass on DynamoDB.
Fixes#27775
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#28102
Following 954f2cbd2f, which added proxy protocol v2 listeners
for CQL, we do the same for alternator. We add two optional ports
for plain and TLS-wrapped HTTP.
We test each new port, that the old ports still work, and that
mixing up a port with no proxy protocol and a connection with proxy
protocol (or the opposite) fails. The latter serves to show
that the testing strategy is valid and doesn't just pass whatever
happens. We also verify that the correct addresses (and TLS mode)
show up in system.clients.
Closesscylladb/scylladb#27889
Currently Alternator supports compressed requests in the gzip format
with "Content-Encoding: gzip". We did not support any other compression
formats.
It turns out that DynamoDB also supports the "deflate" encoding.
The "deflate" format is just a small variant of gzip and also supported
by the same zlib library that we already use, so it is very easy
to add support for it as well. So this patch adds it.
Beyond compatibility with DynamoDB, another benefit of this patch is
symmetry with our response compression support (PR #27454), where
we supported both gzip and deflate compression of responses - so
we should support the same for requests.
This patch also adds tests for Content-Encoding: deflate, which pass
on DynamoDB (proving that "deflate" is indeed supported there).
On Alternator the new tests failed before this patch and pass with
this patch.
Refs #27243 (which asks to support more compression formats).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27917
Previous commit added means to decide whether client asks for compression and with which algorithm.
This patch adds actual compression of responses based on zlib library.
For now only string (not chunked) responses are compressed.
Several previously defined tests start to pass.
This is an initial patch to add support of Alternator's compressed responses.
The actual compression (gzip,deflate) will be added in the following commits.
The main functionality added in this commmit is parsing of Accept-Encoding header,
that indicates compression algorithms supported by the client.
In this commit we add also configuration parameters of response gzip/deflate compression.
They allow to enable/disable compression, set level and a size threshold below which a response is not compressed.
With current implementation it is possible to decide a compression for each response, but it is not used yet.
Optimize memory usage changing types of driver_name and driver_version be
a reference to a cached value instead of an sstring.
These fields very often have the same value among different connections hence
it makes sense to cache these values and use references to them instead of duplicating
such strings in each connection state.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
get_client_data() is used to assemble `client_data` objects from each connection
on each CPU in the context of generation of the `system.clients` virtual table data.
After collected, `client_data` objects were std::moved and arranged into a
different structure to match the table's sorting requirements.
This didn't allow having not-cross-shard-movable objects as fields in the `client_data`,
e.g. lw_shared_ptr objects.
Since we are planning to add such fields to `client_data` in following patches this patch
is solving the limitation above by making get_client_data() return `foreign_ptr<std::unique_ptr<client_data>>`
objects instead of naked `client_data` ones.
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
In this patch we implement Alternator's support for gzip-compressed
requests, i.e., requests with the "Content-Encoding: gzip" header,
other uncompressed headers, and a gzip-compressed body.
The server needs to verify the signature of the *compressed* content,
and then uncompress the body before running the request.
We only support gzip compression because this is what DynamoDB supports.
But in the future we can easily add support for other compression
algorithms like lz4 or zstd.
This patch Refs #5041 but doesn't "Fixes" it because it only implements
compressed requests (Content-Encoding), *not* compressed responses
(Accept-Encoding).
The next patch will enable several tests for this feature and make sure
it behaves like DynamoDB.
Note that while we will have now support in our server for compressed
requests, just like DynamoDB does, the clients (AWS SDKs) will probably
NOT make use of it because they do not enable request compression by
default. For example, see the tests for some hoops one needs to jump
through in boto3 (the Python SDK) to send compressed requests. However,
we are hoping that in the future Alternator's modified clients will
use compressed requests and enjoy this feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this patch, the configuration alternator_enforce_authorization
is a boolean: true means enforce authentication checks (i.e., each
request is signed by a valid user) and authorization checks (the user
who signed the request is allowed by RBAC to perform this request).
This patch adds a second boolean configuration option,
alternator_warn_authorization. When alternator_enforce_authorization
is false but alternator_warn_authorization is true, authentication and
authorization checks are performed as in enforce mode, but failures
are ignored and counted in two new metrics:
scylla_alternator_authentication_failures
scylla_alternator_authorization_failures
additionally,also each authentication or authorization error is logged as
a WARN-level log message. Some users prefer those log messages over
metrics, as the log messages contain additional information about the
failure that can be useful - such as the address of the misconfigured
client, or the username attempted in the request.
All combinations of the two configuration options are allowed:
* If just "enforce" is true, auth failures cause a request failure.
The failures are counted, but not logged.
* If both "enforce" and "warn" are true, auth failures cause a request
failure. The failures are both counted and logged.
* If just "warn" is true, auth failures are ignored (the request
is allowed to compelete) but are counted and logged.
* If neither "enforce" nor "warn" are true, no authentication or
authorization check are done at all. So we don't know about failures,
so naturally we don't count them and don't log them.
This patch is fairly straightforward, doing mainly the following
things:
1. Add an alternator_warn_authorization config parameter.
2. Make sure alternator_enforce_authorization is live-updatable (we'll
use this in a test in the next patch). It "almost" was, but a typo
prevented the live update from working properly.
3. Add the two new metrics, and increment them in every type of
authentication or authorization error.
Some code that needs to increment these new metrics didn't have
access to the "stats" object, so we had to pass it around more.
4. Add log messages when alternator_warn_authorization is true.
5. If alternator_enforce_authorization is false, allow the auth check
to allow the request to proceed (after having counted and/or logged
the auth error).
A separate patch will follow and add documentation suggesting to users
how to use the new "warn" options to safely switch between non-enforcing
to enforcing mode. Another patch will add tests for the new configuration
options, new metrics and new log messages.
Fixes#25308.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Following DynamoDB, Alternator also places a 16 MB limit on the size of
a request. Such a limit is necessary to avoid running out of memory -
because the AWS message authentication protocol requires reading the
entire request into memory before its signature can be verified.
Our implementation for this limit used Seastar's HTTP server's
content_length_limit feature. However, this Seastar feature is
incomplete - it only works when the request uses the Content-Length
header, and doesn't do anything if the request doesn't have a
Content-Length (it may use chunked encoding, or have no length at all).
So malicious users can cause Scylla to OOM by sending a huge request
without a Content-Length.
So in this patch we stop using the incomplete Seastar feature, and
implement the length limit in Scylla in a way that works correctly with
or without Content-Length: We read from the input stream and if we go
over 16MB, we generate an error.
Because we dropped Seastar's protection against a long Content-Length,
we also need to fix a piece of code which used Content-Length to reserve
some semaphore units to prevent reading many large requests in parallel.
We fix two problems in the code:
1. If Content-Length is over the limit, we shouldn't attempt to reserve
semaphore units - this should just be a Payload Too Large error.
2. If Content-Length is missing, the existing code did nothing and had
a TODO that we should. In this patch we implement what was suggested
in that TODO: We temporarily reserve the whole 16 MB limit, and
after reading the actual request, we return part of the reservation
according to the real request size.
That last fix is important, because typically the largest requests will be
BatchWriteItem where a well-written client would want to use chunked
encoding, not Content-Length, to avoid materializing the entire request
up-front. For such clients, the memory use semaphore did nothing, and
now it does the right thing.
Note that this patch does *not* solve the problem #12166 that existed
with Seastar's length-limiting implementation but still exists in the
new in-Scylla length-limiting implementation: The fact we send an
error response in the middle of the request and then close the
connection, while the client continues to send the request, can lead
to an RST being sent by the server kernel. Usually this will be fine -
well-written client libraries will be able to read the response before
the RST. But even with a well-written library in some rare timings
the client may get the RST before the response, and will miss the
response, and get an empty or partial response or "connection reset
by peer". This issue existed before this patch, and still exists, but
is probably of minor impact.
Fixes#8196
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23434
In #24031 users complained, that trace message is truncated, namely it's
no longer json parsable and table name might not be part of the output.
This path enables users to configure maximum size of trace message.
In case user wanted `table` name, but didn't care about message size,
#26634 will help.
- add configuration varable `alternator_max_users_query_size_in_trace_output`
with default value of 4096 (4 times old default value).
- modify `truncated_content_view` function to use new configuration
variable for truncation limit
- update `truncated_content_view` to consistently truncate at given
size, previously trunctation would also happen when data arrived in
more than one chunk
- update `truncated_content_view` to better handle truncated value
(limit number of copies)
- fix `scylla_config_read` call - call to `query` for a configuration
name that is not existing will return `Items` array empty
(but present) - this would raise array access exception few lines
below.
- add test
Refs #26634
Refs #24031Closesscylladb/scylladb#26618
Until recently, Seastar's HTTP server's reply::write_body() only
supported a few "well-known" content types. But Alternator uses a
lesser known one - "application/x-amz-json-1.0" - so it was forced to
use a *wrong* (but legal) content type, and later override it with the
correct one. This was really ugly and we had a comment that once this
feature was fixed in Seastar, we should remove the ugly workaround.
Well, the time has finally come. We can now finally pass the correct
content type to write_body(), and don't need to call the deprecated
type-changing function later.
The new implementation is less awkward, but actually longer - whereas
previously we only set the content type in one place - just before
the done(), after this patch we actually need to do it in three places
where we write the body (string response, streaming response and
error response). But I think this is actually better - there is no
inherent reason why, for example, error messages and success messages
needed to use the same content type. We use a new constant
REPLY_CONSTANT_TYPE so that we don't need to repeat it three times.
We already have a regression test for the content-type returned by
Alternator, test_manual_requests.py::test_content_type, and this
test continues to pass after the patch. But this test only checked
the short response path, so we add additional tests for the streaming
response path and for the error response path. As usual, the new
tests pass on DynamoDB as well.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#26268
Today, the "system.clients" virtual table lists active connections (and
their various properties, like client address, logged in username and
client version) only for CQL requests. In this patch we make Alternator
active clients also be listed on this virtual table.
Unlike CQL where logged in username applies to a complete connection,
in the Alternator API, different requests, theoretically signed by
different users, can arrive over the same HTTP connection. So instead of
listing the currently open *connections*, we list the currently active
*requests*.
This means that when scanning system.clients, you will only see requests
which are being handled right now - and not inactive HTTP connections.
I think this good enough (besides being the correct thing to do) - one
of the goals of this system.clients is to be able to see what kind of
drivers are being used by the user (the "driver_name" field in the
system.clients) - on a busy server there will always be some (even many)
requests being handled, so we'll always have plenty of requests to see
in system.clients.
By the way, note that for Alternator requests, what we use for the
"driver_name" is the request's User-Agent header. AWS SDKs typically
write the driver's name, its version, and often a lot of other
information in that header. For example, Boto3 sends a User-Agent
looking like:
Boto3/1.38.46 md/Botocore#1.38.46 md/awscrt#0.24.2
ua/2.1 os/linux#6.15.4-100.fc41.x86_64 md/arch#x86_64
lang/python#3.13.5 md/pyimpl#CPython m/N,P,b,D,Z
cfg/retry-mode#legacy Botocore/1.38.46 Resource
A functional test for the new feature - adding Alternator requests to
the system.clients table - will be in the next patch.
Fixes#24993
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The previous patch introduced a function make_streamed_with_extra_array
which was a duplicate of the existing make_streamed. Reviewers
complained how baroque the new function is (just like the old function),
having to jump through hoops to return a copyable function working
on non-copyable objects, making strange-named copies and shared pointers
of everything.
We needed to return a copyable function (std::function) just because
Alternator used Seastar's json::json_return_type in the return type
from executor function (request_return_type). This json_return_type
contained either a sstring or an std::function, but neither was ever
really appropriate:
1. We want to return noncopyable_function, not an std::function!
2. We want to return an std::string (which rjson::print()) returns,
not an sstring!
So in this patch we stop using seastar::json::json_return_type
entirely in Alternator.
Alternator's request_return_type is now an std::variant of *three* types:
1. std::string for short responses,
2. noncopyable_function for long streamed response
3. api_error for errors.
The ugliest parts of make_streamed() where we made copies and shared
pointers to allow for a copyable function are all gone. Even nicer, a
lot of other ugly relics of using seastar::json_return_type are gone:
1. We no longer need obscure classes and functions like make_jsonable()
and json_string() to convert strings to response bodies - an operation
can simply return a string directly - usually returning
rjson::print(value) or a fixed string like "" and it just works.
2. There is no more usage of seastar::json in Alternator (except one
minor use of seastar::json::formatter::to_json in streams.cc that
can be removed later). Alternator uses RapidJSON for its JSON
needs, we don't need to use random pieces from a different JSON
library.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds checks validating 'BatchWriteItem' requests mostly to avoid ugly fallback message.
It changes request's behaviour in case of an empty array of WriteRequests - previously such an array was ignored and whole request might succeed, now it raises ValidationException, following the documentation and behaviour of DynamoDB.
Patch includes tests in test_manual_requests (`test_batch_write_item_invalid_payload`, `test_batch_write_item_empty_request_list`) testing with several offending cases.
Fixes#23233Closesscylladb/scylladb#23878
This patch changes gossiper to index nodes by host ids instead of ips.
The main data structure that changes is _endpoint_state_map, but this
results in a lot of changes since everything that uses the map directly
or indirectly has to be changed. The big victim of this outside of the
gossiper itself is topology over gossiper code. It works on IPs and
assumes the gossiper does the same and both need to be changed together.
Changes to other subsystems are much smaller since they already mostly
work on host ids anyway.
Alternator's "/localnodes" HTTP requests is supposed to return the list
of nodes in the local DC to which the user can send requests.
Before commit bac7c33313 we used the
gossiper is_alive() method to determine if a node should be returned.
That commit changed the check to is_normal() - because a node can be
alive but in non-normal (e.g., joining) state and not ready for
requests.
However, it turns out that checking is_normal() is not enough, because
if node is stopped abruptly, other nodes will still consider it "normal",
but down (this is so-called "DN" state). So we need to check **both**
is_alive() and is_normal().
This patch also adds a test reproducing this case, where a node is
shut down abruptly. Before this patch, the test failed ("/localnodes"
continued to return the dead node), and after it it passes.
Fixes#21538
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21540
Our "sstring_view" is an historic alias for the standard std::string_view.
Alternator only used this alias in a couple of random names, let's change
them to the standard type name.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.
in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Before this patch, the "/localnodes" HTTP request to the Alternator server
lists all the live nodes of the current DC. This patch adds two optional
parameters to this query:
dc: allows to list the live nodes of a specific named DC instead of the
current DC of the server.
rack: allows to restrict the results to just the nodes belonging to a
specific named rack.
For both options, if no live node exists in the given dc or rack (in
particular, if such a dc or rack doesn't even exist), an empty list is
returned - it's not an error.
The default, if dc or rack is not specified - remains exactly as it is
today - look at the current DC (the one of the node being request), and
do not restrict the list to any specific rack.
We expect the new options that we added here to be useful for two use cases:
1. A client that knows of *some* Scylla node (belonging to an unknown DC),
but wants to list the nodes in *its* DC, which it knows by name.
2. A client in a multi-rack DC (e.g., multi-AZ region in AWS) that wants
to send requests to nodes in its own rack (which it knows by name),
to avoid cross-rack networking costs.
Note that in both cases, this requires clients to know the names of DCs
and AZs via some out-of-band means. The client can also get a list of DCs
and racks using the system.local system table, as the tests included in
this patch demonstrate.
This patch includes two set of tests for these new options: One in the the
single-node test/alternator framework that has a single dc and rack but
can still check the case of an unknown dc or rack (in which case an empty
list is returned). The second test is in the topology framework, and runs
an 8-node cluster with two DCs, two racks, and two nodes in each, and
checks all the combinations of "/localnodes" requests with and without
dc and rack options. This test also resolves a longstanding TODO that
asked for such a multi-DC test for "/localnodes" to be written.
Fixes#12147
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#20915
For no good reason, the "alternator_enforce_authorization" flag (which
chooses whether to enable authentication and authorization checks in
Alternator) was not live-updatable, so make it so.
Both "server" and "executor" objects use this configuration flag, the
former is fixed in this patch (to hold a live-updatable reference
instead of a copy of a boolean), the latter was already prepared for
this change and already held a live-updatable reference.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Scylla uses a "client_state" object to encapsulate the information of
who the client is - its IP address, which user was authenticated, and so on.
For an unknown reason, Alternator created for each request an "internal"
client_state, meaning that supposedly the client for each request was
some sort of internal process (e.g., repair) rather than a real client.
This was wrong, and we even had a FIXME about not putting the client's
IP address in client_state.
So in this patch, we start using a normal "external" client_state
instead of an "internal" one. The client_state constructors are very
different in the two cases, so a few lines of code had to change.
I hope that this change will cause no functional changes. For example,
Alternator was already setting its own timeouts explicitly and not
relying on the default ones for external clients. However, we need to
fix this for the following patches which introduce permissions checks
(Role-Based Access Control - RBAC) - the client_state methods for
checking permissions become no-ops for *internal* clients (even if the
client_state contains an authenticated users). We need these functions
to do their job - so we need an *external* variant of client_state.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Alternator's "/localnodes" HTTP request is supposed to return the list of
nodes in the local DC to which the user can send requests.
The existing implementation incorrectly used gossiper::is_alive() to check
for which nodes to return - but "alive" nodes include nodes which are still
joining the cluster and not really usable. These nodes can remain in the
JOINING state for a long time while they are copying data, and an attempt
to send requests to them will fail.
The fix for this bug is trivial: change the call to is_alive() to a call
to is_normal().
But the hard part of this test is the testing:
1. An existing multi-node test for "/localnodes" assummed that right after
a new node was created, it appears on "/localnodes". But after this
patch, it may take a bit more time for the bootstrapping to complete
and the new node to appear in /localnodes - so I had to add a retry loop.
2. I added a test that reproduces the bug fixed here, and verifies its
fix. The test is in the multi-node topology framework. It adds an
injection which delays the bootstrap, which leaves a new node in JOINING
state for a long time. The test then verifies that the new node is
alive (as checked by the REST API), but is not returned by "/localnodes".
3. The new injection for delaying the bootstrap is unfortunately not
very pretty - I had to do it in three places because we have several
code paths of how bootstrap works without repair, with repair, without
Raft and with Raft - and I wanted to delay all of them.
Fixes#19694.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#19725
This short series fixes Alternator's "/localnodes" request to allow a node's external IP address - configured with `broadcast_rpc_address` - to be listed instead of its usual, internal, IP address.
The first patch fixes a bug in gossiper::get_rpc_address(), which the second patch needs to implement the feature. The second patch also contains regression tests.
Fixes#18711.
Closesscylladb/scylladb#18828
* github.com:scylladb/scylladb:
alternator: fix "/localnodes" to use broadcast_rpc_address
gossiper: fix get_rpc_address() for this node
Alternator's non-standard "/localnodes" HTTP request returns a list of
live nodes on this DC, to consider for load balancing. The returned
node addresses should be external IP addresses usable by the clients.
Scylla has a configuration parameter - broadcast_rpc_address - which
defines for a node an external IP address. If such a configuration
exists, we need to use those external IP addresses, not the internal
ones.
Finding these broadcast_rpc_address of all nodes is easy, because the
gossiper already gossips them.
This patch also tests the new feature:
1. The existing single-node test is extended to verify that without
broadcast_rpc_address we get the usual IP address.
2. A new two-node test is added to check that when broadcast_rpc_address
is configured, we get that address and not the usual internal IP
addresses.
Fixes#18711.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch cleans up a bit the code in Alternator which splits up
the operation's X-Amz-Target header (the second part of it is the
name of the operation, e.g., CreateTable).
The patch doesn't change any functionality or change performance in
any meaningful way. I was just reviewing this code and was annoyed by
the unnecessary variable and unnecessary creation of strings and
vectors for such a simple operation - and wanted to clean it up.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#18830
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.
Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18712
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.
Docs will be updated later, when auth-v2
is enabled as default.
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
this change silences following compiling warning due to using the
deprecated API by using the recommended API in place of the deprecated
one:
```
/home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations]
_https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
^
/home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here
[[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]]
^
1 warning generated.
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15884
While it may not be explicitly documented DynamoDB sometimes enchriches error
message by additional fields. For instance when ConditionalCheckFailedException
occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object,
similarly for TransactionCanceledException it will add CancellationReasons object.
There may be more cases like this so generic json field is added to our error class.
The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure
feature.