Protocol v4 added WRITE_FAILURE and READ_FAILURE. When running under v3
we downgrade these exceptions to WRITE_TIMEOUT and READ_TIMEOUT (since
the client won't understand the v4 errors), but we still send the new
error codes. This causes the client to become confused.
Fix by updating the error codes.
A better fix is to move the error code from the constructor parameter
list and hard-code it in the constructor, but that is left for a follow-up
after this minimal fix.
Fixes#5610.
Closes#10362
Recently, coordinator_result was introduced as an alternative for
exceptions. It was placed in the main "exceptions/exceptions.hh" header,
which virtually every single source file in Scylla includes.
But unfortunately, it brings in some heavy header files and templates,
leading to a lot of wasted build time - ClangBuildAnalyzer measured that
we include exceptions.hh in 323 source files, taking almost two seconds
each on average.
In this patch, we split the coordinator_result feature into a separate
header file, "exceptions/coordinator_result", and only the few places
which need it include the header file. Unfortunately, some of these
few places are themselves header, so the new header file ends up being
included in 100 source files - but 100 is still much less than 323 and
perhaps we can reduce this number 100 later.
After this patch, the total Scylla object-file size is reduced by 6.5%
(the object size is a proxy for build time, which I didn't directly
measure). ClangBuildAnalyzer reports that now each of the 323 includes
of exceptions.hh only takes 80ms, coordinator_result.hh is only included
100 times, and virtually all the cost to include it comes from Boost's
result.hh (400ms per inclusion).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220228204323.1427012-1-nyh@scylladb.com>
Now the connection_notifier is all gone, only the client_data bits are left.
To keep it consistent -- rename the files.
Also, while at it, brush up the header dependencies and remove the not
really used constexprs for client states.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This includes most of the connection_notifier stuff as well as
the auxiliary code from system_keyspace.cc and a bunch of
updating calls from the client state changing.
Other than less code and less disk updates on clients connection
paths, this removes one usage of the nasty global qctx thing.
Since the system.clients goes away rename the system.clients_v
here too so the table is always present out there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The call returns a chunked_vector with client_data's. For now
only the native transport implements it, others return empty
vector.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Right now when the client state changes the respective update is
performed on the system.clients table. While doing it some bits
from this state are lost from the in-memory structures. For the
sake of exporting this information we need to track whether the
connected client goes authenticating or is already ready.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Adds `utils::result_try` and `utils::result_futurize_try` - functions which allow to convert existing try..catch blocks into a version which handles C++ exceptions, failed results with exception containers and, depending on the function variant, exceptional futures using the same exception handling logic.
For example, you can convert the following try..catch block:
try {
return a_function_that_may_throw();
} catch (const my_exception& ex) {
return 123;
} catch (...) {
throw;
}
...to this:
return utils::result_try([&] {
return a_function_that_may_throw_or_return_a_failed_result();
}, utils::result_catch<my_exception>([&] (const Ex&) {
return 123;
}), utils::result_catch_dots([&] (auto&& handle) {
return handle.into_result();
});
Similarly, `utils::result_futurize_try` can be used to migrate `then_wrapped` or `f.handle_exception()` constructs.
As an example of the usability of the new constructs, two places in the current code which need to simultaneously handle exceptions and failed results are converted to use `result_try` and `result_futurize_try`.
Results of `perf_simple_query --smp 1 --operations-per-shard 1000000 --write`:
```
127041.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52422 insns/op)
126958.60 tps ( 67.2 allocs/op, 14.2 tasks/op, 52409 insns/op)
127088.37 tps ( 67.2 allocs/op, 14.2 tasks/op, 52411 insns/op)
127560.84 tps ( 67.2 allocs/op, 14.2 tasks/op, 52424 insns/op)
127826.61 tps ( 67.2 allocs/op, 14.2 tasks/op, 52406 insns/op)
126801.02 tps ( 67.2 allocs/op, 14.2 tasks/op, 52420 insns/op)
125371.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52425 insns/op)
126498.51 tps ( 67.2 allocs/op, 14.2 tasks/op, 52427 insns/op)
126359.41 tps ( 67.2 allocs/op, 14.2 tasks/op, 52423 insns/op)
126298.27 tps ( 67.2 allocs/op, 14.2 tasks/op, 52410 insns/op)
```
The number of tasks and allocations is unchanged. The number of instructions per operations seems similar, it may have increased slightly (by 10-20) but it's hard to tell for sure because of the noisiness of the results.
Tests: unit(dev)
Closes#10045
* github.com:scylladb/scylla:
transport: use result_try in process_request_one
storage_proxy: use result_futurize_try in mutate_end
storage_proxy: temporarily throw exception from result in mutate_end
utils: add result_try and result_futurize_try
Adapts the exception handling logic in process_request_one so that it
uses utils::result_try to handle both C++ exceptions and failed results
in a unified way.
At the point where `result_message` is converted to a
`cql_server::response`, now the result message is inspected and returned
as failed `result<>` if it contained an error.
For now, the failed `result<>` is thrown as exception in `process` and
`process_on_shard`, but that will change in the next commit.
In order to propagate exceptions as values through the CQL layer with
minimal modifications to the interfaces, a new result_message type is
introduced: result_message::exception. Similarly to
result_message::bounce_to_shard, this is an internal type which is
supposed to be handled before being returned to the client.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The notifier is automatic friend of server and can access its
private fields without additional wrappers/decorations.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The event_notifier is private server subclass that's created once
per server to handle events from storage_service. The notifier needs
gossiper that already sits on the server, and to get it the simplest
way is to equip notifier with the server backreference. Since these
two objects are in strict 1:1 relation this reference is safe.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The gossiper is needed by the transport::event_notifier. There's
already gossiper reference on the transport controller, but it's
a local reference, because controller doesn't need more. This
patch upgrages controller reference to sharded<> and propagates
it further up to the server.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In early versions of the series which proposed protocol servers, the
interface had two methods answering pretty much the same question of
whether the server is running or not:
* listen_addresses(): empty list -> server not running
* is_server_running()
To reduce redundancy and to avoid possible inconsistencies between the
two methods, `is_server_running()` was scrapped, but re-added by a
follow-up patch because `listen_addresses()` proved to be unreliable as
a source for whether the server is running or not.
This patch restores the previous state of having only
`listen_addresses()` with two additional changes:
* rephrase the comment on `listen_addresses()` to make it clear that
implementations must return empty list when the server is not running;
* those implementations that have a reliable source of whether the
server is running or not, use it to force-return an empty list when
the server is not running
Tests: dtest(nodetool_additional_test.py)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211117062539.16932-1-bdenes@scylladb.com>
Change b0a2a9771f broke
the generic api implementation of
is_native_transport_running that relied on
the addresses list being empty agter the server is stopped.
To fix that, this change introduces a pure virtual method:
protocol_server::is_server_running that can be implemented
by each derived class.
Test: unit(dev)
DTest: nodetool_additional_test.py:TestNodetool.binary_test
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211114135248.588798-1-bhalevy@scylladb.com>
"
On start scylla resolves several hostnames into addresses. Different
places use different hostname selection logic, e.g. the API address
can be the listen one if the dedicated option not set. Failure to
resolve a hostname is reported with an exception that (sometimes)
contains the hostname, but it doesn't look very convenient -- better
to know the config option name. Also resolving of different hostnames
has different decoration around, e.g. prometheus carries a main-local
lambda just to nicely wrap the try/catch block.
This set unifies this zoo and makes main() shorter and less hairy:
1. All failures to resolve a hostname are reported with an
exception containing the relevant config option
2. The || operator for named_value's is introduced to make
the option selection look as short as
resolve(cfg->some_address() || cfg->another_address())
3. All sanity checks are explicit and happen early in main
4. No dangling local variables carrying the cfg->...() value
5. Use resolved IP when logging a "... is listening on ..."
message after a service start
tests: unit(dev)
"
* 'br-ip-resolve-on-start' of https://github.com/xemul/scylla:
main: Move fb-utilities initialization up the main
code: Use utils::resolve instead of inet_address::lookup
main: Remove unused variable
main: Sanitize resolving of listen address
main: Sanitize resolving of broadcast address
main: Sanitize resolving of broadcast RPC address
main: Sanitize resolving of API address
main: Sanitize resolving of prometheus address
utils: Introduce || operator for named_values
db.config: Verbose address resolver helper
main: Remove api-port and prometheus-port variables
alternator: Resolve address with the help of inet_address
redis, thrift: Remove unused captures
There are some users of the latter call left. They all suffer
from the same problem -- the lack of verbosity on resolving
errors.
While at it also get rid of useless local variables that are
only there to carry the cfg->...() option over.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We shouldn't be using Seastar as a text formatting library; that's
not its focus. Use fmt directly instead. fmt::print() doesn't return
the output stream which is a minor inconvenience, but that's life.
Closes#9556
We have two identical "Truncated frame" errors, at:
* read_frame_size() in serialization_visitors.hh;
* cql_server::connection::read_and_decompress_frame() in
transport/server.cc;
When such an exception is thrown, it is impossible to tell where was it
thrown from and it doesn't have any further information contained in it
(beyond the basic information it being thrown implies).
This patch solves both problems: it makes the exception messages unique
per location and it adds information about why it was thrown (the
expected vs. real size of the frame).
Ref: #9482Closes#9520
Fixes#9491
CQL server, when encountering a "general" exception (i.e. not thrown by
cql error checks), reports a wire error with simply the what() part of
exception. However, if we have nested exceptions, we will most likely
lose info here (hello encryption).
General exception case should unwind exception and give back full,
concatenated message to avoid confusion.
Closes#9492
This warning can catch a virtual function that thinks it
overrides another, but doesn't, because the two functions
have different signatures. This isn't very likely since most
of our virtual functions override pure virtuals, but it's
still worth having.
Enable the warning and fix numerous violations.
Closes#9347
"
There are 4 places out there that do the same steps parsing
"client_|server_encryption_options" and configuring the
seastar::tls::creds_builder with the values (messaging, redis,
alternator and transport).
Also to make redis and transport look slimmer main() cleans
the client_encryption_options by ... parsing it too.
This set introduces a (coroutinized) helper to configure the
creds_builder with map<string, string> and removes the options
beautification from main.
tests: unit(dev), dtest.internode_ssl_test(dev)
"
* 'br-generalize-tls-creds-builder-configuration' of https://github.com/xemul/scylla:
code: Generalize tls::credentials_builder configuration
transport, redis: Do not assume fixed encryption options
messaging: Move encryption options parsing to ms
main: Open-code internode encryption misconfig warning
main, config: Move options parsing helpers
All the places in code that configure the mentioned creds builder
from client_|server_encryption_options now do it the same way.
This patch generalizes it all in the utils:: helper.
The alternator code "ignores" require_client_auth and truststore
keys, but it's easy to make the generalized helper be compatible.
Also make the new helper coroutinized from the beginning.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
On start main() brushes up the client_encryption_options option
so that any user of it sees it in some "clean" state and can
avoid using get_or_default() to parse.
This patch removes this assumption (and the cleaning code itself).
Next patch will make use of it and relax the duplicated parsing
complexity back.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
`function_call` AST nodes are created for each function
with side effects in a CQL query, i.e. non-deterministic
functions (`uuid()`, `now()` and some others timeuuid-related).
These nodes are evaluated either when a query itself is executed
or query restrictions are computed (e.g. partition/clustering
key ranges for LWT requests).
We need to cache the calls since otherwise when handling a
`bounce_to_shard` request for an LWT query, we can possibly
enter an infinite bouncing loop (in case a function is used
to calculate partition key ranges for a query), since the
results can be different each time.
Furthermore, we don't support bouncing more than one time.
Returning `bounce_to_shard` message more than one time
will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Function calls are indexed by the order in which they appear
within a statement while parsing. There is no need to
include any kind of statement identifier to the cache key
since `query_options` (which holds the cache) is limited
to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Add a test written in `cql-pytest` framework to verify
that both prepared and unprepared lwt statements handle
`bounce_to_shard` messages correctly in such scenario.
Fixes: #8604
Tests: unit(dev, debug)
NOTE: the patchset uses `query_options` as a container for
cached values. This doesn't look clean and `service::query_state`
seems to be a better place to store them. But it's not
forwarded to most of the CQL code and would mean that a huge number
of places would have to be amended.
The series presents a trade-off to avoid forwarding `query_state`
everywhere (but maybe it's the thing that needs to be done, nonetheless).
"
* 'lwt_bounce_to_shard_cached_fn_v6' of https://github.com/ManManson/scylla:
cql-pytest: add a test for non-pure CQL functions
cql3: cache function calls evaluation for non-deterministic functions
cql3: rename `variable_specifications` to `prepare_context`
And reuse these values when handling `bounce_to_shard` messages.
Otherwise such a function (e.g. `uuid()`) can yield a different
value when a statement re-executed on the other shard.
It can lead to an infinite number of `bounce_to_shard` messages
sent in case the function value is used to calculate partition
key ranges for the query. Which, in turn, will cause crashes
since we don't support bouncing more than one time and the second
hop will result in a crash.
Caching works only for LWT statements and only for the function
calls that affect partition key range computation for the query.
`variable_specifications` class is renamed to `prepare_context`
and generalized to record information about each `function_call`
AST node and modify them, as needed:
* Check whether a given function call is a part of partition key
statement restriction.
* Assign ids for caching if above is true and the call is a part
of an LWT statement.
There is no need to include any kind of statement identifier
in the cache key since `query_options` (which holds the cache)
is limited to a single statement, anyway.
Note that `function_call::raw` AST nodes are not created
for selection clauses of a SELECT statement hence they
can only accept only one of the following things as parameters:
* Other function calls.
* Literal values.
* Parameter markers.
In other words, only parameters that can be immediately reduced
to a byte buffer are allowed and we don't need to handle
database inputs to non-pure functions separately since they
are not possible in this context. Anyhow, we don't even have
a single non-pure function that accepts arguments, so precautions
are not needed at the moment.
Tests: unit(dev, debug)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
The controller code sits in the cql_transport namespace and
can omit its mentionings. Also the seastar::distributed<>
is replaced with modern seastar::sharded<> while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the controller has the lifecycle notifier reference and
can stop using storage service to manage the subscription.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After subscription management was moved onto controller level
a bunch of code can be dropped:
- passing migration notifier beyond controller
- event_notifier's _stopped bit
- event_notifier .stop() method
- event_notifier empty constructor and destrictor
- generic_server's on_stop virtual method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's a migration notifier that's carried through cql_server
_just_ to let event-notifier (un)subscribe on it. Also there's
a call for global storage-service in there which will need to
be replaced with yet another pass-through argument which is not
great.
It's easier to establish this subscription outside of cql_server
like it's currently done for proxy and sl-manager. In case of
cql_server the "outside" is the controller.
This patch just moves the subscription management from cql_server
to controller, next two patches will make more use of this change.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This warning prevents using std::move() where it can hurt
- on an unnamed temporary or a named automatic variable being
returned from a function. In both cases the value could be
constructed directly in its final destination, but std::move()
prevents it.
Fix the handful of cases (all trivial), and enable the warning.
Closes#8992
Both controller and server only need database to get config from.
Since controller creation only happens in main() code which has the
config itself, we may remove database mentioning from transport.
Previous attempt was not to carry the config down to the server
level, but it stepped on an updateable_value landmine -- the u._v.
isn't copyable cross-shard (despite the docs) and to properly
initialize server's max_concurrent_requests we need the config's
named_value member itself.
The db::config that flies through the stack is const reference, but
its named_values do not get copied along the way -- the updateable
value accepts both references and const references to subscribe on.
tests: start-stop in debug mode
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210607135656.18522-1-xemul@scylladb.com>
Too large requests are currently handled by the CQL server by
skipping them and sending back an error response.
That's however wasteful and dangerous: bogus request sizes
will force Scylla to potentially skip gigabytes of data
- and skipping is done by simply reading from the socket,
so it may results in gigabytes of bandwidth wasted.
Even if the request size is not bogus, closing the connection
forces users to adjust their request sizes, which should be done
anyway.
Originally, there was a bug in handling too large requests which
only read their headers and then left the connection in a broken,
undefined state, trying to interpret the rest of the large request
as a next CQL header. It was later fixed to skip the request, but
closing the connection is a safer thing to do.
Fixes#8798Closes#8800
"
There's a bunch of issues with starting and stopping of cql_server with
the help of cql_controller.
fixes: #8796
tests: manual(start + stop,
start + exception on cql_set_state()
)
unit not run, they don't mess with transport controller
"
* 'br-transport-stop-fixes' of https://github.com/xemul/scylla:
transport/controller: Stop server on state change failure too
transport/controller: Rollback server start on state change failure too
transport/controller: Do not leave _server uninitialized
transport/controller: Rework try-catch into defers
Also drop a single violation in transport/server.cc. This helps
prevent dead code from piling up.
Three functions in row_cache_test that are not used in debug mode
are moved near their user, and under the same ifdef, to avoid triggering
the error.
Closes#8767
If on stop the set_cql_state() throws the local sharded<cql_server>
will be left not stopped and will fail the respective assertion on
its destruction.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If set_cql_state() throws the cserver remains started. If this
happens on start before the controller stop defer action is
scheduled the destruction of controller will fain on assertion
that checks the _server must be stopped.
Effectively this is the fix of #8796
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If an exception happens after sharded<cql_server>.start() the
controller's _server pointer is left pointing to stopped sharded
server. This makes it impossible to start the server again (via
API) since the check for if (_server) will always be true.
This is the continuation of the ae4d5a60 fix.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>