The DeleteTable operation in Alternator shoudl return a TableDescription
object describing the table which has just been deleted, similar to what
DescribeTable returns
Fixes scylladb#11472
Closes#11628
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore. but many users would like to set the driver timers based on
server timers. we need to enable them to configure timeout even
when the server is still running.
in this change,
* `alternator_timeout_in_ms` is marked as live-updateable
* `executor::_s_default_timeout` is changed to a thread_local variable,
so it can be updated by a per-shard updateable_value. and
it is now a updateable_value, so its variable name is updated
accordingly. this value is set in the ctor of executor, and
it is disconnected from the corresponding named_value<> option
in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
executor via sharded_parameter, so `executor::_timeout_in_ms` can
be initialized on per-shard basis
* `executor::set_default_timeout()` is dropped, as we already pass
the option to executor in its ctor.
Fixes#12232Closes#13300
* github.com:scylladb/scylladb:
alternator: split the param list of executor ctor into multi lines
alternator,config: make alternator_timeout_in_ms live-updateable
CQL evolved several expression evaluation mechanisms: WHERE clause,
selectors (the SELECT clause), and the LWT IF clause are just some
examples. Most now use expressions, which use managed_bytes_opt
as the underlying value representation, but selectors still use bytes_opt.
This poses two problems:
1. bytes_opt generates large contiguous allocations when used with large blobs, impacting latency
2. trying to use expressions with bytes_opt will incur a copy, reducing performance
To solve the problem, we harmonize the data types to managed_bytes_opt
(#13216 notwithstanding). This is somewhat difficult since the source of the values
are views into a bytes_ostream. However, luckily bytes_ostream and managed_bytes_view
are mostly compatible so with a little effort this can be done.
The series is neutral wrt performance:
before:
```
222118.61 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors)
224250.14 tps ( 61.1 allocs/op, 12.1 tasks/op, 43094 insns/op, 0 errors)
224115.66 tps ( 61.1 allocs/op, 12.1 tasks/op, 43092 insns/op, 0 errors)
223508.70 tps ( 61.1 allocs/op, 12.1 tasks/op, 43107 insns/op, 0 errors)
223498.04 tps ( 61.1 allocs/op, 12.1 tasks/op, 43087 insns/op, 0 errors)
```
after:
```
220708.37 tps ( 61.1 allocs/op, 12.1 tasks/op, 43118 insns/op, 0 errors)
225168.99 tps ( 61.1 allocs/op, 12.1 tasks/op, 43081 insns/op, 0 errors)
222406.00 tps ( 61.1 allocs/op, 12.1 tasks/op, 43088 insns/op, 0 errors)
224608.27 tps ( 61.1 allocs/op, 12.1 tasks/op, 43102 insns/op, 0 errors)
225458.32 tps ( 61.1 allocs/op, 12.1 tasks/op, 43098 insns/op, 0 errors)
```
Though I expect with some more effort we can eliminate some copies.
Closes#13637
* github.com:scylladb/scylladb:
cql3: untyped_result_set: switch to managed_bytes_view as the cell type
cql3: result_set: switch cell data type from bytes_opt to managed_bytes_opt
cql3: untyped_result_set: always own data
types: abstract_type: add mixed-type versions of compare() and equal()
utils/managed_bytes, serializer: add conversion between buffer_view<bytes_ostream> and managed_bytes_view
utils: managed_bytes: add bidirectional conversion between bytes_opt and managed_bytes_opt
utils: managed_bytes: add managed_bytes_view::with_linearized()
utils: managed_bytes: mark managed_bytes_view::is_linearized() const
since #13452, we switched most of the caller sites from std::regex
to boost::regex. in this change, all occurences of `#include <regex>`
are dropped unless std::regex is used in the same source file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13765
The expression system uses managed_bytes_opt for values, but result_set
uses bytes_opt. This means that processing values from the result set
in expressions requires a copy.
Out of the two, managed_bytes_opt is the better choice, since it prevents
large contiguous allocations for large blobs. So we switch result_set
to use managed_bytes_opt. Users of the result_set API are adjusted.
The db::function interface is not modified to limit churn; instead we
convert the types on entry and exit. This will be adjusted in a following
patch.
DynamoDB limits the allowed magnitude and precision of numbers - valid
decimal exponents are between -130 and 125 and up to 38 significant
decimal digitst are allowed. In contrast, Scylla uses the CQL "decimal"
type which offers unlimited precision. This can cause two problems:
1. Users might get used to this "unofficial" feature and start relying
on it, not allowing us to switch to a more efficient limited-precision
implementation later.
2. If huge exponents are allowed, e.g., 1e-1000000, summing such a
number with 1.0 will result in a huge number, huge allocations and
stalls. This is highly undesirable.
After this patch, all tests in test/alternator/test_number.py now
pass. The various failing tests which verify magnitude and precision
limitations in different places (key attributes, non-key attributes,
and arithmetic expressions) now pass - so their "xfail" tags are removed.
Fixes#6794
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
this change replaces all occurrences of `boost::lexical_cast<std::string>`
in the source tree with `fmt::to_string()`. for couple reasons:
* `boost::lexical_cast<std::string>` is longer than `fmt::to_string()`,
so the latter is easier to parse and read.
* `boost::lexical_cast<std::string>` creates a stringstream under the
hood, so it can use the `operator<<` to stringify the given object.
but stringstream is known to be less performant than fmtlib.
* we are migrating to fmtlib based formatting, see #13245. so
using `fmt::to_string()` helps us to remove yet another dependency
on `operator<<`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13611
Current signing code hard-codes the "/" as the URL, likely this just
works for alternator. For S3 client the URL would include bucket and
object name and should thus become the argument, not constant.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For S3 signing the whole request payload can be too resource consuming.
Fortunately, payload signing is only enforced if used with plain http,
but with real S3 we're going to use signed requests over https only (see
next patch why).
Said that, the patch turns body-content into optional reference (i.e. --
a pointer) so that the signing code could inject the UNSIGNED-PAYLOAD
mark instead of the payload signature and omit heavy payload signing.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Alternator's implementation of TagResource, UntagResource and UpdateTimeToLive (the latter uses tags to store the TTL configuration) was unsafe for concurrent modifications - some of these modifications may be lost. This short series fixes the bug, and also adds (in the last patch) a test that reproduces the bug and verifies that it's fixed.
The cause of the incorrect isolation was that we separately read the old tags and wrote the modified tags. In this series we introduce a new function, `modify_tags()` which can do both under one lock, so concurrent tag operations are serialized and therefore isolated as expected.
Fixes#6389.
Closes#13150
* github.com:scylladb/scylladb:
test/alternator: test concurrent TagResource / UntagResource
db/tags: drop unsafe update_tags() utility function
alternator: isolate concurrent modification to tags
db/tags: add safe modify_tags() utility functions
migration_manager: expose access to storage_proxy
S3 client cannot perform anonymous multipart uploads into any real S3
buckets regardless of their configuration. Since multipart upload is
essential part of the sstables backend, we need to implement the
authorisation support for the client early.
(side note): with minio anonymous multipart upload works, with aws s3
anonymous PUT and DELETE can be configured, it's exactly the combination
of aws + multipart upload that does need authorization.
Fortunately, the signature generation and signature checking code is
symmetrical and we have the checking option already in alternator :) So
what this patch does is just moves the alternator::get_signature()
helper into utils/. A sad side effect of that is all tests now need to
link with gnutls :( that is used to compute the hash value itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13428
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `partition_region` with the help of fmt::ostream.
to help with the review process, the corresponding `to_string()` is
dropped, and its callers now switch over to `fmt::to_string()` in
this change as well. to use `fmt::to_string()` helps with consolidating
all places to use fmtlib for printing/formatting.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, the line is 249 chars long, so split it into
multiple lines for better readabitlity.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, alternator_timeout_in_ms is not live-updatable,
as after setting executor's default timeout right before creating
sharded executor instances, they never get updated with this option
anymore.
in this change,
* alternator_timeout_in_ms is marked as live-updateable
* executor::_s_default_timeout is changed to a thread_local variable,
so it can be updated by a per-shard updateable_value. and
it is now a updateable_value, so its variable name is updated
accordingly. this value is set in the ctor of executor, and
it is disconnected from the corresponding named_value<> option
in the dtor of executor.
* alternator_timeout_in_ms is passed to the constructor of
executor via sharded_parameter, so executor::_timeout_in_ms can
be initialized on per-shard basis
* executor::set_default_timeout() is dropped, as we already pass
the option to executor in its ctor.
please note, in the ctor of executor, we always update the cached
value of `s_default_timeout` with the value of `_timeout_in_ms`,
and we set the default timeout to 10s in `alternator_test_env`.
this is a design decision to avoid bending the production code for
testing, as in production, we always set the timeout with the value
specified either by the default value of yaml conf file.
Fixes#12232
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change is a leftover of 063b3be,
which failed to include the changes in the header files.
it turns out we have `using namespace httpd;` in seastar's
`request_parser.rl`, and we should not rely on this statement to
expose the symbols in `seatar::httpd` to `seastar` namespace.
in this change,
* api/*.hh: all httpd symbols are referenced by `httpd::*`
instead of being referenced as if they are in `seastar`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
We compare a signed variable to an unsigned one, which can
yield surprising results. In this case, it is harmless since
we already validated the signed input is positive, but
use std::cmp_less() to quench any doubts (and warnings).
Alternator modifies tags in three operations - TagResource, UntagResource
and UpdateTimeToLive (the latter uses a tag to store the TTL configuration).
All three operations were implemented by three separate steps:
1. Read the current tags.
2. Modify the tags according to the desired operation.
3. Write the modified tags back with update_tags().
This implementation was not safe for concurrent operations - some
modifications may be be lost. We fix this in this patch by using the new
modify_tags() function introduced in the previous patch, which performs
all three steps under one lock so the tag operations are serialized and
correctly isolated.
Fixes#6389
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
we should assume that some included header does this for us.
we'd have following compiling failure if seastar's
src/http/request_parser.rl does not `using namespace httpd;` anymore.
```
/home/kefu/dev/scylladb/alternator/streams.cc:433:55: error: no matching literal operator for call to 'operator""h' with argument of type 'unsigned long long' or 'const char *', and no matching literal operator template
static constexpr auto dynamodb_streams_max_window = 24h;
^
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
despite that RapidJSON is a header-only library, we still need to
find it and "link" against it for adding the include directory.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
otherwise we'd have
```
In file included from /home/kefu/dev/scylladb/alternator/executor.cc:37:
/home/kefu/dev/scylladb/cql3/util.hh:21:10: fatal error: 'cql3/CqlParser.hpp' file not found
^~~~~~~~~~~~~~~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.
the building systems are updated accordingly.
the source files referencing `types.hh` were updated using following
command:
```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```
the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#12926
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
alternator headers are exposed to the target which links against it,
so let's expose them using the `target_include_directories()`.
also, `alternator` uses Seastar library and uses xxHash indirectly.
we should fix the latter by exposing the included header instead,
but for now, let's just link alternator directly to xxHash.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
this is the first step to reenable cmake to build scylla, so we can experiment C++20 modules and other changes before porting them to `configure.py` . please note, this changeset alone does not address all issues yet. as this is a low priority project, i want to do this in smaller (or tiny!) steps.
* build: cmake: s/Abseil/absl/
* build: cmake: sync with source files compiled in configure.py
* build: cmake: do not generate crc_combine_table at build time
* build: cmake: use packaged libdeflate
Closes#12838
* github.com:scylladb/scylladb:
build: cmake: add rust binding
build: cmake: extract cql3 and alternator out
build: cmake: use packaged libdeflate
build: cmake: do not generate crc_combine_table at build time
build: cmake: sync with source files compiled in configure.py
build: cmake: s/Abseil/absl/
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
Fixes#12601 (maybe?)
Sort the set of tables on ID. This should ensure we never
generate duplicates in a paged listing here. Can obviously miss things if they
are added between paged calls and end up with a "smaller" UUID/ARN, but that
is to be expected.
Main assumption here is that if is_big is good enough for
GetBatchItems operation it should work well also for Scan,
Query and GetRecords. And it's easier to maintain more unified
code.
Additionally 'future<> print' documentation used for streaming
suggests that there is quite big overhead so since it seems the
only motivation for streaming was to reduce contiguous allocation
size below some threshold we should not stream when this threshold
is not exceeded.
Closes#12164
This decreases the whole alternator::get_table cpu time by 78%
(from 2.8 us to 0.6 us on my cpu).
In perf_simple_query it decreases allocs/op by 1.6% (by removing 4 allocations)
and increases median tps by 3.4%.
Raw results from running:
./build/release/test/perf/perf_simple_query_g --smp 1 \
--alternator forbid --default-log-level error \
--random-seed=1235000092 --duration=180 --write
Before the patch:
median 46903.65 tps (197.2 allocs/op, 12.1 tasks/op, 170886 insns/op, 0 errors)
median absolute deviation: 210.15
maximum: 47354.59
minimum: 42535.63
After the patch:
median 48484.76 tps (194.1 allocs/op, 12.1 tasks/op, 168512 insns/op, 0 errors)
median absolute deviation: 317.32
maximum: 49247.69
minimum: 44656.38
Closes#12445
data
We'll try to distinguish the case when data comes from the storage rather
than user reuqest. Such attribute can be used in expressions and
when it can't be decoded it should make expression evaluate as
false to simply exclude the row during filter query or scan.
Note that this change focuses on binary type, for other types we
may have some inconsistencies in the implementation.
Now that we don't accept cql protocol version 1 or 2, we can
drop cql_serialization format everywhere, except when in the IDL
(since it's part of the inter-node protocol).
A few functions had duplicate versions, one with and one without
a cql_serialization_format parameter. They are deduplicated.
Care is taken that `partition_slice`, which communicates
the cql_serialization_format across nodes, still presents
a valid cql_serialization_format to other nodes when
transmitting itself and rejects protocol 1 and 2 serialization\
format when receiving. The IDL is unchanged.
One test checking the 16-bit serialization format is removed.
The first patch in this small series fixes a hang during shutdown when the expired-item scanning thread can hang in a retry loop instead of quitting. These hangs were seen in some test runs (issue #12145).
The second patch is a failsafe against additional bugs like those solved by the first patch: If any bugs causes the same page fetch to repeatedly time out, let's stop the attempts after 10 retries instead of retrying for ever. When we stop the retries, a warning will be printed to the log, Scylla will wait until the next scan period and start a new scan from scratch - from a random position in the database, instead of hanging potentially-forever waiting for the same page.
Closes#12152
* github.com:scylladb/scylladb:
alternator ttl: in scanning thread, don't retry the same page too many times
alternator: fix hang during shutdown of expiration-scanning thread
Since fixing issue #11737, when the expiration scanner times out reading
a page of data, it retries asking for the same page instead of giving up
on the scan and starting anew later. This retry was infinite - which can
cause problems if we have a bug in the code or several nodes down, which
can lead to getting hung in the same place in the scan for a very long
(potentially infinite) time without making any progress.
An example of such a bug was issue #12145, where we forgot to handle
shutdowns, so on shutdown of the cluster we just hung forever repeating
the same request that will never succeed. It's better in this case to
just give up on the current scan, and start it anew (from a random
position) later.
Refs #12145 (that issue was already fixed, by a different patch which
stops the iteration when shutting down - not waiting for an infinite
number of iterations and not even one more).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The expiration-scanning thread is a long-running thread which can scan
data for hours, but checks for its abort-source before fetching each
page to allow for timely shutdown. Recently, we added the ability to
retry the page fetching in case of timeout, for forgot to check the
abort source in this new retry loop - which lead to an infinitely-long
shutdown in some tests while the retry loop retries forever.
In this patch we fix this bug by using sleep_abortable() instead of
sleep(). sleep_abortable() will throw an exception if the abort source
was triggered before or during the sleep - and this exception will
stop the scan immediately.
Fixes#12145
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The Alternator TTL expiration scanner scans an entire table using many
small pages. If any of those pages time out for some reason (e.g., an
overload situation), we currently consider the entire scan to have failed
and wait for the next scan period (which by default is 24 hours) when
we start the scan from scratch (at a random position). There is a risk
that if these timeouts are common enough to occur once or more per
scan, the result is that we double or more the effective expiration lag.
A better solution, done in this patch, is to retry from the same position
if a single page timed out - immediately (or almost immediately, we add
a one-second sleep).
Fixes#11737
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#12092
This bug doesn't affect anything, the reason is descibed in the commit:
'alternator: fix wrong 'where' condition for GSI range key'.
But it's theoretically correct to escape those key names and
the difference can be observed via CQL's describe table. Before
the patch 'where' condition is missing one double quote in variable
name making it mismatched with corresponding column name.