Commit Graph

624 Commits

Author SHA1 Message Date
Avi Kivity
f0643d1713 alternator: ttl: do not copy mutation while constructing a vector
The vector(initializer_list<T>) constructor copies the T since
initializer_list is read-only. Move the mutation instead.

This happens to fix a use-after-return on clang 15 on aarch64.
I'm fairly sure that's a miscompile, but the fix is worthwhile
regardless.

Closes #11818
2022-10-21 10:04:00 +03:00
Nadav Har'El
264f453b9d Merge 'Associate alternator user with its service level configuration' from Piotr Sarna
Until now, authentication in alternator served only two purposes:
 - refusing clients without proper credentials
 - printing user information with logs

After this series, this user information is passed to lower layers, which also means that users are capable of attaching service levels to roles, and this service level configuration will be effective with alternator requests.

tests: manually by adding more debug logs and inspecting that per-service-level timeout value was properly applied for an authenticated alternator user

Fixes #11379

Closes #11380

* github.com:scylladb/scylladb:
  alternator: propagate authenticated user in client state
  client_state: add internal constructor with auth_service
  alternator: pass auth_service and sl_controller to server
2022-10-19 23:27:48 +03:00
Alexander Turetskiy
636e14cc77 Alternator: Projection field added to return from DescribeTable which describes GSIs and LSIs.
The return from DescribeTable which describes GSIs and LSIs is missing
the Projection field. We do not yet support all the settings Projection
(see #5036), but the default which we support is ALL, and DescribeTable
should return that in its description.

Fixes #11470

Closes #11693
2022-10-19 19:01:08 +03:00
Nadav Har'El
4a7794fb64 alternator: better error message when adding a GSI to an existing table
Due to issue #11567, Alternator do not yet support adding a GSI to an
existing table via UpdateTable with the GlobalSecondaryIndexUpdates
parameter.

However, currently, we print a misleading error message in this case,
complaining about the AttributeDefinitions parameter. This parameter
is also required with GlobalSecondaryIndexUpdates, but it's not the
main problem, and the user is likely to be confused why the error message
points to that specific paramter and what it means that this parameter
is claimed to be "not supported" (while it is supported, in CreateTable).
With this patch, we report that GlobalSecondaryIndexUpdates is not
supported.

This patch does not fix the unsupported feature - it just improves
the error message saying that it's not supported.

Refs #11567

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11650
2022-09-29 09:00:31 +03:00
Piotr Sarna
481240b8b4 Merge 'Alternator: Run more TTL tests by default (and add a test for metrics)' from Nadav Har'El
We had quite a few tests for Alternator TTL in test/alternator, but most
of them did not run as part of the usual Jenkins test suite, because
they were considered "very slow" (and require a special "--runveryslow"
flag to run).

In this series we enable six tests which run quickly enough to run by
default, without an additional flag. We also make them even quicker -
the six tests now take around 2.5 seconds.

I also noticed that we don't have a test for the Alternator TTL metrics
- and added one.

Fixes #11374.
Refs https://github.com/scylladb/scylla-monitoring/issues/1783

Closes #11384

* github.com:scylladb/scylladb:
  test/alternator: insert test names into Scylla logs
  rest api: add a new /system/log operation
  alternator ttl: log warning if scan took too long.
  alternator,ttl: allow sub-second TTL scanning period, for tests
  test/alternator: skip fewer Alternator TTL tests
  test/alternator: test Alternator TTL metrics
2022-09-22 09:47:50 +02:00
Piotr Sarna
5597bc8573 Merge 'Alternator: test and fix crashes and errors...
when using ":attrs" attribute' from Nadav Har'El

This PR improves the testing for issue #5009 and fixes most of it (but
not all - see below). Issue #5009 is about what happens when a user
tries to use the name `:attrs` for an attribute - while Alternator uses
a map column with that name to hold all the schema-less attributes of an
item.  The tests we had for this issue were partial, and missed the
worst cases which could result in Scylla crashing on specially-crafted
PutItem or UpdateItem requests.

What the tests missed were the cases that `:attrs` is used as a
**non-key**. So in this PR we add additional tests for this case,
several of them fail or even crash Scylla, and then we fix all these
cases.

Issue #5009 remains open because using `:attrs` as the name of a **key**
is still not allowed. But because it results in a clean error message
when attempting to create a table with such a key, I consider this
remaining problem very minor.

Refs #5009.

Closes #11572

* github.com:scylladb/scylladb:
  alternator: fix crashes an errors when using ":attrs" attribute
  alternator: improve tests for reserved attribute name ":attrs"
2022-09-19 09:48:06 +02:00
Nadav Har'El
999ca2d588 alternator: fix crashes an errors when using ":attrs" attribute
Alternator uses a single column, a map, with the deliberately strange
name ":attrs", to hold all the schema-less attributes of an item.
The existing code is buggy when the user tries to write to an attribute
with this strange name ":attrs". Although it is extremely unlikely that
any user would happen to choose such a name, it is nevertheless a legal
attribute name in DynamoDB, and should definitely not cause Scylla to crash
as it does in some cases today.

The bug was caused by the code assuming that to check whether an attribute
is stored in its own column in the schema, we just need to check whether
a column with that name exists. This is almost true, except for the name
":attrs" - a column with this name exists, but it is a map - the attribute
with that name should be stored *in* the map, not as the map. The fix
is to modify that check to special-case ":attrs".

This fix makes the relevant tests, which used to crash or fail, now pass.

This fix solves most of #5009, but one point is not yet solved (and
perhaps we don't need to solve): It is still not allowed to use the
name ":attrs" for a **key** attribute. But trying to do that fails cleanly
(during the table creation) with an appropriate error message, so is only
a very minor compatibility issue.

Refs #5009

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-19 10:30:11 +03:00
Nadav Har'El
33e6a88d9a alternator ttl: comment fixes
This patch fixes a few errors and out-of-date descriptions in comments
in alternator/ttl.cc. No functional changes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-15 00:03:43 +03:00
Nadav Har'El
b9792ffb06 alternator ttl: log warning if scan took too long.
Currently, we log at "info" level how much time remained at the end of
a full TTL scan until the next scanning period (we sleep for that time).
If the scan was slower than the period, we didn't print anything.

Let's print a warning in this case - it can be useful for debugging,
and also users should know when their desired scan period is not being
honored because the full scan is taking longer than the desired scan
period.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-12 10:32:56 +03:00
Nadav Har'El
e7e9adc519 alternator,ttl: allow sub-second TTL scanning period, for tests
Alternator has the "alternator_ttl_period_in_seconds" parameter for
controlling how often the expiration thread looks for expired items to
delete. It is usually a very large number of seconds, but for tests
to finish quickly, we set it to 1 second.
With 1 second expiration latency, test/alternator/test_ttl.py took 5
seconds to run.

In this patch, we change the parameter to allow a  floating-point number
of seconds instead of just an integer. Then, this allows us to halve the
TTL period used by tests to 0.5 seconds, and as a result, the run time of
test_ttl.py halves to 2.5 seconds. I think this is fast enough for now.

I verified that even if I change the period to 0.1, there is no noticable
slowdown to other Alternator tests, so 0.5 is definitely safe.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-09-12 10:32:56 +03:00
Piotr Sarna
2379a25ade alternator: propagate authenticated user in client state
From now on, when an alternator user correctly passed an authentication
step, their assigned client_state will have that information,
which also means proper access to service level configuration.
Previously the username was only used in tracing.
2022-09-05 10:43:29 +02:00
Piotr Sarna
9511c21686 alternator: pass auth_service and sl_controller to server
It's going to be needed to recreate a client state for an authenticated
user.
2022-09-05 10:03:00 +02:00
Nadav Har'El
941c719a23 alternator: return ProvisionedThroughput in DescribeTable
DescribeTable is currently hard-coded to return PAY_PER_REQUEST billing
mode. Nevertheless, even in PAY_PER_REQUEST mode, the DescribeTable
operation must return a ProvisionedThroughput structure, listing both
ReadCapacityUnits and WriteCapacityUnits as 0. This requirement is not
stated in some DynamoDB documentation but is explictly mentioned in
https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_ProvisionedThroughput.html
Also in empirically, DynamoDB returns ProvisionedThroughput with zeros
even in PAY_PER_REQUEST mode. We even had an xfailing test to confirm this.

The ProvisionedThroughput structure being missing was a problem for
applications like DynamoDB connectors for Spark, if they implicitly
assume that ProvisionedThroughput is returned by DescribeTable, and
fail (as described in issue #11222) if it's outright missing.

So this patch adds the missing ProvisionedThroughput structure, and
the xfailing test starts to pass.

Note that this patch doesn't change the fact that attempting to set
a table to PROVISIONED billing mode is ignored: DescribeTable continues
to always return PAY_PER_REQUEST as the billing mode and zero as the
provisioned capacities.

Fixes #11222

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11298
2022-08-22 09:58:09 +02:00
Botond Dénes
d1d53f1b84 query: add tombstone-limit to read-command
Propagate the tombstone-limit from coordinator to replicas, to make sure
all is using the same limit.
2022-08-10 06:01:47 +03:00
Benny Halevy
c71ef330b2 query-request, everywhere: define and use query_id as a strong type
Define query_id as a tagged_uuid
So it can be differentiated from other uuid-class types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:13:28 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Kamil Braun
ab946e392f alternator: ttl: pass gossiper& to expiration_service
This allows us to remove the `gossiper()` getter from `storage_proxy`.
2022-08-04 12:12:43 +02:00
Michał Sala
041cb77ad0 alternator, db: move the tag code to db/tags
Tags are a useful mechanism that could be used outside of alternator
namespace. My motivation to move tags_extension and other utilities to
db/tags/ was that I wanted to use them to mark "synchronous mode" views.

I have extracted `get_tags_of_table`, `find_tag` and `update_tags`
method to db/tags/utils.cc and moved alternator/tags_extension.hh to
db/tags/.

The signature of `get_tags_of_table` was changed from `const
std::map<sstring, sstring>&` to `const std::map<sstring, sstring>*`
Original behavior of this function was to throw an
`alternator::api_error` exception. This was undesirable, as it
introduced a dependency on the alternator module. I chose to change it
to return a potentially null value, and added a wrapper function to the
alternator module - `get_tags_of_table_or_throw` to keep the previous
throwing behavior.
2022-07-25 09:53:33 +02:00
Benny Halevy
acae3cc223 treewide: stop use of deprecated coroutine::make_exception
Convert most use sites from `co_return coroutine::make_exception`
to `co_await coroutine::return_exception{,_ptr}` where possible.

In cases this is done in a catch clause, convert to
`co_return coroutine::exception`, generating an exception_ptr
if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10972
2022-07-07 15:02:16 +03:00
Botond Dénes
2b6eeadc07 alternator: use position-in-partition in paging cookie only when reading CQL tables
Recently, we added full position-in-partition support to alternator's
paging cookie, so it can support stopping at arbitrary positions. This
support however is only really needed when tables have range tombstones
and alternator tables never have them. So to avoid having to make the
new fields in 'ExclusiveStartKey' reserved, we avoid filling these in
when reading an alternator table, as in this case it is safe to assume
the position is `after_key($clustring_key)`. We do include these new
members however when reading CQL tables through alternator. As this is
only supported for system tables, we can also be sure that the elaborate
names we used for these fields are enough to avoid naming clashes.
The condition in the code implementing this is actually even more
general: it only includes the region/weight members when the position
differs from that of a normal alternator one.
2022-06-30 15:10:30 +03:00
Botond Dénes
52058ea974 alternator: make is_alternator_keyspace() a standalone method 2022-06-30 14:18:29 +03:00
Avi Kivity
3131cbea62 Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes
Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones.
This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters.

Refs: https://github.com/scylladb/scylla/issues/3672
Refs: https://github.com/scylladb/scylla/issues/7689
Refs: https://github.com/scylladb/scylla/issues/7933

Tests: manual upgrade test.
I wrote a data set with:
```
./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000
```
This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload:
```
./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100
```
I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs.

Closes #10829

* github.com:scylladb/scylla:
  query: have replica provide the last position
  idl/query: add last_position to query_result
  mutlishard_mutation_query: propagate compaction state to result builder
  multishard_mutation_query: defer creating result builder until needed
  querier: use full_position instead of ad-hoc struct
  querier: rely on compactor for position tracking
  mutation_compactor: add current_full_position() convenience accessor
  mutation_compactor: s/_last_clustering_pos/_last_pos/
  mutation_compactor: add state accessor to compact_mutation
  introduce full_position
  idl: move position_in_partition into own header
  service/paging: use position_in_partition instead of clustering_key for last row
  alternator/serialization: extract value object parsing logic
  service/pagers/query_pagers.cc: fix indentation
  position_in_partition: add to_string(partition_region) and parse_partition_region()
  mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh
2022-06-27 12:23:21 +03:00
Avi Kivity
dab56b82fa Merge 'Per-partition rate limiting' from Piotr Dulikowski
Due to its sharded and token-based architecture, Scylla works best when the user workload is more or less uniformly balanced across all nodes and shards. However, a common case when this assumption is broken is the "hot partition" - suddenly, a single partition starts getting a lot more reads and writes in comparison to other partitions. Because the shards owning the partition have only a fraction of the total cluster capacity, this quickly causes latency problems for other partitions within the same shard and vnode.

This PR introduces per-partition rate limiting feature. Now, users can choose to apply per-partition limits to their tables of choice using a schema extension:

```
ALTER TABLE ks.tbl
WITH per_partition_rate_limit = {
	'max_writes_per_second': 100,
	'max_reads_per_second': 200
};
```

Reads and writes which are detected to go over that quota are rejected to the client using a new RATE_LIMIT_ERROR CQL error code - existing error codes didn't really fit well with the rate limit error, so a new error code is added. This code is implemented as a part of a CQL protocol extension and returned to clients only if they requested the extension - if not, the existing CONFIG_ERROR will be used instead.

Limits are tracked and enforced on the replica side. If a write fails with some replicas reporting rate limit being reached, the rate limit error is propagated to the client. Additionally, the following optimization is implemented: if the coordinator shard/node is also a replica, we account the operation into the rate limit early and return an error in case of exceeding the rate limit before sending any messages to other replicas at all.

The PR covers regular, non-batch writes and single-partition reads. LWT and counters are not covered here.

Results of `perf_simple_query --smp=1 --operations-per-shard=1000000`:

- Write mode:
  ```
  8f690fdd47 (PR base):
  129644.11 tps ( 56.2 allocs/op,  13.2 tasks/op,   49785 insns/op)
  This PR:
  125564.01 tps ( 56.2 allocs/op,  13.2 tasks/op,   49825 insns/op)
  ```
- Read mode:
  ```
  8f690fdd47 (PR base):
  150026.63 tps ( 63.1 allocs/op,  12.1 tasks/op,   42806 insns/op)
  This PR:
  151043.00 tps ( 63.1 allocs/op,  12.1 tasks/op,   43075 insns/op)
  ```

Manual upgrade test:
- Start 3 nodes, 4 shards each, Scylla version 8f690fdd47
- Create a keyspace with scylla-bench, RF=3
- Start reading and writing with scylla-bench with CL=QUORUM
- Manually upgrade nodes one by one to the version from this PR
- Upgrade succeeded, apart from a small number of operations which failed when each node was being put down all reads/writes succeeded
- Successfully altered the scylla-bench table to have a read and write limit and those limits were enforced as expected

Fixes: #4703

Closes #9810

* github.com:scylladb/scylla:
  storage_proxy: metrics for per-partition rate limiting of reads
  storage_proxy: metrics for per-partition rate limiting of writes
  database: add stats for per partition rate limiting
  tests: add per_partition_rate_limit_test
  config: add add_per_partition_rate_limit_extension function for testing
  cf_prop_defs: guard per-partition rate limit with a feature
  query-request: add allow_limit flag
  storage_proxy: add allow rate limit flag to get_read_executor
  storage_proxy: resultize return type of get_read_executor
  storage_proxy: add per partition rate limit info to read RPC
  storage_proxy: add per partition rate limit info to query_result_local(_digest)
  storage_proxy: add allow rate limit flag to mutate/mutate_result
  storage_proxy: add allow rate limit flag to mutate_internal
  storage_proxy: add allow rate limit flag to mutate_begin
  storage_proxy: choose the right per partition rate limit info in write handler
  storage_proxy: resultize return types of write handler creation path
  storage_proxy: add per partition rate limit to mutation_holders
  storage_proxy: add per partition rate limit info to write RPC
  storage_proxy: add per partition rate limit info to mutate_locally
  database: apply per-partition rate limiting for reads/writes
  database: move and rename: classify_query -> classify_request
  schema: add per_partition_rate_limit schema extension
  db: add rate_limiter
  storage_proxy: propagate rate_limit_exception through read RPC
  gms: add TYPED_ERRORS_IN_READ_RPC cluster feature
  storage_proxy: pass rate_limit_exception through write RPC
  replica: add rate_limit_exception and a simple serialization framework
  docs: design doc for per-partition rate limiting
  transport: add rate_limit_error
2022-06-24 01:32:13 +03:00
Botond Dénes
2b0bc11f2e service/paging: use position_in_partition instead of clustering_key for last row
The former allows for expressing more positions, like a position
before/after a clustering key. This practically enables the coordinator
side paging logic, for a query to be stopped at a tombstone (which can
have said positions).
2022-06-23 13:36:20 +03:00
Botond Dénes
adabe3b5a3 alternator/serialization: extract value object parsing logic
To make it reusable by a method added by the next patch.
2022-06-23 11:33:18 +03:00
Piotr Dulikowski
a7ad70600d query-request: add allow_limit flag
Adds allow_limit flag to the read_command. The flag decides whether rate
limiting of this operation is allowed.
2022-06-22 20:16:49 +02:00
Piotr Dulikowski
e6beab3106 storage_proxy: add allow rate limit flag to mutate/mutate_result
Now, mutate/mutate_result accept a flag which decides whether the write
should be rate limited or not.

The new parameter is mandatory and all call sites were updated.
2022-06-22 20:16:49 +02:00
Pavel Emelyanov
98a4d41e31 alternator: Get rack/datacenter from topology
It's needed in two places, both can get topology from the proxy's token
metadata.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-22 11:47:26 +03:00
Nadav Har'El
3aca1ca572 alternator: make BatchGetItem group reads by partition
DynamoDB API's BatchGetItem invokes a number (up to 25) of read requests
in parallel, returning when all results are available. Alternator naively
implemented this by sending all read requests in parallel, no matter which
requests these were.

That implementation was inefficient when all the requests are to different
items (clustering rows) of the same partition. In a multi-node setup this
will end up sending 25 separate requests to the same remote node(s). Even
on a single-node setup, this may result in reading from disk more than
once, and even if the partition is cached - doing an O(logN) search in
each multiple times.

What we do in this patch, instead, is to group all the BatchGetItem
requests that aimed at the same partition into a single read request
asking for a (sorted) list of clustering keys. This is similar to an
"IN" request in CQL.

As an example of the performance benefit of this patch, I tried a
BatchGetItem request asking for 20 random items from a 10-million item
partition. I measured the latency of this request on a single-node
Scylla. Before this patch, I saw a latency of 17-21 ms (the lower number
is when the request is retried and the requested items are already in
the cache). After this patch, the latency is 10-14 ms. The performance
improvement on multi-node clusters are expected to be even higher.

Unfortunately the patch is less trivial than I hoped it would be,
because some of the old code was organized under the assumption that
each read request only returned one item (and if it failed, it means
only one item failed), so this part of the code had to be reorganized
(and, for making the code more readable, coroutinized).

An unintended benefit of the code reorganization is that it also gave
me an opportunity to fail an attempt to ask BatchGetItem the same
item more than once (issue #10757).

The patch also adds a few more corner cases in the tests, to be even
more sure that the code reorganization doesn't introduce a regression
in BatchGetItem.

Fixes #10753
Fixes #10757

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-19 14:47:57 +03:00
Nadav Har'El
e20233dab1 alternator: improve error handling when trying to tag a GSI or LSI
In issue #10786, we raised the idea of maybe allowing to tag (with
TagResource) GSIs and LSIs, not just base tables. However, currently,
neither DynamoDB nor Syclla allows it. So in this patch we add a
test that confirms this. And while at it, we fix Alternator to
return the same error message as DynamoDB in this case.

Refs #10786.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-13 18:14:42 +03:00
Nadav Har'El
8866c326de alternator: forbid duplicate index (LSI and GSI) names
Adding an LSI and GSI with the same name to the same Alternator table
should be forbidden - because if both exists only one of them (the GSI)
would actually be usable. DynamoDB also forbids such duplicate name.

So in this patch we add a test for this issue, and fix it.

Since the patch involves a few more uses of the IndexName string,
we also clean up its handling a bit, to use std::string_view instead
of the old-style std::string&.

Fixes #10789

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-13 18:14:42 +03:00
Nadav Har'El
00866a75d8 alternator: add ARN for indexes (LSI and GSI)
DynamoDB gives an ARN ("Amazon Resource Name") to LSIs and GSIs. These
look like BASEARN/index/INDEXNAME, where BASEARN is the ARN of the base
table, and INDEXNAME is the name of the LSI or the GSI.

These ARNs should be returned by DescribeTable as part of its
description of each index, and this patch adds that missing IndexArn
field.

The ARN we're adding here is hardly useful (e.g., as explained in
issue #10786, it can't be used to add tags to the index table),
but nevertheless should exist for compatibility with DynamoDB.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-06-13 18:14:42 +03:00
Nadav Har'El
62b6179c88 alternator: remove dead code
Remove the function make_keyspace_name() that was never used.

We *could* have used this function, but we didn't, and it had
had an inconvenient API. If we later want to de-duplicate the
several copies of "executor::KEYSPACE_NAME_PREFIX + table_name"
we have in the code, we can do it with a better API.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-26 15:21:14 +03:00
Nadav Har'El
d0ca09a925 alternator: implement DescribeContinuousBackups operation
Although we don't yet support the DynamoDB API's backup features (see
issue #5063), we can already implement the DescribeContinuousBackups
operation. It should just say that continuous backups, and point-in-time
restores, and disabled.

This will be useful for client code which tries to inquire about
continuous backups, even if not planning to use them in practice
(e.g., see issue #10660).

Refs #5063
Refs #10660

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-26 15:13:50 +03:00
Nadav Har'El
8ecf1e306f alternator: allow DescribeTimeToLive even without TTL enabled
We still consider the TTL support in Alternator to be experimental, so we
don't want to allow a user to enable TTL on a table without turning on a
"--experimental-features" flag. However, there is no reason not to allow
the DescribeTimeToLive call when this experimental flag is off - this call
would simply reply with the truth - that the TTL feature is disabled for
the table!

This is important for client code (such as the Terraform module
described in issue #10660) which uses DescribeTimeToLive for
information, even when it never intends to actually enable TTL.

The patch is trivial - we simply remove the flag check in
DescribeTimeToLive, the code works just as before.

After this patch, the following test now works on Scylla without
experimental flags turned on:

    test/alternator/run test_ttl.py::test_describe_ttl_without_ttl

Refs #10660

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-05-26 10:55:36 +03:00
Pavel Emelyanov
5ac28a29d3 gossiper, code: Relax get_up/down/all_counters() helpers
These helpers count elements in the endpoint state map. It makes sense
to keep them in gossiper API, but it's worth removing the wrappers that
do invoke_on(0). This makes code shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-06 10:34:48 +03:00
Avi Kivity
19ab3edd77 gms: feature_service: remove variable/helper function duplication
Each feature has a private variable and a public accessor. Since the
accessor effectively makes the variable public, avoid the intermediary
and make the variable public directly.

To ease mechanical translation, the variable name is chosen as
the function name (without the cluster_supports_ prefix).

References throughout the codebase are adjusted.
2022-05-04 18:59:56 +03:00
Pavel Emelyanov
7a5c2cdbe6 alternator: Don't use global gossiper
There's proxy at hand which can provide local gossiper reference

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-05-03 10:33:07 +03:00
Avi Kivity
582802825a treewide: use system-#include (angle brackets) for seastar
Seastar is an external library from Scylla's point of view so
we should use the angle bracket #include style. Most of the source
follows this, this patch fixes a few stragglers.

Also fix cases of #include which reached out to seastar's directory
tree directly, via #include "seastar/include/sesatar/..." to
just refer to <seastar/...>.

Closes #10433
2022-04-26 14:46:42 +03:00
Avi Kivity
40beb48176 alternator: ttl: avoid specializing class templates in non-namespace scope
The C++ standard disallows class template specialization in non-namespace
scopes. Clang apparently allows it as an extension.

Fix by not using a template - there are just two specializations and
no generic implementation. Use regular classes and std::conditional_t
to choose between the two.
2022-04-18 12:27:18 +03:00
Avi Kivity
b5e8e32c01 alternator: executor: fix signed/unsigned comparison in is_big()
Signed/unsigned comparisons are subject to C promotion rules. In is_big()
in this case the comparison is safe, but gcc warns. Use a cast to silence
the warning.

The sign/unsigned mix and int/size_t size differences still look bad, it
would be good to revisit this code, but that is left for another patch.
2022-04-18 12:23:18 +03:00
Nadav Har'El
84143c2ee5 alternator: implement Select option of Query and Scan
This patch implements the previously-unimplemented Select option of the
Query and Scan operators.

The most interesting use case of this option is Select=COUNT which means
we should only count the items, without returning their actual content.
But there are actually four different Select settings: COUNT,
ALL_ATTRIBUTES, SPECIFIC_ATTRIBUTES, and ALL_PROJECTED_ATTRIBUTES.

Five previously-failing tests now pass, and their xfail mark is removed:

 *  test_query.py::test_query_select
 *  test_scan.py::test_scan_select
 *  test_query_filter.py::test_query_filter_and_select_count
 *  test_filter_expression.py::test_filter_expression_and_select_count
 *  test_gsi.py::test_gsi_query_select_1

These tests cover many different cases of successes and errors, including
combination of Select and other options. E.g., combining Select=COUNT
with filtering requires us to get the parts of the items needed for the
filtering function - even if we don't need to return them to the user
at the end.

Because we do not yet support GSI/LSI projection (issue #5036), the
support for ALL_PROJECTED_ATTRIBUTES is a bit simpler than it will need
to be in the future, but we can only finish that after #5036 is done.

Fixes #5058.

The most intrusive part of this patch is a change from attrs_to_get -
a map of top-level attributes that a read needs to fetch - to an
optional<attrs_to_get>. This change is needed because we also need
to support the case that we want to read no attributes (Select=COUNT),
and attrs_to_get.empty() used to mean that we want to read *all*
attributes, not no attributes. After this patch, an unset
optional<attrs_to_get> means read *all* attributes, a set but empty
attrs_to_get means read *no* attributes, and a set and non-empty
attrs_to_get means read those specific attributes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-2-nyh@scylladb.com>
2022-04-11 10:04:32 +02:00
Nadav Har'El
9c1ebdceea alternator: forbid empty AttributesToGet
In DynamoDB one can retrieve only a subset of the attributes using the
AttributesToGet or ProjectionExpression paramters to read requests.
Neither allows an empty list of attributes - if you don't want any
attributes, you should use Select=COUNT instead.

Currently we correctly refuse an empty ProjectionExpression - and have
a test for it:
test_projection_expression.py::test_projection_expression_toplevel_syntax

However, Alternator is missing the same empty-forbidding logic for
AttributesToGet. An empty AttributesToGet is currently allowed, and
basically says "retrieve everything", which is sort of unexpected.

So this patch adds the missing logic, and the missing test (actually
two tests for the same thing - one using GetItem and the other Query).

Fixes #10332

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220405113700.9768-1-nyh@scylladb.com>
2022-04-11 10:21:02 +03:00
Nadav Har'El
67e0590bbc alternator: remove old TODO (with test verifying it)
We had an old TODO in the Alternator "Scan" operation code which
suggested that we may need to do something to limit the size of pages
when a row limit ("Limit") isn't given.

But we do already have a built-in limit on page sizes (1 MB),
so this TODO isn't needed and can be removed.

But I also wanted to make sure we have a test that this limit works:

We already had a test that this 1 MB limit works for a single-partition
Query (test_query.py::test_query_reverse_longish - tested both forward
and reversed queries). In this patch I add a similar test for a whole-
table Scan. It turns out that although page size is limited in this case
as well, it's not exactly 1 MB... For small tables can even reach 3 MB.
I consider this "good enough" and that we can drop the TODO, but also
opened issue #10327 to document this surprising (for me) finding.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220404145240.354198-1-nyh@scylladb.com>
2022-04-05 09:23:23 +03:00
Nadav Har'El
7f89c8b3e3 alternator: clean error shutdown in case of TLS misconfigration
The way our boot-time service "controllers" are written, if a
controller's start_server() finds an error and throws, it cannot
the caller (main.cc) to call stop_server(), and must clean up
resources already created (e.g., sharded services) before returning
or risk crashes on assertion failures.

This patch fixes such a mistake in Alternator's initialization.
As noted in issue #10025, if the Alternator TLS configuration is
broken - especially the certificate or key files are missing -
Scylla would crash on an assertion failure, instead of reporting
the error as expected. Before this patch such a misconfiguration
will result in the unintelligible:

<alternator::server>::~sharded() [Service = alternator::server]:
Assertion `_instances.empty()' failed. Aborting on shard 0.

After this patch we get the right error message:

ERROR 2022-03-21 15:25:07,553 [shard 0] init - Startup failed:
std::_Nested_exception<std::runtime_error> (Failed to set up Alternator
TLS credentials): std::_Nested_exception<std::runtime_error> (Could not
read certificate file conf/scylla.crt): std::filesystem::__cxx11::
filesystem_error (error system:2, filesystem error: open failed:
No such file or directory [conf/scylla.crt])

Arguably this error message is a bit ugly, so I opened
https://github.com/scylladb/seastar/issues/1029, but at least it says
exactly what the error is.

Fixes #10025

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220321133323.3150939-1-nyh@scylladb.com>
2022-03-28 15:26:42 +03:00
Nadav Har'El
653f2df28f alternator: fix JSON escaping of error responses
In the DynamoDB API, error responses are in JSON format with specific
fields ("__type" and "message" in the x-amz-json-1.0 format currently
used). Alternator tried to be clever and build the string representation
of this JSON itself, instead of using RapidJSON. But this optimization
was a mistake - if the error message contains characters that need
escaping (such as double quotes and newlines), they weren't escaped,
and the resulting JSON was malformed. When the client library boto3
read this malformed JSON it got confused, cosidered the entire error
response to be a string, which resulted in an ugly error message.

The fix is easy - just build the JSON output as usual with RapidJSON
instead of trying to optimize using string operation.

The patch also includes two tests reproducing this bug and checking its
fix. The first test uses boto3 and shows it got confused on the type
of error (not understanding that it is a ValidationException). The
second test bypasses boto3 and shows exactly where the bug happens -
the response is an unparsable JSON.

Fixes #10278

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220327132705.3707979-1-nyh@scylladb.com>
2022-03-27 16:32:36 +03:00
Mikołaj Sielużycki
6f1b6da68a compile: Fix headers so that *-headers targets compile cleanly.
Closes #10273
2022-03-25 16:19:26 +02:00
Pavel Solodovnikov
95c8d65949 treewide: fix compilation issues with fmtlib 8.1.0+
Due to fd62fba985
scoped enums are not automatically converted to integers anymore,
this is the intended behavior, according to the fmtlib devs.

A bit nicer solution would be to use `std::to_underlying`
instead of a direct `static_cast`, but it's not available until
C++23 and some compilers are still missing the support for it.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-03-16 12:31:50 +03:00
Nadav Har'El
c26230943b alternator ttl: add metrics
This patch adds metrics to the Alternator TTL feature (aka the "expiration
service").

I put these metrics deliberately in their own object in ttl.{hh,cc}, and
also with their own prefix ("expiration_*") - and *not* together with the
rest of the Alternator metrics (alternator/stats.{hh,cc}). This is
because later we may want to use the expiration service not only in
Alternator but also in CQL - to support per-item expiration with CDC
events also in CQL. So the implementation of this feature should not be
too tangled with that of Alternator.

The patch currently adds four metrics, and opens the path to easily add
more in the future. The metrics added now are:

1. scylla_expiration_scan_passes:  The number of scan passes over the
       entire table. We expect this to grow by 1 every
       alternator_ttl_period_in_seconds seconds.

2. scylla_expiration_scan_table: The number of table scans. In each scan
       pass, we scan all the tables that have the Alternator TTL feature
       enabled. Each scan of each table is counted by this counter.

3. scylla_expiration_items_deleted: Counts the number of items that
       the expiration service expired (deleted). Please remember that
       each item is considered for expiration - and then expired - on
       only one node, so each expired item is counted only once - not
       RF times.

4. scylla_expiration_secondary_ranges_scanned: If this counter is
       incremented, it means this node took over some other node's
       expiration scanning duties while the other node was down.

This patch also includes a couple of unrelated comment fixes.

I tested the new metrics manually - they aren't yet tested by the
Alternator test suite because I couldn't make up my mind if such
tests would belong in test_ttl.py or test_metrics.py :-)

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220224092419.1132655-1-nyh@scylladb.com>
2022-02-25 07:26:11 +02:00
Nadav Har'El
db7b11cfc4 alternator: make TTL expiration scanner bypass cache
The background scan for expired Alternator items (the TTL feature)
should bypass the cache to avoid poluting it with the entire content
of the table being scanned.

I tested that the flag added in this patch really works by adding a printout
to the code in table.cc which creates the reader. Although we do have a
metric for uses of BYPASS CACHE, unfortunately this metric counts usage
of "BYPASS CACHE" in CQL statements - and not does not account the low-
level calls that we use in the ttl scanner.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2022-02-25 07:26:11 +02:00