Compare commits

..

225 Commits

Author SHA1 Message Date
Calle Wilund
2bbf3cf669 system_keyspace: Prune dropped tables from truncation on start/drop
Fixes #25683

Once a table drop is complete, there should be no reason to retain
truncation records for it, as any replay should skip mutations
anyway (no CF), and iff we somehow resurrect a dropped table,
this replay-resurrected data is the least problem anyway.

Adds a prune phase to the startup drop_truncation_rp_records run,
which ignores updating, and instead deletes records for non-existant
tables (which should patch any existing servers with lingering data
as well).

Also does an explicit delete of records on actual table DROP, to
ensure we don't grow this table more than needed even in long
uptime nodes.

Small unit test included.

Closes scylladb/scylladb#25699

(cherry picked from commit bc20861afb)

Closes scylladb/scylladb#25815
2025-09-05 19:02:39 +03:00
Botond Dénes
c30c1ec40a Merge '[Backport 2025.3] drop table: fix crash on drop table with concurrent cleanup' from Scylladb[bot]
Consider the following scenario:

- A tablet is migrated away from a shard
- The tablet cleanup stage closes the storage group's async_gate
- A drop table runs truncate which attempts to disable compaction on the tablet with its gate closed. This fails, because table::parallel_foreach_compaction_group() ultimately calls storage_group_manager::parallel_foreach_storage_group() which will not disable compaction if it can't hold the storage group's gate
- Truncate calls table::discard_sstables() which checks if the compaction has been disabled, and because it hasn't, it then runs on_internal_error() with "compaction not disabled on table ks.cf during TRUNCATE" which causes a crash

Fixes: #25706

This needs to be backported to all supported versions with tablets

- (cherry picked from commit a0934cf80d)

- (cherry picked from commit 1b8a44af75)

Parent PR: #25708

Closes scylladb/scylladb#25785

* github.com:scylladb/scylladb:
  test: reproducer and test for drop with concurrent cleanup
  truncate: check for closed storage group's gate in discard_sstables
2025-09-05 19:02:04 +03:00
Andrei Chekun
2ee1082561 test.py: modify run to use different junit output filenames
Currently, run will execute twice pytest without modifying the path of the
JUnit XML report. This leads that the second execution of the pytest
will override the report. This PR fixing this issue so both reports will
be stored.

Closes scylladb/scylladb#25726

(cherry picked from commit e55c8a9936)

Closes scylladb/scylladb#25778
2025-09-05 19:01:22 +03:00
Pavel Emelyanov
f1e3dedcd6 Revert "test/gossiper: add reproducible test for race condition during node decommission"
This reverts commit 4e17330a1b because
parent PR had been reverted as per #25803
2025-09-05 10:08:29 +03:00
Nadav Har'El
5d6aa6e8c2 utils, alternator: fix detection of invalid base-64
This patch fixes an error-path bug in the base-64 decoding code in
utils/base64.cc, which among other things is used in Alternator to decode
blobs in JSON requests.

The base-64 decoding code has a lookup table, which was wrongly sized 255
bytes, but needed to be 256 bytes. This meant that if the byte 255 (0xFF)
was included in an invalid base-64 string, instead of detecting that this
is an invalid byte (since the only valid bytes in a base-64 string are
A-Z,a-z,0-9,+,/ and =), the code would either think it's valid with a
nonsense 6-bit part, or even crash on an out-of-bounds read.

Besides the trivial fix, this patch also includes a reproducing test,
which tries to write a blob as a supposedly base-64 encoded string with
a 0xFF byte in it. The test fails before this patch (the write succeeds,
unexpectedly), and passes after this patch (the write fails as
expected). The test also passes on DynamoDB.

Fixes #25701

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25705

(cherry picked from commit ff91027eac)

Closes scylladb/scylladb#25767
2025-09-04 11:38:55 +03:00
Pavel Emelyanov
1c8e10231a Merge '[Backport 2025.3] service/qos: Modularize service level controller to avoid invalid access to auth::service' from Scylladb[bot]
Move management over effective service levels from `service_level_controller`
to a new dedicated type -- `auth_integration`.

Before these changes, it was possible for the service level controller to try
to access `auth::service` after it was deinitialized. For instance, it could
happen when reloading the cache. That HAS happened as described in the following
issue: scylladb/scylladb#24792.

Although the problem might have been mitigated or even resolved in
scylladb/scylladb@10214e13bd, it's not clear
how the service will be used in the future. It's best to prevent similar bugs
than trying to fix them later on.

The logic responsible for preventing to access an uninitialized `auth::service`
was also either non-existent, complex, or non-sufficient.

To prevent accessing `auth::service` by the service level controller, we extract
the relevant portion of the code to a separate entity -- `auth_integration`.
It's an internal helper type whose sole purpose is to manage effective service
levels.

Thanks to that, we were able to nest the lifetime of `auth_integration` within
the lifetime of `auth::service`. It's now impossible to attempt to dereference
it while it's uninitialized.

If a bug related to an invalid access is spotted again, though, it might also
be easier to debug it now.

There should be no visible change to the users of the interface of the service
level controller. We strived to make the patch minimal, and the only affected
part of the logic should be related to how `auth::service` is accessed.

The relevant portion of the initialization and deinitialization flow:

(a) Before the changes:

1. Initialize `service_level_controller`. Pass a reference to an uninitialized
   `auth::service` to it.
2. Initialize other services.
3. Initialize and start `auth::service`.
4. (work)
5. Stop and deinitialize `auth::service`.
6. Deinitialize other services.
7. Deinitialize `service_level_controller`.

(b) After the changes:

1. Initialize `service_level_controller`. Pass a reference to an uninitialized
   `auth::service` to it. (*)
2. Initialize other services.
3. Initialize and start `auth::service`.
4. Initialize `auth_integration`. Register it in `service_level_controller`.
5. (work)
6. Unregister `auth_integration` in `service_level_controller` and deinitialize
   it.
7. Stop and deinitialize `auth::service`.
8. Deinitialize other services.
9. Deinitialize `service_level_controller`.

(*):
    The reference to `auth::service` in `service_level_controller` is still
    necessary. We need to access the service when dropping a distributed
    service level.

    Although it would be best to cut that link between the service level
    controller and `auth::service` too, effectively separating the entities,
    it would require more work, so we leave it as-is for now.

    It shouldn't prove problematic as far as accessing an uninitialized service
    goes. Trying to drop a service level at the point when we're de-initializing
    auth should be impossible.

    For more context, see the function `drop_distributed_service_level` in
    `service_level_controller`.

A trivial test has been included in the PR. Although its value is questionable
as we only try to reload the service level cache at a specific moment, it's
probably the best we can deliver to provide a reproducer of the issue this patch
is resolving.

Fixes scylladb/scylladb#24792

Backport: The impact of the bug was minimal as it only affected the shutdown.
However, since CI is failing because of it, let's backport the change to all
supported versions.

- (cherry picked from commit 7d0086b093)

- (cherry picked from commit 34afb6cdd9)

- (cherry picked from commit e929279d74)

- (cherry picked from commit dd5a35dc67)

- (cherry picked from commit fc1c41536c)

Parent PR: #25478

Closes scylladb/scylladb#25753

* github.com:scylladb/scylladb:
  service/qos: Move effective SL cache to auth_integration
  service/qos: Add auth::service to auth_integration
  service/qos: Reload effective SL cache conditionally
  service/qos: Add gate to auth_integration
  service/qos: Introduce auth_integration
2025-09-04 11:38:17 +03:00
Pavel Emelyanov
d484837a2a Merge '[Backport 2025.3] db/hints: Improve logs' from Scylladb[bot]
Before these changes, the logs in hinted handoff often didn't provide
crucial information like the identifier of the node that hints were
being sent to. Also, some of the logs were misleading and referred to
other places in the code than the one where an exception or some other
situation really occurred.

We modify those logs, extending them by more valuable information
and fixing existing issues. What's more, all of the logs in
`hint_endpoint_manager` and `hint_sender` follow a consistent format
now:

```
<class_name>[<destination host ID>]:<function_name>: <message>
```

This way, we should always have AT LEAST the basic information.

Fixes scylladb/scylladb#25466

Backport:
There is no risk in backporting these changes. They only have
impact on the logs. On the other hand, they might prove helpful
when debugging an issue in hinted handoff.

- (cherry picked from commit 2327d4dfa3)

- (cherry picked from commit d7bc9edc6c)

- (cherry picked from commit 6f1fb7cfb5)

Parent PR: #25470

Closes scylladb/scylladb#25538

* github.com:scylladb/scylladb:
  db/hints: Add new logs
  db/hints: Adjust log levels
  db/hints: Improve logs
2025-09-04 11:36:30 +03:00
Pavel Emelyanov
ad6dbcfdc5 Merge '[Backport 2025.3] generic server: 2 step shutdown' from Scylladb[bot]
This PR implements solution proposed in scylladb/scylladb#24481

Instead of terminating connections immediately, the shutdown now proceeds in two stages: first closing the receive (input) side to stop new requests, then waiting for all active requests to complete before fully closing the connections.

The updated shutdown process is as follows:

1. Initial Shutdown Phase
   * Close the accept gate to block new incoming connections.
   * Abort all accept() calls.
   * For all active connections:
      * Close only the input side of the connection to prevent new requests.
      * Keep the output side open to allow responses to be sent.

2. Drain Phase
   * Wait for all in-progress requests to either complete or fail.

3. Final Shutdown Phase
   * Fully close all connections.

Fixes scylladb/scylladb#24481

- (cherry picked from commit 122e940872)

- (cherry picked from commit 3848d10a8d)

- (cherry picked from commit 3610cf0bfd)

- (cherry picked from commit 27b3d5b415)

- (cherry picked from commit 061089389c)

- (cherry picked from commit 7334bf36a4)

- (cherry picked from commit ea311be12b)

- (cherry picked from commit 4f63e1df58)

Parent PR: #24499

Closes scylladb/scylladb#25519

* github.com:scylladb/scylladb:
  test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`,  decrease request timeout.
  generic_server: Two-step connection shutdown.
  transport: consmetic change, remove extra blanks.
  transport: Handle sleep aborted exception in sleep_until_timeout_passes
  generic_server: replace empty destructor with `= default`
  generic_server: refactor connection::shutdown to use `shutdown_input` and `shutdown_output`
  generic_server: add `shutdown_input` and `shutdown_output` functions to `connection` class.
  test: Add test for query execution during CQL server shutdown
2025-09-04 11:35:55 +03:00
Ran Regev
a79cbd9a9a docs: backup and restore feature
added backup and restore as a feature
to documentation

Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#25608

(cherry picked from commit 515d9f3e21)

Closes scylladb/scylladb#25748
2025-09-03 12:37:45 +03:00
Emil Maskovsky
4e17330a1b test/gossiper: add reproducible test for race condition during node decommission
This change introduces a targeted test that simulates the gossiper race
condition observed during node decommissioning. The test delays gossip
state application and host ID lookup to reliably reproduce the scenario
where `gossiper::get_host_id()` is called on a removed endpoint,
potentially triggering an abort in `apply_new_states`.

There is a specific error injection added to widen the race window, in
order to increase the likelihood of hitting the race condition. The
error injection is designed to delay the application of gossip state
updates, for the specific node that is being decommissioned. This should
then result in the server abort in the gossiper.

Refs: scylladb/scylladb#25621
Fixes: scylladb/scylladb#25721

Backport: The test is primarily for an issue found in 2025.1, so it
needs to be backported to all the 2025.x branches.

Closes scylladb/scylladb#25685

(cherry picked from commit 5dac4b38fb)

Closes scylladb/scylladb#25781
2025-09-02 08:29:27 +02:00
Ferenc Szili
6a7a5f5edc test: reproducer and test for drop with concurrent cleanup
This change adds a reproducer and test for issue #25706

(cherry picked from commit 1b8a44af75)
2025-09-02 02:18:56 +00:00
Ferenc Szili
34b403747a truncate: check for closed storage group's gate in discard_sstables
Consider the following scenario:

- A tablet is migrated away from a shard
- The tablet cleanup stage closes the storage group's async_gate
- A drop table runs truncate which attempts to disable compaction on the
  tablet with its gate closed. This fails, because
  table::parallel_foreach_compaction_group() ultimately calls
  storage_group_manager::parallel_foreach_storage_group() which will not
  disable compaction if it can't hold the storage group's gate
- Truncate calls table::discard_sstables() which checks if the compaction
  has been disabled, and because it hasn't, it then runs
  on_internal_error() with "compaction not disabled on table ks.cf during
  TRUNCATE" which causes a crash

This patch makes dicard_sstables check if the storage group's gate is
closed whend checking for disabled compaction.

(cherry picked from commit a0934cf80d)
2025-09-02 02:18:56 +00:00
Piotr Dulikowski
debc637ac1 Merge '[Backport 2025.3] system_keyspace: add peers cache to get_ip_from_peers_table' from Scylladb[bot]
The gossiper can call `storage_service::on_change` frequently (see  scylladb/scylla-enterprise#5613), which may cause high CPU load and even trigger OOMs or related issues.

This PR adds a temporary cache for `system.peers` to resolve host_id -> ip without hitting storage on every call. The cache is short-lived to handle the unlikely case where `system.peers` is updated directly via CQL.

This is a temporary fix; a more thorough solution is tracked in https://github.com/scylladb/scylladb/issues/25620.

Fixes scylladb/scylladb#25660

backport: this patch needs to be backported to all supported versions (2025.1/2/3).

- (cherry picked from commit 91c633371e)

- (cherry picked from commit de5dc4c362)

- (cherry picked from commit 4b907c7711)

Parent PR: #25658

Closes scylladb/scylladb#25766

* github.com:scylladb/scylladb:
  storage_service: move get_host_id_to_ip_map to system_keyspace
  system_keyspace: use peers cache in get_ip_from_peers_table
  storage_service: move get_ip_from_peers_table to system_keyspace
2025-09-01 21:21:26 +02:00
Taras Veretilnyk
ddb7c8ea12 keys: from_nodetool_style_string don't split single partition keys
Users with single-column partition keys that contain colon characters
were unable to use certain REST APIs and 'nodetool' commands, because the
API split key by colon regardless of the partition key schema.

Affected commands:
- 'nodetool getendpoints'
- 'nodetool getsstables'
Affected endpoints:
- '/column_family/sstables/by_key'
- '/storage_service/natural_endpoints'

Refs: #16596 - This does not fully fix the issue, as users with compound
keys will face the issue if any column of the partition key contains
a colon character.

Closes scylladb/scylladb#24829

Closes scylladb/scylladb#25565
2025-09-01 15:36:56 +03:00
Petr Gusev
c4386c2aa4 storage_service: move get_host_id_to_ip_map to system_keyspace
Reimplemented the function to use the peers cache. It could be replaced
with get_ip_from_peers_table, but that would create a coroutine frame for
each call.

(cherry picked from commit 4b907c7711)
2025-09-01 11:22:55 +02:00
Petr Gusev
7ec3e166c6 system_keyspace: use peers cache in get_ip_from_peers_table
The storage_service::on_change method can be called quite often
by the gossiper, see scylladb/scylla-enterprise#5613. In this commit
we introduce a temporal cache for system.peers so that we don't have
to go to the storage each time we need to resolve host_id -> ip.
We keep the cache only for a small amount of time to handle the
(unlikely) scenario when the user wants to update system.peers table
from CQL.

Fixes scylladb/scylladb#25660

(cherry picked from commit de5dc4c362)
2025-09-01 11:22:05 +02:00
Petr Gusev
5f8664757a storage_service: move get_ip_from_peers_table to system_keyspace
We plan to add a cache to get_ip_from_peers_table in upcoming commits.
It's more convenient to do this from system_keyspace, since the only two
methods that mutate system.peers (remove_endpoint and update_peers_info)
are already there.

(cherry picked from commit 91c633371e)
2025-09-01 11:21:55 +02:00
Calle Wilund
2e08d651a8 system_keyspace: Limit parallelism in drop_truncation_records
Fixes #25682
Refs scylla-enterprise#5580

If the truncation table is large in entries, we might create a
huge parallel execution, quite possibly consuming loads of resources
doing something quite trivial.
Limit concurrency to a small-ish number

Closes scylladb/scylladb#25678

(cherry picked from commit 2eccd17e70)

Closes scylladb/scylladb#25751
2025-09-01 09:13:44 +03:00
Emil Maskovsky
05f8b0d543 storage: pass host_id as parameter to maybe_reconnect_to_preferred_ip()
Previously, `maybe_reconnect_to_preferred_ip()` retrieved the host ID
using `gossiper::get_host_id()`. Since the host ID is already available
in the calling function, we now pass it directly as a parameter.

This change simplifies the code and eliminates a potential race condition
where `gossiper::get_host_id()` could fail, as described in scylladb/scylladb#25621.

Refs: scylladb/scylladb#25621
Fixes: scylladb/scylladb#25715

Backport: Recommended for 2025.x release branches to avoid potential issues
from unnecessary calls to `gossiper::get_host_id()` in subscribers.

(cherry picked from commit cfc87746b6)

Closes scylladb/scylladb#25718
2025-09-01 09:13:21 +03:00
kendrick-ren
d7a36c6d8e Update launch-on-gcp.rst
Add the missing '=' mark in --zone option. Otherwise the command complains.

Closes scylladb/scylladb#25471

(cherry picked from commit d6e62aeb6a)

Closes scylladb/scylladb#25645
2025-09-01 09:11:21 +03:00
Benny Halevy
2a6791d246 api: storage_service: fix token_range documentation
Note that the token_range type is used only by describe_ring.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25609

(cherry picked from commit 45c496c276)

Closes scylladb/scylladb#25640
2025-09-01 09:11:12 +03:00
Pavel Emelyanov
95d953b7b9 Merge '[Backport 2025.3] cql3: Warn when creating RF-rack-invalid keyspace' from Scylladb[bot]
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

If the option is turned off, we will also log all of the
RF-rack-invalid keyspaces at start-up.

We provide validation tests.

Fixes scylladb/scylladb#23330

Backport: we'd like to encourage the user to abide by the restriction
even when they don't enforce it to make it easier in the future to
adjust the schema when there's no way to disable it anymore. Because
of that, we'd like to backport it to all relevant versions, starting with 2025.1.

- (cherry picked from commit 60ea22d887)

- (cherry picked from commit af8a3dd17b)

- (cherry picked from commit 837d267cbf)

Parent PR: #24785

Closes scylladb/scylladb#25635

* github.com:scylladb/scylladb:
  main: Log RF-rack-invalid keyspaces at startup
  cql3/statements: Fix indentation
  cql3: Warn when creating RF-rack-invalid keyspace
2025-09-01 09:11:01 +03:00
David Garcia
3db935f30f docs: expose alternator metrics
Renders in the docs some metrics introduced in https://github.com/scylladb/scylladb/pull/24046/files that were not being displayed in https://docs.scylladb.com/manual/stable/reference/metrics.html

Closes scylladb/scylladb#25561

(cherry picked from commit c3c70ba73f)

Closes scylladb/scylladb#25629
2025-09-01 09:10:41 +03:00
Michał Chojnowski
05ea29ee8d sstables/types.hh: fix fmt::formatter<sstables::deletion_time>
Obvious typo.

Fixes scylladb/scylladb#25556

Closes scylladb/scylladb#25557

(cherry picked from commit c1b513048c)

Closes scylladb/scylladb#25588
2025-09-01 09:10:21 +03:00
Dawid Mędrek
7f58681482 db/commitlog: Extend error messages for corrupted data
We're providing additional information in error messages when throwing
an exception related to data corruption: when a segment is truncated
and when it's content is invalid. That might prove helpful when debugging.

Closes scylladb/scylladb#25190

(cherry picked from commit 408b45fa7e)

Closes scylladb/scylladb#25461
2025-09-01 09:08:29 +03:00
Andrei Chekun
8163f4edaa test.py: use unique hostname for Minio
To avoid situation that port is occupied on localhost, use unique
hostname for Minio

(cherry picked from commit c6c3e9f492)

Closes scylladb/scylladb#24775
2025-09-01 08:59:00 +03:00
Pavel Emelyanov
0c6c507704 Merge '[Backport 2025.3] test.py: add missed parameters that should be passed from test.py to pytest' from Scylladb[bot]
Several parameters that `test.py` should pass to pytest->boost were missing. This PR adds handling these parameters: `--random-seed` and `--x-log2-compaction-groups`

Since this code affected with this issue in 2025.3 and this is only framework change, backport for that version needed.

Fixes: https://github.com/scylladb/scylladb/issues/24927

- (cherry picked from commit 71b875c932)

- (cherry picked from commit f7c7877ba6)

Parent PR: #24928

Closes scylladb/scylladb#25035

* github.com:scylladb/scylladb:
  test.py: add bypassing x_log2_compaction_groups to boost tests
  test.py: add bypassing random seed to boost tests
2025-09-01 08:58:48 +03:00
Jenkins Promoter
3da82e8572 Update pgo profiles - aarch64 2025-09-01 05:24:53 +03:00
Jenkins Promoter
16c7bd4c6e Update pgo profiles - x86_64 2025-09-01 05:01:14 +03:00
Jenkins Promoter
3c922d68f0 Update ScyllaDB version to: 2025.3.1 2025-08-31 11:05:24 +03:00
Calle Wilund
fe87af4674 commitlog: Ensure segment deletion is re-entrant
Fixes #25709

If we have large allocations, spanning more than one segment, and
the internal segment references from lead to secondary are the
only thing keeping a segment alive, the implicit drop in
discard_unused_segments and orphan_all can cause a recursive call
to discard_unused_segments, which in turn can lead to vector
corruption/crash, or even double free of segment (iterator confusion).

Need to separate the modification of the vector (_segments) from
actual releasing of objects. Using temporaries is the easiest
solution.

To further reduce recursion, we can also do an early clear of
segment dependencies in callbacks from segment release (cf release).

Closes scylladb/scylladb#25719

(cherry picked from commit cc9eb321a1)

Closes scylladb/scylladb#25756
2025-08-30 18:50:47 +03:00
Dawid Mędrek
5c947a936e service/qos: Move effective SL cache to auth_integration
Since `auth_integration` manages effective service levels, let's move
the relevant cache from `service_level_controller` to it.

(cherry picked from commit fc1c41536c)
2025-08-29 22:58:21 +00:00
Dawid Mędrek
fcdd21948d service/qos: Add auth::service to auth_integration
The new service, `auth_integration`, has taken over the responsibility
over managing effective service levels from `service_level_controller`.
However, before these changes, it still accessed `auth::service` via
the service level controller. Let's change that.

Note that we also remove a check that `auth::service` has been
initialized. It's not necessary anymore because the lifetime of
`auth_integration` is strictly nested within the lifetime of `auth::service`.

In actuality, `service_level_controller` should lose its reference to
`auth::service` completely. All of the management over effective service
levels has already been moved to `auth_integration`. However, the
referernce is still needed when dropping a distributed service level
because we need to update the corresponding attribute for relevant
roles.

That should not lead to invalid accesses, though. Dropping a service level
should not be possible when `auth::service` is not initialized.

(cherry picked from commit dd5a35dc67)
2025-08-29 22:58:21 +00:00
Dawid Mędrek
aea5805c1f service/qos: Reload effective SL cache conditionally
Since `service_level_controller` outlives `auth_integration`, it may
happen that we try to access it when it has already been deinitialized.
To prevent that, we only try to reload or clear the effective service
level cache when the object is still alive.

These changes solve an existing problem with an invalid memory access.
For more context, see issue scylladb/scylladb#24792.

We provide a reproducer test that consistently fails before these
changes but passes after them.

Fixes scylladb/scylladb#24792

(cherry picked from commit e929279d74)
2025-08-29 22:58:20 +00:00
Dawid Mędrek
753305763a service/qos: Add gate to auth_integration
We add a named gate to `auth_integration` that will aid us in synchronizing
ongoing tasks with stopping the service.

(cherry picked from commit 34afb6cdd9)
2025-08-29 22:58:20 +00:00
Dawid Mędrek
4b69c74385 service/qos: Introduce auth_integration
We introduce a new type, `auth_integration`, that will be used internally
by `service_level_controller`. Its purpose is to take over the responsibility
over managing effective service levels.

The main problem of the current implementation of service level controller
is its dependency on `auth::service` whose lifetime is strictly nested
within the lifetime of service level controller. That may and already have
led to invalid memory accesses; for an example, see issue
scylladb/scylladb#24792.

Our strategy is to split service level controller into smaller parts and
ensure that we access `auth::service` only when it's valid to do so.
This commit is the first step towards that.

We don't change anything in the logic yet, just add the new type. Further
adjustments will be made in following commits.

(cherry picked from commit 7d0086b093)
2025-08-29 22:58:20 +00:00
Jenkins Promoter
d9e492a90c Update ScyllaDB version to: 2025.3.0 2025-08-27 14:38:30 +03:00
Andrei Chekun
6ee92600e2 test.py: add bypassing x_log2_compaction_groups to boost tests
Bypassing argument to pytest->boost that was missing.

(cherry picked from commit f7c7877ba6)
2025-08-25 15:15:30 +02:00
Andrei Chekun
c55919242d test.py: add bypassing random seed to boost tests
Bypassing argument to pytest->boost that was missing.

Fixes: https://github.com/scylladb/scylladb/issues/24927
(cherry picked from commit 71b875c932)
2025-08-25 15:14:52 +02:00
Dawid Mędrek
9652a1260f main: Log RF-rack-invalid keyspaces at startup
When the configuration option `rf_rack_valid_keyspaces` is enabled and there
is an RF-rack-invalid keyspace, starting a node fails. However, when the
configuration option is disabled, but there still is a keyspace that violates
the condition, we'd like Scylla to print a warning informing the user about
the fact. That's what happens in this commit.

We provide a validation test.

(cherry picked from commit 837d267cbf)
2025-08-22 14:31:49 +00:00
Dawid Mędrek
5f13044627 cql3/statements: Fix indentation
(cherry picked from commit af8a3dd17b)
2025-08-22 14:31:49 +00:00
Dawid Mędrek
cd795170b4 cql3: Warn when creating RF-rack-invalid keyspace
Although RF-rack-valid keyspaces are not universally enforced
yet (they're governed by the configuration option
`rf_rack_valid_keyspaces`), we'd like to encourage the user to
abide by the restriction.

To that end, we're introducing a warning when creating or
altering a keyspace. If the configuration option is disabled,
but the user is trying to create an RF-rack-invalid keyspace,
they'll receive a warning.

We provide a validation test.

(cherry picked from commit 60ea22d887)
2025-08-22 14:31:49 +00:00
Ferenc Szili
acb542606e test: remove test_tombstone_gc_disabled_on_pending_replica
The test test_tombstone_gc_disabled_on_pending_replica was added when
we fixed (#20788) the potential problem with data resurrection during
file based streaming. The issue was occurring only in Enterprise, but
we added the fix in OSS to limit code divergence. This test was added
together with the fix in OSS with the idea to guard this change in OSS.
The real reproducer and test for this fix was added later, after the
fix was ported into Enterprise.
It is in: test/cluster/test_resurrection.py

Since Enterprise has been merged into OSS, there is no more need to
keep the test test_tombstone_gc_disabled_on_pending_replica. Also,
it is flaky with very low probability of failure, making it difficult
to investigate the cause of failure.

Fixes: #22182

Refs: scylladb/scylladb#25448

Closes scylladb/scylladb#25134

(cherry picked from commit 7ce96345bf)

Closes scylladb/scylladb#25573
2025-08-19 16:01:22 +03:00
Piotr Dulikowski
8bd92d4dd0 Merge '[Backport 2025.3] test: test_mv_backlog: fix to consider internal writes' from Scylladb[bot]
The PR fixes a test flakiness issue in test_mv_backlog related to reading metrics.

The first commit fixes a more general issue in the ScyllaMetrics helper class where it doesn't return the value of all matching lines when a specific shard is requested, but it breaks after the first match.

The second commit fixes a test issue where it expects exactly one write to be throttled, not taking into account other internal writes that may be executed during this time.

Fixes https://github.com/scylladb/scylladb/issues/23139

backport to improve CI stability - test only change

- (cherry picked from commit 5c28cffdb4)

- (cherry picked from commit 276a09ac6e)

Parent PR: #25279

Closes scylladb/scylladb#25475

* github.com:scylladb/scylladb:
  test: test_mv_backlog: fix to consider internal writes
  test/pylib/rest_client: fix ScyllaMetrics filtering
2025-08-19 09:48:01 +02:00
Patryk Jędrzejczak
e631d2e872 test: test_maintenance_socket: use cluster_con for driver sessions
The test creates all driver sessions by itself. As a consequence, all
sessions use the default request timeout of 10s. This can be too low for
the debug mode, as observed in scylladb/scylla-enterprise#5601.

In this commit, we change the test to use `cluster_con`, so that the
sessions have the request timeout set to 200s from now on.

Fixes scylladb/scylla-enterprise#5601

This commit changes only the test and is a CI stability improvement,
so it should be backported all the way to 2024.2. 2024.1 doesn't have
this test.

Closes scylladb/scylladb#25510

(cherry picked from commit 03cc34e3a0)

Closes scylladb/scylladb#25547
2025-08-18 16:41:03 +02:00
Dawid Mędrek
d12fdcaa75 db/hints: Add new logs
We're adding new logs in just a few places that may however prove
important when debugging issues in hinted handoff in the future.

(cherry picked from commit 6f1fb7cfb5)
2025-08-18 16:02:01 +02:00
Dawid Mędrek
325831afad db/hints: Adjust log levels
Some of the logs could be clogging Scylla's logs, so we demote their
level to a lower one.

On the other hand, some of the logs would most likely not do that,
and they could be useful when debugging -- we promote them to debug
level.

(cherry picked from commit d7bc9edc6c)
2025-08-18 16:02:00 +02:00
Dawid Mędrek
7b212edd0c db/hints: Improve logs
Before these changes, the logs in hinted handoff often didn't provide
crucial information like the identifier of the node that hints were
being sent to. Also, some of the logs were misleading and referred to
other places in the code than the one where an exception or some other
situation really occurred.

We modify those logs, extending them by more valuable information
and fixing existing issues. What's more, all of the logs in
`hint_endpoint_manager` and `hint_sender` follow a consistent format
now:

```
<class_name>[<destination host ID>]:<function_name>: <message>
```

This way, we should always have AT LEAST the basic information.

(cherry picked from commit 2327d4dfa3)
2025-08-18 16:01:57 +02:00
Sergey Zolotukhin
bad157453b test: Set request_timeout_on_shutdown_in_seconds to request_timeout_in_ms,
decrease request timeout.

In debug mode, queries may sometimes take longer than the default 30 seconds.
To address this, the timeout value `request_timeout_on_shutdown_in_seconds`
during tests is aligned with other request timeouts.
Change request timeout for tests from 180s to 90s since we must keep the request
timeout during shutdown significantly lower than the graceful shutdown timeout(2m),
or else a request timeout would cause a graceful shutdown timeout and fail a test.

(cherry picked from commit 4f63e1df58)
2025-08-18 15:47:08 +02:00
Sergey Zolotukhin
9b7886ed71 generic_server: Two-step connection shutdown.
When shutting down in `generic_server`, connections are now closed in two steps.
First, only the RX (receive) side is shut down. Then, after all ongoing requests
are completed, or a timeout happened the connections are fully closed.

Fixes scylladb/scylladb#24481

(cherry picked from commit ea311be12b)
2025-08-18 15:46:46 +02:00
Sergey Zolotukhin
e2aed2e860 transport: consmetic change, remove extra blanks.
(cherry picked from commit 7334bf36a4)
2025-08-18 14:55:16 +02:00
Anna Stuchlik
977a4a110a doc: add support for RHEL 10
This commit adds RHEL 10 to the list of supported platforms.

Fixes https://github.com/scylladb/scylladb/issues/25436

Closes scylladb/scylladb#25437

(cherry picked from commit 1322f301f6)

Closes scylladb/scylladb#25447
2025-08-18 12:20:19 +03:00
Wojciech Przytuła
666985bbe0 Fix link to ScyllaDB manual
The link would point to outdated OS docs. I fixed it to point to up-to-date Enterprise docs.

Closes scylladb/scylladb#25328

(cherry picked from commit 7600ccfb20)

Closes scylladb/scylladb#25486
2025-08-15 13:31:06 +03:00
Wojciech Mitros
bb6e681b58 test: run mv tests depending on metrics on a standalone instance
The test_base_partition_deletion_with_metrics test case (and the batch
variant) uses the metric of view updates done during its runtime to check
if we didn't perform too many of them. The test runs in the cqlpy suite,
which  runs all test cases sequentially on one Scylla instance. Because
of this, if another test case starts a process which generates view
updates and doesn't wait for it to finish before it exists, we may
observe too many view updates in test_base_partition_deletion_with_metrics
and fail the test.
In all test cases we make sure that all tables that were created
during the test are dropped at the end. However, that doesn't
stop the view building process immediately, so the issue can happen
even if we drop the view. I confirmed it by adding a test just before
test_base_partition_deletion_with_metrics which builds a big
materialized view and drops it at the end - the metrics check still failed.

The issue could be caused by any of the existing test cases where we create
a view and don't wait for it to be built. Note that even if we start adding
rows after creating the view, some of them may still be included in the view
building, as the view building process is started asynchronously. In such
a scenario, the view building also doesn't cause any issues with the data in
these tests - writes performed after view creation generate view updates
synchronously when they're local (and we're running a single Scylla server),
the corresponding view udpates generated during view building are redundant.

Because we have many test cases which could be causing this issue, instead
of waiting for the view building to finish in every single one of them, we
move the susceptible test cases to be run on separate Scylla instances, in
the "cluster" suite. There, no other test cases will influence the results.

Fixes https://github.com/scylladb/scylladb/issues/20379

Closes scylladb/scylladb#25209

(cherry picked from commit 2ece08ba43)

Closes scylladb/scylladb#25504
2025-08-15 13:30:53 +03:00
Ernest Zaslavsky
8a017834a0 s3_client: add memory fallback in chunked_download_source
Introduce fallback logic in `chunked_download_source` to handle
memory exhaustion. When memory is low, feed the `deque` with only
one uncounted buffer at a time. This allows slow but steady progress
without getting stuck on the memory semaphore.

Fixes: https://github.com/scylladb/scylladb/issues/25453
Fixes: https://github.com/scylladb/scylladb/issues/25262

Closes scylladb/scylladb#25452

(cherry picked from commit dd51e50f60)

Closes scylladb/scylladb#25511
2025-08-15 13:30:38 +03:00
Anna Stuchlik
d8d5ab1032 doc: document support for new z3 instance types
This commit adds new z3 instances we now support to the list of GCP instance types.

Fixes https://github.com/scylladb/scylladb/issues/25438

Closes scylladb/scylladb#25446

(cherry picked from commit 841ba86609)

Closes scylladb/scylladb#25512
2025-08-15 13:30:11 +03:00
Andrzej Jackowski
82ee1bf9cb test: audit: add logging of get_audit_log_list and set_of_rows_before
Without those logs, analysing some test failures is difficult.

Refs: scylladb/scylladb#25442

Closes scylladb/scylladb#25485

(cherry picked from commit bf8be01086)

Closes scylladb/scylladb#25514
2025-08-15 13:29:56 +03:00
Abhinav Jha
5e018831f8 raft: replication test: change rpc_propose_conf_change test to SEASTAR_THREAD_TEST_CASE
RAFT_TEST_CASE macro creates 2 test cases, one with random 20% packet
loss named name_drops. The framework makes hard coded assumptions about
leader which doesn't hold well in case of packet losses.

This short term fix disables the packet drop variant of the specified test.
It should be safe to re-enable it once the whole framework is re-worked to
remove these hard coded assumptions.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#23816

Closes scylladb/scylladb#25489

(cherry picked from commit a0ee5e4b85)

Closes scylladb/scylladb#25528
2025-08-15 13:29:42 +03:00
Jenkins Promoter
a162e0256e Update pgo profiles - aarch64 2025-08-15 05:28:08 +03:00
Jenkins Promoter
adbbbf87c3 Update pgo profiles - x86_64 2025-08-15 05:05:35 +03:00
Sergey Zolotukhin
e2dcd559b6 transport: Handle sleep aborted exception in sleep_until_timeout_passes
In PR #23156, a new function `sleep_until_timeout_passes` was introduced
to wait until a read request times out or completes. However, the function
did not handle cases where the sleep is aborted via _abort_source, which
could result in WARN messages like "Exceptional future is ignored" during
shutdown.

This change adds proper handling for that exception, eliminating the warning.

(cherry picked from commit 061089389c)
2025-08-14 13:22:36 +00:00
Sergey Zolotukhin
665530e479 generic_server: replace empty destructor with = default
This change improves code readability by explicitly marking the destructor as defaulted.

(cherry picked from commit 27b3d5b415)
2025-08-14 13:22:36 +00:00
Sergey Zolotukhin
d729529226 generic_server: refactor connection::shutdown to use shutdown_input and shutdown_output
This change improves logging and modifies the behavior to attempt closing
the output side of a connection even if an error occurs while closing the input side.

(cherry picked from commit 3610cf0bfd)
2025-08-14 13:22:36 +00:00
Sergey Zolotukhin
2fef421534 generic_server: add shutdown_input and shutdown_output functions to
`connection` class.

The functions are just wrappers for  _fd.shutdown_input() and  _fd.shutdown_output(), with added error reporting.
Needed by later changes.

(cherry picked from commit 3848d10a8d)
2025-08-14 13:22:36 +00:00
Sergey Zolotukhin
0f99fa76de test: Add test for query execution during CQL server shutdown
This test simulates a scenario where a query is being executed while
the query coordinator begins shutting down the CQL server and client
connections. The shutdown process should wait until the query execution
is either completed or timed out.

Test for scylladb/scylladb#24481

(cherry picked from commit 122e940872)
2025-08-14 13:22:36 +00:00
Ernest Zaslavsky
c70ba8384e s3_client: make memory semaphore acquisition abortable
Add `abort_source` to the `get_units` call for the memory semaphore
in the S3 client, allowing the acquisition process to be aborted.

Fixes: https://github.com/scylladb/scylladb/issues/25454

Closes scylladb/scylladb#25469

(cherry picked from commit 380c73ca03)

Closes scylladb/scylladb#25499
2025-08-14 10:34:28 +02:00
Michael Litvak
761f722b6f test: test_mv_backlog: fix to consider internal writes
The test executes a single write, fetching metrics before and after the
write, and expects the total throttled writes count to be increased
exactly by one.

However, other internal writes (compaction for example) may be executed
during this time and be throttled, causing the metrics to be increased
by more than expected.

To address this, we filter the metrics by the scheduling group label of
the user write, to filter out the compaction writes that run in the
compaction scheduling group.

Fixes scylladb/scylladb#23139

(cherry picked from commit 276a09ac6e)
2025-08-12 14:51:10 +00:00
Michael Litvak
3a3b5bb14c test/pylib/rest_client: fix ScyllaMetrics filtering
In the ScyllaMetrics `get` function, when requesting the value for a
specific shard, it is expected to return the sum of all values of
metrics for that shard that match the labels.

However, it would return the value of the first matching line it finds
instead of summing all matching lines.

For example, if we have two lines for one shard like:
some_metric{scheduling_group_name="compaction",shard="0"} 1
some_metric{scheduling_group_name="sl:default",shard="0"} 2

The result of this call would be 1 instead of 3:
get('some_metric', shard="0")

We fix this to sum all matching lines.

The filtering of lines by labels is fixed to allow specifying only some
of the labels. Previously, for the line to match the filter, either the
filter needs to be empty, or all the labels in the metric line had to be
specified in the filter parameter and match its value, which is
unexpected, and breaks when more labels are added.

We also simplify the function signature and the implementation - instead
of having the shard as a separate parameter, it can be specified as a
label, like any other label.

(cherry picked from commit 5c28cffdb4)
2025-08-12 14:51:09 +00:00
Patryk Jędrzejczak
b999aa85b9 docs: Raft recovery procedure: recommend verifying participation in Raft recovery
This instruction adds additional safety. The faster we notice that
a node didn't restart properly, the better.

The old gossip-based recovery procedure had a similar recommendation
to verify that each restarting node entered `RECOVERY` mode.

Fixes #25375

This is a documentation improvement. We should backport it to all
branches with the new recovery procedure, so 2025.2 and 2025.3.

Closes scylladb/scylladb#25376

(cherry picked from commit 7b77c6cc4a)

Closes scylladb/scylladb#25440
2025-08-11 15:49:20 +02:00
Anna Stuchlik
a655c0e193 doc: add new and removed metrics to the 2025.3 upgrade guide
This commit adds the list of new and removed metrics to the already existing upgrade guide
from 2025.2 to 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24697

Closes scylladb/scylladb#25385

(cherry picked from commit f3d9d0c1c7)

Closes scylladb/scylladb#25416
2025-08-11 06:56:31 +03:00
Botond Dénes
9775b2768b Merge '[Backport 2025.3] GCP Key Provider: Fix authentication issues' from Scylladb[bot]
* Fix discovery of application default credentials by using fully expanded pathnames (no tildes).
* Fix grant type in token request with user credentials.

Fixes #25345.

- (cherry picked from commit 77cc6a7bad)

- (cherry picked from commit b1d5a67018)

Parent PR: #25351

Closes scylladb/scylladb#25407

* github.com:scylladb/scylladb:
  encryption: gcp: Fix the grant type for user credentials
  encryption: gcp: Expand tilde in pathnames for credentials file
2025-08-11 06:52:38 +03:00
Botond Dénes
2048ac88f1 Merge '[Backport 2025.3] test.py: native pytest repeats' from Scylladb[bot]
Previous way of execution repeat was to launch pytest for each repeat.
That was resource consuming, since each time pytest was doing discovery
of the tests. Now all repeats are done inside one pytest process.

Backport for 2025.3 is needed, since this functionality is framework only, and 2025.3 affected with this slow repeats as well.

Fixes: https://github.com/scylladb/scylladb/issues/25391

- (cherry picked from commit cc75197efd)

- (cherry picked from commit 557293995b)

- (cherry picked from commit 853bdec3ec)

- (cherry picked from commit d0e4045103)

Parent PR: #25073

Closes scylladb/scylladb#25392

* github.com:scylladb/scylladb:
  test.py: add repeats in pytest
  test.py: add directories and filename to the log files
  test.py: rename log sink file for boost tests
  test.py: better error handling in boost facade
2025-08-11 06:51:55 +03:00
Szymon Malewski
4c375b257b test/alternator: enable more relevant logs in CI.
This patch sets, for alternator test suite, all 'alternator-*' loggers and 'paxos' logger to trace level. This should significantly ease debugging of failed tests, while it has no effect on test time and increases log size only by 7%.
This affects running alternator tests only with `test.py`, not with `test/alternator/run`.

Closes #24645

Closes scylladb/scylladb#25327

(cherry picked from commit eb11485969)

Closes scylladb/scylladb#25383
2025-08-11 06:51:23 +03:00
Botond Dénes
ea6d0c880a Merge '[Backport 2025.3] test: audit: ignore cassandra user audit logs in AUTH tests' from Scylladb[bot]
Audit tests are vulnerable to noise from LOGIN queries (because AUTH
audit logs can appear at any time). Most tests already use the
`filter_out_noise` mechanism to remove this noise, but tests
focused on AUTH verification did not, leading to sporadic failures.

This change adds a filter to ignore AUTH logs generated by the default
"cassandra" user, so tests only verify logs from the user created
specifically for each test.

Additionally, this PR:
 - Adds missing `nonlocal new_rows` statement that prevented some checks from being called
 - Adds a testcase for audit logs of `cassandra` user

Fixes: https://github.com/scylladb/scylladb/issues/25069

Better backport those test changes to 2025.3. 2025.2 and earlier don't have `./cluster/dtest/audit_test.py`.

- (cherry picked from commit e634a2cb4f)

- (cherry picked from commit daf1c58e21)

- (cherry picked from commit aef6474537)

- (cherry picked from commit 21aedeeafb)

Parent PR: #25111

Closes scylladb/scylladb#25140

* github.com:scylladb/scylladb:
  test: audit: add cassandra user test case
  test: audit: ignore cassandra user audit logs in AUTH tests
  test: audit: change names of `filter_out_noise` parameters
2025-08-11 06:49:36 +03:00
Andrei Chekun
a4ea7b42c8 test.py: add repeats in pytest
Previous way of executin repeat was to launch pytest for each repeat.
That was resource consuming, since each time pytest was doing discovery
of the tests. Now all repeats are done inside one pytest process.

(cherry picked from commit d0e4045103)
2025-08-08 15:27:25 +02:00
Benny Halevy
9a34622a47 scylla-sstable: print_query_results_json: continue loop if row is disengaged
Otherwise it is accessed right when exiting the if block.
Add a unit test reproducing the issue and validating the fix.

Fixes #25325

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#25326

(cherry picked from commit 5e5e63af10)

Closes scylladb/scylladb#25379
2025-08-08 11:43:34 +03:00
Nikos Dragazis
8838d8df5f encryption: gcp: Fix the grant type for user credentials
Exchanging a refresh token for an access token requires the
"refresh_token" grant type [1].

[1] https://datatracker.ietf.org/doc/html/rfc6749#section-6

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit b1d5a67018)
2025-08-07 21:46:24 +00:00
Nikos Dragazis
a69afb0d0b encryption: gcp: Expand tilde in pathnames for credentials file
The GCP host searches for application default credentials in known
locations within the user's home directory using
`seastar::file_exists()`. However, this function does not perform tilde
expansion in pathnames.

Replace tildes with the home directory from the HOME environment
variable.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 77cc6a7bad)
2025-08-07 21:46:24 +00:00
Andrei Chekun
8766750228 test.py: add directories and filename to the log files
Currently, only test function name used for output and log files. For better
clarity adding the relative path from the test directory of the file name
without extension to these files.
Before:
test_aggregate_avg.1.log
test_aggregate_avg_stdout.1.log
After:
boost.aggregate_fcts_test.test_aggregate_avg.1.log
boost.aggregate_fcts_test.test_aggregate_avg_stdout.3.log

(cherry picked from commit 853bdec3ec)
2025-08-07 10:46:58 +00:00
Andrei Chekun
c4cefc5195 test.py: rename log sink file for boost tests
Log sink is outputted in XML format not just simple text file. Renaming to have better clarity

(cherry picked from commit 557293995b)
2025-08-07 10:46:58 +00:00
Andrei Chekun
5f8e69a5d9 test.py: better error handling in boost facade
If test was not executed for some reason, for example not known parameter passed to the test, but boost framework was able to finish correctly, log file will have data but it will be parsed to an empty list. This will raise an exception in pytest execution, rather than produce test output. This change will handle this situation.

(cherry picked from commit cc75197efd)
2025-08-07 10:46:58 +00:00
Avi Kivity
0d54b72f21 Merge '[Backport 2025.3] truncate: change check for write during truncate into a log warning' from Scylladb[bot]
TRUNCATE TABLE performs a memtable flush and then discards the sstables of the table being truncated. It collects the highest replay position for both of these. When the highest replay position of the discarded sstables is higher than the highest replay position of the flushed memtable, that means that we have had writes during truncate which have been flushed to disk independently of the truncate process. We check for this and trigger an on_internal_error() which throws an exception, informing the user that writing data concurrently with TRUNCATE TABLE is not advised.

The problem with this is that truncate is also called from DROP KEYSPACE and DROP TABLE. These are raft operations and exceptions thrown by them are caught by the (...) exception handler in the raft applier fiber, which then exits leaving the node without the ability to execute subsequent raft commands.

This commit changes the on_internal_error() into a warning log entry. It also outputs to keyspace/table names, and the offending replay positions which caused the check to fail.

This PR also adds a test which validates that TRUNCATE works correctly with concurrent writes. More specifically, it checks that:
- all data written before TRUNCATE starts is deleted
- none of the data after TRUNCATE completes is deleted

Fixes: #25173
Fixes: #25013

Backport is needed in versions which check for truncate with concurrent writes using `on_internal_error()`: 2025.3 2025.2 2025.1

- (cherry picked from commit 268ec72dc9)

- (cherry picked from commit 33488ba943)

Parent PR: #25174

Closes scylladb/scylladb#25350

* github.com:scylladb/scylladb:
  truncate: add test for truncate with concurrent writes
  truncate: change check for write during truncate into a log warning
2025-08-07 12:19:45 +03:00
Andrzej Jackowski
1be1306233 test: audit: add cassandra user test case
Audit tests use the `filter_out_noise` function to remove noise from
audit logs generated by user authentication. As a result, none of the
existing tests covered audit logs for the default `cassandra` user.
This change adds a test case for that user.

Refs: scylladb/scylladb#25069
(cherry picked from commit 21aedeeafb)
2025-08-07 10:03:27 +02:00
Patryk Jędrzejczak
1863386bc8 Merge '[Backport 2025.3] Raft-based recovery procedure: simplify rolling restart with recovery_leader' from Scylladb[bot]
The following steps are performed in sequence as part of the
Raft-based recovery procedure:
- set `recovery_leader` to the host ID of the recovery leader in
  `scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- perform a rolling restart (with the recovery leader being restarted
  first).

These steps are not intuitive and more complicated than they could be.

In this PR, we simplify these steps. From now on, we will be able to
simply set `recovery_leader` on each node just before restarting it.

Apart from making necessary changes in the code, we also update all
tests of the Raft-based recovery procedure and the user-facing
documentation.

Fixes scylladb/scylladb#25015

The Raft-based procedure was added in 2025.2. This PR makes the
procedure simpler and less error-prone, so it should be backported
to 2025.2 and 2025.3.

- (cherry picked from commit ec69028907)

- (cherry picked from commit 445a15ff45)

- (cherry picked from commit 23f59483b6)

- (cherry picked from commit ba5b5c7d2f)

- (cherry picked from commit 9e45e1159b)

- (cherry picked from commit f408d1fa4f)

Parent PR: #25032

Closes scylladb/scylladb#25335

* https://github.com/scylladb/scylladb:
  docs: document the option to set recovery_leader later
  test: delay setting recovery_leader in the recovery procedure tests
  gossip: add recovery_leader to gossip_digest_syn
  db: system_keyspace: peers_table_read_fixup: remove rows with null host_id
  db/config, gms/gossiper: change recovery_leader to UUID
  db/config, utils: allow using UUID as a config option
2025-08-07 09:58:13 +02:00
Taras Veretilnyk
606db56cf3 docs: Sort commands list in nodetool.rst
Fixes scylladb/scylladb#25330

Closes scylladb/scylladb#25331

(cherry picked from commit bcb90c42e4)

Closes scylladb/scylladb#25372
2025-08-06 20:49:21 +03:00
Nikos Dragazis
26174a9c67 test: kmip: Fix segfault from premature destruction of port_promise
`kmip_test_helper()` is a utility function to spawn a dedicated PyKMIP
server for a particular Boost test case. The function runs the server as
an external process and uses a thread to parse the port from the
server's logs. The thread communicates the port to the main thread via
a promise.

The current implementation has a bug where the thread may set a value
to the promise after its destruction, causing a segfault. This happens
when the server does not start within 20 seconds, in which case the port
future throws and the stack unwinding machinery destroys the port
promise before the thread that writes to it.

Fix the bug by declaring the promise before the cleanup action.

The bug has been encountered in CI runs on slow machines, where the
PyKMIP server takes too long to create its internal tables (due to slow
fdatasync calls from SQLite). This patch does not improve CI stability -
it only ensures that the error condition is properly reflected in the
test output.

This patch is not a backport. The same bug has been fixed in master as
part of a larger rewrite of the `kmip_test_helper()` (see 722e2bce96).

Refs #24747, #24842.
Fixes #24574.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#25030
2025-08-06 11:59:40 +03:00
Pavel Emelyanov
4fcf0a620c Merge '[Backport 2025.3] Simplify credential reload: remove internal expiration checks' from Scylladb[bot]
This PR introduces a refinement in how credential renewal is triggered. Previously, the system attempted to renew credentials one hour before their expiration, but the credentials provider did not recognize them as expired—resulting in a no-op renewal that returned existing credentials. This led the timer fiber to immediately retry renewal, causing a renewal storm.

To resolve this, we remove expiration (or any other checks) in `reload` method, assuming that whoever calls this method knows what he does.

Fixes: https://github.com/scylladb/scylladb/issues/25044

Should be backported to 2025.3 since we need this fix for the restore

- (cherry picked from commit 68855c90ca)

- (cherry picked from commit e4ebe6a309)

- (cherry picked from commit 837475ec6f)

Parent PR: #24961

Closes scylladb/scylladb#25347

* github.com:scylladb/scylladb:
  s3_creds: code cleanup
  s3_creds: Make `reload` unconditional
  s3_creds: Add test exposing credentials renewal issue
2025-08-06 11:33:19 +03:00
Aleksandra Martyniuk
2282a11405 tasks: change _finished_children type
Parent task keeps a vector of statuses (task_essentials) of its finished
children. When the children number is large - for example because we
have many tables and a child task is created for each table - we may hit
oversize allocation while adding a new child essentials to the vector.

Keep task_essentails of children in chunked_vector.

Fixes: #25040.

Closes scylladb/scylladb#25064

(cherry picked from commit b5026edf49)

Closes scylladb/scylladb#25319
2025-08-06 07:36:04 +03:00
Michał Jadwiszczak
c31f47026d storage_service, group0_state_machine: move SL cache update from topology_state_load() to load_snapshot()
Currently the service levels cache is unnecessarily updated in every
call of `topology_state_load()`.
But it is enough to reload it only when a snapshot is loaded.
(The cache is also already updated when there is a change to one of
`service_levels_v2`, `role_members`, `role_attributes` tables.)

Fixes scylladb/scylladb#25114
Fixes scylladb/scylladb#23065

Closes scylladb/scylladb#25116

(cherry picked from commit 10214e13bd)

Closes scylladb/scylladb#25305
2025-08-06 07:27:48 +03:00
Aleksandra Martyniuk
4f0e5bf429 api: storage_service: do not log the exception that is passed to user
The exceptions that are thrown by the tasks started with API are
propagated to users. Hence, there is no need to log it.

Remove the logs about exception in user started tasks.

Fixes: https://github.com/scylladb/scylladb/issues/16732.

Closes scylladb/scylladb#25153

(cherry picked from commit e607ef10cd)

Closes scylladb/scylladb#25298
2025-08-06 07:27:05 +03:00
Andrei Chekun
85769131a2 docs: update documentation with new way of running C++ tests
Documentation had outdated information how to run C++ test.
Additionally, some information added about gathered test metrics.

Closes scylladb/scylladb#25180

(cherry picked from commit a6a3d119e8)

Closes scylladb/scylladb#25291
2025-08-06 07:25:44 +03:00
Dawid Mędrek
c5e1e28076 test: Enable RF-rack-valid keyspaces in all Python suites
We're enabling the configuration option `rf_rack_valid_keyspaces`
in all Python test suites. All relevant tests have been adjusted
to work with it enabled.

That encompasses the following suites:

* alternator,
* broadcast_tables,
* cluster (already enabled in scylladb/scylladb@ee96f8dcfc),
* cql,
* cqlpy (already enabled in scylladb/scylladb@be0877ce69),
* nodetool,
* rest_api.

Two remaining suites that use tests written in Python, redis and scylla_gdb,
are not affected, at least not directly.

The redis suite requires creating an instance of Scylla manually, and the tests
don't do anything that could violate the restriction.

The scylla_gdb suite focuses on testing the capabilities of scylla-gdb.py, but
even then it reuses the `run` file from the cqlpy suite.

Fixes scylladb/scylladb#25126

Closes scylladb/scylladb#24617

(cherry picked from commit b41151ff1a)

Closes scylladb/scylladb#25231
2025-08-06 07:17:40 +03:00
Tomasz Grabiec
71dd30fc25 topology_coordinator: Trigger load stats refresh after replace
Otherwise, tablet rebuilt will be delayed for up to 60s, as the tablet
scheduler needs load stats for the new node (replacing) to make
decisisons.

Fixes #25163

Closes scylladb/scylladb#25181

(cherry picked from commit 55116ee660)

Closes scylladb/scylladb#25216
2025-08-06 07:16:45 +03:00
Ferenc Szili
db3777c703 truncate: add test for truncate with concurrent writes
test_validate_truncate_with_concurrent_writes checks if truncate deletes
all the data written before the truncate starts, and does not delete any
data after truncate completes.

(cherry picked from commit 33488ba943)
2025-08-06 00:52:15 +00:00
Ferenc Szili
0248f555da truncate: change check for write during truncate into a log warning
TRUNCATE TABLE performs a memtable flush and then discards the sstables
of the table being truncated. It collects the highest replay position
for both of these. When the highest replay position of the discarded
sstables is higher than the highest replay position of the flushed
memtable, that means that we have had writes during truncate which have
been flushed to disk independently of the truncate process. We check for
this and trigger an on_internal_error() which throws an exception,
informing the user that writing data concurrently with TRUNCATE TABLE is
not advised.

The problem with this is that truncate is also called from DROP KEYSPACE
and DROP TABLE. These are raft operations and exceptions thrown by them
are caught by the (...) exception handler in the raft applier fiber,
which then exits leaving the node without the ability to execute
subsequent raft commands.

This commit changes the on_internal_error() into a warning log entry. It
also outputs to keyspace/table names, the truncated_at timepoint, the
offending replay positions which caused the check to fail.

Fixes: #25173
Fixes: #25013
(cherry picked from commit 268ec72dc9)
2025-08-06 00:52:15 +00:00
Ernest Zaslavsky
88779a6884 s3_creds: code cleanup
Remove unnecessary code which is no more used

(cherry picked from commit 837475ec6f)
2025-08-06 00:50:36 +00:00
Ernest Zaslavsky
ccdc98c8f0 s3_creds: Make reload unconditional
Assume that any caller invoking `reload` intends to refresh credentials.
Remove conditional logic that checks for expiration before reloading.

(cherry picked from commit e4ebe6a309)
2025-08-06 00:50:36 +00:00
Ernest Zaslavsky
bd79ae3826 s3_creds: Add test exposing credentials renewal issue
Add a test demonstrating that renewing credentials does not update
their expiration. After requesting credentials again, the expiration
remains unchanged, indicating no actual update occurred.

(cherry picked from commit 68855c90ca)
2025-08-06 00:50:36 +00:00
Botond Dénes
f212f6af28 Merge 'repair: distribute tablet_repair_task_metas between shards' from Aleksandra Martyniuk
Currently, in repair_service::repair_tablets a shard that initiates
the repair keeps repair_tablet_metas of all tablets that have a replica
on this node (on any shard). This may lead to oversized allocations.

Modify tablet_repair_task_impl to repair only the tablets which replicas
are kept on this shard. Modify repair_service::repair_tablets to gather
repair_tablet_metas only on local shard. repair_tablets is invoked on
all shards.

Add a new legacy_tablet_repair_task_impl that covers tablet repair started with
async_repair. A user can use sequence number of this task to manage the repair
using storage_service API.

In a test that reproduced this, we have seen 11136 tablets and 5636096 bytes
allocation failure. If we had a node with 250 shards, 100 tablets each, we could
reach 12MB kept on one shard for the whole repair time.

Fixes: https://github.com/scylladb/scylladb/issues/23632

Needs backport to all live branches as they are all vulnerable to such crashes.

Closes scylladb/scylladb#24194

* github.com:scylladb/scylladb:
  repair: distribute tablet_repair_task_meta among shards
  repair: do not keep erm in tablet_repair_task_meta
2025-08-05 22:36:59 +03:00
Avi Kivity
a8193bd503 Merge '[Backport 2025.3] transport: remove throwing protocol_exception on connection start' from Dario Mirovic
`protocol_exception` is thrown in several places. This has become a performance issue, especially when starting/restarting a server. To alleviate this issue, throwing the exception has to be replaced with returning it as a result or an exceptional future.

This PR replaces throws in the `transport/server` module. This is achieved by using result_with_exception, and in some places, where suitable, just by creating and returning an exceptional future.

There are four commits in this PR. The first commit introduces tests in `test/cqlpy`. The second commit refactors transport server `handle_error` to not rethrow exceptions. The third commit refactors reusable buffer writer callbacks. The fourth commit replaces throwing `protocol_exception` to returning it.

Based on the comments on an issue linked in https://github.com/scylladb/scylladb/issues/24567, the main culprit from the side of protocol exceptions is the invalid protocol version one, so I tested that exception for performance.

In order to see if there is a measurable difference, a modified version of `test_protocol_version_mismatch` Python is used, with 100'000 runs across 10 processes (not threads, to avoid Python GIL). One test run consisted of 1 warm-up run and 5 measured runs. First test run has been executed on the current code, with throwing protocol exceptions. Second test urn has been executed on the new code, with returning protocol exceptions. The performance report is in https://github.com/scylladb/scylladb/pull/24738#issuecomment-3051611069. It shows ~10% gains in real, user, and sys time for this test.

Testing

Build: `release`

Test file: `test/cqlpy/test_protocol_exceptions.py`
Test name: `test_protocol_version_mismatch` (modified for mass connection requests)

Test arguments:
```
max_attempts=100'000
num_parallel=10
```

Throwing `protocol_exception` results:
```
real=1:26.97  user=10:00.27  sys=2:34.55  cpu=867%
real=1:26.95  user=9:57.10  sys=2:32.50  cpu=862%
real=1:26.93  user=9:56.54  sys=2:35.59  cpu=865%
real=1:26.96  user=9:54.95  sys=2:32.33  cpu=859%
real=1:26.96  user=9:53.39  sys=2:33.58  cpu=859%

real=1:26.95 user=9:56.85 sys=2:34.11 cpu=862%   # average
```

Returning `protocol_exception` as `result_with_exception` or an exceptional future:
```
real=1:18.46  user=9:12.21  sys=2:19.08  cpu=881%
real=1:18.44  user=9:04.03  sys=2:17.91  cpu=869%
real=1:18.47  user=9:12.94  sys=2:19.68  cpu=882%
real=1:18.49  user=9:13.60  sys=2:19.88  cpu=883%
real=1:18.48  user=9:11.76  sys=2:17.32  cpu=878%

real=1:18.47 user=9:10.91 sys=2:18.77 cpu=879%   # average
```

This PR replaced `transport/server` throws of `protocol_exception` with returns. There are a few other places where protocol exceptions are thrown, and there are many places where `invalid_request_exception` is thrown. That is out of scope of this single PR, so the PR just refs, and does not resolve issue #24567.

Refs: #24567
Fixes: #25271

This PR improves performance in cases when protocol exceptions happen, for example during connection storms. It will require backporting.

* (cherry picked from commit 7aaeed012e)

* (cherry picked from commit 30d424e0d3)

* (cherry picked from commit 9f4344a435)

* (cherry picked from commit 5390f92afc)

* (cherry picked from commit 4a6f71df68)

Parent PR: #24738

Closes scylladb/scylladb#25117

* github.com:scylladb/scylladb:
  test/cqlpy: add cpp exception metric test conditions
  transport/server: replace protocol_exception throws with returns
  utils/reusable_buffer: accept non-throwing writer callbacks via result_with_exception
  transport/server: avoid exception-throw overhead in handle_error
  test/cqlpy: add protocol_exception tests
2025-08-05 14:16:14 +03:00
Patryk Jędrzejczak
eb8ea703d5 docs: document the option to set recovery_leader later
In one of the previous commits, we made it possible to set
`recovery_leader` on each node just before restarting it. Here, we
update the corresponding documentation.

(cherry picked from commit f408d1fa4f)
2025-08-05 10:59:39 +00:00
Patryk Jędrzejczak
ac7945e044 test: delay setting recovery_leader in the recovery procedure tests
In the previous commit, we made it possible to set `recovery_leader`
on each node just before restarting it. Here, we change all the
tests of the Raft-based recovery procedure to use and test this option.

(cherry picked from commit 9e45e1159b)
2025-08-05 10:59:39 +00:00
Patryk Jędrzejczak
79c27454d4 gossip: add recovery_leader to gossip_digest_syn
In the new Raft-based recovery procedure, live nodes join the new
group 0 one by one during a rolling restart. There is a time window when
some of them are in the old group 0, while others are in the new group
0. This causes a group 0 mismatch in `gossiper::handle_syn_msg`. The
current solution for this problem is to ignore group 0 mismatches if
`recovery_leader` is set on the local node and to ask the administrator
to perform the rolling restart in the following way:
- set `recovery_leader` in `scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- proceed with the rolling restart.

This commit makes `gossiper::handle_syn_msg` ignore group 0 mismatches
when exactly one of the two gossiping nodes has `recovery_leader` set.
We achieve this by adding `recovery_leader` to `gossip_digest_syn`.
This change makes setting `recovery_leader` earlier on all nodes and
reloading the config unnecessary. From now on, the administrator can
simply restart each node with `recovery_leader` set.

However, note that nodes that join group 0 must have `recovery_leader`
set until all nodes join the new group 0. For example, assume that we
are in the middle of the rolling restart and one of the nodes in the new
group 0 crashes. It must be restarted with `recovery_leader` set, or
else it would reject `gossip_digest_syn` messages from nodes in the old
group 0. To avoid problems in such cases, we will continue to recommend
setting `recovery_leader` in `scylla.yaml` instead of passing it as
a command line argument.

(cherry picked from commit ba5b5c7d2f)
2025-08-05 10:59:39 +00:00
Patryk Jędrzejczak
4294669e72 db: system_keyspace: peers_table_read_fixup: remove rows with null host_id
Currently, `peers_table_read_fixup` removes rows with no `host_id`, but
not with null `host_id`. Null host IDs are known to appear in system
tables, for example in `system.cluster_status` after a failed bootstrap.
We better make sure we handle them properly if they ever appear in
`system.peers`.

This commit guarantees that null UUID cannot belong to
`loaded_endpoints` in `storage_service::join_cluster`, which in
particular ensures that we throw a runtime error when a user sets
`recovery_leader` to null UUID during the recovery procedure. This is
handled by the code verifying that `recovery_leader` belongs to
`loaded_endpoints`.

(cherry picked from commit 23f59483b6)
2025-08-05 10:59:39 +00:00
Patryk Jędrzejczak
74cf95a675 db/config, gms/gossiper: change recovery_leader to UUID
We change the type of the `recovery_leader` config parameter and
`gossip_config::recovery_leader` from sstring to UUID. `recovery_leader`
is supposed to store host ID, so UUID is a natural choice.

After changing the type to UUID, if the user provides an incorrect UUID,
parsing `recovery_leader` will fail early, but the start-up will
continue. Outside the recovery procedure, `recovery_leader` will then be
ignored. In the recovery procedure, the start-up will fail on:

```
throw std::runtime_error(
        "Cannot start - Raft-based topology has been enabled but persistent group 0 ID is not present. "
        "If you are trying to run the Raft-based recovery procedure, you must set recovery_leader.");
```

(cherry picked from commit 445a15ff45)
2025-08-05 10:59:39 +00:00
Patryk Jędrzejczak
d18d2fa0cf db/config, utils: allow using UUID as a config option
We change the `recovery_leader` option to UUID in the following commit.

(cherry picked from commit ec69028907)
2025-08-05 10:59:39 +00:00
Jenkins Promoter
3d4ec918ff Update ScyllaDB version to: 2025.3.0-rc3 2025-08-03 15:50:47 +03:00
Nikos Dragazis
257ebbeca9 test: Use in-memory SQLite for PyKMIP server
The PyKMIP server uses an SQLite database to store artifacts such as
encryption keys. By default, SQLite performs a full journal and data
flush to disk on every CREATE TABLE operation. Each operation triggers
three fdatasync(2) calls. If we multiply this by 16, that is the number
of tables created by the server, we get a significant number of file
syncs, which can last for several seconds on slow machines.

This behavior has led to CI stability issues from KMIP unit tests where
the server failed to complete its schema creation within the 20-second
timeout (observed on spider9 and spider11).

Fix this by configuring the server to use an in-memory SQLite.

Fixes #24842.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#24995

(cherry picked from commit 2656fca504)

Closes scylladb/scylladb#25300
2025-08-02 17:12:05 +03:00
Ran Regev
7aa7f50b3a scylla.yaml: add recommended value for stream_io_throughput_mb_per_sec
Fixes: #24758

Updated scylla.yaml and the help for
scylla --help

Closes scylladb/scylladb#24793

(cherry picked from commit db4f301f0c)

Closes scylladb/scylladb#25280
2025-08-01 15:02:01 +03:00
Piotr Dulikowski
0dc700de70 Merge '[Backport 2025.3] qos: don't populate effective service level cache until auth is migrated to raft' from Scylladb[bot]
Right now, service levels are migrated in one group0 command and auth is migrated in the next one. This has a bad effect on the group0 state reload logic - modifying service levels in group0 causes the effective service levels cache to be recalculated, and to do so we need to fetch information about all roles. If the reload happens after SL upgrade and before auth upgrade, the query for roles will be directed to the legacy auth tables in system_auth - and the query, being a potentially remote query, has a timeout. If the query times out, it will throw an exception which will break the group0 apply fiber and the node will need to be restarted to bring it back to work.

In order to solve this issue, make sure that the service level module does not start populating and using the service level cache until both service levels and auth are migrated to raft. This is achieved by adding the check both to the cache population logic and the effective service level getter - they now look at service level's accessor new method, `can_use_effective_service_level_cache` which takes a look at the auth version.

Fixes: scylladb/scylladb#24963

Should be backported to all versions which support upgrade to topology over raft - the issue described here may put the cluster into a state which is difficult to get out of (group0 apply fiber can break on multiple nodes, which necessitates their restart).

- (cherry picked from commit 2bb800c004)

- (cherry picked from commit 3a082d314c)

Parent PR: #25188

Closes scylladb/scylladb#25285

* github.com:scylladb/scylladb:
  test: sl: verify that legacy auth is not queried in sl to raft upgrade
  qos: don't populate effective service level cache until auth is migrated to raft
2025-08-01 08:49:13 +02:00
Jenkins Promoter
308400895f Update pgo profiles - aarch64 2025-08-01 05:19:18 +03:00
Jenkins Promoter
54b259bec9 Update pgo profiles - x86_64 2025-08-01 05:02:34 +03:00
Piotr Dulikowski
f27a3be62b test: sl: verify that legacy auth is not queried in sl to raft upgrade
Adjust `test_service_levels_upgrade`: right before upgrade to topology
on raft, enable an error injection which triggers when the standard role
manager is about to query the legacy auth tables in the
system_auth keyspace. The preceding commit which fixes
scylladb/scylladb#24963 makes sure that the legacy tables are not
queried during upgrade to topology on raft, so the error injection does
not trigger and does not cause a problem; without that commit, the test
fails.

(cherry picked from commit 3a082d314c)
2025-07-31 15:13:57 +00:00
Piotr Dulikowski
ba70b39486 qos: don't populate effective service level cache until auth is migrated to raft
Right now, service levels are migrated in one group0 command and auth
is migrated in the next one. This has a bad effect on the group0 state
reload logic - modifying service levels in group0 causes the effective
service levels cache to be recalculated, and to do so we need to fetch
information about all roles. If the reload happens after SL upgrade and
before auth upgrade, the query for roles will be directed to the legacy
auth tables in system_auth - and the query, being a potentially remote
query, has a timeout. If the query times out, it will throw
an exception which will break the group0 apply fiber and the node will
need to be restarted to bring it back to work.

In order to solve this issue, make sure that the service level module
does not start populating and using the service level cache until both
service levels and auth are migrated to raft. This is achieved by adding
the check both to the cache population logic and the effective service
level getter - they now look at service level's accessor new method,
`can_use_effective_service_level_cache` which takes a look at the auth
version.

Fixes: scylladb/scylladb#24963
(cherry picked from commit 2bb800c004)
2025-07-31 15:13:57 +00:00
Andrzej Jackowski
12866e8f2e test: audit: ignore cassandra user audit logs in AUTH tests
Audit tests are vulnerable to noise from LOGIN queries (because AUTH
audit logs can appear at any time). Most tests already use the
`filter_out_noise` mechanism to remove this noise, but tests
focused on AUTH verification did not, leading to sporadic failures.

This change adds a filter to ignore AUTH logs generated by the default
"cassandra" user, so tests only verify logs from the user created
specifically for each test.

Fixes: scylladb/scylladb#25069
(cherry picked from commit aef6474537)
2025-07-31 17:01:29 +02:00
Andrzej Jackowski
e77a190f1a test: audit: change names of filter_out_noise parameters
This is a refactoring commit that changes the names of the parameters
of the `filter_out_noise` function, as well as names of related
variables. The motiviation for the change is introduction of more
complex filtering logic in next commit of this patch series.

Refs: scylladb/scylladb#25069
(cherry picked from commit daf1c58e21)
2025-07-31 16:58:36 +02:00
Aleksandra Martyniuk
132e6495a3 repair: distribute tablet_repair_task_meta among shards
Currently, in repair_service::repair_tablets a shard that initiates
the repair keeps tablet_repair_task_meta of all tablets that have a replica
on this node (on any shard). This may lead to oversized allocations.

Add remote_metas class which takes care of distributing tablet_repair_task_meta
among different shards. An additional class remote_metas_builder was
added in order to ensure safety and separate writes and reads to meta
vectors.

Fixes: #23632
2025-07-31 15:56:53 +02:00
Aleksandra Martyniuk
603a2dbb10 repair: do not keep erm in tablet_repair_task_meta
Do not keep erm in tablet_repair_task_meta to avoid non-owner shared
pointer access when metas will be distributes among shards.

Pass std::chunked_vector of erms to tablet_repair_task_impl to
preserve safety.
2025-07-31 15:56:43 +02:00
Dario Mirovic
7d300367c0 test/cqlpy: add cpp exception metric test conditions
Tested code paths should not throw exceptions. `scylla_reactor_cpp_exceptions`
metric is used. This is a global metric. To address potential test flakiness,
each test runs multiple times:
- `run_count = 100`
- `cpp_exception_threshold = 10`

If a change in the code introduced an exception, expectation is that the number
of registered exceptions will be > `cpp_exception_threshold` in `run_count` runs.
In which case the test fails.

Fixes: #25271
(cherry picked from commit 4a6f71df68)
2025-07-31 11:53:00 +02:00
Anna Stuchlik
4bc531d48d doc: add the upgrade guide from 2025.2 to 2025.3
This PR adds the upgrade guide from version 2025.2 to 2025.3.
Also, it removes the upgrade guide existing for the previous version
that is irrelevant in 2025.2 (upgrade from 2025.1 to 2025.2).

Note that the new guide does not include the "Enable Consistent Topology Updates" page and note,
as users upgrading to 2025.3 have consistent topology updates already enabled.

Fixes https://github.com/scylladb/scylladb/issues/24696

Closes scylladb/scylladb#25219

(cherry picked from commit 8365219d40)

Closes scylladb/scylladb#25248
2025-07-31 12:19:33 +03:00
Anna Stuchlik
f3ca644a55 doc: add OS support for ScyllaDB 2025.3
This commit adds the information about support for platforms in ScyllaDB version 2025.3.

Fixes https://github.com/scylladb/scylladb/issues/24698

Closes scylladb/scylladb#25220

(cherry picked from commit b67bb641bc)

Closes scylladb/scylladb#25249
2025-07-31 12:17:36 +03:00
Anna Stuchlik
573bbace20 doc: add tablets support information to the Drivers table
This commit:

- Extends the Drivers support table with information on which driver supports tablets
  and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.

In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
  and replaced with tablets to correctly describe the contents of the table.

Fixes https://github.com/scylladb/scylladb/issues/19471

Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69

Closes scylladb/scylladb#24635

(cherry picked from commit 18b4d4a77c)

Closes scylladb/scylladb#25251
2025-07-31 12:17:21 +03:00
Aleksandra Martyniuk
4630a2f9c5 streaming: close sink when exception is thrown
If an exception is thrown in result_handling_cont in streaming,
then the sink does not get closed. This leads to a node crash.

Close sink in exception handler.

Fixes: https://github.com/scylladb/scylladb/issues/25165.

Closes scylladb/scylladb#25238

(cherry picked from commit 99ff08ae78)

Closes scylladb/scylladb#25268
2025-07-31 12:17:05 +03:00
Dario Mirovic
38a8318466 transport/server: replace protocol_exception throws with returns
Replace throwing protocol_exception with returning it as a result
or an exceptional future in the transport server module. This
improves performance, for example during connection storms and
server restarts, where protocol exceptions are more frequent.

In functions already returning a future, protocol exceptions are
propagated using an exceptional future. In functions not already
returning a future, result_with_exception is used.

Notable change is checking v.failed() before calling v.get() in
process_request function, to avoid throwing in case of an
exceptional future.

Refs: #24567
Fixes: #25271
(cherry picked from commit 5390f92afc)
2025-07-30 21:35:24 +02:00
Dario Mirovic
1078a1f03a utils/reusable_buffer: accept non-throwing writer callbacks via result_with_exception
Make make_bytes_ostream and make_fragmented_temporary_buffer accept
writer callbacks that return utils::result_with_exception instead of
forcing them to throw on error. This lets callers propagate failures
by returning an error result rather than throwing an exception.

Introduce buffer_writer_for, bytes_ostream_writer, and fragmented_buffer_writer
concepts to simplify and document the template requirements on writer callbacks.

This patch does not modify the actual callbacks passed, except for the syntax
changes needed for successful compilation, without changing the logic.

Refs: #24567
Fixes: #25271
(cherry picked from commit 9f4344a435)
2025-07-30 21:35:15 +02:00
Dario Mirovic
0679a7bb78 transport/server: avoid exception-throw overhead in handle_error
Previously, connection::handle_error always called f.get() inside a try/catch,
forcing every failed future to throw and immediately catch an exception just to
classify it. This change eliminates that extra throw/catch cycle by first checking
f.failed(), getting the stored std::exception_ptr via f.get_exception(), and
then dispatching on its type via utils::try_catch<T>(eptr).

The error-response logic is not changed - cassandra_exception, std::exception,
and unknown exceptions are caught and processed, and any exceptions thrown by
write_response while handling those exceptions continues to escape handle_error.

Refs: #24567
Fixes: #25271
(cherry picked from commit 30d424e0d3)
2025-07-30 21:34:56 +02:00
Dario Mirovic
918d4ab5fb test/cqlpy: add protocol_exception tests
Add a helper to fetch scylla_transport_cql_errors_total{type="protocol_error"} counter
from Scylla's metrics endpoint. These metrics are used to track protocol error
count before and after each test.

Add cql_with_protocol context manager utility for session creation with parameterized
protocol_version value. This is used for testing connection establishment with
different protocol versions, and proper disposal of successfully established sessions.

The tests cover two failure scenarios:
- Protocol version mismatch in test_protocol_version_mismatch which tests both supported
and unsupported protocol version
- Malformed frames via raw socket in _protocol_error_impl, used by several test functions,
and also test_no_protocol_exceptions test to assert that the error counters never decrease
during test execution, catching unintended metric resets

Refs: #24567
Fixes: #25271
(cherry picked from commit 7aaeed012e)
2025-07-30 21:34:31 +02:00
Patryk Jędrzejczak
7164f11b99 Merge '[Backport 2025.3] Revert 24418: main.cc: fix group0 shutdown order' from Petr Gusev
This PR reverts the changes of #24418 since they can cause use-after-free.

The `raft_group0::abort()` was called in `storage_service::do_drain` (introduced in #24418) to stop the group0 Raft server before destroying local storage. This was necessary because `raft::server` depends on storage (via `raft_sys_table_storage` and `group0_state_machine`).

However, this caused issues: services like `sstable_dict_autotrainer` and `auth::service`, which use `group0_client` but are not stopped by `storage_service`, could trigger use-after-free if `raft_group0` was destroyed too early. This can happen both during normal shutdown and when 'nodetool drain' is used.

This PR reverts two of the three commits from #24418. The commit [e456d2d](e456d2d507) is not reverted because it only affects logging and does not impact correctness.

Fixes scylladb/scylladb#25221

Backport: this PR is a backport

Closes scylladb/scylladb#25206

* https://github.com/scylladb/scylladb:
  Revert "main.cc: fix group0 shutdown order"
  Revert "storage_service: test_group0_apply_while_node_is_being_shutdown"
2025-07-30 16:18:13 +02:00
Pavel Emelyanov
99f328b7a7 Merge '[Backport 2025.3] s3_client: Enhance s3_client error handling' from Scylladb[bot]
Enhance and fix error handling in the `chunked_download_source` to prevent errors seeping from the request callback. Also stop retrying on seastar's side since it is going to break the integrity of data which maybe downloaded more than once for the same range.

Fixes: https://github.com/scylladb/scylladb/issues/25043

Should be backported to 2025.3 since we have an intention to release native backup/restore feature

- (cherry picked from commit d53095d72f)

- (cherry picked from commit b7ae6507cd)

- (cherry picked from commit ba910b29ce)

- (cherry picked from commit fc2c9dd290)

Parent PR: #24883

Closes scylladb/scylladb#25137

* github.com:scylladb/scylladb:
  s3_client: Disable Seastar-level retries in HTTP client creation
  s3_test: Validate handling of non-`aws_error` exceptions
  s3_client: Improve error handling in chunked_download_source
  aws_error: Add factory method for `aws_error` from exception
2025-07-29 14:42:45 +03:00
Pavel Emelyanov
07f46a4ad5 Merge '[Backport 2025.3] storage_service: cancel all write requests after stopping transports' from Scylladb[bot]
When a node shuts down, in storage service, after storage_proxy RPCs are stopped, some write handlers within storage_proxy may still be waiting for background writes to complete. These handlers hold appropriate ERMs to block schema changes before the write finishes. After the RPCs are stopped, these writes cannot receive the replies anymore.

If, at the same time, there are RPC commands executing `barrier_and_drain`, they may get stuck waiting for these ERM holders to finish, potentially blocking node shutdown until the writes time out.

This change introduces cancellation of all outstanding write handlers from storage_service after the storage proxy RPCs were stopped.

Fixes scylladb/scylladb#23665

Backport: since this fixes an issue that frequently causes issues in CI, backport to 2025.1, 2025.2, and 2025.3.

- (cherry picked from commit bc934827bc)

- (cherry picked from commit e0dc73f52a)

Parent PR: #24714

Closes scylladb/scylladb#25170

* github.com:scylladb/scylladb:
  storage_service: Cancel all write requests on storage_proxy shutdown
  test: Add test for unfinished writes during shutdown and topology change
2025-07-29 14:42:25 +03:00
Taras Veretilnyk
a9f5e7d18f docs: fix typo in command name enbleautocompaction -> enableautocompaction
Renamed the file and updated all references from 'enbleautocompaction' to the correct 'enableautocompaction'.

Fixes scylladb/scylladb#25172

Closes scylladb/scylladb#25175

(cherry picked from commit 6b6622e07a)

Closes scylladb/scylladb#25218
2025-07-29 14:41:50 +03:00
Petr Gusev
d8f6a497a5 Revert "main.cc: fix group0 shutdown order"
This reverts commit 6b85ab79d6.
2025-07-28 17:50:38 +02:00
Petr Gusev
c98dde92db Revert "storage_service: test_group0_apply_while_node_is_being_shutdown"
This reverts commit b1050944a3.
2025-07-28 17:49:03 +02:00
Aleksandra Martyniuk
8efee38d6f tasks: do not use binary progress for task manager tasks
Currently, progress of a parent task depends on expected_total_workload,
expected_children_number, and children progresses. Basically, if total
workload is known or all children have already been created, progresses
of children are summed up. Otherwise binary progress is returned.

As a result, two tasks of the same type may return progress in different
units. If they are children of the same task and this parent gathers the
progress - it becomes meaningless.

Drop expected_children_number as we can't assume that children are able
to show their progresses.

Modify get_progress method - progress is calculated based on children
progresses. If expected_total_workload isn't specified, the total
progress of a task may grow. If expected_total_workload isn't specified
and no children are created, empty progress (0/0) is returned.

Fixes: https://github.com/scylladb/scylladb/issues/24650.

Closes scylladb/scylladb#25113

(cherry picked from commit a7ee2bbbd8)

Closes scylladb/scylladb#25200
2025-07-28 13:11:45 +03:00
Michael Litvak
934260e9a9 storage service: drain view builder before group0
The view builder uses group0 operations to coordinate view building, so
we should drain the view builder before stopping group0.

Fixes scylladb/scylladb#25096

Closes scylladb/scylladb#25101

(cherry picked from commit 3ff388cd94)

Closes scylladb/scylladb#25198
2025-07-28 13:05:14 +03:00
Nadav Har'El
583c118ccd Merge '[Backport 2025.3] alternator: avoid oversized allocation in Query/Scan' from Scylladb[bot]
This series fixes one cause of oversized allocations - and therefore potentially stalls and increased tail latencies - in Alternator.

The first patch in the series is the main fix - the later patches are cleanups requested by reviewers but also involved other pre-existing code, so I did those cleanups as separate patches.

Alternator's Scan or Query operation return a page of results. When the number of items is not limited by a "Limit" parameter, the default is to return a 1 MB page. If items are short, a large number of them can fit in that 1MB. The test test_query.py::test_query_large_page_small_rows has 30,000 items returned in a single page.

In the response JSON, all these items are returned in a single array "Items". Before this patch, we build the full response as a RapidJSON object before sending it. The problem is that unfortunately, RapidJSON stores arrays as contiguous allocations. This results in large contiguous allocations in workloads that scan many small items, and large contiguous allocations can also cause stalls and high tail latencies. For example, before this patch, running

    test/alternator/run --runveryslow \
        test_query.py::test_query_large_page_small_rows

reports in the log:

    oversized allocation: 573440 bytes.

After this patch, this warning no longer appears.
The patch solves the problem by collecting the scanned items not in a RapidJSON array, but rather in a chunked_vector<rjson::value>, i.e, a chunked (non-contiguous) array of items (each a JSON value). After collecting this array separately from the response object, we need to print its content without actually inserting it into the object - we add a new function print_with_extra_array() to do that.

The new separate-chunked-vector technique is used when a large number (currently, >256) of items were scanned. When there is a smaller number of items in a page (this is typical when each item is longer), we just insert those items in the object and print it as before.

Beyond the original slow test that demonstrated the oversized allocation (which is now gone), this patch also includes a new test which exercises the new code with a scan of 700 (>256) items in a page - but this new test is fast enough to be permanently in our test suite and not a manual "veryslow" test as the other test.

Fixes #23535

The stalls caused by large allocations was seen by actual users, so it makes sense to backport this patch. On the other hand, the patch while not big is fairly intrusive (modifies the nomal Scan and Query path and also the later patches do some cleanup of additional code) so there is some small risk involved in the backport.

- (cherry picked from commit 2385fba4b6)

- (cherry picked from commit d8fab2a01a)

- (cherry picked from commit 13ec94107a)

- (cherry picked from commit a248336e66)

Parent PR: #24480

Closes scylladb/scylladb#25194

* github.com:scylladb/scylladb:
  alternator: clean up by co-routinizing
  alternator: avoid spamming the log when failing to write response
  alternator: clean up and simplify request_return_type
  alternator: avoid oversized allocation in Query/Scan
2025-07-27 14:12:49 +03:00
Nadav Har'El
f1c5350141 alternator: clean up by co-routinizing
Reviewers of the previous patch complained on some ugly pre-existing
code in alternator/executor.cc, where returning from an asynchronous
(future) function require lengthy verbose casts. So this patch cleans
up a few instances of these ugly casts by using co_return instead of
return.

For example, the long and verbose

    return make_ready_future<executor::request_return_type>(
        rjson::print(std::move(response)));

can be changed to the shorter and more readable

    co_return rjson::print(std::move(response));

This patch should not have any functional implications, and also not any
performance implications: I only coroutinized slow-path functions and
one function that was already "partially" coroutinized (and this was
expecially ugly and deserved being fixed).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit a248336e66)
2025-07-27 07:42:01 +00:00
Nadav Har'El
f897f38003 alternator: avoid spamming the log when failing to write response
Both make_streamed() and new make_streamed_with_extra_array() functions,
used when returning a long response in Alternator, would write an error-
level log message if it failed to write the response. This log message
is probably not helpful, and may spam the log if the application causes
repeated errors intentionally or accidentally.

So drop these log messages. The exception is still thrown as usual.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 13ec94107a)
2025-07-27 07:42:01 +00:00
Nadav Har'El
fe037663ea alternator: clean up and simplify request_return_type
The previous patch introduced a function make_streamed_with_extra_array
which was a duplicate of the existing make_streamed. Reviewers
complained how baroque the new function is (just like the old function),
having to jump through hoops to return a copyable function working
on non-copyable objects, making strange-named copies and shared pointers
of everything.

We needed to return a copyable function (std::function) just because
Alternator used Seastar's json::json_return_type in the return type
from executor function (request_return_type). This json_return_type
contained either a sstring or an std::function, but neither was ever
really appropriate:

  1. We want to return noncopyable_function, not an std::function!
  2. We want to return an std::string (which rjson::print()) returns,
     not an sstring!

So in this patch we stop using seastar::json::json_return_type
entirely in Alternator.

Alternator's request_return_type is now an std::variant of *three* types:
  1. std::string for short responses,
  2. noncopyable_function for long streamed response
  3. api_error for errors.

The ugliest parts of make_streamed() where we made copies and shared
pointers to allow for a copyable function are all gone. Even nicer, a
lot of other ugly relics of using seastar::json_return_type are gone:

1. We no longer need obscure classes and functions like make_jsonable()
   and json_string() to convert strings to response bodies - an operation
   can simply return a string directly - usually returning
   rjson::print(value) or a fixed string like "" and it just works.

2. There is no more usage of seastar::json in Alternator (except one
   minor use of seastar::json::formatter::to_json in streams.cc that
   can be removed later). Alternator uses RapidJSON for its JSON
   needs, we don't need to use random pieces from a different JSON
   library.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit d8fab2a01a)
2025-07-27 07:42:01 +00:00
Nadav Har'El
b7da50d781 alternator: avoid oversized allocation in Query/Scan
This patch fixes one cause of oversized allocations - and therefore
potentially stalls and increased tail latencies - in Alternator.

Alternator's Scan or Query operation return a page of results. When the
number of items is not limited by a "Limit" parameter, the default is
to return a 1 MB page. If items are short, a large number of them can
fit in that 1MB. The test test_query.py::test_query_large_page_small_rows
has 30,000 items returned in a single page.

In the response JSON, all these items are returned in a single array
"Items". Before this patch, we build the full response as a RapidJSON
object before sending it. The problem is that unfortunately, RapidJSON
stores arrays as contiguous allocations. This results in large
contiguous allocations in workloads that scan many small items, and
large contiguous allocations can also cause stalls and high tail
latencies. For example, before this patch, running

    test/alternator/run --runveryslow \
        test_query.py::test_query_large_page_small_rows

reports in the log:

    oversized allocation: 573440 bytes.

After this patch, this warning no longer appears.
The patch solves the problem by collecting the scanned items not in a
RapidJSON array, but rather in a chunked_vector<rjson::value>, i.e,
a chunked (non-contiguous) array of items (each a JSON value).
After collecting this array separately from the response object, we
need to print its content without actually inserting it into the object -
we add a new function print_with_extra_array() to do that.

The new separate-chunked-vector technique is used when a large number
(currently, >256) of items were scanned. When there is a smaller number
of items in a page (this is typical when each item is longer), we just
insert those items in the object and print it as before.

Beyond the original slow test that demonstrated the oversized allocation
(which is now gone), this patch also includes a new test which
exercises the new code with a scan of 700 (>256) items in a page -
but this new test is fast enough to be permanently in our test suite
and not a manual "veryslow" test as the other test.

Fixes #23535

(cherry picked from commit 2385fba4b6)
2025-07-27 07:42:01 +00:00
Pavel Emelyanov
7c04619ecf Merge '[Backport 2025.3] encryption_at_rest_test: Fix some spurious errors' from Scylladb[bot]
Fixes #24574

* Ensure we close the embedded load_cache objects on encryption shutdown, otherwise we can, in unit testing, get destruction of these while a timer is still active -> assert
* Add extra exception handling to `network_error_test_helper`, so even if test framework might exception-escape, we properly stop the network proxy to avoid use after free.

- (cherry picked from commit ee98f5d361)

- (cherry picked from commit 8d37e5e24b)

Parent PR: #24633

Closes scylladb/scylladb#24772

* github.com:scylladb/scylladb:
  encryption_at_rest_test: Add exception handler to ensure proxy stop
  encryption: Ensure stopping timers in provider cache objects
2025-07-24 16:35:53 +03:00
Pavel Emelyanov
b07f4fb26b Merge '[Backport 2025.3] streaming: Avoid deadlock by running view checks in a separate scheduling group' from Scylladb[bot]
This issue happens with removenode, when RBNO is disabled, so range
streamer is used.

The deadlock happens in a scenario like this:
1. Start 3 nodes: {A, B, C}, RF=2
2. Node A is lost
3. removenode A
4. Both B and C gain ownership of ranges.
5. Streaming sessions are started with crossed directions: B->C, C->B

Readers created by sender side exhaust streaming semaphore on B and C.
Receiver side attempts to obtain a permit indirectly by calling
check_needs_view_update_path(), which reads local tables. That read is
blocked and times-out, causing streaming to fail. The streaming writer
is already using a tracking-only permit.

Even if we didn't deadlock, and the streaming semaphore was simply exhausted
by other receiving sessions (via tracking-only permit), the query may still time-out due to starvation.

To avoid that, run the query under a different scheduling group, which
translates to the system semaphore instead of the maintenance
semaphore, to break the dependency. The gossip group was chosen
because it shouldn't be contended and this change should not interfere
with it much.

Fixes #24807
Fixes #24925

- (cherry picked from commit ee2fa58bd6)

- (cherry picked from commit dff2b01237)

Parent PR: #24929

Closes scylladb/scylladb#25058

* github.com:scylladb/scylladb:
  streaming: Avoid deadlock by running view checks in a separate scheduling group
  service: migration_manager: Run group0 barrier in gossip scheduling group
2025-07-24 16:35:24 +03:00
Ran Regev
c5f4ad3665 nodetool restore: sstable list from a file
Fixes: #25045

added the ability to supply the list of files to
restore from the a given file.
mainly required for local testing.

Signed-off-by: Ran Regev <ran.regev@scylladb.com>

Closes scylladb/scylladb#25077

(cherry picked from commit dd67d22825)

Closes scylladb/scylladb#25124
2025-07-24 16:35:04 +03:00
Ran Regev
013e0d685c docs: update nodetool restore documentation for --sstables-file-list
Fixes: #25128
A leftover from #25077

Closes scylladb/scylladb#25129

(cherry picked from commit 3d82b9485e)

Closes scylladb/scylladb#25139
2025-07-24 16:34:39 +03:00
Jakub Smolar
800f819b5b gdb: handle zero-size reads in managed_bytes
Fixes: https://github.com/scylladb/scylladb/issues/25048

Closes scylladb/scylladb#25050

(cherry picked from commit 6e0a063ce3)

Closes scylladb/scylladb#25142
2025-07-24 16:34:04 +03:00
Sergey Zolotukhin
8ac6aaadaf storage_service: Cancel all write requests on storage_proxy shutdown
During a graceful node shutdown, RPC listeners are stopped in `storage_service::drain_on_shutdown`
as one of the first steps. However, even after RPCs are shut down, some write handlers in
`storage_proxy` may still be waiting for background writes to complete. These handlers retain the ERM.
Since the RPC subsystem is no longer active, replies cannot be received, and if any RPC commands are
concurrently executing `barrier_and_drain`, they may get stuck waiting for those writes. This can block
the messaging server shutdown and delay the entire shutdown process until the write timeout occurs.

This change introduces the cancellation of all outstanding write handlers in `storage_proxy`
during shutdown to prevent unnecessary delays.

Fixes scylladb/scylladb#23665

(cherry picked from commit e0dc73f52a)
2025-07-24 13:03:32 +00:00
Sergey Zolotukhin
16a8cd9514 test: Add test for unfinished writes during shutdown and topology change
This test reproduces an issue where a topology change and an ongoing write query
during query coordinator shutdown can cause the node to get stuck.

When a node receives a write request, it creates a write handler that holds
a copy of the current table's ERM (Effective Replication Map). The ERM ensures
that no topology or schema changes occur while the request is being processed.

After the query coordinator receives the required number of replica write ACKs
to satisfy the consistency level (CL), it sends a reply to the client. However,
the write response handler remains alive until all replicas respond — the remaining
writes are handled in the background.

During shutdown, when all network connections are closed, these responses can no longer
be received. As a result, the write response handler is only destroyed once the write
timeout is reached.

This becomes problematic because the ERM held by the handler blocks topology or schema
change commands from executing. Since shutdown waits for these commands to complete,
this can lead to unnecessary delays in node shutdown and restarts, and occasional
test case failures.

Test for: scylladb/scylladb#23665

(cherry picked from commit bc934827bc)
2025-07-24 13:03:32 +00:00
Ernest Zaslavsky
e45852a595 s3_client: Disable Seastar-level retries in HTTP client creation
Prevent Seastar from retrying HTTP requests to avoid buffer double-feed
issues when an entire request is retried. This could cause data
corruption in `chunked_download_source`. The change is global for every
instance of `s3_client`, but it is still safe because:
* Seastar's `http_client` resets connections regardless of retry behavior
* `s3_client` retry logic handles all error types—exceptions, HTTP errors,
  and AWS-specific errors—via `http_retryable_client`

(cherry picked from commit fc2c9dd290)
2025-07-22 16:46:54 +00:00
Ernest Zaslavsky
fdf706a6eb s3_test: Validate handling of non-aws_error exceptions
Inject exceptions not wrapped in `aws_error` from request callback
lambda to verify they are properly caught and handled.

(cherry picked from commit ba910b29ce)
2025-07-22 16:46:53 +00:00
Ernest Zaslavsky
2bc3accf9c s3_client: Improve error handling in chunked_download_source
Create aws_error from raised exceptions when possible and respond
appropriately. Previously, non-aws_exception types leaked from the
request handler and were treated as non-retryable, causing potential
data corruption during download.

(cherry picked from commit b7ae6507cd)
2025-07-22 16:46:53 +00:00
Ernest Zaslavsky
0106d132bd aws_error: Add factory method for aws_error from exception
Move `aws_error` creation logic out of `retryable_http_client` and
into the `aws_error` class to support reuse across components.

(cherry picked from commit d53095d72f)
2025-07-22 16:46:53 +00:00
Pavel Emelyanov
53637fdf61 Merge '[Backport 2025.3] storage: add make_data_or_index_source to the storages' from Scylladb[bot]
Add `make_data_or_index_source` to the storages to utilize new S3 based data source which should improve restore performance

* Introduce the `encrypted_data_source` class that wraps an existing data source to read and decrypt data on the fly using block encryption. Also add unit tests to verify correct decryption behavior.
* Add `make_data_or_index_source` to the `storage` interface, implement it  for `filesystem_storage` storage which just creates `data_source` from a file and for the `s3_storage` create a (maybe) decrypting source from s3 make_download_source. This change should solve performance improvement for reading large objects from S3 and should not affect anything for the `filesystem_storage`

Fixes: https://github.com/scylladb/scylladb/issues/22458

- (cherry picked from commit 211daeaa40)

- (cherry picked from commit 7e5e3c5569)

- (cherry picked from commit 0de61f56a2)

- (cherry picked from commit 8ac2978239)

- (cherry picked from commit dff9a229a7)

- (cherry picked from commit 8d49bb8af2)

Parent PR: #23695

Closes scylladb/scylladb#25016

* github.com:scylladb/scylladb:
  sstables: Start using `make_data_or_index_source` in `sstable`
  sstables: refactor readers and sources to use coroutines
  sstables: coroutinize futurized readers
  sstables: add `make_data_or_index_source` to the `storage`
  encryption: refactor key retrieval
  encryption: add `encrypted_data_source` class
2025-07-21 18:05:53 +03:00
Piotr Dulikowski
fdfcd67a6e Merge '[Backport 2025.3] cdc: Forbid altering columns of CDC log tables directly' from Scylladb[bot]
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.

In this commit, we're restricting those operations. We also provide two
validation tests.

One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.

Fixes scylladb/scylladb#24643

Backport: we should backport the change to all affected
branches to prevent the consequences that may affect the user.

- (cherry picked from commit 20d0050f4e)

- (cherry picked from commit 59800b1d66)

Parent PR: #25008

Closes scylladb/scylladb#25108

* github.com:scylladb/scylladb:
  cdc: Forbid altering columns of inactive CDC log table
  cdc: Forbid altering columns of CDC log tables directly
2025-07-21 16:22:31 +02:00
Dawid Mędrek
dc6cb5cfad cdc: Forbid altering columns of inactive CDC log table
When CDC becomes disabled on the base table, the CDC log table
still exsits (cf. scylladb/scylladb@adda43edc7).
If it continues to exist up to the point when CDC is re-enabled
on the base table, no new log table will be created -- instead,
the old olg table will be *re-attached*.

Since we want to avoid situations when the definition of the log
table has become misaligned with the definition of the base table
due to actions of the user, we forbid modifying the set of columns
or renaming them in CDC log tables, even when they're inactive.

Validation tests are provided.

(cherry picked from commit 59800b1d66)
2025-07-21 11:43:49 +00:00
Dawid Mędrek
10a9ced4d1 cdc: Forbid altering columns of CDC log tables directly
The set of columns of a CDC log table should be managed automatically
by Scylla, and the user should not have the ability to manipulate them
directly. That could lead to disastrous consequences such as a
segmentation fault.

In this commit, we're restricting those operations. We also provide two
validation tests.

One of the existing tests had to be adjusted as it modified the type
of a column in a CDC log table. Since the test simply verifies that
the user has sufficient permissions to perform `ALTER TABLE` on the log
table, the test is still valid.

Fixes scylladb/scylladb#24643

(cherry picked from commit 20d0050f4e)
2025-07-21 11:43:49 +00:00
Ernest Zaslavsky
934359ea28 s3_client: parse multipart response XML defensively
Ensure robust handling of XML responses when initiating multipart
uploads. Check for the existence of required nodes before access,
and throw an exception if the XML is empty or malformed.

Refs: https://github.com/scylladb/scylladb/issues/24676

Closes scylladb/scylladb#24990

(cherry picked from commit 342e94261f)

Closes scylladb/scylladb#25057
2025-07-21 12:03:00 +02:00
Piotr Dulikowski
74d97711fd Merge '[Backport 2025.3] cdc: throw error if column doesn't exist' from Scylladb[bot]
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.

When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.

However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.

If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.

Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
  before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
  while it's being dropped.

Both cases might as well fail with an error because the column is not
found in the base table.

Fixes scylladb/scylladb#/24952

backport needed - simple fix for a node crash

- (cherry picked from commit b336f282ae)

- (cherry picked from commit 86dfa6324f)

Parent PR: #24986

Closes scylladb/scylladb#25067

* github.com:scylladb/scylladb:
  test: cdc: add test_cdc_with_alter
  cdc: throw error if column doesn't exist
2025-07-21 11:18:06 +02:00
Jenkins Promoter
fc7a6b66e2 Update ScyllaDB version to: 2025.3.0-rc2 2025-07-20 15:44:21 +03:00
Michael Litvak
594ec7d66d test: cdc: add test_cdc_with_alter
Add a test that tests adding and dropping a column to a table with CDC
enabled while writing to it.

(cherry picked from commit 86dfa6324f)
2025-07-20 09:04:00 +02:00
Michael Litvak
338ff18dfe cdc: throw error if column doesn't exist
in the CDC log transformer, when creating a CDC mutation based on some
base table mutation, for each value of a base column we set the value in
the CDC column with the same name.

When looking up the column in the CDC schema by name, we may get a null
pointer if a column by that name is not found. This shouldn't happen
normally because the base schema and CDC schema should be compatible,
and for each base column there should be a CDC column with the same
name.

However, there are scenarios where the base schema and CDC schema are
incompatible for a short period of time when they are being altered.
When a base column is being added or dropped, we could get a base
mutation with this column set, and then the CDC transformer picks up the
latest CDC schema which doesn't have this column.

If such thing happens, we fix the code to throw an exception instead of
crashing on null pointer dereference. Currently we don't have a safer
approach to handle this, but this might be changed in the future. The
other alternative is dropping that data silently which we prefer not to
do.

Throwing an error is acceptable because this scenario most likely
indicates this behavior by the user:
* The user adds a new column, and start writing values to the column
  before the ALTER is complete. or,
* The user drops a column, and continues writing values to the column
  while it's being dropped.

Both cases might as well fail with an error because the column is not
found in the base table.

Fixes scylladb/scylladb#24952

(cherry picked from commit b336f282ae)
2025-07-18 10:36:44 +00:00
Tomasz Grabiec
888e92c969 streaming: Avoid deadlock by running view checks in a separate scheduling group
This issue happens with removenode, when RBNO is disabled, so range
streamer is used.

The deadlock happens in a scenario like this:
1. Start 3 nodes: {A, B, C}, RF=2
2. Node A is lost
3. removenode A
4. Both B and C gain ownership of ranges.
5. Streaming sessions are started with crossed directions: B->C, C->B

Readers created by sender side exhaust streaming semaphore on B and C.
Receiver side attempts to obtain a permit indirectly by calling
check_needs_view_update_path(), which reads local tables. That read is
blocked and times-out, causing streaming to fail. The streaming writer
is already using a tracking-only permit.

To avoid that, run the query under a different scheduling group, which
translates to the system semaphore instead of the maintenance
semaphore, to break the dependency. The gossip group was chosen
because it shouldn't be contended and this change should not interfere
with it much.

Fixes: #24807
(cherry picked from commit dff2b01237)
2025-07-17 17:25:44 +00:00
Tomasz Grabiec
f424c773a4 service: migration_manager: Run group0 barrier in gossip scheduling group
Fixes two issues.

One is potential priority inversion. The barrier will be executed
using scheduling group of the first fiber which triggers it, the rest
will block waiting on it. For example, CQL statements which need to
sync the schema on replica side can block on the barrier triggered by
streaming. That's undesirable. This is theoretical, not proved in the
field.

The second problem is blocking the error path. This barrier is called
from the streaming error handling path. If the streaming concurrency
semaphore is exhausted, and streaming fails due to timeout on
obtaining the permit in check_needs_view_update_path(), the error path
will block too because it will also attempt to obtain the permit as
part of the group0 barrier. Running it in the gossip scheduling group
prevents this.

Fixes #24925

(cherry picked from commit ee2fa58bd6)
2025-07-17 17:25:44 +00:00
Piotr Dulikowski
e49b312be9 auth: fix crash when migration code runs parallel with raft upgrade
The functions password_authenticator::start and
standard_role_manager::start have a similar structure: they spawn a
fiber which invokes a callback that performs some migration until that
migration succeeds. Both handlers set a shared promise called
_superuser_created_promise (those are actually two promises, one for the
password authenticator and the other for the role manager).

The handlers are similar in both cases. They check if auth is in legacy
mode, and behave differently depending on that. If in legacy mode, the
promise is set (if it was not set before), and some legacy migration
actions follow. In auth-on-raft mode, the superuser is attempted to be
created, and if it succeeds then the promise is _unconditionally_ set.

While it makes sense at a glance to set the promise unconditionally,
there is a non-obvious corner case during upgrade to topology on raft.
During the upgrade, auth switches from the legacy mode to auth on raft
mode. Thus, if the callback didn't succeed in legacy mode and then tries
to run in auth-on-raft mode and succeds, it will unconditionally set a
promise that was already set - this is a bug and triggers an assertion
in seastar.

Fix the issue by surrounding the `shared_promise::set_value` call with
an `if` - like it is already done for the legacy case.

Fixes: scylladb/scylladb#24975

Closes scylladb/scylladb#24976

(cherry picked from commit a14b7f71fe)

Closes scylladb/scylladb#25019
2025-07-17 13:32:35 +02:00
Ernest Zaslavsky
549d139e84 sstables: Start using make_data_or_index_source in sstable
Convert all necessary methods to be awaitable. Start using `make_data_or_index_source`
when creating data_source for data and index components.

For proper working of compressed/checksummed input streams, start passing
stream creator functors to `make_(checksummed/compressed)_file_(k_l/m)_format_input_stream`.

(cherry picked from commit 8d49bb8af2)
2025-07-16 12:45:58 +00:00
Ernest Zaslavsky
4a47262167 sstables: refactor readers and sources to use coroutines
Refactor readers and sources to support coroutine usage in
preparation for integration with `make_data_or_index_source`.
Move coroutine-based member initialization out of constructors
where applicable, and defer initialization until first use.

(cherry picked from commit dff9a229a7)
2025-07-16 12:45:58 +00:00
Ernest Zaslavsky
81d356315b sstables: coroutinize futurized readers
Coroutinize futurized readers and sources to get ready for using `make_data_or_index_source` in `sstable`

(cherry picked from commit 8ac2978239)
2025-07-16 12:45:58 +00:00
Ernest Zaslavsky
4ffd72e597 sstables: add make_data_or_index_source to the storage
Add `make_data_or_index_source` to the `storage` interface, implement it
for `filesystem_storage` storage which just creates `data_source` from a
file and for the `s3_storage` create a (maybe) decrypting source from s3
make_download_source.

This change should solve performance improvement for reading large objects
from S3 and should not affect anything for the `filesystem_storage`.

(cherry picked from commit 0de61f56a2)
2025-07-16 12:45:58 +00:00
Ernest Zaslavsky
8998f221ab encryption: refactor key retrieval
Get the encryption schema extension retrieval code out of
`wrap_file` method to make it reusable elsewhere

(cherry picked from commit 7e5e3c5569)
2025-07-16 12:45:58 +00:00
Ernest Zaslavsky
243ba1fb66 encryption: add encrypted_data_source class
Introduce the `encrypted_data_source` class that wraps an existing data
source to read and decrypt data on the fly using block encryption. Also add
unit tests to verify correct decryption behavior.
NOTE: The wrapped source MUST read from offset 0, `encrypted_data_source` assumes it is

Co-authored-by: Calle Wilund <calle@scylladb.com>
(cherry picked from commit 211daeaa40)
2025-07-16 12:45:58 +00:00
Patryk Jędrzejczak
7caacf958b test: test_zero_token_nodes_multidc: properly handle reads with CL=ONE
The test could fail with RF={DC1: 2, DC2: 0} and CL=ONE when:
- both writes succeeded with the same replica responding first,
- one of the following reads succeeded with the other replica
  responding before it applied mutations from any of the writes.

We fix the test by not expecting reads with CL=ONE to return a row.

We also harden the test by inserting different rows for every pair
(CL, coordinator), where one of the two coordinators is a normal
node from DC1, and the other one is a zero-token node from DC2.
This change makes sure that, for example, every write really
inserts a row.

Fixes scylladb/scylladb#22967

The fix addresses CI flakiness and only changes the test, so it
should be backported.

Closes scylladb/scylladb#23518

(cherry picked from commit 21edec1ace)

Closes scylladb/scylladb#24985
2025-07-15 15:47:43 +02:00
Botond Dénes
489e4fdb4e Merge '[Backport 2025.3] S3 chunked download source bug fixes' from Scylladb[bot]
- Fix missing negation in the `if` in the background downloading fiber
- Add test to catch this case
- Improve the s3 proxy to inject errors if the same resource requested more than once
- Suppress client retry since retrying the same request when each produces multiple buffers may lead to the same data appear more than once in the buffer deque
- Inject exception from the test to simulate response callback failure in the middle

No need to backport anything since this class in not used yet

- (cherry picked from commit f1d0690194)

- (cherry picked from commit e73b83e039)

- (cherry picked from commit 6d9cec558a)

- (cherry picked from commit ec59fcd5e4)

- (cherry picked from commit c75acd274c)

- (cherry picked from commit d2d69cbc8c)

- (cherry picked from commit e50f247bf1)

- (cherry picked from commit 49e8c14a86)

- (cherry picked from commit a5246bbe53)

- (cherry picked from commit acf15eba8e)

Parent PR: #24657

Closes scylladb/scylladb#24943

* github.com:scylladb/scylladb:
  s3_test: Add s3_client test for non-retryable error handling
  s3_test: Add trace logging for default_retry_strategy
  s3_client: Fix edge case when the range is exhausted
  s3_client: Fix indentation in try..catch block
  s3_client: Stop retries in chunked download source
  s3_client: Enhance test coverage for retry logic
  s3_client: Add test for Content-Range fix
  s3_client: Fix missing negation
  s3_client: Refine logging
  s3_client: Improve logging placement for current_range output
2025-07-15 15:28:48 +03:00
Michael Litvak
26738588db tablets: stop storage group on deallocation
When a tablet transitions to a post-cleanup stage on the leaving replica
we deallocate its storage group. Before the storage can be deallocated
and destroyed, we must make sure it's cleaned up and stopped properly.

Normally this happens during the tablet cleanup stage, when
table::cleanup_table is called, so by the time we transition to the next
stage the storage group is already stopped.

However, it's possible that tablet cleanup did not run in some scenario:
1. The topology coordinator runs tablet cleanup on the leaving replica.
2. The leaving replica is restarted.
3. When the leaving replica starts, still in `cleanup` stage, it
   allocates a storage group for the tablet.
4. The topology coordinator moves to the next stage.
5. The leaving replica deallocates the storage group, but it was not
   stopped.

To address this scenario, we always stop the storage group when
deallocating it. Usually it will be already stopped and complete
immediately, and otherwise it will be stopped in the background.

Fixes scylladb/scylladb#24857
Fixes scylladb/scylladb#24828

Closes scylladb/scylladb#24896

(cherry picked from commit fa24fd7cc3)

Closes scylladb/scylladb#24909
2025-07-15 13:14:35 +03:00
Aleksandra Martyniuk
f69f59afbd repair: Reduce max row buf size when small table optimization is on
If small_table_optimization is on, a repair works on a whole table
simultaneously. It may be distributed across the whole cluster and
all nodes might participate in repair.

On a repair master, row buffer is copied for each repair peer.
This means that the memory scales with the number of peers.

In large clusters, repair with small_table_optimization leads to OOM.

Divide the max_row_buf_size by the number of repair peers if
small_table_optimization is on.

Use max_row_buf_size to calculate number of units taken from mem_sem.

Fixes: https://github.com/scylladb/scylladb/issues/22244.

Closes scylladb/scylladb#24868

(cherry picked from commit 17272c2f3b)

Closes scylladb/scylladb#24907
2025-07-15 13:13:49 +03:00
Łukasz Paszkowski
e1e0c721e7 test.py: Fix test_compactionhistory_rows_merged_time_window_compaction_strategy
The test has two major problems
1. Wrongly computed time windows. Data was not spread across two 1-minute
   windows causing the test to generate even three sstables instead
   of two
2. Timestamp was not propagated to the prepared CQL statements. So
   in fact, a current time was used implicitly
3. Because of the incorrect timestamp issue, the remaining tests
   testing purged tombstones were affected as well.

Fixes https://github.com/scylladb/scylladb/issues/24532

Closes scylladb/scylladb#24609

(cherry picked from commit a22d1034af)

Closes scylladb/scylladb#24791
2025-07-15 13:12:39 +03:00
Yaron Kaikov
05a6d4da23 dist/common/scripts/scylla_sysconfig_setup: fix SyntaxWarning: invalid escape sequence
There are invalid escape sequence warnings where raw strings should be used for the regex patterns

Fixes: https://github.com/scylladb/scylladb/issues/24915

Closes scylladb/scylladb#24916

(cherry picked from commit fdcaa9a7e7)

Closes scylladb/scylladb#24970
2025-07-15 11:01:28 +02:00
Yaron Kaikov
1e1aeed3cd auto-backport.py: Avoid bot push to existing backport branches
Changed the backport logic so that the bot only pushes the backport branch if it does not already exist in the remote fork.
If the branch exists, the bot skips the push, allowing only users to update (force-push) the branch after the backport PR is open.

Fixes: https://github.com/scylladb/scylladb/issues/24953

Closes scylladb/scylladb#24954

(cherry picked from commit ed7c7784e4)

Closes scylladb/scylladb#24969
2025-07-15 10:25:30 +02:00
Jenkins Promoter
af10d6f03b Update pgo profiles - aarch64 2025-07-15 05:21:25 +03:00
Jenkins Promoter
0d3742227d Update pgo profiles - x86_64 2025-07-15 04:58:36 +03:00
Yaron Kaikov
c6987e3fed packaging: add ps command to dependancies
ScyllaDB container image doesn't have ps command installed, while this command is used by perftune.py script shipped within the same image. This breaks node and container tuning in Scylla Operator.

Fixes: #24827

Closes scylladb/scylladb#24830

(cherry picked from commit 66ff6ab6f9)

Closes scylladb/scylladb#24956
2025-07-14 14:19:17 +03:00
Ernest Zaslavsky
873c8503cd s3_test: Add s3_client test for non-retryable error handling
Introduce a test that injects a non-retryable error and verifies
that the chunked download source throws an exception as expected.

(cherry picked from commit acf15eba8e)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
dbf4bd162e s3_test: Add trace logging for default_retry_strategy
Introduce trace-level logging for `default_retry_strategy` in
`s3_test` to improve visibility into retry logic during test
execution.

(cherry picked from commit a5246bbe53)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
7f303bfda3 s3_client: Fix edge case when the range is exhausted
Handle case where the download loop exits after consuming all data,
but before receiving an empty buffer signaling EOF. Without this, the
next request is sent with a non-zero offset and zero length, resulting
in "Range request cannot be satisfied" errors. Now, an empty buffer is
pushed to indicate completion and exit the fiber properly.

(cherry picked from commit 49e8c14a86)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
22739df69f s3_client: Fix indentation in try..catch block
Correct indentation in the `try..catch` block to improve code
readability and maintain consistent formatting.

(cherry picked from commit e50f247bf1)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
54db6ca088 s3_client: Stop retries in chunked download source
Disable retries for S3 requests in the chunked download source to
prevent duplicate chunks from corrupting the buffer queue. The
response handler now throws an exception to bypass the retry
strategy, allowing the next range to be attempted cleanly.

This exception is only triggered for retryable errors; unretryable
ones immediately halt further requests.

(cherry picked from commit d2d69cbc8c)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
c841ffe398 s3_client: Enhance test coverage for retry logic
Extend the S3 proxy to support error injection when the client
makes multiple requests to the same resource—useful for testing
retry behavior and failure handling.

(cherry picked from commit c75acd274c)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
c748a97170 s3_client: Add test for Content-Range fix
Introduce a test that accurately verifies the Content-Range
behavior, ensuring the previous fix is properly validated.

(cherry picked from commit ec59fcd5e4)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
00f10e7f1d s3_client: Fix missing negation
Restore a missing `not` in a conditional check that caused
incorrect behavior during S3 client execution.

(cherry picked from commit 6d9cec558a)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
4cd1792528 s3_client: Refine logging
Fix typo in log message to improve clarity and accuracy during
S3 operations.

(cherry picked from commit e73b83e039)
2025-07-13 13:17:14 +00:00
Ernest Zaslavsky
115e8c85e4 s3_client: Improve logging placement for current_range output
Relocated logging to occur after determining the `current_range`,
ensuring more relevant output during S3 client operations.

(cherry picked from commit f1d0690194)
2025-07-13 13:17:14 +00:00
Gleb Natapov
087d3bb957 api: unregister raft_topology_get_cmd_status on shutdown
In c8ce9d1c60 we introduced
raft_topology_get_cmd_status REST api but the commit forgot to
unregister the handler during shutdown.

Fixes #24910

Closes scylladb/scylladb#24911

(cherry picked from commit 89f2edf308)

Closes scylladb/scylladb#24923
2025-07-13 15:15:52 +03:00
Avi Kivity
f3297824e3 Revert "config: decrease default large allocation warning threshold to 128k"
This reverts commit 04fb2c026d. 2025.3 got
the reduced threshold, but won't get many of the fixes the warning will
generate, leaving it very noisy. Better to avoid the noise for this release.

Fixes #24384.
2025-07-10 14:12:14 +03:00
Avi Kivity
4eb220d3ab service: tablet_allocator: avoid large contiguous vector in make_repair_plan()
make_repair_plan() allocates a temporary vector which can grow larger
than our 128k basic allocation unit. Use a chunked vector to avoid
stalls due to large allocations.

Fixes #24713.

Closes scylladb/scylladb#24801

(cherry picked from commit 0138afa63b)

Closes scylladb/scylladb#24902
2025-07-10 12:41:35 +03:00
Patryk Jędrzejczak
c9de7d68f2 Merge '[Backport 2025.3] Make it easier to debug stuck raft topology operation.' from Scylladb[bot]
The series adds more logging and provides new REST api around topology command rpc execution to allow easier debugging of stuck topology operations.

Backport since we want to have in the production as quick as possible.

Fixes #24860

- (cherry picked from commit c8ce9d1c60)

- (cherry picked from commit 4e6369f35b)

Parent PR: #24799

Closes scylladb/scylladb#24881

* https://github.com/scylladb/scylladb:
  topology coordinator: log a start and an end of topology coordinator command execution at info level
  topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc
2025-07-09 12:55:48 +02:00
Piotr Dulikowski
b535f44db2 Merge '[Backport 2025.3] batchlog_manager: abort replay of a failed batch on shutdown or node down' from Scylladb[bot]
When replaying a failed batch and sending the mutation to all replicas, make the write response handler cancellable and abort it on shutdown or if some target is marked down. also set a reasonable timeout so it gets aborted if it's stuck for some other unexpected reason.

Previously, the write response handler is not cancellable and has no timeout. This can cause a scenario where some write operation by the batchlog manager is stuck indefinitely, and node shutdown gets stuck as well because it waits for the batchlog manager to complete, without aborting the operation.

backport to relevant versions since the issue can cause node shutdown to hang

Fixes scylladb/scylladb#24599

- (cherry picked from commit 8d48b27062)

- (cherry picked from commit fc5ba4a1ea)

- (cherry picked from commit 7150632cf2)

- (cherry picked from commit 74a3fa9671)

- (cherry picked from commit a9b476e057)

- (cherry picked from commit d7af26a437)

Parent PR: #24595

Closes scylladb/scylladb#24882

* github.com:scylladb/scylladb:
  test: test_batchlog_manager: batchlog replay includes cdc
  test: test_batchlog_manager: test batch replay when a node is down
  batchlog_manager: set timeout on writes
  batchlog_manager: abort writes on shutdown
  batchlog_manager: create cancellable write response handler
  storage_proxy: add write type parameter to mutate_internal
2025-07-08 12:35:55 +02:00
Michael Litvak
ec1dd1bf31 test: test_batchlog_manager: batchlog replay includes cdc
Add a new test that verifies that when replaying batch mutations from
the batchlog, the mutations include cdc augmentation if needed.

This is done in order to verify that it works currently as expected and
doesn't break in the future.

(cherry picked from commit d7af26a437)
2025-07-08 06:25:36 +00:00
Michael Litvak
7b30f487dd test: test_batchlog_manager: test batch replay when a node is down
Add a test of the batchlog manager replay loop applying failed batches
while some replica is down.

The test reproduces an issue where the batchlog manager tries to replay
a failed batch, doesn't get a response from some replica, and becomes
stuck.

It verifies that the batchlog manager can eventually recover from this
situation and continue applying failed batches.

(cherry picked from commit a9b476e057)
2025-07-08 06:25:36 +00:00
Michael Litvak
c3c489d3d4 batchlog_manager: set timeout on writes
Set a timeout on writes of replayed batches by the batchlog manager.

We want to avoid having infinite timeout for the writes in case it gets
stuck for some unexpected reason.

The timeout is set to be high enough to allow any reasonable write to
complete.

(cherry picked from commit 74a3fa9671)
2025-07-08 06:25:36 +00:00
Michael Litvak
6fb6bb8dc7 batchlog_manager: abort writes on shutdown
On shutdown of batchlog manager, abort all writes of replayed batches
by the batchlog manager.

To achieve this we set the appropriate write_type to BATCH, and on
shutdown cancel all write handlers with this type.

(cherry picked from commit 7150632cf2)
2025-07-08 06:25:36 +00:00
Michael Litvak
02c038efa8 batchlog_manager: create cancellable write response handler
When replaying a batch mutation from the batchlog manager and sending it
to all replicas, create the write response handler as cancellable.

To achieve this we define a new wrapper type for batchlog mutations -
batchlog_replay_mutation, and this allows us to overload
create_write_response_handler for this type. This is similar to how it's
done with hint_wrapper and read_repair_mutation.

(cherry picked from commit fc5ba4a1ea)
2025-07-08 06:25:36 +00:00
Michael Litvak
d3175671b7 storage_proxy: add write type parameter to mutate_internal
Currently mutate_internal has a boolean parameter `counter_write` that
indicates whether the write is of counter type or not.

We replace it with a more general parameter that allows to indicate the
write type.

It is compatible with the previous behavior - for a counter write, the
type COUNTER is passed, and otherwise a default value will be used
as before.

(cherry picked from commit 8d48b27062)
2025-07-08 06:25:36 +00:00
Gleb Natapov
4651c44747 topology coordinator: log a start and an end of topology coordinator command execution at info level
Those calls a relatively rare and the output may help to analyze issues
in production.

(cherry picked from commit 4e6369f35b)
2025-07-08 06:24:22 +00:00
Gleb Natapov
0e67f6f6c2 topology coordinator: add REST endpoint to query the status of ongoing topology cmd rpc
The topology coordinator executes several topology cmd rpc against some nodes
during a topology change. A topology operation will not proceed unless
rpc completes (successfully or not), but sometimes it appears that it
hangs and it is hard to tell on which nodes it did not complete yet.
Introduce new REST endpoint that can help with debugging such cases.
If executed on the topology coordinator it returns currently running
topology rpc (if any) and a list of nodes that did not reply yet.

(cherry picked from commit c8ce9d1c60)
2025-07-08 06:24:21 +00:00
Avi Kivity
859d9dd3b1 Merge '[Backport 2025.3] Improve background disposal of tablet_metadata' from Scylladb[bot]
As seen in #23284, when the tablet_metadata contains many tables, even empty ones,
we're seeing a long queue of seastar tasks coming from the individual destruction of
`tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>`.

This change improves `tablet_metadata::clear_gently` to destroy the `tablet_map_ptr` objects
on their owner shard by sorting them into vectors, per- owner shard.

Also, background call to clear_gently was added to `~token_metadata`, as it is destroyed
arbitrarily when automatic token_metadata_ptr variables go out of scope, so that the
contained tablet_metadata would be cleared gently.

Finally, a unit test was added to reproduce the `Too long queue accumulated for gossip` symptom
and verify that it is gone with this change.

Fixes #24814
Refs #23284

This change is not marked as fixing the issue since we still need to verify that there is no impact on query performance, reactor stalls, or large allocations, with a large number of tablet-based tables.

* Since the issue exists in 2025.1, requesting backport to 2025.1 and upwards

- (cherry picked from commit 3acca0aa63)

- (cherry picked from commit 493a2303da)

- (cherry picked from commit e0a19b981a)

- (cherry picked from commit 2b2cfaba6e)

- (cherry picked from commit 2c0bafb934)

- (cherry picked from commit 4a3d14a031)

- (cherry picked from commit 6e4803a750)

Parent PR: #24618

Closes scylladb/scylladb#24864

* github.com:scylladb/scylladb:
  token_metadata_impl: clear_gently: release version tracker early
  test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
  token_metadata: clear_and_destroy_impl when destroyed
  token_metadata: keep a reference to shared_token_metadata
  token_metadata: move make_token_metadata_ptr into shared_token_metadata class
  replica: database: get and expose a mutable locator::shared_token_metadata
  locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
2025-07-07 14:02:19 +03:00
Gleb Natapov
a25bd068bf topology coordinator: do not set request_type field for truncation command if topology_global_request_queue feature is not enabled yet
Old nodes do not expect global topology request names to be in
request_type field, so set it only if a cluster is fully upgraded
already.

Closes scylladb/scylladb#24731

(cherry picked from commit ca7837550d)

Closes scylladb/scylladb#24833
2025-07-07 11:50:55 +02:00
Benny Halevy
9bc487e79e token_metadata_impl: clear_gently: release version tracker early
No need to wait for all members to be cleared gently.
We can release the version earlier since the
held version may be awaited for in barriers.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 6e4803a750)
2025-07-07 09:42:29 +03:00
Benny Halevy
41dc86ffa8 test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
Reproduces #23284

Currently skipped in release mode since it requires
the `short_tablet_stats_refresh_interval` interval.
Ref #24641

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4a3d14a031)
2025-07-07 09:42:26 +03:00
Benny Halevy
f78a352a29 token_metadata: clear_and_destroy_impl when destroyed
We have a lot of places in the code where
a token_metadata_ptr is kept in an automatic
variable and destroyed when it leaves the scope.
since it's a referenced counted lw_shared_ptr,
the token_metadata object is rarely destroyed in
those cases, but when it is, it doesn't go through
clear_gently, and in particular its tablet_metadata
is not cleared gently, leading to inefficient destruction
of potentially many foreign_ptr:s.

This patch calls clear_and_destroy_impl that gently
clears and destroys the impl object in the background
using the shared_token_metadata.

Fixes #13381

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 2c0bafb934)
2025-07-07 09:38:17 +03:00
Benny Halevy
b647dbd547 token_metadata: keep a reference to shared_token_metadata
To be used by a following patch to gently clean and destroy
the token_data_impl in the background.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 2b2cfaba6e)
2025-07-07 09:34:10 +03:00
Benny Halevy
0e7d3b4eb9 token_metadata: move make_token_metadata_ptr into shared_token_metadata class
So we can use the local shared_token_metadata instance
for safe background destroy of token_metadata_impl:s.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e0a19b981a)
2025-07-07 09:30:01 +03:00
Benny Halevy
c8043e05c1 replica: database: get and expose a mutable locator::shared_token_metadata
Prepare for next patch, the will use this shared_token_metadata
to make mutable_token_metadata_ptr:s

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 493a2303da)
2025-07-07 09:27:06 +03:00
Benny Halevy
54fb9ed03b locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
Sort all tablet_map_ptr:s by shard_id
and then destroy them on each shard to prevent
long cross-shard task queues for foreign_ptr destructions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 3acca0aa63)
2025-07-07 09:27:01 +03:00
Avi Kivity
f60c54df77 storage_proxy: avoid large allocation when storing batch in system.batchlog
Currently, when computing the mutation to be stored in system.batchlog,
we go through data_value. In turn this goes through `bytes` type
(#24810), so it causes a large contiguous allocation if the batch is
large.

Fix by going through the more primitive, but less contiguous,
atomic_cell API.

Fixes #24809.

Closes scylladb/scylladb#24811

(cherry picked from commit 60f407bff4)

Closes scylladb/scylladb#24846
2025-07-05 00:37:09 +03:00
Patryk Jędrzejczak
f1ec51133e docs: handling-node-failures: fix typo
Replacing "from" is incorrect. The typo comes from recently
merged #24583.

Fixes #24732

Requires backport to 2025.2 since #24583 has been backported to 2025.2.

Closes scylladb/scylladb#24733

(cherry picked from commit fa982f5579)

Closes scylladb/scylladb#24832
2025-07-04 19:35:00 +02:00
Jenkins Promoter
648fe6a4e8 Update ScyllaDB version to: 2025.3.0-rc1 2025-07-03 11:35:01 +03:00
Michał Chojnowski
1bd536a228 utils/alien_worker: fix a data race in submit()
We move a `seastar::promise` on the external worker thread,
after the matching `seastar::future` was returned to the shard.

That's illegal. If the `promise` move occurs concurrently with some
operation (move, await) on the `future`, it becomes a data race
which could cause various kinds of corruption.

This patch fixes that by keeping the promise at a stable address
on the shard (inside a coroutine frame) and only passing through
the worker.

Fixes #24751

Closes scylladb/scylladb#24752

(cherry picked from commit a29724479a)

Closes scylladb/scylladb#24780
2025-07-03 10:45:51 +03:00
Avi Kivity
d5b11098e8 repair: row_level: unstall to_repair_rows_on_wire() destroying its input
to_repair_rows_on_wire() moves the contents of its input std::list
and is careful to yield after each element, but the final destruction
of the input list still deals with all of the list elements without
yielding. This is expensive as not all contents of repair_row are moved
(_dk_with_hash is of type lw_shared_ptr<const decorated_key_with_hash>).

To fix, destroy each row element as we move along. This is safe as we
own the input and don't reference row_list other than for the iteration.

Fixes #24725.

Closes scylladb/scylladb#24726

(cherry picked from commit 6aa71205d8)

Closes scylladb/scylladb#24771
2025-07-03 10:44:58 +03:00
Tomasz Grabiec
775916132e Merge '[Backport 2025.3] repair: postpone repair until topology is not busy ' from Scylladb[bot]
Currently, repair_service::repair_tablets starts repair if there
is no ongoing tablet operations. The check does not consider global
topology operations, like tablet resize finalization.

Hence, if:
- topology is in the tablet_resize_finalization state;
- repair starts (as there is no tablet transitions) and holds the erm;
- resize finalization finishes;

then the repair sees a topology state different than the actual -
it does not see that the storage groups were already split.
Repair code does not handle this case and it results with
on_internal_error.

Start repair when topology is not busy. The check isn't atomic,
as it's done on a shard 0. Thus, we compare the topology versions
to ensure that the business check is valid.

Fixes: https://github.com/scylladb/scylladb/issues/24195.

Needs backport to all branches since they are affected

- (cherry picked from commit df152d9824)

- (cherry picked from commit 83c9af9670)

Parent PR: #24202

Closes scylladb/scylladb#24783

* github.com:scylladb/scylladb:
  test: add test for repair and resize finalization
  repair: postpone repair until topology is not busy
2025-07-02 13:17:08 +02:00
Calle Wilund
46e3794bde encryption_at_rest_test: Add exception handler to ensure proxy stop
If boost test is run such that we somehow except even in a test macro
such as BOOST_REQUIRE_THROW, we could end up not stopping the net proxy
used, causing a use after free.

(cherry picked from commit 8d37e5e24b)
2025-07-02 10:13:08 +00:00
Calle Wilund
b7a82898f0 encryption: Ensure stopping timers in provider cache objects
utils::loading cache has a timer that can, if we're unlucky, be runnnig
while the encryption context/extensions referencing the various host
objects containing them are destroyed in the case of unit testing.

Add a stop phase in encryption context shutdown closing the caches.

(cherry picked from commit ee98f5d361)
2025-07-02 10:13:08 +00:00
Jenkins Promoter
76bf279e0e Update pgo profiles - aarch64 2025-07-02 13:06:18 +03:00
Jenkins Promoter
61364624e3 Update pgo profiles - x86_64 2025-07-02 12:34:58 +03:00
Botond Dénes
6e6c00dcfe docs: cql/types.rst: remove reference to frozen-only UDTs
ScyllaDB supports non-frozen UDTs since 3.2, no need to keep referencing
this limitation in the current docs. Replace the description of the
limitation with general description of frozen semantics for UDTs.

Fixes: #22929

Closes scylladb/scylladb#24763

(cherry picked from commit 37ef9efb4e)

Closes scylladb/scylladb#24784
2025-07-02 12:11:25 +03:00
Aleksandra Martyniuk
c26eb8ef14 test: add test for repair and resize finalization
Add test that checks whether repair does not start if there is an
ongoing resize finalization.

(cherry picked from commit 83c9af9670)
2025-07-01 20:26:53 +00:00
Aleksandra Martyniuk
8a1d09862e repair: postpone repair until topology is not busy
Currently, repair_service::repair_tablets starts repair if there
is no ongoing tablet operations. The check does not consider global
topology operations, like tablet resize finalization. This may cause
a data race and unexpected behavior.

Start repair when topology is not busy.

(cherry picked from commit df152d9824)
2025-07-01 20:26:53 +00:00
Yaron Kaikov
e64bb3819c Update ScyllaDB version to: 2025.3.0-rc0 2025-07-01 10:34:39 +03:00
4742 changed files with 68723 additions and 205941 deletions

17
.github/CODEOWNERS vendored
View File

@@ -1,5 +1,5 @@
# AUTH
auth/* @nuivall
auth/* @nuivall @ptrsmrn
# CACHE
row_cache* @tgrabiec
@@ -25,15 +25,15 @@ compaction/* @raphaelsc
transport/*
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @nuivall
cql3/* @tgrabiec @nuivall @ptrsmrn
# COUNTERS
counters* @nuivall
tests/counter_test* @nuivall
counters* @nuivall @ptrsmrn
tests/counter_test* @nuivall @ptrsmrn
# DOCS
/docs/ @annastuchlik @tzach
/docs/alternator/ @annastuchlik @tzach @nyh
docs/* @annastuchlik @tzach
docs/alternator @annastuchlik @tzach @nyh
# GOSSIP
gms/* @tgrabiec @asias @kbr-scylla
@@ -57,6 +57,7 @@ repair/* @tgrabiec @asias
# SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec
db/legacy_schema_migrator* @tgrabiec
service/migration* @tgrabiec
schema* @tgrabiec
@@ -92,10 +93,6 @@ test/boost/querier_cache_test.cc @denesb
# PYTEST-BASED CQL TESTS
test/cqlpy/* @nyh
# TEST FRAMEWORK
test/pylib/* @xtrey
test.py @xtrey
# RAFT
raft/* @kbr-scylla @gleb-cloudius @kostja
test/raft/* @kbr-scylla @gleb-cloudius @kostja

View File

@@ -1,104 +0,0 @@
# ScyllaDB Development Instructions
## Project Context
High-performance distributed NoSQL database. Core values: performance, correctness, readability.
## Build System
### Using native OS environment
```bash
# Configure (run once)
./configure.py
# Build everything
ninja <mode>-build # modes: dev, debug, release, sanitize
# dev is recommended for development (fastest compilation)
# Build Scylla binary only (sufficient for Python integration tests)
ninja build/<mode>/scylla
# Build specific test
ninja build/<mode>/test/boost/<test_name>
```
### Using frozen toolchain (Docker)
Prefix any build command with `./tools/toolchain/dbuild`.
## Running Tests
### C++ Unit Tests
```bash
# Run all tests in a file
./test.py --mode=<mode> test/<suite>/<test_name>.cc
# Run a single test case from a file
./test.py --mode=<mode> test/<suite>/<test_name>.cc::<test_case_name>
# Examples
./test.py --mode=dev test/boost/memtable_test.cc
./test.py --mode=dev test/raft/raft_server_test.cc::test_check_abort_on_client_api
```
**Important:**
- Use full path with `.cc` extension (e.g., `test/boost/memtable_test.cc`)
- To run a single test case, append `::<test_case_name>` to the file path
- If you encounter permission issues with cgroup metrics, add `--no-gather-metrics` to the `./test.py` command
**Rebuilding Tests:**
- test.py does NOT automatically rebuild when test source files are modified
- Many tests are part of composite binaries (e.g., `combined_tests` in test/boost contains multiple test files)
- To find which binary contains a test, check `configure.py` in the repository root (primary source) or `test/<suite>/CMakeLists.txt`
- To rebuild a specific test binary: `ninja build/<mode>/test/<suite>/<binary_name>`
- Examples:
- `ninja build/dev/test/boost/combined_tests` (contains group0_voter_calculator_test.cc and others)
- `ninja build/dev/test/raft/replication_test` (standalone Raft test)
### Python Integration Tests
```bash
# Only requires Scylla binary (full build usually not needed)
ninja build/<mode>/scylla
# Run all tests in a file
./test.py --mode=<mode> test/<suite>/<test_name>.py
# Run a single test case from a file
./test.py --mode=<mode> test/<suite>/<test_name>.py::<test_function_name>
# Examples
./test.py --mode=dev test/alternator/
./test.py --mode=dev test/cqlpy/test_json.py
./test.py --mode=dev test/cluster/test_raft_voters.py::test_raft_limited_voters_retain_coordinator
# Optional flags
./test.py --mode=dev test/cluster/test_raft_no_quorum.py -v --repeat 5
```
**Important:**
- Use full path with `.py` extension
- To run a single test case, append `::<test_function_name>` to the file path
- Add `-v` for verbose output
- Add `--repeat <num>` to repeat a test multiple times
- After modifying C++ source files, only rebuild the Scylla binary for Python tests
## Code Philosophy
- Performance matters in hot paths (data read/write, inner loops)
- Self-documenting code through clear naming
- Comments explain "why", not "what"
- Prefer standard library over custom implementations
- Strive for simplicity and clarity, add complexity only when clearly justified
- Question requests: don't blindly implement requests - evaluate trade-offs, identify issues, and suggest better alternatives when appropriate
- Consider different approaches, weigh pros and cons, and recommend the best fit for the specific context
## Test Philosophy
- Performance matters. Tests should run as quickly as possible. Sleeps in the code are highly discouraged and should be avoided, to reduce run time and flakiness.
- Stability matters. Tests should be stable. New tests should be executed 100 times at least to ensure they pass 100 out of 100 times. (use --repeat 100 --max-failures 1 when running it)
- Unit tests should ideally test one thing only.
- Tests for bug fixes should run before the fix - and show the failure and after the fix - and show they now pass.
- Tests for bug fixes should have in their comments which bug fixes (GitHub or JIRA issue) they test.
- Tests in debug are always slower, so if needed, reduce number of iterations, rows, data used, cycles, etc. in debug mode.
- Tests should strive to be repeatable, and not use random input that will make their results unpredictable.
- Tests should consume as little resources as possible. Prefer running tests on a single node if it is sufficient, for example.
## New Files
- Include `LicenseRef-ScyllaDB-Source-Available-1.1` in the SPDX header
- Use the current year for new files; for existing code keep the year as is

View File

@@ -1,6 +1,6 @@
version: 2
updates:
- package-ecosystem: "uv"
- package-ecosystem: "pip"
directory: "/docs"
schedule:
interval: "daily"

View File

@@ -1,121 +0,0 @@
---
applyTo: "**/*.{cc,hh}"
---
# C++ Guidelines
**Important:** Always match the style and conventions of existing code in the file and directory.
## Memory Management
- Prefer stack allocation whenever possible
- Use `std::unique_ptr` by default for dynamic allocations
- `new`/`delete` are forbidden (use RAII)
- Use `seastar::lw_shared_ptr` or `seastar::shared_ptr` for shared ownership within same shard
- Use `seastar::foreign_ptr` for cross-shard sharing
- Avoid `std::shared_ptr` except when interfacing with external C++ APIs
- Avoid raw pointers except for non-owning references or C API interop
## Seastar Asynchronous Programming
- Use `seastar::future<T>` for all async operations
- Prefer coroutines (`co_await`, `co_return`) over `.then()` chains for readability
- Coroutines are preferred over `seastar::do_with()` for managing temporary state
- In hot paths where futures are ready, continuations may be more efficient than coroutines
- Chain futures with `.then()`, don't block with `.get()` (unless in `seastar::thread` context)
- All I/O must be asynchronous (no blocking calls)
- Use `seastar::gate` for shutdown coordination
- Use `seastar::semaphore` for resource limiting (not `std::mutex`)
- Break long loops with `maybe_yield()` to avoid reactor stalls
- Most Scylla code runs on a single shard where atomics are unnecessary
- Use Seastar message passing for cross-shard communication
## Coroutines
```cpp
seastar::future<T> func() {
auto result = co_await async_operation();
co_return result;
}
```
## Error Handling
- Throw exceptions for errors (futures propagate them automatically)
- In coroutines, use `co_await coroutine::return_exception_ptr()` or `co_return coroutine::exception()` to avoid the overhead of throwing
- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead
- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)
- Database-specific: throw appropriate schema/query exceptions
## Invariant Checking
- Prefer `throwing_assert()` (`utils/assert.hh`), it logs and throws instead of aborting
- Use `SCYLLA_ASSERT` where critical to system stability where no clean shutdown is possible, it aborts
- Use `on_internal_error()` for should-never-happen conditions that should be logged with backtrace
## Performance
- Pass large objects by `const&` or `&&` (move semantics)
- Use `std::string_view` for non-owning string references
- Avoid copies: prefer move semantics
- Use `utils::chunked_vector` instead of `std::vector` for large allocations (>128KB)
- Minimize dynamic allocations in hot paths
## Database-Specific Types
- Use `schema_ptr` for schema references
- Use `mutation` and `mutation_partition` for data modifications
- Use `partition_key` and `clustering_key` for keys
- Use `api::timestamp_type` for database timestamps
- Use `gc_clock` for garbage collection timing
## Style
- C++23 standard (prefer modern features, especially coroutines)
- Use `auto` when type is obvious from RHS
- Avoid `auto` when it obscures the type
- Use range-based for loops: `for (const auto& item : container)`
- Use standard algorithms when they clearly simplify code (e.g., replacing 10-line loops)
- Avoid chaining multiple algorithms if a straightforward loop is clearer
- Mark functions and variables `const` whenever possible
- Use scoped enums: `enum class` (not unscoped `enum`)
## Headers
- Use `#pragma once`
- Include order: own header, C++ std, Seastar, Boost, project headers
- Forward declare when possible
- Never `using namespace` in headers. Exception: most headers include `seastarx.hh`, which provides `using namespace seastar` project-wide.
## Documentation
- Public APIs require clear documentation
- Implementation details should be self-evident from code
- Use `///` or Doxygen `/** */` for public documentation, `//` for implementation notes - follow the existing style
## Naming
- `snake_case` for most identifiers (classes, functions, variables, namespaces)
- Template parameters: `CamelCase` (e.g., `template<typename ValueType>`)
- Member variables: prefix with `_` (e.g., `int _count;`)
- Structs (value-only): no `_` prefix on members
- Constants and `constexpr`: `snake_case` (e.g., `static constexpr int max_size = 100;`)
- Files: `.hh` for headers, `.cc` for source
## Formatting
- 4 spaces indentation, never tabs
- Opening braces on same line as control structure (except namespaces)
- Space after keywords: `if (`, `while (`, `return `
- Whitespace around operators matches precedence: `*a + *b` not `* a+* b`
- Line length: keep reasonable (<160 chars), use continuation lines with double indent if needed
- Brace all nested scopes, even single statements
- Minimal patches: only format code you modify, never reformat entire files
## Logging
- Use structured logging with appropriate levels: DEBUG, INFO, WARN, ERROR
- Include context in log messages (e.g., request IDs)
- Never log sensitive data (credentials, PII)
## Forbidden
- `malloc`/`free`
- `printf` family (use logging or fmt)
- Raw pointers for ownership
- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)
- New ad-hoc macros (prefer `inline`, `constexpr`, or templates; established project macros like `SCYLLA_ASSERT` are fine)
## Testing
When modifying existing code, follow TDD: create/update test first, then implement.
- Examine existing tests for style and structure
- Use Boost.Test framework
- Use `SEASTAR_THREAD_TEST_CASE` for Seastar asynchronous tests
- Aim for high code coverage, especially for new features and bug fixes
- Maintain bisectability: all tests must pass in every commit. Mark failing tests with `BOOST_FAIL()` or similar, then fix in subsequent commit

View File

@@ -1,51 +0,0 @@
---
applyTo: "**/*.py"
---
# Python Guidelines
**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.
## Style
- Match style of the file and directory you are editing; fall back to PEP 8 if unclear
- Use type hints for function signatures (unless directory style omits them)
- Use f-strings for formatting
- Line length: 160 characters max
- 4 spaces for indentation
## Imports
Order: standard library, third-party, local imports
```python
import os
import sys
import pytest
from cassandra.cluster import Cluster
from test.utils import setup_keyspace
```
Avoid wildcard imports (`from module import *`).
## Documentation
All public functions/classes need docstrings (unless the current directory conventions omit them):
```python
def my_function(arg1: str, arg2: int) -> bool:
"""
Brief summary of function purpose.
Args:
arg1: Description of first argument.
arg2: Description of second argument.
Returns:
Description of return value.
"""
pass
```
## Testing Best Practices
- Maintain bisectability: all tests must pass in every commit
- Mark currently-failing tests with `@pytest.mark.xfail`, unmark when fixed
- Use descriptive names that convey intent
- Docstrings/comments should explain what the test verifies and why, and if it reproduces a specific issue or how it fits into the larger test suite

View File

@@ -47,29 +47,13 @@ def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr
draft=is_draft
)
logging.info(f"Pull request created: {backport_pr.html_url}")
labels_to_add = []
priority_labels = {"P0", "P1"}
parent_pr_labels = [label.name for label in pr.labels]
for label in priority_labels:
if label in parent_pr_labels:
labels_to_add.append(label)
labels_to_add.append("force_on_cloud")
logging.info(f"Adding {label} and force_on_cloud labels from parent PR to backport PR")
break # Only apply the highest priority label
if is_collaborator:
backport_pr.add_to_assignees(pr.user)
if is_draft:
labels_to_add.append("conflicts")
backport_pr.add_to_labels("conflicts")
pr_comment = f"@{pr.user.login} - This PR was marked as draft because it has conflicts\n"
pr_comment += "Please resolve them and remove the 'conflicts' label. The PR will be made ready for review automatically."
pr_comment += "Please resolve them and mark this PR as ready for review"
backport_pr.create_issue_comment(pr_comment)
# Apply all labels at once if we have any
if labels_to_add:
backport_pr.add_to_labels(*labels_to_add)
logging.info(f"Added labels to backport PR: {labels_to_add}")
logging.info(f"Assigned PR to original author: {pr.user}")
return backport_pr
except GithubException as e:
@@ -142,31 +126,20 @@ def backport(repo, pr, version, commits, backport_base_branch, is_collaborator):
def with_github_keyword_prefix(repo, pr):
# GitHub issue pattern: #123, scylladb/scylladb#123, or full GitHub URLs
github_pattern = rf"(?:fix(?:|es|ed))\s*:?\s*(?:(?:(?:{repo.full_name})?#)|https://github\.com/{repo.full_name}/issues/)(\d+)"
# JIRA issue pattern: PKG-92 or https://scylladb.atlassian.net/browse/PKG-92
jira_pattern = r"(?:fix(?:|es|ed))\s*:?\s*(?:(?:https://scylladb\.atlassian\.net/browse/)?([A-Z]+-\d+))"
# Check PR body for GitHub issues
github_match = re.findall(github_pattern, pr.body, re.IGNORECASE)
# Check PR body for JIRA issues
jira_match = re.findall(jira_pattern, pr.body, re.IGNORECASE)
match = github_match or jira_match
if match:
pattern = rf"(?:fix(?:|es|ed))\s*:?\s*(?:(?:(?:{repo.full_name})?#)|https://github\.com/{repo.full_name}/issues/)(\d+)"
match = re.findall(pattern, pr.body, re.IGNORECASE)
if not match:
for commit in pr.get_commits():
match = re.findall(pattern, commit.commit.message, re.IGNORECASE)
if match:
print(f'{pr.number} has a valid close reference in commit message {commit.sha}')
break
if not match:
print(f'No valid close reference for {pr.number}')
return False
else:
return True
for commit in pr.get_commits():
github_match = re.findall(github_pattern, commit.commit.message, re.IGNORECASE)
jira_match = re.findall(jira_pattern, commit.commit.message, re.IGNORECASE)
if github_match or jira_match:
print(f'{pr.number} has a valid close reference in commit message {commit.sha}')
return True
print(f'No valid close reference for {pr.number}')
return False
def main():
args = parse_args()

View File

@@ -4,7 +4,7 @@
# Copyright (C) 2024-present ScyllaDB
#
#
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
#
import argparse

View File

@@ -30,13 +30,8 @@ def copy_labels_from_linked_issues(repo, pr_number):
try:
issue = repo.get_issue(int(issue_number))
for label in issue.labels:
# Copy ALL labels from issues to PR when PR is opened
pr.add_to_labels(label.name)
print(f"Copied label '{label.name}' from issue #{issue_number} to PR #{pr_number}")
if label.name in ['P0', 'P1']:
pr.add_to_labels('force_on_cloud')
print(f"Added force_on_cloud label to PR #{pr_number} due to {label.name} label")
print(f"All labels from issue #{issue_number} copied to PR #{pr_number}")
print(f"Labels from issue #{issue_number} copied to PR #{pr_number}")
except Exception as e:
print(f"Error processing issue #{issue_number}: {e}")
@@ -79,22 +74,9 @@ def sync_labels(repo, number, label, action, is_issue=False):
target = repo.get_issue(int(pr_or_issue_number))
if action == 'labeled':
target.add_to_labels(label)
if label in ['P0', 'P1'] and is_issue:
# Only add force_on_cloud to PRs when P0/P1 is added to an issue
target.add_to_labels('force_on_cloud')
print(f"Added 'force_on_cloud' label to PR #{pr_or_issue_number} due to {label} label")
print(f"Label '{label}' successfully added.")
elif action == 'unlabeled':
target.remove_from_labels(label)
if label in ['P0', 'P1'] and is_issue:
# Check if any other P0/P1 labels remain before removing force_on_cloud
remaining_priority_labels = [l.name for l in target.labels if l.name in ['P0', 'P1']]
if not remaining_priority_labels:
try:
target.remove_from_labels('force_on_cloud')
print(f"Removed 'force_on_cloud' label from PR #{pr_or_issue_number} as no P0/P1 labels remain")
except Exception as e:
print(f"Warning: Could not remove force_on_cloud label: {e}")
print(f"Label '{label}' successfully removed.")
elif action == 'opened':
copy_labels_from_linked_issues(repo, number)

View File

@@ -10,9 +10,6 @@ on:
types: [labeled, unlabeled]
branches: [master, next, enterprise]
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
check-commit:
runs-on: ubuntu-latest
@@ -33,7 +30,7 @@ jobs:
echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV
fi
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ env.DEFAULT_BRANCH }}
@@ -57,13 +54,10 @@ jobs:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}
- name: Check if a valid backport label exists and no backport_error
env:
LABELS_JSON: ${{ toJson(github.event.pull_request.labels) }}
id: check_label
run: |
labels_json="$LABELS_JSON"
echo "Checking labels:"
echo "$labels_json" | jq -r '.[].name'
labels_json='${{ toJson(github.event.pull_request.labels) }}'
echo "Checking labels: $(echo "$labels_json" | jq -r '.[].name')"
# Check if a valid backport label exists
if echo "$labels_json" | jq -e 'any(.[] | .name; test("backport/[0-9]+\\.[0-9]+$"))' > /dev/null; then

View File

@@ -5,18 +5,12 @@ on:
types: [opened, reopened, edited]
branches: [branch-*]
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
check-fixes-prefix:
runs-on: ubuntu-latest
permissions:
contents: read
issues: write
steps:
- name: Check PR body for "Fixes" prefix patterns
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@v7
with:
script: |
const body = context.payload.pull_request.body;
@@ -24,7 +18,7 @@ jobs:
// Regular expression pattern to check for "Fixes" prefix
// Adjusted to dynamically insert the repository full name
const pattern = `Fixes:? ((?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)|(?:https://scylladb\\.atlassian\\.net/browse/)?([A-Z]+-\\d+))`;
const pattern = `Fixes:? (?:#|${repo.replace('/', '\\/')}#|https://github\\.com/${repo.replace('/', '\\/')}/issues/)(\\d+)`;
const regex = new RegExp(pattern);
if (!regex.test(body)) {

View File

@@ -12,9 +12,6 @@ on:
description: 'the md5sum for scylla executable'
value: ${{ jobs.build.outputs.md5sum }}
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
read-toolchain:
uses: ./.github/workflows/read-toolchain.yaml
@@ -27,7 +24,7 @@ jobs:
outputs:
md5sum: ${{ steps.checksum.outputs.md5sum }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
submodules: recursive
- name: Generate the building system

View File

@@ -1,53 +0,0 @@
name: Backport with Jira Integration
on:
push:
branches:
- master
- next-*.*
- branch-*.*
pull_request_target:
types: [labeled, closed]
branches:
- master
- next
- next-*.*
- branch-*.*
jobs:
backport-on-push:
if: github.event_name == 'push'
uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
with:
event_type: 'push'
base_branch: ${{ github.ref }}
commits: ${{ github.event.before }}..${{ github.sha }}
secrets:
gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
backport-on-label:
if: github.event_name == 'pull_request_target' && github.event.action == 'labeled'
uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
with:
event_type: 'labeled'
base_branch: refs/heads/${{ github.event.pull_request.base.ref }}
pull_request_number: ${{ github.event.pull_request.number }}
head_commit: ${{ github.event.pull_request.base.sha }}
label_name: ${{ github.event.label.name }}
pr_state: ${{ github.event.pull_request.state }}
secrets:
gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
backport-chain:
if: github.event_name == 'pull_request_target' && github.event.action == 'closed' && github.event.pull_request.merged == true
uses: scylladb/github-automation/.github/workflows/backport-with-jira.yaml@main
with:
event_type: 'chain'
base_branch: refs/heads/${{ github.event.pull_request.base.ref }}
pull_request_number: ${{ github.event.pull_request.number }}
pr_body: ${{ github.event.pull_request.body }}
secrets:
gh_token: ${{ secrets.AUTO_BACKPORT_TOKEN }}
jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,18 +0,0 @@
name: Sync Jira Based on PR Events
on:
pull_request_target:
types: [opened, edited, ready_for_review, review_requested, labeled, unlabeled, closed]
permissions:
contents: read
pull-requests: write
issues: write
jobs:
jira-sync:
uses: scylladb/github-automation/.github/workflows/main_pr_events_jira_sync.yml@main
with:
caller_action: ${{ github.event.action }}
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,22 +0,0 @@
name: Sync Jira Based on PR Milestone Events
on:
pull_request_target:
types: [milestoned, demilestoned]
permissions:
contents: read
pull-requests: read
jobs:
jira-sync-milestone-set:
if: github.event.action == 'milestoned'
uses: scylladb/github-automation/.github/workflows/main_jira_sync_pr_milestone_set.yml@main
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}
jira-sync-milestone-removed:
if: github.event.action == 'demilestoned'
uses: scylladb/github-automation/.github/workflows/main_jira_sync_pr_milestone_removed.yml@main
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,14 +0,0 @@
name: Call Jira release creation for new milestone
on:
milestone:
types: [created, closed]
jobs:
sync-milestone-to-jira:
uses: scylladb/github-automation/.github/workflows/main_sync_milestone_to_jira_release.yml@main
with:
# Comma-separated list of Jira project keys
jira_project_keys: "SCYLLADB,CUSTOMER,SMI,RELENG,VECTOR"
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,18 +0,0 @@
name: validate_pr_author_email
on:
pull_request_target:
types:
- opened
- synchronize
- reopened
permissions:
contents: read
pull-requests: write
statuses: write
jobs:
validate_pr_author_email:
uses: scylladb/github-automation/.github/workflows/validate_pr_author_email.yml@main

View File

@@ -7,9 +7,8 @@ on:
env:
HEADER_CHECK_LINES: 10
LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.1"
LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.0"
CHECKED_EXTENSIONS: ".cc .hh .py"
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
check-license-headers:
@@ -20,7 +19,7 @@ jobs:
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@v4
with:
fetch-depth: 0
@@ -41,7 +40,7 @@ jobs:
- name: Comment on PR if check fails
if: failure()
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@v7
with:
script: |
const license = '${{ env.LICENSE }}';

View File

@@ -9,7 +9,6 @@ env:
# use the development branch explicitly
CLANG_VERSION: 21
BUILD_DIR: build
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
permissions: {}
@@ -33,7 +32,7 @@ jobs:
steps:
- run: |
sudo dnf -y install git
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
submodules: true
- name: Install build dependencies

View File

@@ -18,7 +18,6 @@ env:
BUILD_TYPE: RelWithDebInfo
BUILD_DIR: build
CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
permissions: {}
@@ -43,7 +42,7 @@ jobs:
IMAGE: ${{ needs.read-toolchain.image }}
run: |
echo ${{ needs.read-toolchain.image }}
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
submodules: true
- run: |

View File

@@ -1,65 +0,0 @@
name: Close issues created by Scylla associates
on:
issues:
types: [opened, reopened]
permissions:
issues: write
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
comment-and-close:
runs-on: ubuntu-latest
steps:
- name: Comment and close if author email is scylladb.com
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
script: |
const issue = context.payload.issue;
const actor = context.actor;
// Get user data (only public email is available)
const { data: user } = await github.rest.users.getByUsername({
username: actor,
});
const email = user.email || "";
console.log(`Actor: ${actor}, public email: ${email || "<none>"}`);
// Only continue if email exists and ends with @scylladb.com
if (!email || !email.toLowerCase().endsWith("@scylladb.com")) {
console.log("User is not a scylladb.com email (or email not public); skipping.");
return;
}
const owner = context.repo.owner;
const repo = context.repo.repo;
const issue_number = issue.number;
const body = "Issues in this repository are closed automatically. Scylla associates should use Jira to manage issues.\nPlease move this issue to Jira https://scylladb.atlassian.net/jira/software/c/projects/SCYLLADB/list";
// Add the comment
await github.rest.issues.createComment({
owner,
repo,
issue_number,
body,
});
console.log(`Comment added to #${issue_number}`);
// Close the issue
await github.rest.issues.update({
owner,
repo,
issue_number,
state: "closed",
state_reason: "not_planned"
});
console.log(`Issue #${issue_number} closed.`);

View File

@@ -4,16 +4,14 @@ on:
branches:
- master
permissions: {}
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
codespell:
name: Check for spelling errors
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: codespell-project/actions-codespell@8f01853be192eb0f849a5c7d721450e7a467c579 # v2.2
- uses: actions/checkout@v4
- uses: codespell-project/actions-codespell@master
with:
only_warn: 1
ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison,iif,tread"
ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"
skip: "./.git,./build,./tools,*.js,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

View File

@@ -1,38 +0,0 @@
name: Compare Build Systems
on:
pull_request:
branches:
- master
paths:
- 'configure.py'
- '**/CMakeLists.txt'
- 'cmake/**'
- 'scripts/compare_build_systems.py'
workflow_dispatch:
permissions:
contents: read
# cancel the in-progress run upon a repush
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
jobs:
read-toolchain:
uses: ./.github/workflows/read-toolchain.yaml
compare:
name: Compare configure.py vs CMake
needs:
- read-toolchain
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
steps:
- uses: actions/checkout@v4
with:
submodules: true
- name: Compare build systems
run: |
git config --global --add safe.directory $GITHUB_WORKSPACE
python3 scripts/compare_build_systems.py --ci

View File

@@ -10,10 +10,7 @@ on:
- 'master'
- 'branch-*'
schedule:
- cron: '0 10 * * 1' # Runs every Monday at 10:00am
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
- cron: '0 10 * * 1,4' # Runs every Monday and Thursday at 10:00am
jobs:
notify_conflict_prs:
@@ -21,7 +18,7 @@ jobs:
steps:
- name: Notify PR Authors of Conflicts
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@v7
with:
script: |
console.log("Starting conflict reminder script...");
@@ -45,10 +42,10 @@ jobs:
const recentPrs = prs.filter(pr => new Date(pr.created_at) >= twoMonthsAgo);
const validBaseBranches = ['master'];
const branchPrefix = 'branch-';
const oneWeekAgo = new Date();
const threeDaysAgo = new Date();
const conflictLabel = 'conflicts';
oneWeekAgo.setDate(oneWeekAgo.getDate() - 7);
console.log(`One week ago: ${oneWeekAgo.toISOString()}`);
threeDaysAgo.setDate(threeDaysAgo.getDate() - 3);
console.log(`Three days ago: ${threeDaysAgo.toISOString()}`);
for (const pr of recentPrs) {
console.log(`Checking PR #${pr.number} on base branch '${pr.base.ref}'`);
@@ -60,8 +57,8 @@ jobs:
}
const updatedDate = new Date(pr.updated_at);
console.log(`PR #${pr.number} last updated at: ${updatedDate.toISOString()}`);
if (!isPushEvent && updatedDate >= oneWeekAgo) {
console.log(`PR #${pr.number} skipped: updated within last week`);
if (!isPushEvent && updatedDate >= threeDaysAgo) {
console.log(`PR #${pr.number} skipped: updated within last 3 days`);
continue;
}
if (pr.assignee === null) {
@@ -93,41 +90,34 @@ jobs:
const hasConflictLabel = pr.labels.some(label => label.name === conflictLabel);
console.log(`PR #${pr.number} has conflict label: ${hasConflictLabel}`);
// Fetch comments to check for existing notifications
const comments = await github.paginate(github.rest.issues.listComments, {
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
per_page: 100,
});
// Find last notification comment from the bot
const notificationPrefix = `@${pr.assignee.login}, this PR has merge conflicts with the base branch.`;
const lastNotification = comments
.filter(c =>
c.user.type === "Bot" &&
c.body.startsWith(notificationPrefix)
)
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
// Check if we should skip notification based on recent notification
let shouldSkipNotification = false;
if (lastNotification) {
const lastNotified = new Date(lastNotification.created_at);
if (lastNotified >= oneWeekAgo) {
console.log(`PR #${pr.number} skipped: last notification was less than 1 week ago`);
shouldSkipNotification = true;
}
}
// Additional check for push events on draft PRs with conflict labels
if (
isPushEvent &&
pr.draft === true &&
hasConflictLabel &&
shouldSkipNotification
hasConflictLabel
) {
continue;
// Fetch comments to find last bot notification
const comments = await github.paginate(github.rest.issues.listComments, {
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
per_page: 100,
});
// Find last notification comment from the bot (by body and user)
const botLogin = context.actor;
const notificationPrefix = `@${pr.assignee.login}, this PR has merge conflicts with the base branch.`;
const lastNotification = comments
.filter(c =>
c.user.type === "Bot" &&
c.body.startsWith(notificationPrefix)
)
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
if (lastNotification) {
const lastNotified = new Date(lastNotification.created_at);
if (lastNotified >= threeDaysAgo) {
console.log(`PR #${pr.number} skipped: last notification was less than 3 days ago`);
continue;
}
}
}
if (!hasConflictLabel) {
@@ -139,9 +129,8 @@ jobs:
});
console.log(`Added 'conflicts' label to PR #${pr.number}`);
}
const assignee = pr.assignee.login;
if (assignee && !shouldSkipNotification) {
if (assignee) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,

View File

@@ -13,9 +13,6 @@ on:
permissions:
contents: read
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
lint:
runs-on: ubuntu-latest
@@ -24,12 +21,12 @@ jobs:
security-events: write
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Differential ShellCheck
uses: redhat-plumbers-in-action/differential-shellcheck@d965e66ec0b3b2f821f75c8eff9b12442d9a7d1e # v5.5.6
uses: redhat-plumbers-in-action/differential-shellcheck@v5
with:
severity: warning
token: ${{ secrets.GITHUB_TOKEN }}

View File

@@ -5,7 +5,6 @@ name: "Docs / Publish"
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}
DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
on:
push:
@@ -19,24 +18,18 @@ on:
jobs:
release:
permissions:
pages: write
id-token: write
contents: write
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@v4
with:
ref: ${{ env.DEFAULT_BRANCH }}
persist-credentials: false
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install uv
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
python-version: "3.10"
- name: Set up env
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs

View File

@@ -2,12 +2,8 @@ name: "Docs / Build PR"
# For more information,
# see https://sphinx-theme.scylladb.com/stable/deployment/production.html#available-workflows
permissions:
contents: read
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
on:
pull_request:
@@ -23,16 +19,14 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@v4
with:
persist-credentials: false
fetch-depth: 0
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: actions/setup-python@v5
with:
python-version: "3.12"
- name: Install uv
uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0
python-version: "3.10"
- name: Set up env
run: make -C docs FLAG="${{ env.FLAG }}" setupenv
- name: Build docs

View File

@@ -1,40 +0,0 @@
name: Docs / Validate metrics
permissions:
contents: read
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
on:
pull_request:
branches:
- master
- enterprise
paths:
- '**/*.cc'
- 'scripts/metrics-config.yml'
- 'scripts/get_description.py'
- 'docs/_ext/scylladb_metrics.py'
jobs:
validate-metrics:
runs-on: ubuntu-latest
name: Check metrics documentation coverage
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
submodules: true
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.10'
- name: Install dependencies
run: pip install PyYAML
- name: Validate metrics
run: python3 scripts/get_description.py --validate -c scripts/metrics-config.yml

View File

@@ -13,10 +13,8 @@ env:
# supposed to be processed by idl-compiler.py, so we don't check them using the cleaner
CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction db dht gms index lang message mutation mutation_writer node_ops raft redis replica service
SEASTAR_BAD_INCLUDE_OUTPUT_PATH: build/seastar-bad-include.log
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
permissions:
contents: read
permissions: {}
# cancel the in-progress run upon a repush
concurrency:
@@ -33,9 +31,11 @@ jobs:
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
submodules: true
- run: |
sudo dnf -y install clang-tools-extra
- name: Generate compilation database
run: |
cmake \
@@ -90,7 +90,7 @@ jobs:
| tee "$SEASTAR_BAD_INCLUDE_OUTPUT_PATH"
- run: |
echo "::remove-matcher owner=seastar-bad-include::"
- uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0
- uses: actions/upload-artifact@v4
with:
name: Logs
path: |

View File

@@ -7,7 +7,6 @@ on:
env:
DEFAULT_BRANCH: 'master'
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
mark-ready:
@@ -18,7 +17,7 @@ jobs:
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@v4
with:
repository: ${{ github.repository }}
ref: ${{ env.DEFAULT_BRANCH }}

View File

@@ -5,8 +5,6 @@ on:
branches:
- master
- next
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
label:
if: github.event.pull_request.draft == false
@@ -17,7 +15,7 @@ jobs:
steps:
- name: Wait for label to be added
run: sleep 1m
- uses: mheap/github-action-required-labels@0ac283b4e65c1fb28ce6079dea5546ceca98ccbe # v5.5.2
- uses: mheap/github-action-required-labels@v5
with:
mode: minimum
count: 1

View File

@@ -7,18 +7,13 @@ on:
description: "the toolchain docker image"
value: ${{ jobs.read-toolchain.outputs.image }}
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
read-toolchain:
runs-on: ubuntu-latest
permissions:
contents: read
outputs:
image: ${{ steps.read.outputs.image }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
sparse-checkout: tools/toolchain/image
sparse-checkout-cone-mode: false

View File

@@ -13,7 +13,6 @@ concurrency:
env:
BUILD_DIR: build
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
read-toolchain:
@@ -30,12 +29,12 @@ jobs:
- RelWithDebInfo
- Dev
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
submodules: true
- run: |
rm -rf seastar
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@v4
with:
repository: scylladb/seastar
submodules: true

View File

@@ -7,9 +7,6 @@ on:
issues:
types: [labeled, unlabeled]
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
label-sync:
if: ${{ github.repository == 'scylladb/scylladb' }}
@@ -24,7 +21,7 @@ jobs:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@v4
with:
sparse-checkout: |
.github/scripts/sync_labels.py
@@ -40,13 +37,13 @@ jobs:
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}
- name: Pull request labeled or unlabeled event
if: github.event_name == 'pull_request_target' && (startsWith(github.event.label.name, 'backport/') || github.event.label.name == 'P0' || github.event.label.name == 'P1')
if: github.event_name == 'pull_request_target' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}
- name: Issue labeled or unlabeled event
if: github.event_name == 'issues' && (startsWith(github.event.label.name, 'backport/') || github.event.label.name == 'P0' || github.event.label.name == 'P1')
if: github.event_name == 'issues' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.issue.number }} --action ${{ github.event.action }} --is_issue --label ${{ github.event.label.name }}

View File

@@ -1,66 +0,0 @@
name: Trigger Scylla CI Route
permissions:
contents: read
on:
issue_comment:
types: [created]
pull_request_target:
types:
- unlabeled
jobs:
trigger-jenkins:
if: (github.event_name == 'issue_comment' && github.event.comment.user.login != 'scylladbbot') || github.event.label.name == 'conflicts'
runs-on: ubuntu-latest
steps:
- name: Verify Org Membership
id: verify_author
env:
EVENT_NAME: ${{ github.event_name }}
PR_AUTHOR: ${{ github.event.pull_request.user.login }}
PR_ASSOCIATION: ${{ github.event.pull_request.author_association }}
COMMENT_AUTHOR: ${{ github.event.comment.user.login }}
COMMENT_ASSOCIATION: ${{ github.event.comment.author_association }}
shell: bash
run: |
if [[ "$EVENT_NAME" == "pull_request_target" ]]; then
AUTHOR="$PR_AUTHOR"
ASSOCIATION="$PR_ASSOCIATION"
else
AUTHOR="$COMMENT_AUTHOR"
ASSOCIATION="$COMMENT_ASSOCIATION"
fi
if [[ "$ASSOCIATION" == "MEMBER" || "$ASSOCIATION" == "OWNER" ]]; then
echo "member=true" >> $GITHUB_OUTPUT
else
echo "::warning::${AUTHOR} is not a member of scylladb (association: ${ASSOCIATION}); skipping CI trigger."
echo "member=false" >> $GITHUB_OUTPUT
fi
- name: Validate Comment Trigger
if: github.event_name == 'issue_comment'
id: verify_comment
env:
COMMENT_BODY: ${{ github.event.comment.body }}
shell: bash
run: |
CLEAN_BODY=$(echo "$COMMENT_BODY" | grep -v '^[[:space:]]*>')
if echo "$CLEAN_BODY" | grep -qi '@scylladbbot' && echo "$CLEAN_BODY" | grep -qi 'trigger-ci'; then
echo "trigger=true" >> $GITHUB_OUTPUT
else
echo "trigger=false" >> $GITHUB_OUTPUT
fi
- name: Trigger Scylla-CI-Route Jenkins Job
if: steps.verify_author.outputs.member == 'true' && (github.event_name == 'pull_request_target' || steps.verify_comment.outputs.trigger == 'true')
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
PR_NUMBER: "${{ github.event.issue.number || github.event.pull_request.number }}"
PR_REPO_NAME: "${{ github.event.repository.full_name }}"
run: |
curl -X POST "$JENKINS_URL/job/releng/job/Scylla-CI-Route/buildWithParameters?PR_NUMBER=$PR_NUMBER&PR_REPO_NAME=$PR_REPO_NAME" \
--user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail

View File

@@ -1,245 +0,0 @@
name: Trigger next gating
on:
pull_request_target:
types: [opened, reopened, synchronize]
issue_comment:
types: [created]
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
jobs:
trigger-ci:
runs-on: ubuntu-latest
steps:
- name: Dump GitHub context
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Checkout PR code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0 # Needed to access full history
ref: ${{ github.event.pull_request.head.ref }}
- name: Fetch before commit if needed
run: |
if ! git cat-file -e ${{ github.event.before }} 2>/dev/null; then
echo "Fetching before commit ${{ github.event.before }}"
git fetch --depth=1 origin ${{ github.event.before }}
fi
- name: Compare commits for file changes
if: github.action == 'synchronize'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
echo "Base: ${{ github.event.before }}"
echo "Head: ${{ github.event.after }}"
TREE_BEFORE=$(git show -s --format=%T ${{ github.event.before }})
TREE_AFTER=$(git show -s --format=%T ${{ github.event.after }})
echo "TREE_BEFORE=$TREE_BEFORE" >> $GITHUB_ENV
echo "TREE_AFTER=$TREE_AFTER" >> $GITHUB_ENV
- name: Check if last push has file changes
run: |
if [[ "${{ env.TREE_BEFORE }}" == "${{ env.TREE_AFTER }}" ]]; then
echo "No file changes detected in the last push, only commit message edit."
echo "has_file_changes=false" >> $GITHUB_ENV
else
echo "File changes detected in the last push."
echo "has_file_changes=true" >> $GITHUB_ENV
fi
- name: Rule 1 - Check PR draft or conflict status
run: |
# Check if PR is in draft mode
IS_DRAFT="${{ github.event.pull_request.draft }}"
# Check if PR has 'conflict' label
HAS_CONFLICT_LABEL="false"
LABELS='${{ toJson(github.event.pull_request.labels) }}'
if echo "$LABELS" | jq -r '.[].name' | grep -q "^conflict$"; then
HAS_CONFLICT_LABEL="true"
fi
# Set draft_or_conflict variable
if [[ "$IS_DRAFT" == "true" || "$HAS_CONFLICT_LABEL" == "true" ]]; then
echo "draft_or_conflict=true" >> $GITHUB_ENV
echo "✅ Rule 1: PR is in draft mode or has conflict label - setting draft_or_conflict=true"
else
echo "draft_or_conflict=false" >> $GITHUB_ENV
echo "✅ Rule 1: PR is ready and has no conflict label - setting draft_or_conflict=false"
fi
echo "Draft status: $IS_DRAFT"
echo "Has conflict label: $HAS_CONFLICT_LABEL"
echo "Result: draft_or_conflict = $draft_or_conflict"
- name: Rule 2 - Check labels
run: |
# Check if PR has P0 or P1 labels
HAS_P0_P1_LABEL="false"
LABELS='${{ toJson(github.event.pull_request.labels) }}'
if echo "$LABELS" | jq -r '.[].name' | grep -E "^(P0|P1)$" > /dev/null; then
HAS_P0_P1_LABEL="true"
fi
# Check if PR already has force_on_cloud label
echo "HAS_FORCE_ON_CLOUD_LABEL=false" >> $GITHUB_ENV
if echo "$LABELS" | jq -r '.[].name' | grep -q "^force_on_cloud$"; then
HAS_FORCE_ON_CLOUD_LABEL="true"
echo "HAS_FORCE_ON_CLOUD_LABEL=true" >> $GITHUB_ENV
fi
echo "Has P0/P1 label: $HAS_P0_P1_LABEL"
echo "Has force_on_cloud label: $HAS_FORCE_ON_CLOUD_LABEL"
# Add force_on_cloud label if PR has P0/P1 and doesn't already have force_on_cloud
if [[ "$HAS_P0_P1_LABEL" == "true" && "$HAS_FORCE_ON_CLOUD_LABEL" == "false" ]]; then
echo "✅ Rule 2: PR has P0 or P1 label - adding force_on_cloud label"
curl -X POST \
-H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/labels" \
-d '{"labels":["force_on_cloud"]}'
elif [[ "$HAS_P0_P1_LABEL" == "true" && "$HAS_FORCE_ON_CLOUD_LABEL" == "true" ]]; then
echo "✅ Rule 2: PR has P0 or P1 label and already has force_on_cloud label - no action needed"
else
echo "✅ Rule 2: PR does not have P0 or P1 label - no force_on_cloud label needed"
fi
SKIP_UNIT_TEST_CUSTOM="false"
if echo "$LABELS" | jq -r '.[].name' | grep -q "^ci/skip_unit-tests_custom$"; then
SKIP_UNIT_TEST_CUSTOM="true"
fi
echo "SKIP_UNIT_TEST_CUSTOM=$SKIP_UNIT_TEST_CUSTOM" >> $GITHUB_ENV
- name: Rule 3 - Analyze changed files and set build requirements
run: |
# Get list of changed files
CHANGED_FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }})
echo "Changed files:"
echo "$CHANGED_FILES"
echo ""
# Initialize all requirements to false
REQUIRE_BUILD="false"
REQUIRE_DTEST="false"
REQUIRE_UNITTEST="false"
REQUIRE_ARTIFACTS="false"
REQUIRE_SCYLLA_GDB="false"
# Check each file against patterns
while IFS= read -r file; do
if [[ -n "$file" ]]; then
echo "Checking file: $file"
# Build pattern: ^(?!scripts\/pull_github_pr.sh).*$
# Everything except scripts/pull_github_pr.sh
if [[ "$file" != "scripts/pull_github_pr.sh" ]]; then
REQUIRE_BUILD="true"
echo " ✓ Matches build pattern"
fi
# Dtest pattern: ^(?!test(.py|\/)|dist\/docker\/|dist\/common\/scripts\/).*$
# Everything except test files, dist/docker/, dist/common/scripts/
if [[ ! "$file" =~ ^test\.(py|/).*$ ]] && [[ ! "$file" =~ ^dist/docker/.*$ ]] && [[ ! "$file" =~ ^dist/common/scripts/.*$ ]]; then
REQUIRE_DTEST="true"
echo " ✓ Matches dtest pattern"
fi
# Unittest pattern: ^(?!dist\/docker\/|dist\/common\/scripts).*$
# Everything except dist/docker/, dist/common/scripts/
if [[ ! "$file" =~ ^dist/docker/.*$ ]] && [[ ! "$file" =~ ^dist/common/scripts.*$ ]]; then
REQUIRE_UNITTEST="true"
echo " ✓ Matches unittest pattern"
fi
# Artifacts pattern: ^(?:dist|tools\/toolchain).*$
# Files starting with dist or tools/toolchain
if [[ "$file" =~ ^dist.*$ ]] || [[ "$file" =~ ^tools/toolchain.*$ ]]; then
REQUIRE_ARTIFACTS="true"
echo " ✓ Matches artifacts pattern"
fi
# Scylla GDB pattern: ^(scylla-gdb.py).*$
# Files starting with scylla-gdb.py
if [[ "$file" =~ ^scylla-gdb\.py.*$ ]]; then
REQUIRE_SCYLLA_GDB="true"
echo " ✓ Matches scylla_gdb pattern"
fi
fi
done <<< "$CHANGED_FILES"
# Set environment variables
echo "requireBuild=$REQUIRE_BUILD" >> $GITHUB_ENV
echo "requireDtest=$REQUIRE_DTEST" >> $GITHUB_ENV
echo "requireUnittest=$REQUIRE_UNITTEST" >> $GITHUB_ENV
echo "requireArtifacts=$REQUIRE_ARTIFACTS" >> $GITHUB_ENV
echo "requireScyllaGdb=$REQUIRE_SCYLLA_GDB" >> $GITHUB_ENV
echo ""
echo "✅ Rule 3: File analysis complete"
echo "Build required: $REQUIRE_BUILD"
echo "Dtest required: $REQUIRE_DTEST"
echo "Unittest required: $REQUIRE_UNITTEST"
echo "Artifacts required: $REQUIRE_ARTIFACTS"
echo "Scylla GDB required: $REQUIRE_SCYLLA_GDB"
- name: Determine Jenkins Job Name
run: |
if [[ "${{ github.ref_name }}" == "next" ]]; then
FOLDER_NAME="scylla-master"
elif [[ "${{ github.ref_name }}" == "next-enterprise" ]]; then
FOLDER_NAME="scylla-enterprise"
else
VERSION=$(echo "${{ github.ref_name }}" | awk -F'-' '{print $2}')
if [[ "$VERSION" =~ ^202[0-4]\.[0-9]+$ ]]; then
FOLDER_NAME="enterprise-$VERSION"
elif [[ "$VERSION" =~ ^[0-9]+\.[0-9]+$ ]]; then
FOLDER_NAME="scylla-$VERSION"
fi
fi
echo "JOB_NAME=${FOLDER_NAME}/job/scylla-ci" >> $GITHUB_ENV
- name: Trigger Jenkins Job
if: env.draft_or_conflict == 'false' && env.has_file_changes == 'true' && github.action == 'opened' || github.action == 'reopened'
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
run: |
PR_NUMBER=${{ github.event.issue.number }}
PR_REPO_NAME=${{ github.event.repository.full_name }}
echo "Triggering Jenkins Job: $JOB_NAME"
curl -X POST \
"$JENKINS_URL/job/$JOB_NAME/buildWithParameters? \
PR_NUMBER=$PR_NUMBER& \
RUN_DTEST=$REQUIRE_DTEST& \
RUN_ONLY_SCYLLA_GDB=$REQUIRE_SCYLLA_GDB& \
RUN_UNIT_TEST=$REQUIRE_UNITTEST& \
FORCE_ON_CLOUD=$HAS_FORCE_ON_CLOUD_LABEL& \
SKIP_UNIT_TEST_CUSTOM=$SKIP_UNIT_TEST_CUSTOM& \
RUN_ARTIFACT_TESTS=$REQUIRE_ARTIFACTS" \
--fail \
--user "$JENKINS_USER:$JENKINS_API_TOKEN" \
-i -v
trigger-ci-via-comment:
if: github.event.comment.user.login != 'scylladbbot' && contains(github.event.comment.body, '@scylladbbot') && contains(github.event.comment.body, 'trigger-ci')
runs-on: ubuntu-latest
steps:
- name: Trigger Scylla-CI Jenkins Job
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
run: |
PR_NUMBER=${{ github.event.issue.number }}
PR_REPO_NAME=${{ github.event.repository.full_name }}
curl -X POST "$JENKINS_URL/job/$JOB_NAME/buildWithParameters?PR_NUMBER=$PR_NUMBER" \
--user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail -i -v

View File

@@ -1,8 +1,5 @@
name: Trigger next gating
permissions:
contents: read
on:
push:
branches:

View File

@@ -2,10 +2,7 @@ name: Urgent Issue Reminder
on:
schedule:
- cron: '10 8 * * *' # Runs daily at 8 AM
env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true
- cron: '10 8 * * 1' # Runs every Monday at 8 AM
jobs:
reminder:
@@ -13,7 +10,7 @@ jobs:
steps:
- name: Send reminders
uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0
uses: actions/github-script@v7
with:
script: |
const labelFilters = ['P0', 'P1', 'Field-Tier1','status/release blocker', 'status/regression'];

2
.gitignore vendored
View File

@@ -35,5 +35,3 @@ compile_commands.json
.envrc
clang_build
.idea/
nuke
rust/target

View File

@@ -1,16 +0,0 @@
# ScyllaDB — AI Agent Instructions
This file routes you to the relevant instruction files.
Do NOT load all files at once — read only what applies to your current task.
## Instruction Files
- `.github/copilot-instructions.md` — build system, test runner, code philosophy, test philosophy
- `.github/instructions/cpp.instructions.md` — C++ style, Seastar patterns, memory, error handling (for `*.cc`, `*.hh`)
- `.github/instructions/python.instructions.md` — Python style, testing conventions (for `*.py`)
## Which files to read
- **Always read** `.github/copilot-instructions.md` for build/test commands and project values
- **If editing C++ files** (`*.cc`, `*.hh`): also read `.github/instructions/cpp.instructions.md`
- **If editing Python files** (`*.py`): also read `.github/instructions/python.instructions.md`

View File

@@ -2,12 +2,6 @@ cmake_minimum_required(VERSION 3.27)
project(scylla)
# Disable CMake's automatic -fcolor-diagnostics injection (CMake 3.24+ adds
# it for Clang+Ninja). configure.py does not add any color diagnostics flags,
# so we clear the internal CMake variable to prevent injection.
set(CMAKE_CXX_COMPILE_OPTIONS_COLOR_DIAGNOSTICS "")
set(CMAKE_C_COMPILE_OPTIONS_COLOR_DIAGNOSTICS "")
list(APPEND CMAKE_MODULE_PATH
${CMAKE_CURRENT_SOURCE_DIR}/cmake
${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)
@@ -55,17 +49,7 @@ include(limit_jobs)
set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")
set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")
set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")
set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)
# Global defines matching configure.py
# Since gcc 13, libgcc doesn't need the exception workaround
add_compile_definitions(SEASTAR_NO_EXCEPTION_HACK)
# Hacks needed to expose internal APIs for xxhash dependencies
add_compile_definitions(XXH_PRIVATE_API)
# SEASTAR_TESTING_MAIN is added later (after add_subdirectory(seastar) and
# add_subdirectory(abseil)) to avoid leaking into the seastar subdirectory.
# If SEASTAR_TESTING_MAIN is defined globally before seastar, it causes a
# duplicate 'main' symbol in seastar_testing.
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
if(is_multi_config)
find_package(Seastar)
@@ -106,39 +90,18 @@ if(is_multi_config)
add_dependencies(Seastar::seastar_testing Seastar)
else()
set(Seastar_TESTING ON CACHE BOOL "" FORCE)
set(Seastar_API_LEVEL 9 CACHE STRING "" FORCE)
set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)
set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)
set(Seastar_APPS ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_IO_URING ON CACHE BOOL "" FORCE)
set(Seastar_SCHEDULING_GROUPS_COUNT 24 CACHE STRING "" FORCE)
set(Seastar_SCHEDULING_GROUPS_COUNT 19 CACHE STRING "" FORCE)
set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)
# Match configure.py's build_seastar_shared_libs: Debug and Dev
# build Seastar as a shared library, others build it static.
if(CMAKE_BUILD_TYPE STREQUAL "Debug" OR CMAKE_BUILD_TYPE STREQUAL "Dev")
set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)
else()
set(BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)
endif()
add_subdirectory(seastar)
# Coverage mode sets cmake_build_type='Debug' for Seastar
# (configure.py:515), so Seastar's pkg-config output includes sanitizer
# link flags in seastar_libs_coverage (configure.py:2514,2649).
# Seastar's own CMake only activates sanitizer targets for Debug/Sanitize
# configs, so we inject link options on the seastar target for Coverage.
# Using PUBLIC ensures they propagate to all targets linking Seastar
# (but not standalone tools like patchelf), matching configure.py's
# behavior. Compile-time flags and defines are handled globally in
# cmake/mode.Coverage.cmake.
if(CMAKE_BUILD_TYPE STREQUAL "Coverage")
target_link_options(seastar
PUBLIC
-fsanitize=address
-fsanitize=undefined
-fsanitize=vptr)
endif()
target_compile_definitions (seastar
PRIVATE
SEASTAR_NO_EXCEPTION_HACK)
endif()
set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)
@@ -148,14 +111,11 @@ if(Scylla_ENABLE_LTO)
endif()
find_package(Sanitizers QUIET)
# Match configure.py:2192 — abseil gets sanitizer flags with -fno-sanitize=vptr
# to exclude vptr checks which are incompatible with abseil's usage.
list(APPEND absl_cxx_flags
$<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>;-fno-sanitize=vptr>)
$<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
list(APPEND absl_cxx_flags "-Wno-deprecated-builtins")
list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags})
endif()
set(ABSL_DEFAULT_LINKOPTS
@@ -176,38 +136,9 @@ add_library(absl::headers ALIAS absl-headers)
# unfortunately.
set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)
# Now that seastar and abseil subdirectories are fully processed, add
# SEASTAR_TESTING_MAIN globally. This matches configure.py's global define
# without leaking into seastar (which would cause duplicate main symbols).
add_compile_definitions(SEASTAR_TESTING_MAIN)
# System libraries dependencies
find_package(Boost REQUIRED
COMPONENTS filesystem program_options system thread regex unit_test_framework)
# When using shared Boost libraries, define BOOST_ALL_DYN_LINK (matching configure.py)
if(NOT Boost_USE_STATIC_LIBS)
add_compile_definitions(BOOST_ALL_DYN_LINK)
endif()
# CMake's Boost package config adds per-component defines like
# BOOST_UNIT_TEST_FRAMEWORK_DYN_LINK, BOOST_REGEX_DYN_LINK, etc. on the
# imported targets. configure.py only uses BOOST_ALL_DYN_LINK (which covers
# all components), so strip the per-component defines to align the two build
# systems.
foreach(_boost_target
Boost::unit_test_framework
Boost::regex
Boost::filesystem
Boost::program_options
Boost::system
Boost::thread)
if(TARGET ${_boost_target})
# Completely remove all INTERFACE_COMPILE_DEFINITIONS from the Boost target.
# This prevents per-component *_DYN_LINK and *_NO_LIB defines from
# propagating. BOOST_ALL_DYN_LINK (set globally) covers all components.
set_property(TARGET ${_boost_target} PROPERTY INTERFACE_COMPILE_DEFINITIONS)
endif()
endforeach()
target_link_libraries(Boost::regex
INTERFACE
ICU::i18n
@@ -232,70 +163,38 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
include(add_version_library)
generate_scylla_version()
option(Scylla_USE_PRECOMPILED_HEADER "Use precompiled header for Scylla" ON)
add_library(scylla-precompiled-header STATIC exported_templates.cc)
target_link_libraries(scylla-precompiled-header PRIVATE
absl::headers
absl::btree
absl::hash
absl::raw_hash_set
Seastar::seastar
Snappy::snappy
systemd
ZLIB::ZLIB
lz4::lz4_static
zstd::zstd_static)
if (Scylla_USE_PRECOMPILED_HEADER)
set(Scylla_USE_PRECOMPILED_HEADER_USE ON)
find_program(DISTCC_EXEC NAMES distcc OPTIONAL)
if (DISTCC_EXEC)
if(DEFINED ENV{DISTCC_HOSTS})
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
message(STATUS "Disabling precompiled header usage because distcc exists and DISTCC_HOSTS is set, assuming you're using distributed compilation.")
else()
file(REAL_PATH "~/.distcc/hosts" DIST_CC_HOSTS_PATH EXPAND_TILDE)
if (EXISTS ${DIST_CC_HOSTS_PATH})
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
message(STATUS "Disabling precompiled header usage because distcc and ~/.distcc/hosts exists, assuming you're using distributed compilation.")
endif()
endif()
endif()
if (Scylla_USE_PRECOMPILED_HEADER_USE)
message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")
target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")
target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)
# Match configure.py: -fpch-validate-input-files-content tells the compiler
# to check content of stdafx.hh if timestamps don't match (important for
# ccache/git workflows where timestamps may not be preserved).
add_compile_options(-fpch-validate-input-files-content)
endif()
else()
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
endif()
add_library(scylla-main STATIC)
target_sources(scylla-main
PRIVATE
absl-flat_hash_map.cc
bytes.cc
client_data.cc
clocks-impl.cc
collection_mutation.cc
compress.cc
converting_mutation_partition_applier.cc
counters.cc
sstable_dict_autotrainer.cc
duration.cc
exceptions/exceptions.cc
frozen_schema.cc
generic_server.cc
debug.cc
init.cc
keys/keys.cc
keys.cc
multishard_mutation_query.cc
mutation_query.cc
node_ops/task_manager_module.cc
partition_slice_builder.cc
query/query.cc
querier.cc
query.cc
query_ranges_to_vnodes.cc
query/query-result-set.cc
query-result-set.cc
tombstone_gc_options.cc
tombstone_gc.cc
reader_concurrency_semaphore.cc
reader_concurrency_semaphore_group.cc
schema_mutations.cc
serializer.cc
service/direct_failure_detector/failure_detector.cc
sstables_loader.cc
@@ -319,7 +218,6 @@ target_link_libraries(scylla-main
ZLIB::ZLIB
lz4::lz4_static
zstd::zstd_static
scylla-precompiled-header
)
option(Scylla_CHECK_HEADERS
@@ -372,7 +270,9 @@ add_subdirectory(locator)
add_subdirectory(message)
add_subdirectory(mutation)
add_subdirectory(mutation_writer)
add_subdirectory(node_ops)
add_subdirectory(readers)
add_subdirectory(redis)
add_subdirectory(replica)
add_subdirectory(raft)
add_subdirectory(repair)
@@ -387,7 +287,6 @@ add_subdirectory(tracing)
add_subdirectory(transport)
add_subdirectory(types)
add_subdirectory(utils)
add_subdirectory(vector_search)
add_version_library(scylla_version
release.cc)
@@ -417,6 +316,7 @@ set(scylla_libs
mutation_writer
raft
readers
redis
repair
replica
schema
@@ -429,8 +329,7 @@ set(scylla_libs
tracing
transport
types
utils
vector_search)
utils)
target_link_libraries(scylla PRIVATE
${scylla_libs})

View File

@@ -12,7 +12,7 @@ Please use the [issue tracker](https://github.com/scylladb/scylla/issues/) to re
## Contributing code to Scylla
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form to cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
If you need help formatting or sending patches, [check out these instructions](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches).
The Scylla C++ source code uses the [Seastar coding style](https://github.com/scylladb/seastar/blob/master/coding-style.md) so please adhere to that in your patches. Note that Scylla code is written with `using namespace seastar`, so should not explicitly add the `seastar::` prefix to Seastar symbols. You will usually not need to add `using namespace seastar` to new source files, because most Scylla header files have `#include "seastarx.hh"`, which does this.

View File

@@ -43,7 +43,7 @@ $ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
Note: do not mix environments - either perform all your work with dbuild, or natively on the host.
Note: do not mix environemtns - either perform all your work with dbuild, or natively on the host.
Note2: you can get to an interactive shell within dbuild by running it without any parameters:
```bash
$ ./tools/toolchain/dbuild
@@ -91,7 +91,7 @@ You can also specify a single mode. For example
$ ninja-build release
```
Will build everything in release mode. The valid modes are
Will build everytihng in release mode. The valid modes are
* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
and other sanity checks. It has no optimizations, which allows for debugging with tools like
@@ -361,7 +361,7 @@ avoid that the gold linker can be told to create an index with
More info at https://gcc.gnu.org/wiki/DebugFission.
Both options can be enabled by passing `--split-dwarf` to configure.py.
Both options can be enable by passing `--split-dwarf` to configure.py.
Note that distcc is *not* compatible with it, but icecream
(https://github.com/icecc/icecream) is.
@@ -370,7 +370,7 @@ Note that distcc is *not* compatible with it, but icecream
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
One way to do this is to create a local remote for the Seastar submodule in the Scylla repository:
One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
```bash
$ cd $HOME/src/scylla

View File

@@ -1,8 +1,8 @@
## **SCYLLADB SOFTWARE LICENSE AGREEMENT**
| Version: | 1.1 |
| Version: | 1.0 |
| :---- | :---- |
| Last updated: | April 12, 2026 |
| Last updated: | December 18, 2024 |
**Your Acceptance**
@@ -12,48 +12,20 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using
**Grant of License**
* **Definitions:**
1. **Software:** Software means the ScyllaDB software provided by Licensor, including the source code, object code, and any accompanying documentation or tools, or any part thereof, as made available under this Agreement.
2. **Commercial Customer**: means any legal entity (including its Affiliates) that has entered into a transaction with Licensor, or an authorized reseller/distributor, for the provision of any ScyllaDB products or services. This includes, without limitation: (a) Scope of Service: Any paid subscription, enterprise license, "BYOA" or Database-as-a-Service (DBaaS) offering, technical support, professional services, consulting, or training. (b) Scale and Volume: Any deployment regardless of size, capacity, or performance metrics (c) Payment Method: Any compensation model, including but not limited to, fixed-fee, consumption-based (On-Demand), committed spend, third-party marketplace credits (e.g., AWS, GCP, Azure), or promotional credits and discounts.
* **Grant of License:** Subject to the terms and conditions of this Agreement, including the Eligibility and Exclusive Use Restrictions clause, Licensor grants You a limited, non-exclusive, revocable, non-sublicensable, non-transferable, royalty free license to Use the Software, in each case solely for the purposes of:
* **Software Definitions:** Software means the ScyllaDB software provided by Licensor, including the source code, object code, and any accompanying documentation or tools, or any part thereof, as made available under this Agreement.
* **Grant of License:** Subject to the terms and conditions of this Agreement, Licensor grants You a limited, non-exclusive, revocable, non-sublicensable, non-transferable, royalty free license to Use the Software, in each case solely for the purposes of:
1) Copying, distributing, evaluating (including performing benchmarking or comparative tests or evaluations , subject to the limitations below) and improving the Software and ScyllaDB; and
2) create a modified version of the Software (each, a "**Licensed Work**"); provided however, that each such Licensed Work keeps all or substantially all of the functions and features of the Software, and/or using all or substantially all of the source code of the Software. You hereby agree that all the Licensed Work are, upon creation, considered Licensed Work of the Licensor, shall be the sole property of the Licensor and its assignees, and the Licensor and its assignees shall be the sole owner of all rights of any kind or nature, in connection with such Licensed Work. You hereby irrevocably and unconditionally assign to the Licensor all the Licensed Work and any part thereof. This License applies separately for each version of the Licensed Work, which shall be considered "Software" for the purpose of this Agreement.
* **Eligibility and Exclusive Use Restrictions**
i. Restricted to "Never Customers" Only. The license granted under this Agreement is strictly limited to Never Customers. For purposes of this Agreement, a "Never Customer" is an entity (including its Affiliates) that does not have, and has not had within the previous twelve (12) months, a paid commercial subscription, professional services agreement, or any other commercial relationship with Licensor. Satisfaction of the Never Customer criteria is a strict condition precedent to the effectiveness of this License.
ii. Total Prohibition for Existing Commercial Customers. If You (or any of Your Affiliates) are an existing Commercial Customer of Licensor within the last twelve (12) months, no license is deemed to have been offered or extended to You, and any download or installation of the Software by You is unauthorized. This prohibition applies to all deployments, including but not limited to:
(a) existing commercial workloads;
(b) any new use cases, new applications, or new workloads
iii. **No "Hybrid" Usage**. Licensee is expressly prohibited from combining free tier usage under this Agreement with paid commercial units.
If You are a Commercial Customer, all use of the Software across Your entire organization (and any of your Affiliates) must be governed by a valid, paid commercial agreement. Use of the Software under this license by a Commercial Customer (which is not a "Never Customer") shall:
(a) Void this license *ab initio*;
(b) Be deemed a material breach of both this Agreement and any existing commercial terms; and
(c) Entitle Licensor to invoice Licensee for such unauthorized usage at Licensor's standard list prices, retroactive to the date of first use.
Notwithstanding anything to the contrary in the Eligibility or License Limitations sections above a Commercial Customer may use the Software exclusively for non-production purposes, including Continuous Integration (CI), automated testing, and quality assurance environments, provided that such use at all times remains compliant with the Usage Limit.
iv. **Verification**. Licensor reserves the right to audit Licensee's environment and corporate identity to ensure compliance with these eligibility criteria.
For the purposes of this Agreement an "**Affiliate**" means any entity that directly or indirectly controls, is controlled by, or is under common control with a party, where "control" means ownership of more than 50% of the voting stock or decision-making authority
**License Limitations, Restrictions and Obligations:** The license grant above is subject to the following limitations, restrictions, and obligations. If Licensees Use of the Software does not comply with the above license grant or the terms of this section (including exceeding the Usage Limit set forth below), Licensee must: (i) refrain from any Use of the Software; and (ii) unless Licensee is a Never Customer, purchase a [commercial paid license](https://www.scylladb.com/scylladb-proprietary-software-license-agreement/) from the Licensor.
**License Limitations, Restrictions and Obligations:** The license grant above is subject to the following limitations, restrictions, and obligations. If Licensees Use of the Software does not comply with the above license grant or the terms of this section (including exceeding the Usage Limit set forth below), Licensee must: (i) refrain from any Use of the Software; and (ii) purchase a [commercial paid license](https://www.scylladb.com/scylladb-proprietary-software-license-agreement/) from the Licensor.
* **Updates:** You shall be solely responsible for providing all equipment, systems, assets, access, and ancillary goods and services needed to access and Use the Software. Licensor may modify or update the Software at any time, without notification, in its sole and absolute discretion. After the effective date of each such update, Licensor shall bear no obligation to run, provide or support legacy versions of the Software.
* **"Usage Limit":** Licensee's total overall available storage across all deployments and clusters of the Software and the Licensed Work under this License shall not exceed 10TB and/or an upper limit of 50 VCPUs (hyper threads).
* **IP Markings:** Licensee must retain all copyright, trademark, and other proprietary notices contained in the Software. You will not modify, delete, alter, remove, or obscure any intellectual property, including without limitations licensing, copyright, trademark, or any other notices of Licensor in the Software.
* **License Reproduction:** You must conspicuously display this Agreement on each copy of the Software. If You receive the Software from a third party, this Agreement still applies to Your Use of the Software. You will be responsible for any breach of this Agreement by any such third-party.
* Distribution of any Licensed Works is permitted, provided that: (i) You must include in any Licensed Work prominent notices stating that You have modified the Software, (ii) You include a copy of this Agreement with the Licensed Work, and (iii) You clearly identify all modifications made in the Licensed Work and provides attribution to the Licensor as the original author(s) of the Software.
* **Commercial Use Restrictions:** Licensee may not offer the Software as a software-as-a-service (SaaS) or commercial database-as-as-service (dBaaS) offering. Licensee may not use the Software to compete with Licensor's existing or future products or services. If your Use of the Software does not comply with the requirements currently in effect as described in this License, you must purchase a commercial license from the Licensor, its Affiliated entities, or you must refrain from using the Software and all Licensed Work. Furthermore, if You make any written claim of patent infringement relating to the Software, Your patent license for the Software granted under this Agreement terminates immediately.
* **Commercial Use Restrictions:** Licensee may not offer the Software as a software-as-a-service (SaaS) or commercial database-as-as-service (dBaaS) offering. Licensee may not use the Software to compete with Licensor's existing or future products or services. If your Use of the Software does not comply with the requirements currently in effect as described in this License, you must purchase a commercial license from the Licensor, its affiliated entities, or you must refrain from using the Software and all Licensed Work. Furthermore, if You make any written claim of patent infringement relating to the Software, Your patent license for the Software granted under this Agreement terminates immediately.
* Notwithstanding anything to the contrary, under the License granted hereunder, You shall not and shall not permit others to: (i) transfer the Software or any portions thereof to any other party except as expressly permitted herein; (ii) attempt to circumvent or overcome any technological protection measures incorporated into the Software; (iii) incorporate the Software into the structure, machinery or controls of any aircraft, other aerial device, military vehicle, hovercraft, waterborne craft or any medical equipment of any kind; or (iv) use the Software or any part thereof in any unlawful, harmful or illegal manner, or in a manner which infringes third parties rights in any way, including intellectual property rights.
**Monitoring; Audit**
@@ -69,14 +41,14 @@ For the purposes of this Agreement an "**Affiliate**" means any entity that dire
**Indemnity; Disclaimer; Limitation of Liability**
* **Indemnity:** Licensee hereby agrees to indemnify, defend and hold harmless Licensor and its Affiliates from any losses or damages incurred due to a third party claim arising out of: (i) Licensees breach of this Agreement; (ii) Licensees negligence, willful misconduct or violation of law, or (iii) Licensees products or services.
* **Indemnity:** Licensee hereby agrees to indemnify, defend and hold harmless Licensor and its affiliates from any losses or damages incurred due to a third party claim arising out of: (i) Licensees breach of this Agreement; (ii) Licensees negligence, willful misconduct or violation of law, or (iii) Licensees products or services.
* DISCLAIMER OF WARRANTIES: LICENSEE AGREES THAT LICENSOR HAS MADE NO EXPRESS WARRANTIES REGARDING THE SOFTWARE AND THAT THE SOFTWARE IS BEING PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. LICENSOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THE SOFTWARE, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE; TITLE; MERCHANTABILITY; OR NON-INFRINGEMENT OF THIRD PARTY RIGHTS. LICENSOR DOES NOT WARRANT THAT THE SOFTWARE WILL OPERATE UNINTERRUPTED OR ERROR FREE, OR THAT ALL ERRORS WILL BE CORRECTED. LICENSOR DOES NOT GUARANTEE ANY PARTICULAR RESULTS FROM THE USE OF THE SOFTWARE, AND DOES NOT WARRANT THAT THE SOFTWARE IS FIT FOR ANY PARTICULAR PURPOSE.
* LIMITATION OF LIABILITY: TO THE FULLEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW, IN NO EVENT WILL LICENSOR AND/OR ITS AFFILIATES, EMPLOYEES, OFFICERS AND DIRECTORS BE LIABLE TO LICENSEE FOR (I) ANY LOSS OF USE OR DATA; INTERRUPTION OF BUSINESS; OR ANY INDIRECT; SPECIAL; INCIDENTAL; OR CONSEQUENTIAL DAMAGES OF ANY KIND (INCLUDING LOST PROFITS); AND (II) ANY DIRECT DAMAGES EXCEEDING THE TOTAL AMOUNT OF ONE THOUSAND US DOLLARS ($1,000). THE FOREGOING PROVISIONS LIMITING THE LIABILITY OF LICENSOR SHALL APPLY REGARDLESS OF THE FORM OR CAUSE OF ACTION, WHETHER IN STRICT LIABILITY, CONTRACT OR TORT.
**Proprietary Rights; No Other Rights**
* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensors trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.
* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback"). If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it. All right in any trademark or logo of Licensor or its Affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.
* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback"). If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it. All right in any trademark or logo of Licensor or its affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.
* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.
@@ -84,7 +56,7 @@ For the purposes of this Agreement an "**Affiliate**" means any entity that dire
**Miscellaneous**
* **Miscellaneous:** This Agreement may be modified at any time by Licensor, and constitutes the entire agreement between the parties with respect to the subject matter hereof. Licensee may not assign or subcontract its rights or obligations under this Agreement. This Agreement does not, and shall not be construed to create any relationship, partnership, joint venture, employer-employee, agency, or franchisor-franchisee relationship between the parties.
* **Modifications**: Licensor reserves the right to modify this Agreement at any time. Changes will be effective upon posting to the Website or within the Software repository. Continued use of the Software after such changes constitutes acceptance.
* **Governing Law & Jurisdiction:** This Agreement shall be governed and construed in accordance with the laws of Israel, without giving effect to their respective conflicts of laws provisions, and the competent courts situated in Tel Aviv, Israel, shall have sole and exclusive jurisdiction over the parties and any conflict and/or dispute arising out of, or in connection to, this Agreement
\[*End of ScyllaDB Software License Agreement*\]

View File

@@ -1,6 +1,9 @@
This project includes code developed by the Apache Software Foundation (http://www.apache.org/),
especially Apache Cassandra.
It includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
These files are located in utils/arch/powerpc/crc32-vpmsum. Their license may be found in licenses/LICENSE-crc32-vpmsum.TXT.
It includes modified code from https://gitbox.apache.org/repos/asf?p=cassandra-dtest.git (owned by The Apache Software Foundation)
It includes modified tests from https://github.com/etcd-io/etcd.git (owned by The etcd Authors)

View File

@@ -18,7 +18,7 @@ Scylla is fairly fussy about its build environment, requiring very recent
versions of the C++23 compiler and of many libraries to build. The document
[HACKING.md](HACKING.md) includes detailed information on building and
developing Scylla, but to get Scylla building quickly on (almost) any build
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md).
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),
This is a pre-configured Docker image which includes recent versions of all
the required compilers, libraries and build tools. Using the frozen toolchain
allows you to avoid changing anything in your build machine to meet Scylla's
@@ -43,7 +43,7 @@ For further information, please see:
[developer documentation]: HACKING.md
[build documentation]: docs/dev/building.md
[docker image build documentation]: dist/docker/redhat/README.md
[docker image build documentation]: dist/docker/debian/README.md
## Running Scylla

View File

@@ -78,7 +78,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=2026.3.0-dev
VERSION=2025.3.1
if test -f version
then

2
abseil

Submodule abseil updated: 255c84dadd...d7aaad83b4

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "absl-flat_hash_map.hh"

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once

View File

@@ -9,8 +9,6 @@ target_sources(alternator
controller.cc
server.cc
executor.cc
executor_read.cc
executor_util.cc
stats.cc
serialization.cc
expressions.cc
@@ -19,8 +17,6 @@ target_sources(alternator
streams.cc
consumed_capacity.cc
ttl.cc
parsed_expression_cache.cc
http_compression.cc
${cql_grammar_srcs})
target_include_directories(alternator
PUBLIC
@@ -37,8 +33,5 @@ target_link_libraries(alternator
idl
absl::headers)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(alternator REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers alternator
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -1,253 +0,0 @@
/*
* Copyright 2019-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#pragma once
#include <map>
#include <memory>
#include <optional>
#include <string>
#include <unordered_map>
#include <variant>
#include "utils/rjson.hh"
#include "utils/overloaded_functor.hh"
#include "alternator/error.hh"
#include "alternator/expressions_types.hh"
namespace alternator {
// An attribute_path_map object is used to hold data for various attributes
// paths (parsed::path) in a hierarchy of attribute paths. Each attribute path
// has a root attribute, and then modified by member and index operators -
// for example in "a.b[2].c" we have "a" as the root, then ".b" member, then
// "[2]" index, and finally ".c" member.
// Data can be added to an attribute_path_map using the add() function, but
// requires that attributes with data not be *overlapping* or *conflicting*:
//
// 1. Two attribute paths which are identical or an ancestor of one another
// are considered *overlapping* and not allowed. If a.b.c has data,
// we can't add more data in a.b.c or any of its descendants like a.b.c.d.
//
// 2. Two attribute paths which need the same parent to have both a member and
// an index are considered *conflicting* and not allowed. E.g., if a.b has
// data, you can't add a[1]. The meaning of adding both would be that the
// attribute a is both a map and an array, which isn't sensible.
//
// These two requirements are common to the two places where Alternator uses
// this abstraction to describe how a hierarchical item is to be transformed:
//
// 1. In ProjectExpression: for filtering from a full top-level attribute
// only the parts for which user asked in ProjectionExpression.
//
// 2. In UpdateExpression: for taking the previous value of a top-level
// attribute, and modifying it based on the instructions in the user
// wrote in UpdateExpression.
template<typename T>
class attribute_path_map_node {
public:
using data_t = T;
// We need the extra unique_ptr<> here because libstdc++ unordered_map
// doesn't work with incomplete types :-(
using members_t = std::unordered_map<std::string, std::unique_ptr<attribute_path_map_node<T>>>;
// The indexes list is sorted because DynamoDB requires handling writes
// beyond the end of a list in index order.
using indexes_t = std::map<unsigned, std::unique_ptr<attribute_path_map_node<T>>>;
// The prohibition on "overlap" and "conflict" explained above means
// That only one of data, members or indexes is non-empty.
std::optional<std::variant<data_t, members_t, indexes_t>> _content;
bool is_empty() const { return !_content; }
bool has_value() const { return _content && std::holds_alternative<data_t>(*_content); }
bool has_members() const { return _content && std::holds_alternative<members_t>(*_content); }
bool has_indexes() const { return _content && std::holds_alternative<indexes_t>(*_content); }
// get_members() assumes that has_members() is true
members_t& get_members() { return std::get<members_t>(*_content); }
const members_t& get_members() const { return std::get<members_t>(*_content); }
indexes_t& get_indexes() { return std::get<indexes_t>(*_content); }
const indexes_t& get_indexes() const { return std::get<indexes_t>(*_content); }
T& get_value() { return std::get<T>(*_content); }
const T& get_value() const { return std::get<T>(*_content); }
};
template<typename T>
using attribute_path_map = std::unordered_map<std::string, attribute_path_map_node<T>>;
using attrs_to_get_node = attribute_path_map_node<std::monostate>;
// attrs_to_get lists which top-level attribute are needed, and possibly also
// which part of the top-level attribute is really needed (when nested
// attribute paths appeared in the query).
// Most code actually uses optional<attrs_to_get>. There, a disengaged
// optional means we should get all attributes, not specific ones.
using attrs_to_get = attribute_path_map<std::monostate>;
// takes a given JSON value and drops its parts which weren't asked to be
// kept. It modifies the given JSON value, or returns false to signify that
// the entire object should be dropped.
// Note that The JSON value is assumed to be encoded using the DynamoDB
// conventions - i.e., it is really a map whose key has a type string,
// and the value is the real object.
template<typename T>
bool hierarchy_filter(rjson::value& val, const attribute_path_map_node<T>& h) {
if (!val.IsObject() || val.MemberCount() != 1) {
// This shouldn't happen. We shouldn't have stored malformed objects.
// But today Alternator does not validate the structure of nested
// documents before storing them, so this can happen on read.
throw api_error::internal(format("Malformed value object read: {}", val));
}
const char* type = val.MemberBegin()->name.GetString();
rjson::value& v = val.MemberBegin()->value;
if (h.has_members()) {
const auto& members = h.get_members();
if (type[0] != 'M' || !v.IsObject()) {
// If v is not an object (dictionary, map), none of the members
// can match.
return false;
}
rjson::value newv = rjson::empty_object();
for (auto it = v.MemberBegin(); it != v.MemberEnd(); ++it) {
std::string attr = rjson::to_string(it->name);
auto x = members.find(attr);
if (x != members.end()) {
if (x->second) {
// Only a part of this attribute is to be filtered, do it.
if (hierarchy_filter(it->value, *x->second)) {
// because newv started empty and attr are unique
// (keys of v), we can use add() here
rjson::add_with_string_name(newv, attr, std::move(it->value));
}
} else {
// The entire attribute is to be kept
rjson::add_with_string_name(newv, attr, std::move(it->value));
}
}
}
if (newv.MemberCount() == 0) {
return false;
}
v = newv;
} else if (h.has_indexes()) {
const auto& indexes = h.get_indexes();
if (type[0] != 'L' || !v.IsArray()) {
return false;
}
rjson::value newv = rjson::empty_array();
const auto& a = v.GetArray();
for (unsigned i = 0; i < v.Size(); i++) {
auto x = indexes.find(i);
if (x != indexes.end()) {
if (x->second) {
if (hierarchy_filter(a[i], *x->second)) {
rjson::push_back(newv, std::move(a[i]));
}
} else {
// The entire attribute is to be kept
rjson::push_back(newv, std::move(a[i]));
}
}
}
if (newv.Size() == 0) {
return false;
}
v = newv;
}
return true;
}
// Add a path to an attribute_path_map. Throws a validation error if the path
// "overlaps" with one already in the filter (one is a sub-path of the other)
// or "conflicts" with it (both a member and index is requested).
template<typename T>
void attribute_path_map_add(const char* source, attribute_path_map<T>& map, const parsed::path& p, T value = {}) {
using node = attribute_path_map_node<T>;
// The first step is to look for the top-level attribute (p.root()):
auto it = map.find(p.root());
if (it == map.end()) {
if (p.has_operators()) {
it = map.emplace(p.root(), node {std::nullopt}).first;
} else {
(void) map.emplace(p.root(), node {std::move(value)}).first;
// Value inserted for top-level node. We're done.
return;
}
} else if(!p.has_operators()) {
// If p is top-level and we already have it or a part of it
// in map, it's a forbidden overlapping path.
throw api_error::validation(fmt::format(
"Invalid {}: two document paths overlap at {}", source, p.root()));
} else if (it->second.has_value()) {
// If we're here, it != map.end() && p.has_operators && it->second.has_value().
// This means the top-level attribute already has a value, and we're
// trying to add a non-top-level value. It's an overlap.
throw api_error::validation(fmt::format("Invalid {}: two document paths overlap at {}", source, p.root()));
}
node* h = &it->second;
// The second step is to walk h from the top-level node to the inner node
// where we're supposed to insert the value:
for (const auto& op : p.operators()) {
std::visit(overloaded_functor {
[&] (const std::string& member) {
if (h->is_empty()) {
*h = node {typename node::members_t()};
} else if (h->has_indexes()) {
throw api_error::validation(format("Invalid {}: two document paths conflict at {}", source, p));
} else if (h->has_value()) {
throw api_error::validation(format("Invalid {}: two document paths overlap at {}", source, p));
}
typename node::members_t& members = h->get_members();
auto it = members.find(member);
if (it == members.end()) {
it = members.insert({member, std::make_unique<node>()}).first;
}
h = it->second.get();
},
[&] (unsigned index) {
if (h->is_empty()) {
*h = node {typename node::indexes_t()};
} else if (h->has_members()) {
throw api_error::validation(format("Invalid {}: two document paths conflict at {}", source, p));
} else if (h->has_value()) {
throw api_error::validation(format("Invalid {}: two document paths overlap at {}", source, p));
}
typename node::indexes_t& indexes = h->get_indexes();
auto it = indexes.find(index);
if (it == indexes.end()) {
it = indexes.insert({index, std::make_unique<node>()}).first;
}
h = it->second.get();
}
}, op);
}
// Finally, insert the value in the node h.
if (h->is_empty()) {
*h = node {std::move(value)};
} else {
throw api_error::validation(format("Invalid {}: two document paths overlap at {}", source, p));
}
}
// A very simplified version of the above function for the special case of
// adding only top-level attribute. It's not only simpler, we also use a
// different error message, referring to a "duplicate attribute" instead of
// "overlapping paths". DynamoDB also has this distinction (errors in
// AttributesToGet refer to duplicates, not overlaps, but errors in
// ProjectionExpression refer to overlap - even if it's an exact duplicate).
template<typename T>
void attribute_path_map_add(const char* source, attribute_path_map<T>& map, const std::string& attr, T value = {}) {
using node = attribute_path_map_node<T>;
auto it = map.find(attr);
if (it == map.end()) {
map.emplace(attr, node {std::move(value)});
} else {
throw api_error::validation(fmt::format(
"Invalid {}: Duplicate attribute: {}", source, attr));
}
}
} // namespace alternator

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "alternator/error.hh"
@@ -11,10 +11,10 @@
#include "utils/log.hh"
#include <string>
#include <string_view>
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "db/consistency_level_type.hh"
#include "db/system_keyspace.hh"
#include "auth/password_authenticator.hh"
#include "service/storage_proxy.hh"
#include "alternator/executor.hh"
#include "cql3/selection/selection.hh"
@@ -26,8 +26,8 @@ namespace alternator {
static logging::logger alogger("alternator-auth");
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema(db::system_keyspace::NAME, "roles");
future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username) {
schema_ptr schema = proxy.data_dictionary().find_schema(auth::get_auth_ks_name(as.query_processor()), "roles");
partition_key pk = partition_key::from_single_value(*schema, utf8_type->decompose(username));
dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};
std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};
@@ -40,7 +40,7 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::strin
auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());
auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,
proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));
auto cl = db::consistency_level::LOCAL_ONE;
auto cl = auth::password_authenticator::consistency_for_user(username);
service::client_state client_state{service::client_state::internal_tag()};
service::storage_proxy::coordinator_query_result qr = co_await proxy.query(schema, std::move(command), std::move(partition_ranges), cl,

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -20,6 +20,6 @@ namespace alternator {
using key_cache = utils::loading_cache<std::string, std::string, 1>;
future<std::string> get_key_from_roles(service::storage_proxy& proxy, std::string username);
future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::service& as, std::string username);
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include <string_view>
@@ -42,7 +42,7 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_
if (!comparison_operator.IsString()) {
throw api_error::validation(fmt::format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));
}
std::string op = rjson::to_string(comparison_operator);
std::string op = comparison_operator.GetString();
auto it = ops.find(op);
if (it == ops.end()) {
throw api_error::validation(fmt::format("Unsupported comparison operator {}", op));
@@ -377,8 +377,8 @@ bool check_compare(const rjson::value* v1, const rjson::value& v2, const Compara
return cmp(unwrap_number(*v1, cmp.diagnostic), unwrap_number(v2, cmp.diagnostic));
}
if (kv1.name == "S") {
return cmp(rjson::to_string_view(kv1.value),
rjson::to_string_view(kv2.value));
return cmp(std::string_view(kv1.value.GetString(), kv1.value.GetStringLength()),
std::string_view(kv2.value.GetString(), kv2.value.GetStringLength()));
}
if (kv1.name == "B") {
auto d_kv1 = unwrap_bytes(kv1.value, v1_from_query);
@@ -470,9 +470,9 @@ static bool check_BETWEEN(const rjson::value* v, const rjson::value& lb, const r
return check_BETWEEN(unwrap_number(*v, diag), unwrap_number(lb, diag), unwrap_number(ub, diag), bounds_from_query);
}
if (kv_v.name == "S") {
return check_BETWEEN(rjson::to_string_view(kv_v.value),
rjson::to_string_view(kv_lb.value),
rjson::to_string_view(kv_ub.value),
return check_BETWEEN(std::string_view(kv_v.value.GetString(), kv_v.value.GetStringLength()),
std::string_view(kv_lb.value.GetString(), kv_lb.value.GetStringLength()),
std::string_view(kv_ub.value.GetString(), kv_ub.value.GetStringLength()),
bounds_from_query);
}
if (kv_v.name == "B") {
@@ -618,7 +618,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req) {
// Check if the existing values of the item (previous_item) match the
// conditions given by the Expected and ConditionalOperator parameters
// (if they exist) in the request (an UpdateItem, PutItem or DeleteItem).
// This function can throw a ValidationException API error if there
// This function can throw an ValidationException API error if there
// are errors in the format of the condition itself.
bool verify_expected(const rjson::value& req, const rjson::value* previous_item) {
const rjson::value* expected = rjson::find(req, "Expected");

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
/*

View File

@@ -3,13 +3,11 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "consumed_capacity.hh"
#include "error.hh"
#include "utils/rjson.hh"
#include <fmt/format.h>
namespace alternator {
@@ -34,18 +32,18 @@ bool consumed_capacity_counter::should_add_capacity(const rjson::value& request)
if (!return_consumed->IsString()) {
throw api_error::validation("Non-string ReturnConsumedCapacity field in request");
}
std::string_view consumed = rjson::to_string_view(*return_consumed);
std::string consumed = return_consumed->GetString();
if (consumed == "INDEXES") {
throw api_error::validation("INDEXES consumed capacity is not supported");
}
if (consumed != "TOTAL") {
throw api_error::validation(fmt::format("Unknown consumed capacity {}", consumed));
throw api_error::validation("Unknown consumed capacity "+ consumed);
}
return true;
}
void consumed_capacity_counter::add_consumed_capacity_to_response_if_needed(rjson::value& response) const noexcept {
if (_should_add_to_response) {
if (_should_add_to_reponse) {
auto consumption = rjson::empty_object();
rjson::add(consumption, "CapacityUnits", get_consumed_capacity_units());
rjson::add(response, "ConsumedCapacity", std::move(consumption));

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -28,9 +28,9 @@ namespace alternator {
class consumed_capacity_counter {
public:
consumed_capacity_counter() = default;
consumed_capacity_counter(bool should_add_to_response) : _should_add_to_response(should_add_to_response){}
consumed_capacity_counter(bool should_add_to_reponse) : _should_add_to_reponse(should_add_to_reponse){}
bool operator()() const noexcept {
return _should_add_to_response;
return _should_add_to_reponse;
}
consumed_capacity_counter& operator +=(uint64_t bytes);
@@ -44,7 +44,7 @@ public:
uint64_t _total_bytes = 0;
static bool should_add_capacity(const rjson::value& request);
protected:
bool _should_add_to_response = false;
bool _should_add_to_reponse = false;
};
class rcu_consumed_capacity_counter : public consumed_capacity_counter {

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include <seastar/core/with_scheduling_group.hh>
@@ -18,7 +18,6 @@
#include "service/memory_limiter.hh"
#include "auth/service.hh"
#include "service/qos/service_level_controller.hh"
#include "vector_search/vector_store_client.hh"
using namespace seastar;
@@ -29,29 +28,23 @@ static logging::logger logger("alternator_controller");
controller::controller(
sharded<gms::gossiper>& gossiper,
sharded<service::storage_proxy>& proxy,
sharded<service::storage_service>& ss,
sharded<service::migration_manager>& mm,
sharded<db::system_distributed_keyspace>& sys_dist_ks,
sharded<db::system_keyspace>& sys_ks,
sharded<cdc::generation_service>& cdc_gen_svc,
sharded<service::memory_limiter>& memory_limiter,
sharded<auth::service>& auth_service,
sharded<qos::service_level_controller>& sl_controller,
sharded<vector_search::vector_store_client>& vsc,
const db::config& config,
seastar::scheduling_group sg)
: protocol_server(sg)
, _gossiper(gossiper)
, _proxy(proxy)
, _ss(ss)
, _mm(mm)
, _sys_dist_ks(sys_dist_ks)
, _sys_ks(sys_ks)
, _cdc_gen_svc(cdc_gen_svc)
, _memory_limiter(memory_limiter)
, _auth_service(auth_service)
, _sl_controller(sl_controller)
, _vsc(vsc)
, _config(config)
{
}
@@ -96,8 +89,8 @@ future<> controller::start_server() {
auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {
return cfg.alternator_timeout_in_ms;
};
_executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_ss), std::ref(_mm), std::ref(_sys_dist_ks), std::ref(_sys_ks),
sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), std::ref(_vsc), _ssg.value(),
_executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_mm), std::ref(_sys_dist_ks),
sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), _ssg.value(),
sharded_parameter(get_timeout_in_ms, std::ref(_config))).get();
_server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper), std::ref(_auth_service), std::ref(_sl_controller)).get();
// Note: from this point on, if start_server() throws for any reason,
@@ -110,23 +103,11 @@ future<> controller::start_server() {
alternator_port = _config.alternator_port();
_listen_addresses.push_back({addr, *alternator_port});
}
std::optional<uint16_t> alternator_port_proxy_protocol;
if (_config.alternator_port_proxy_protocol()) {
alternator_port_proxy_protocol = _config.alternator_port_proxy_protocol();
_listen_addresses.push_back({addr, *alternator_port_proxy_protocol});
}
std::optional<uint16_t> alternator_https_port;
std::optional<uint16_t> alternator_https_port_proxy_protocol;
std::optional<tls::credentials_builder> creds;
if (_config.alternator_https_port() || _config.alternator_https_port_proxy_protocol()) {
if (_config.alternator_https_port()) {
alternator_https_port = _config.alternator_https_port();
_listen_addresses.push_back({addr, *alternator_https_port});
}
if (_config.alternator_https_port_proxy_protocol()) {
alternator_https_port_proxy_protocol = _config.alternator_https_port_proxy_protocol();
_listen_addresses.push_back({addr, *alternator_https_port_proxy_protocol});
}
if (_config.alternator_https_port()) {
alternator_https_port = _config.alternator_https_port();
_listen_addresses.push_back({addr, *alternator_https_port});
creds.emplace();
auto opts = _config.alternator_encryption_options();
if (opts.empty()) {
@@ -152,29 +133,18 @@ future<> controller::start_server() {
}
}
_server.invoke_on_all(
[this, addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol, creds = std::move(creds)] (server& server) mutable {
return server.init(addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol, creds,
[this, addr, alternator_port, alternator_https_port, creds = std::move(creds)] (server& server) mutable {
return server.init(addr, alternator_port, alternator_https_port, creds,
_config.alternator_enforce_authorization,
_config.alternator_warn_authorization,
_config.alternator_max_users_query_size_in_trace_output,
&_memory_limiter.local().get_semaphore(),
_config.max_concurrent_requests_per_shard);
}).handle_exception([this, addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol] (std::exception_ptr ep) {
logger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}, proxy-protocol port {}, TLS proxy-protocol port {}: {}",
addr,
alternator_port ? std::to_string(*alternator_port) : "OFF",
alternator_https_port ? std::to_string(*alternator_https_port) : "OFF",
alternator_port_proxy_protocol ? std::to_string(*alternator_port_proxy_protocol) : "OFF",
alternator_https_port_proxy_protocol ? std::to_string(*alternator_https_port_proxy_protocol) : "OFF",
ep);
}).handle_exception([this, addr, alternator_port, alternator_https_port] (std::exception_ptr ep) {
logger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, alternator_port ? std::to_string(*alternator_port) : "OFF", alternator_https_port ? std::to_string(*alternator_https_port) : "OFF", ep);
return stop_server().then([ep = std::move(ep)] { return make_exception_future<>(ep); });
}).then([addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol] {
logger.info("Alternator server listening on {}, HTTP port {}, HTTPS port {}, proxy-protocol port {}, TLS proxy-protocol port {}",
addr,
alternator_port ? std::to_string(*alternator_port) : "OFF",
alternator_https_port ? std::to_string(*alternator_https_port) : "OFF",
alternator_port_proxy_protocol ? std::to_string(*alternator_port_proxy_protocol) : "OFF",
alternator_https_port_proxy_protocol ? std::to_string(*alternator_https_port_proxy_protocol) : "OFF");
}).then([addr, alternator_port, alternator_https_port] {
logger.info("Alternator server listening on {}, HTTP port {}, HTTPS port {}",
addr, alternator_port ? std::to_string(*alternator_port) : "OFF", alternator_https_port ? std::to_string(*alternator_https_port) : "OFF");
}).get();
});
}
@@ -197,8 +167,4 @@ future<> controller::request_stop_server() {
});
}
future<utils::chunked_vector<foreign_ptr<std::unique_ptr<client_data>>>> controller::get_client_data() {
return _server.local().get_client_data();
}
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -11,18 +11,16 @@
#include <seastar/core/sharded.hh>
#include <seastar/core/smp.hh>
#include "transport/protocol_server.hh"
#include "protocol_server.hh"
namespace service {
class storage_proxy;
class storage_service;
class migration_manager;
class memory_limiter;
}
namespace db {
class system_distributed_keyspace;
class system_keyspace;
class config;
}
@@ -44,10 +42,6 @@ namespace qos {
class service_level_controller;
}
namespace vector_search {
class vector_store_client;
}
namespace alternator {
// This is the official DynamoDB API version.
@@ -63,15 +57,12 @@ class server;
class controller : public protocol_server {
sharded<gms::gossiper>& _gossiper;
sharded<service::storage_proxy>& _proxy;
sharded<service::storage_service>& _ss;
sharded<service::migration_manager>& _mm;
sharded<db::system_distributed_keyspace>& _sys_dist_ks;
sharded<db::system_keyspace>& _sys_ks;
sharded<cdc::generation_service>& _cdc_gen_svc;
sharded<service::memory_limiter>& _memory_limiter;
sharded<auth::service>& _auth_service;
sharded<qos::service_level_controller>& _sl_controller;
sharded<vector_search::vector_store_client>& _vsc;
const db::config& _config;
std::vector<socket_address> _listen_addresses;
@@ -83,15 +74,12 @@ public:
controller(
sharded<gms::gossiper>& gossiper,
sharded<service::storage_proxy>& proxy,
sharded<service::storage_service>& ss,
sharded<service::migration_manager>& mm,
sharded<db::system_distributed_keyspace>& sys_dist_ks,
sharded<db::system_keyspace>& sys_ks,
sharded<cdc::generation_service>& cdc_gen_svc,
sharded<service::memory_limiter>& memory_limiter,
sharded<auth::service>& auth_service,
sharded<qos::service_level_controller>& sl_controller,
sharded<vector_search::vector_store_client>& vsc,
const db::config& config,
seastar::scheduling_group sg);
@@ -102,10 +90,6 @@ public:
virtual future<> start_server() override;
virtual future<> stop_server() override;
virtual future<> request_stop_server() override;
// This virtual function is called (on each shard separately) when the
// virtual table "system.clients" is read. It is expected to generate a
// list of clients connected to this server (on this shard).
virtual future<utils::chunked_vector<foreign_ptr<std::unique_ptr<client_data>>>> get_client_data() override;
};
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -94,9 +94,6 @@ public:
static api_error internal(std::string msg) {
return api_error("InternalServerError", std::move(msg), http::reply::status_type::internal_server_error);
}
static api_error payload_too_large(std::string msg) {
return api_error("PayloadTooLarge", std::move(msg), status_type::payload_too_large);
}
// Provide the "std::exception" interface, to make it easier to print this
// exception in log messages. Note that this function is *not* used to

File diff suppressed because it is too large Load Diff

View File

@@ -3,15 +3,13 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <seastar/core/future.hh>
#include "audit/audit.hh"
#include "seastarx.hh"
#include <seastar/core/future.hh>
#include <seastar/core/sharded.hh>
#include <seastar/util/noncopyable_function.hh>
@@ -19,26 +17,16 @@
#include "service/client_state.hh"
#include "service_permit.hh"
#include "db/timeout_clock.hh"
#include "db/config.hh"
#include "alternator/error.hh"
#include "alternator/attribute_path.hh"
#include "alternator/stats.hh"
#include "alternator/executor_util.hh"
#include "stats.hh"
#include "utils/rjson.hh"
#include "utils/updateable_value.hh"
#include "tracing/trace_state.hh"
namespace db {
class system_distributed_keyspace;
class system_keyspace;
}
namespace audit {
class audit_info_alternator;
}
namespace query {
@@ -52,12 +40,6 @@ namespace cql3::selection {
namespace service {
class storage_proxy;
class cas_shard;
class storage_service;
}
namespace vector_search {
class vector_store_client;
}
namespace cdc {
@@ -72,40 +54,96 @@ class gossiper;
class schema_builder;
namespace alternator {
enum class table_status;
class rmw_operation;
class put_or_delete_item;
namespace parsed {
class expression_cache;
}
class path;
};
schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request);
bool is_alternator_keyspace(const sstring& ks_name);
// Wraps the db::get_tags_of_table and throws if the table is missing the tags extension.
const std::map<sstring, sstring>& get_tags_of_table_or_throw(schema_ptr schema);
// An attribute_path_map object is used to hold data for various attributes
// paths (parsed::path) in a hierarchy of attribute paths. Each attribute path
// has a root attribute, and then modified by member and index operators -
// for example in "a.b[2].c" we have "a" as the root, then ".b" member, then
// "[2]" index, and finally ".c" member.
// Data can be added to an attribute_path_map using the add() function, but
// requires that attributes with data not be *overlapping* or *conflicting*:
//
// 1. Two attribute paths which are identical or an ancestor of one another
// are considered *overlapping* and not allowed. If a.b.c has data,
// we can't add more data in a.b.c or any of its descendants like a.b.c.d.
//
// 2. Two attribute paths which need the same parent to have both a member and
// an index are considered *conflicting* and not allowed. E.g., if a.b has
// data, you can't add a[1]. The meaning of adding both would be that the
// attribute a is both a map and an array, which isn't sensible.
//
// These two requirements are common to the two places where Alternator uses
// this abstraction to describe how a hierarchical item is to be transformed:
//
// 1. In ProjectExpression: for filtering from a full top-level attribute
// only the parts for which user asked in ProjectionExpression.
//
// 2. In UpdateExpression: for taking the previous value of a top-level
// attribute, and modifying it based on the instructions in the user
// wrote in UpdateExpression.
template<typename T>
class attribute_path_map_node {
public:
using data_t = T;
// We need the extra unique_ptr<> here because libstdc++ unordered_map
// doesn't work with incomplete types :-(
using members_t = std::unordered_map<std::string, std::unique_ptr<attribute_path_map_node<T>>>;
// The indexes list is sorted because DynamoDB requires handling writes
// beyond the end of a list in index order.
using indexes_t = std::map<unsigned, std::unique_ptr<attribute_path_map_node<T>>>;
// The prohibition on "overlap" and "conflict" explained above means
// That only one of data, members or indexes is non-empty.
std::optional<std::variant<data_t, members_t, indexes_t>> _content;
bool is_empty() const { return !_content; }
bool has_value() const { return _content && std::holds_alternative<data_t>(*_content); }
bool has_members() const { return _content && std::holds_alternative<members_t>(*_content); }
bool has_indexes() const { return _content && std::holds_alternative<indexes_t>(*_content); }
// get_members() assumes that has_members() is true
members_t& get_members() { return std::get<members_t>(*_content); }
const members_t& get_members() const { return std::get<members_t>(*_content); }
indexes_t& get_indexes() { return std::get<indexes_t>(*_content); }
const indexes_t& get_indexes() const { return std::get<indexes_t>(*_content); }
T& get_value() { return std::get<T>(*_content); }
const T& get_value() const { return std::get<T>(*_content); }
};
template<typename T>
using attribute_path_map = std::unordered_map<std::string, attribute_path_map_node<T>>;
using attrs_to_get_node = attribute_path_map_node<std::monostate>;
// attrs_to_get lists which top-level attribute are needed, and possibly also
// which part of the top-level attribute is really needed (when nested
// attribute paths appeared in the query).
// Most code actually uses optional<attrs_to_get>. There, a disengaged
// optional means we should get all attributes, not specific ones.
using attrs_to_get = attribute_path_map<std::monostate>;
class executor : public peering_sharded_service<executor> {
gms::gossiper& _gossiper;
service::storage_service& _ss;
service::storage_proxy& _proxy;
service::migration_manager& _mm;
db::system_distributed_keyspace& _sdks;
db::system_keyspace& _system_keyspace;
cdc::metadata& _cdc_metadata;
vector_search::vector_store_client& _vsc;
utils::updateable_value<bool> _enforce_authorization;
utils::updateable_value<bool> _warn_authorization;
seastar::sharded<audit::audit>& _audit;
// An smp_service_group to be used for limiting the concurrency when
// forwarding Alternator request between shards - if necessary for LWT.
smp_service_group _ssg;
std::unique_ptr<parsed::expression_cache> _parsed_expression_cache;
struct describe_table_info_manager;
std::unique_ptr<describe_table_info_manager> _describe_table_info_manager;
future<> cache_newly_calculated_size_on_all_shards(schema_ptr schema, std::uint64_t size_in_bytes, std::chrono::nanoseconds ttl);
future<> fill_table_size(rjson::value &table_description, schema_ptr schema, bool deleting);
public:
using client_state = service::client_state;
// request_return_type is the return type of the executor methods, which
@@ -119,6 +157,7 @@ public:
// is written in chunks to the output_stream. This allows for efficient
// handling of large responses without needing to allocate a large buffer
// in memory.
using body_writer = noncopyable_function<future<>(output_stream<char>&&)>;
using request_return_type = std::variant<std::string, body_writer, api_error>;
stats _stats;
// The metric_groups object holds this stat object's metrics registered
@@ -130,103 +169,108 @@ public:
executor(gms::gossiper& gossiper,
service::storage_proxy& proxy,
service::storage_service& ss,
service::migration_manager& mm,
db::system_distributed_keyspace& sdks,
db::system_keyspace& system_keyspace,
cdc::metadata& cdc_metadata,
vector_search::vector_store_client& vsc,
smp_service_group ssg,
utils::updateable_value<uint32_t> default_timeout_in_ms);
~executor();
future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> update_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> describe_continuous_backups(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header);
future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> update_time_to_live(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request);
future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request);
future<request_return_type> describe_continuous_backups(client_state& client_state, service_permit permit, rjson::value request);
future<> start();
future<> stop();
future<> stop() {
// disconnect from the value source, but keep the value unchanged.
s_default_timeout_in_ms = utils::updateable_value<uint32_t>{s_default_timeout_in_ms()};
return make_ready_future<>();
}
static sstring table_name(const schema&);
static db::timeout_clock::time_point default_timeout();
private:
static thread_local utils::updateable_value<uint32_t> s_default_timeout_in_ms;
public:
static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);
private:
friend class rmw_operation;
// Helper to set up auditing for an Alternator operation. Checks whether
// the operation should be audited (via will_log()) and if so, allocates
// and populates audit_info. No allocation occurs when auditing is disabled.
void maybe_audit(std::unique_ptr<audit::audit_info_alternator>& audit_info,
audit::statement_category category,
std::string_view ks_name,
std::string_view table_name,
std::string_view operation_name,
const rjson::value& request,
std::optional<db::consistency_level> cl = std::nullopt);
future<rjson::value> fill_table_description(schema_ptr schema, table_status tbl_status, service::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit);
future<executor::request_return_type> create_table_on_shard0(service::client_state&& client_state, tracing::trace_state_ptr trace_state, rjson::value request, bool enforce_authorization,
bool warn_authorization, const db::tablets_mode_t::mode tablets_mode, std::unique_ptr<audit::audit_info_alternator>& audit_info);
future<> do_batch_write(
std::vector<std::pair<schema_ptr, put_or_delete_item>> mutation_builders,
service::client_state& client_state,
tracing::trace_state_ptr trace_state,
service_permit permit);
future<> cas_write(schema_ptr schema, service::cas_shard cas_shard, const dht::decorated_key& dk,
const std::vector<put_or_delete_item>& mutation_builders, service::client_state& client_state,
tracing::trace_state_ptr trace_state, service_permit permit);
static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr);
public:
static bool add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp);
static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&);
static std::optional<rjson::value> describe_single_item(schema_ptr,
const query::partition_slice&,
const cql3::selection::selection&,
const query::result&,
const std::optional<attrs_to_get>&,
uint64_t* = nullptr);
static future<std::vector<rjson::value>> describe_multi_item(schema_ptr schema,
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get,
uint64_t& rcu_half_units);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<managed_bytes_opt>&,
const std::optional<attrs_to_get>&,
rjson::value&,
uint64_t* item_length_in_bytes = nullptr,
bool = false);
static void add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp);
static void supplement_table_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp);
static void supplement_table_stream_info(rjson::value& descr, const schema& schema, const service::storage_proxy& sp);
};
// returns table creation time in seconds since epoch for `db_clock`
double get_table_creation_time(const schema &schema);
// is_big() checks approximately if the given JSON value is "bigger" than
// the given big_size number of bytes. The goal is to *quickly* detect
// oversized JSON that, for example, is too large to be serialized to a
// contiguous string - we don't need an accurate size for that. Moreover,
// as soon as we detect that the JSON is indeed "big", we can return true
// and don't need to continue calculating its exact size.
// For simplicity, we use a recursive implementation. This is fine because
// Alternator limits the depth of JSONs it reads from inputs, and doesn't
// add more than a couple of levels in its own output construction.
bool is_big(const rjson::value& val, int big_size = 100'000);
// result of parsing ARN (Amazon Resource Name)
// ARN format is `arn:<partition>:<service>:<region>:<account-id>:<resource-type>/<resource-id>/<postfix>`
// we ignore partition, service and account-id
// resource-type must be string "table"
// resource-id will be returned as table_name
// region will be returned as keyspace_name
// postfix is a string after resource-id and will be returned as is (whole), including separator.
struct arn_parts {
std::string_view keyspace_name;
std::string_view table_name;
std::string_view postfix;
};
// arn - arn to parse
// arn_field_name - identifier of the ARN, used only when reporting an error (in error messages), for example "Incorrect resource identifier `<arn_field_name>`"
// type_name - used only when reporting an error (in error messages), for example "... is not a valid <type_name> ARN ..."
// expected_postfix - optional filter of postfix value (part of ARN after resource-id, including separator, see comments for struct arn_parts).
// If is empty - then postfix value must be empty as well
// if not empty - postfix value must start with expected_postfix, but might be longer
arn_parts parse_arn(std::string_view arn, std::string_view arn_field_name, std::string_view type_name, std::string_view expected_postfix);
// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,
// SELECT, DROP, etc.) on the given table. When permission is denied an
// appropriate user-readable api_error::access_denied is thrown.
future<> verify_permission(bool enforce_authorization, const service::client_state&, const schema_ptr&, auth::permission);
/**
* Make return type for serializing the object "streamed",
* i.e. direct to HTTP output stream. Note: only useful for
* (very) large objects as there are overhead issues with this
* as well, but for massive lists of return objects this can
* help avoid large allocations/many re-allocs
*/
executor::body_writer make_streamed(rjson::value&&);
// The format is ks1|ks2|ks3... and table1|table2|table3...
sstring print_names_for_audit(const std::set<sstring>& names);
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,559 +0,0 @@
/*
* Copyright 2019-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#include "alternator/executor_util.hh"
#include "alternator/executor.hh"
#include "alternator/error.hh"
#include "auth/resource.hh"
#include "auth/service.hh"
#include "cdc/log.hh"
#include "data_dictionary/data_dictionary.hh"
#include "db/tags/utils.hh"
#include "replica/database.hh"
#include "cql3/selection/selection.hh"
#include "cql3/result_set.hh"
#include "serialization.hh"
#include "service/storage_proxy.hh"
#include "types/map.hh"
#include <fmt/format.h>
namespace alternator {
extern logging::logger elogger; // from executor.cc
std::optional<int> get_int_attribute(const rjson::value& value, std::string_view attribute_name) {
const rjson::value* attribute_value = rjson::find(value, attribute_name);
if (!attribute_value)
return {};
if (!attribute_value->IsInt()) {
throw api_error::validation(fmt::format("Expected integer value for attribute {}, got: {}",
attribute_name, value));
}
return attribute_value->GetInt();
}
std::string get_string_attribute(const rjson::value& value, std::string_view attribute_name, const char* default_return) {
const rjson::value* attribute_value = rjson::find(value, attribute_name);
if (!attribute_value)
return default_return;
if (!attribute_value->IsString()) {
throw api_error::validation(fmt::format("Expected string value for attribute {}, got: {}",
attribute_name, value));
}
return rjson::to_string(*attribute_value);
}
bool get_bool_attribute(const rjson::value& value, std::string_view attribute_name, bool default_return) {
const rjson::value* attribute_value = rjson::find(value, attribute_name);
if (!attribute_value) {
return default_return;
}
if (!attribute_value->IsBool()) {
throw api_error::validation(fmt::format("Expected boolean value for attribute {}, got: {}",
attribute_name, value));
}
return attribute_value->GetBool();
}
std::optional<std::string> find_table_name(const rjson::value& request) {
const rjson::value* table_name_value = rjson::find(request, "TableName");
if (!table_name_value) {
return std::nullopt;
}
if (!table_name_value->IsString()) {
throw api_error::validation("Non-string TableName field in request");
}
std::string table_name = rjson::to_string(*table_name_value);
return table_name;
}
std::string get_table_name(const rjson::value& request) {
auto name = find_table_name(request);
if (!name) {
throw api_error::validation("Missing TableName field in request");
}
return *name;
}
schema_ptr find_table(service::storage_proxy& proxy, const rjson::value& request) {
auto table_name = find_table_name(request);
if (!table_name) {
return nullptr;
}
return find_table(proxy, *table_name);
}
schema_ptr find_table(service::storage_proxy& proxy, std::string_view table_name) {
try {
return proxy.data_dictionary().find_schema(sstring(executor::KEYSPACE_NAME_PREFIX) + sstring(table_name), table_name);
} catch(data_dictionary::no_such_column_family&) {
// DynamoDB returns validation error even when table does not exist
// and the table name is invalid.
validate_table_name(table_name);
throw api_error::resource_not_found(
fmt::format("Requested resource not found: Table: {} not found", table_name));
}
}
schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request) {
auto schema = find_table(proxy, request);
if (!schema) {
// if we get here then the name was missing, since syntax or missing actual CF
// checks throw. Slow path, but just call get_table_name to generate exception.
get_table_name(request);
}
return schema;
}
map_type attrs_type() {
static thread_local auto t = map_type_impl::get_instance(utf8_type, bytes_type, true);
return t;
}
const std::map<sstring, sstring>& get_tags_of_table_or_throw(schema_ptr schema) {
auto tags_ptr = db::get_tags_of_table(schema);
if (tags_ptr) {
return *tags_ptr;
} else {
throw api_error::validation(format("Table {} does not have valid tagging information", schema->ks_name()));
}
}
bool is_alternator_keyspace(std::string_view ks_name) {
return ks_name.starts_with(executor::KEYSPACE_NAME_PREFIX);
}
// This tag is set on a GSI when the user did not specify a range key, causing
// Alternator to add the base table's range key as a spurious range key. It is
// used by describe_key_schema() to suppress reporting that key.
extern const sstring SPURIOUS_RANGE_KEY_ADDED_TO_GSI_AND_USER_DIDNT_SPECIFY_RANGE_KEY_TAG_KEY;
void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string, std::string>* attribute_types, const std::map<sstring, sstring>* tags) {
rjson::value key_schema = rjson::empty_array();
const bool ignore_range_keys_as_spurious = tags != nullptr && tags->contains(SPURIOUS_RANGE_KEY_ADDED_TO_GSI_AND_USER_DIDNT_SPECIFY_RANGE_KEY_TAG_KEY);
for (const column_definition& cdef : schema.partition_key_columns()) {
rjson::value key = rjson::empty_object();
rjson::add(key, "AttributeName", rjson::from_string(cdef.name_as_text()));
rjson::add(key, "KeyType", "HASH");
rjson::push_back(key_schema, std::move(key));
if (attribute_types) {
(*attribute_types)[cdef.name_as_text()] = type_to_string(cdef.type);
}
}
if (!ignore_range_keys_as_spurious) {
// NOTE: user requested key (there can be at most one) will always come first.
// There might be more keys following it, which were added, but those were
// not requested by the user, so we ignore them.
for (const column_definition& cdef : schema.clustering_key_columns()) {
rjson::value key = rjson::empty_object();
rjson::add(key, "AttributeName", rjson::from_string(cdef.name_as_text()));
rjson::add(key, "KeyType", "RANGE");
rjson::push_back(key_schema, std::move(key));
if (attribute_types) {
(*attribute_types)[cdef.name_as_text()] = type_to_string(cdef.type);
}
break;
}
}
rjson::add(parent, "KeySchema", std::move(key_schema));
}
// Check if the given string has valid characters for a table name, i.e. only
// a-z, A-Z, 0-9, _ (underscore), - (dash), . (dot). Note that this function
// does not check the length of the name - instead, use validate_table_name()
// to validate both the characters and the length.
static bool valid_table_name_chars(std::string_view name) {
for (auto c : name) {
if ((c < 'a' || c > 'z') &&
(c < 'A' || c > 'Z') &&
(c < '0' || c > '9') &&
c != '_' &&
c != '-' &&
c != '.') {
return false;
}
}
return true;
}
std::string view_name(std::string_view table_name, std::string_view index_name, const std::string& delim, bool validate_len) {
if (index_name.length() < 3) {
throw api_error::validation("IndexName must be at least 3 characters long");
}
if (!valid_table_name_chars(index_name)) {
throw api_error::validation(
fmt::format("IndexName '{}' must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", index_name));
}
std::string ret = std::string(table_name) + delim + std::string(index_name);
if (ret.length() > max_auxiliary_table_name_length && validate_len) {
throw api_error::validation(
fmt::format("The total length of TableName ('{}') and IndexName ('{}') cannot exceed {} characters",
table_name, index_name, max_auxiliary_table_name_length - delim.size()));
}
return ret;
}
std::string gsi_name(std::string_view table_name, std::string_view index_name, bool validate_len) {
return view_name(table_name, index_name, ":", validate_len);
}
std::string lsi_name(std::string_view table_name, std::string_view index_name, bool validate_len) {
return view_name(table_name, index_name, "!:", validate_len);
}
void check_key(const rjson::value& key, const schema_ptr& schema) {
if (key.MemberCount() != (schema->clustering_key_size() == 0 ? 1 : 2)) {
throw api_error::validation("Given key attribute not in schema");
}
}
void verify_all_are_used(const rjson::value* field,
const std::unordered_set<std::string>& used, const char* field_name, const char* operation) {
if (!field) {
return;
}
for (auto it = field->MemberBegin(); it != field->MemberEnd(); ++it) {
if (!used.contains(rjson::to_string(it->name))) {
throw api_error::validation(
format("{} has spurious '{}', not used in {}",
field_name, rjson::to_string_view(it->name), operation));
}
}
}
// This function increments the authorization_failures counter, and may also
// log a warn-level message and/or throw an access_denied exception, depending
// on what enforce_authorization and warn_authorization are set to.
// Note that if enforce_authorization is false, this function will return
// without throwing. So a caller that doesn't want to continue after an
// authorization_error must explicitly return after calling this function.
static void authorization_error(stats& stats, bool enforce_authorization, bool warn_authorization, std::string msg) {
stats.authorization_failures++;
if (enforce_authorization) {
if (warn_authorization) {
elogger.warn("alternator_warn_authorization=true: {}", msg);
}
throw api_error::access_denied(std::move(msg));
} else {
if (warn_authorization) {
elogger.warn("If you set alternator_enforce_authorization=true the following will be enforced: {}", msg);
}
}
}
future<> verify_permission(
bool enforce_authorization,
bool warn_authorization,
const service::client_state& client_state,
const schema_ptr& schema,
auth::permission permission_to_check,
stats& stats) {
if (!enforce_authorization && !warn_authorization) {
co_return;
}
// Unfortunately, the fix for issue #23218 did not modify the function
// that we use here - check_has_permissions(). So if we want to allow
// writes to internal tables (from try_get_internal_table()) only to a
// superuser, we need to explicitly check it here.
if (permission_to_check == auth::permission::MODIFY && is_internal_keyspace(schema->ks_name())) {
if (!client_state.user() ||
!client_state.user()->name ||
!co_await client_state.get_auth_service()->underlying_role_manager().is_superuser(*client_state.user()->name)) {
sstring username = "<anonymous>";
if (client_state.user() && client_state.user()->name) {
username = client_state.user()->name.value();
}
authorization_error(stats, enforce_authorization, warn_authorization, fmt::format(
"Write access denied on internal table {}.{} to role {} because it is not a superuser",
schema->ks_name(), schema->cf_name(), username));
co_return;
}
}
auto resource = auth::make_data_resource(schema->ks_name(), schema->cf_name());
if (!client_state.user() || !client_state.user()->name ||
!co_await client_state.check_has_permission(auth::command_desc(permission_to_check, resource))) {
sstring username = "<anonymous>";
if (client_state.user() && client_state.user()->name) {
username = client_state.user()->name.value();
}
// Using exceptions for errors makes this function faster in the
// success path (when the operation is allowed).
authorization_error(stats, enforce_authorization, warn_authorization, fmt::format(
"{} access on table {}.{} is denied to role {}, client address {}",
auth::permissions::to_string(permission_to_check),
schema->ks_name(), schema->cf_name(), username, client_state.get_client_address()));
}
}
// Similar to verify_permission() above, but just for CREATE operations.
// Those do not operate on any specific table, so require permissions on
// ALL KEYSPACES instead of any specific table.
future<> verify_create_permission(bool enforce_authorization, bool warn_authorization, const service::client_state& client_state, stats& stats) {
if (!enforce_authorization && !warn_authorization) {
co_return;
}
auto resource = auth::resource(auth::resource_kind::data);
if (!co_await client_state.check_has_permission(auth::command_desc(auth::permission::CREATE, resource))) {
sstring username = "<anonymous>";
if (client_state.user() && client_state.user()->name) {
username = client_state.user()->name.value();
}
authorization_error(stats, enforce_authorization, warn_authorization, fmt::format(
"CREATE access on ALL KEYSPACES is denied to role {}", username));
}
}
schema_ptr try_get_internal_table(const data_dictionary::database& db, std::string_view table_name) {
size_t it = table_name.find(executor::INTERNAL_TABLE_PREFIX);
if (it != 0) {
return schema_ptr{};
}
table_name.remove_prefix(executor::INTERNAL_TABLE_PREFIX.size());
size_t delim = table_name.find_first_of('.');
if (delim == std::string_view::npos) {
return schema_ptr{};
}
std::string_view ks_name = table_name.substr(0, delim);
table_name.remove_prefix(ks_name.size() + 1);
// Only internal keyspaces can be accessed to avoid leakage
auto ks = db.try_find_keyspace(ks_name);
if (!ks || !ks->is_internal()) {
return schema_ptr{};
}
try {
return db.find_schema(ks_name, table_name);
} catch (data_dictionary::no_such_column_family&) {
// DynamoDB returns validation error even when table does not exist
// and the table name is invalid.
validate_table_name(table_name);
throw api_error::resource_not_found(
fmt::format("Requested resource not found: Internal table: {}.{} not found", ks_name, table_name));
}
}
schema_ptr get_table_from_batch_request(const service::storage_proxy& proxy, const rjson::value::ConstMemberIterator& batch_request) {
sstring table_name = rjson::to_sstring(batch_request->name); // JSON keys are always strings
try {
return proxy.data_dictionary().find_schema(sstring(executor::KEYSPACE_NAME_PREFIX) + table_name, table_name);
} catch(data_dictionary::no_such_column_family&) {
// DynamoDB returns validation error even when table does not exist
// and the table name is invalid.
validate_table_name(table_name);
throw api_error::resource_not_found(format("Requested resource not found: Table: {} not found", table_name));
}
}
lw_shared_ptr<stats> get_stats_from_schema(service::storage_proxy& sp, const schema& schema) {
try {
replica::table& table = sp.local_db().find_column_family(schema.id());
if (!table.get_stats().alternator_stats) {
table.get_stats().alternator_stats = seastar::make_shared<table_stats>(schema.ks_name(), schema.cf_name());
}
return table.get_stats().alternator_stats->_stats;
} catch (std::runtime_error&) {
// If we're here it means that a table we are currently working on was deleted before the
// operation completed, returning a temporary object is fine, if the table get deleted so will its metrics
return make_lw_shared<stats>();
}
}
void describe_single_item(const cql3::selection::selection& selection,
const std::vector<managed_bytes_opt>& result_row,
const std::optional<attrs_to_get>& attrs_to_get,
rjson::value& item,
uint64_t* item_length_in_bytes,
bool include_all_embedded_attributes)
{
const auto& columns = selection.get_columns();
auto column_it = columns.begin();
for (const managed_bytes_opt& cell : result_row) {
if (!cell) {
++column_it;
continue;
}
std::string column_name = (*column_it)->name_as_text();
if (column_name != executor::ATTRS_COLUMN_NAME) {
if (item_length_in_bytes) {
(*item_length_in_bytes) += column_name.length() + cell->size();
}
if (!attrs_to_get || attrs_to_get->contains(column_name)) {
// item is expected to start empty, and column_name are unique
// so add() makes sense
rjson::add_with_string_name(item, column_name, rjson::empty_object());
rjson::value& field = item[column_name.c_str()];
cell->with_linearized([&] (bytes_view linearized_cell) {
rjson::add_with_string_name(field, type_to_string((*column_it)->type), json_key_column_value(linearized_cell, **column_it));
});
}
} else {
auto deserialized = attrs_type()->deserialize(*cell);
auto keys_and_values = value_cast<map_type_impl::native_type>(deserialized);
for (auto entry : keys_and_values) {
std::string attr_name = value_cast<sstring>(entry.first);
if (item_length_in_bytes) {
(*item_length_in_bytes) += attr_name.length();
}
if (include_all_embedded_attributes || !attrs_to_get || attrs_to_get->contains(attr_name)) {
bytes value = value_cast<bytes>(entry.second);
if (item_length_in_bytes && value.length()) {
// ScyllaDB uses one extra byte compared to DynamoDB for the bytes length
(*item_length_in_bytes) += value.length() - 1;
}
rjson::value v = deserialize_item(value);
if (attrs_to_get) {
auto it = attrs_to_get->find(attr_name);
if (it != attrs_to_get->end()) {
// attrs_to_get may have asked for only part of
// this attribute. hierarchy_filter() modifies v,
// and returns false when nothing is to be kept.
if (!hierarchy_filter(v, it->second)) {
continue;
}
}
}
// item is expected to start empty, and attribute
// names are unique so add() makes sense
rjson::add_with_string_name(item, attr_name, std::move(v));
} else if (item_length_in_bytes) {
(*item_length_in_bytes) += value_cast<bytes>(entry.second).length() - 1;
}
}
}
++column_it;
}
}
std::optional<rjson::value> describe_single_item(schema_ptr schema,
const query::partition_slice& slice,
const cql3::selection::selection& selection,
const query::result& query_result,
const std::optional<attrs_to_get>& attrs_to_get,
uint64_t* item_length_in_bytes) {
rjson::value item = rjson::empty_object();
cql3::selection::result_set_builder builder(selection, gc_clock::now());
query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));
auto result_set = builder.build();
if (result_set->empty()) {
if (item_length_in_bytes) {
// empty results is counted as having a minimal length (e.g. 1 byte).
(*item_length_in_bytes) += 1;
}
// If there is no matching item, we're supposed to return an empty
// object without an Item member - not one with an empty Item member
return {};
}
if (result_set->size() > 1) {
// If the result set contains multiple rows, the code should have
// called describe_multi_item(), not this function.
throw std::logic_error("describe_single_item() asked to describe multiple items");
}
describe_single_item(selection, *result_set->rows().begin(), attrs_to_get, item, item_length_in_bytes);
return item;
}
static void check_big_array(const rjson::value& val, int& size_left);
static void check_big_object(const rjson::value& val, int& size_left);
// For simplicity, we use a recursive implementation. This is fine because
// Alternator limits the depth of JSONs it reads from inputs, and doesn't
// add more than a couple of levels in its own output construction.
bool is_big(const rjson::value& val, int big_size) {
if (val.IsString()) {
return ssize_t(val.GetStringLength()) > big_size;
} else if (val.IsObject()) {
check_big_object(val, big_size);
return big_size < 0;
} else if (val.IsArray()) {
check_big_array(val, big_size);
return big_size < 0;
}
return false;
}
static void check_big_array(const rjson::value& val, int& size_left) {
// Assume a fixed size of 10 bytes for each number, boolean, etc., or
// beginning of a sub-object. This doesn't have to be accurate.
size_left -= 10 * val.Size();
for (const auto& v : val.GetArray()) {
if (size_left < 0) {
return;
}
// Note that we avoid recursive calls for the leaves (anything except
// array or object) because usually those greatly outnumber the trunk.
if (v.IsString()) {
size_left -= v.GetStringLength();
} else if (v.IsObject()) {
check_big_object(v, size_left);
} else if (v.IsArray()) {
check_big_array(v, size_left);
}
}
}
static void check_big_object(const rjson::value& val, int& size_left) {
size_left -= 10 * val.MemberCount();
for (const auto& m : val.GetObject()) {
if (size_left < 0) {
return;
}
size_left -= m.name.GetStringLength();
if (m.value.IsString()) {
size_left -= m.value.GetStringLength();
} else if (m.value.IsObject()) {
check_big_object(m.value, size_left);
} else if (m.value.IsArray()) {
check_big_array(m.value, size_left);
}
}
}
void validate_table_name(std::string_view name, const char* source) {
if (name.length() < 3 || name.length() > max_table_name_length) {
throw api_error::validation(
format("{} must be at least 3 characters long and at most {} characters long", source, max_table_name_length));
}
if (!valid_table_name_chars(name)) {
throw api_error::validation(
format("{} must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", source));
}
}
void validate_cdc_log_name_length(std::string_view table_name) {
if (cdc::log_name(table_name).length() > max_auxiliary_table_name_length) {
// CDC will add cdc_log_suffix ("_scylla_cdc_log") to the table name
// to create its log table, and this will exceed the maximum allowed
// length. To provide a more helpful error message, we assume that
// cdc::log_name() always adds a suffix of the same length.
int suffix_len = cdc::log_name(table_name).length() - table_name.length();
throw api_error::validation(fmt::format("Streams or vector search cannot be enabled on a table whose name is longer than {} characters: {}",
max_auxiliary_table_name_length - suffix_len, table_name));
}
}
body_writer make_streamed(rjson::value&& value) {
return [value = std::move(value)](output_stream<char>&& _out) mutable -> future<> {
auto out = std::move(_out);
std::exception_ptr ex;
try {
co_await rjson::print(value, out);
} catch (...) {
ex = std::current_exception();
}
co_await out.close();
co_await rjson::destroy_gently(std::move(value));
if (ex) {
co_await coroutine::return_exception_ptr(std::move(ex));
}
};
}
} // namespace alternator

View File

@@ -1,247 +0,0 @@
/*
* Copyright 2019-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
// This header file, and the implementation file executor_util.cc, contain
// various utility functions that are reused in many different operations
// (API requests) across Alternator's code - in files such as executor.cc,
// executor_read.cc, streams.cc, ttl.cc, and more. These utility functions
// include things like extracting and validating pieces from a JSON request,
// checking permissions, constructing auxiliary table names, and more.
#pragma once
#include <map>
#include <optional>
#include <string>
#include <string_view>
#include <unordered_map>
#include <unordered_set>
#include <seastar/core/future.hh>
#include <seastar/util/noncopyable_function.hh>
#include "utils/rjson.hh"
#include "schema/schema_fwd.hh"
#include "types/types.hh"
#include "auth/permission.hh"
#include "alternator/stats.hh"
#include "alternator/attribute_path.hh"
#include "utils/managed_bytes.hh"
namespace query { class partition_slice; class result; }
namespace cql3::selection { class selection; }
namespace data_dictionary { class database; }
namespace service { class storage_proxy; class client_state; }
namespace alternator {
/// The body_writer is used for streaming responses - where the response body
/// is written in chunks to the output_stream. This allows for efficient
/// handling of large responses without needing to allocate a large buffer in
/// memory. It is one of the variants of executor::request_return_type.
using body_writer = noncopyable_function<future<>(output_stream<char>&&)>;
/// Get the value of an integer attribute, or an empty optional if it is
/// missing. If the attribute exists, but is not an integer, a descriptive
/// api_error is thrown.
std::optional<int> get_int_attribute(const rjson::value& value, std::string_view attribute_name);
/// Get the value of a string attribute, or a default value if it is missing.
/// If the attribute exists, but is not a string, a descriptive api_error is
/// thrown.
std::string get_string_attribute(const rjson::value& value, std::string_view attribute_name, const char* default_return);
/// Get the value of a boolean attribute, or a default value if it is missing.
/// If the attribute exists, but is not a bool, a descriptive api_error is
/// thrown.
bool get_bool_attribute(const rjson::value& value, std::string_view attribute_name, bool default_return);
/// Extract table name from a request.
/// Most requests expect the table's name to be listed in a "TableName" field.
/// get_table_name() returns the name or api_error in case the table name is
/// missing or not a string.
std::string get_table_name(const rjson::value& request);
/// find_table_name() is like get_table_name() except that it returns an
/// optional table name - it returns an empty optional when the TableName
/// is missing from the request, instead of throwing as get_table_name()
/// does. However, find_table_name() still throws if a TableName exists but
/// is not a string.
std::optional<std::string> find_table_name(const rjson::value& request);
/// Extract table schema from a request.
/// Many requests expect the table's name to be listed in a "TableName" field
/// and need to look it up as an existing table. The get_table() function
/// does this, with the appropriate validation and api_error in case the table
/// name is missing, invalid or the table doesn't exist. If everything is
/// successful, it returns the table's schema.
schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request);
/// This find_table() variant is like get_table() excepts that it returns a
/// nullptr instead of throwing if the request does not mention a TableName.
/// In other cases of errors (i.e., a table is mentioned but doesn't exist)
/// this function throws too.
schema_ptr find_table(service::storage_proxy& proxy, const rjson::value& request);
/// This find_table() variant is like the previous one except that it takes
/// the table name directly instead of a request object. It is used in cases
/// where we already have the table name extracted from the request.
schema_ptr find_table(service::storage_proxy& proxy, std::string_view table_name);
// We would have liked to support table names up to 255 bytes, like DynamoDB.
// But Scylla creates a directory whose name is the table's name plus 33
// bytes (dash and UUID), and since directory names are limited to 255 bytes,
// we need to limit table names to 222 bytes, instead of 255. See issue #4480.
// We actually have two limits here,
// * max_table_name_length is the limit that Alternator will impose on names
// of new Alternator tables.
// * max_auxiliary_table_name_length is the potentially higher absolute limit
// that Scylla imposes on the names of auxiliary tables that Alternator
// wants to create internally - i.e. materialized views or CDC log tables.
// The second limit might mean that it is not possible to add a GSI to an
// existing table, because the name of the new auxiliary table may go over
// the limit. The second limit is also one of the reasons why the first limit
// is set lower than 222 - to have room to enable streams which add the extra
// suffix "_scylla_cdc_log" to the table name.
inline constexpr int max_table_name_length = 192;
inline constexpr int max_auxiliary_table_name_length = 222;
/// validate_table_name() validates the TableName parameter in a request - it
/// should be called in CreateTable, and in other requests only when noticing
/// that the named table doesn't exist.
/// The DynamoDB developer guide, https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules
/// specifies that table "names must be between 3 and 255 characters long and
/// can contain only the following characters: a-z, A-Z, 0-9, _ (underscore),
/// - (dash), . (dot)". However, Alternator only allows max_table_name_length
/// characters (see above) - not 255.
/// validate_table_name() throws the appropriate api_error if this validation
/// fails.
void validate_table_name(std::string_view name, const char* source = "TableName");
/// Validate that a CDC log table could be created for the base table with a
/// given table_name, and if not, throw a user-visible api_error::validation.
/// It is not possible to create a CDC log table if the table name is so long
/// that adding the 15-character suffix "_scylla_cdc_log" (cdc_log_suffix)
/// makes it go over max_auxiliary_table_name_length.
/// Note that if max_table_name_length is set to less than 207 (which is
/// max_auxiliary_table_name_length-15), then this function will never
/// fail. However, it's still important to call it in UpdateTable, in case
/// we have pre-existing tables with names longer than this to avoid #24598.
void validate_cdc_log_name_length(std::string_view table_name);
/// Checks if a keyspace, given by its name, is an Alternator keyspace.
/// This just checks if the name begins in executor::KEYSPACE_NAME_PREFIX,
/// a prefix that all keyspaces created by Alternator's CreateTable use.
bool is_alternator_keyspace(std::string_view ks_name);
/// Wraps db::get_tags_of_table() and throws api_error::validation if the
/// table is missing the tags extension.
const std::map<sstring, sstring>& get_tags_of_table_or_throw(schema_ptr schema);
/// Returns a type object representing the type of the ":attrs" column used
/// by Alternator to store all non-key attribute. This type is a map from
/// string (attribute name) to bytes (serialized attribute value).
map_type attrs_type();
// In DynamoDB index names are local to a table, while in Scylla, materialized
// view names are global (in a keyspace). So we need to compose a unique name
// for the view taking into account both the table's name and the index name.
// We concatenate the table and index name separated by a delim character
// (a character not allowed by DynamoDB in ordinary table names, default: ":").
// The downside of this approach is that it limits the sum of the lengths,
// instead of each component individually as DynamoDB does.
// The view_name() function assumes the table_name has already been validated
// but validates the legality of index_name and the combination of both.
std::string view_name(std::string_view table_name, std::string_view index_name,
const std::string& delim = ":", bool validate_len = true);
std::string gsi_name(std::string_view table_name, std::string_view index_name,
bool validate_len = true);
std::string lsi_name(std::string_view table_name, std::string_view index_name,
bool validate_len = true);
/// After calling pk_from_json() and ck_from_json() to extract the pk and ck
/// components of a key, and if that succeeded, call check_key() to further
/// check that the key doesn't have any spurious components.
void check_key(const rjson::value& key, const schema_ptr& schema);
/// Fail with api_error::validation if the expression if has unused attribute
/// names or values. This is how DynamoDB behaves, so we do too.
void verify_all_are_used(const rjson::value* field,
const std::unordered_set<std::string>& used,
const char* field_name,
const char* operation);
/// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,
/// SELECT, DROP, etc.) on the given table. When permission is denied an
/// appropriate user-readable api_error::access_denied is thrown.
future<> verify_permission(bool enforce_authorization, bool warn_authorization, const service::client_state&, const schema_ptr&, auth::permission, stats& stats);
/// Similar to verify_permission() above, but just for CREATE operations.
/// Those do not operate on any specific table, so require permissions on
/// ALL KEYSPACES instead of any specific table.
future<> verify_create_permission(bool enforce_authorization, bool warn_authorization, const service::client_state&, stats& stats);
// Sets a KeySchema JSON array inside the given parent object describing the
// key attributes of the given schema as HASH or RANGE keys. Additionally,
// adds mappings from key attribute names to their DynamoDB type string into
// attribute_types.
void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string, std::string>* attribute_types = nullptr, const std::map<sstring, sstring>* tags = nullptr);
/// is_big() checks approximately if the given JSON value is "bigger" than
/// the given big_size number of bytes. The goal is to *quickly* detect
/// oversized JSON that, for example, is too large to be serialized to a
/// contiguous string - we don't need an accurate size for that. Moreover,
/// as soon as we detect that the JSON is indeed "big", we can return true
/// and don't need to continue calculating its exact size.
bool is_big(const rjson::value& val, int big_size = 100'000);
/// try_get_internal_table() handles the special case that the given table_name
/// begins with INTERNAL_TABLE_PREFIX (".scylla.alternator."). In that case,
/// this function assumes that the rest of the name refers to an internal
/// Scylla table (e.g., system table) and returns the schema of that table -
/// or an exception if it doesn't exist. Otherwise, if table_name does not
/// start with INTERNAL_TABLE_PREFIX, this function returns an empty schema_ptr
/// and the caller should look for a normal Alternator table with that name.
schema_ptr try_get_internal_table(const data_dictionary::database& db, std::string_view table_name);
/// get_table_from_batch_request() is used by batch write/read operations to
/// look up the schema for a table named in a batch request, by the JSON member
/// name (which is the table name in a BatchWriteItem or BatchGetItem request).
schema_ptr get_table_from_batch_request(const service::storage_proxy& proxy, const rjson::value::ConstMemberIterator& batch_request);
/// Returns (or lazily creates) the per-table stats object for the given schema.
/// If the table has been deleted, returns a temporary stats object.
lw_shared_ptr<stats> get_stats_from_schema(service::storage_proxy& sp, const schema& schema);
/// Writes one item's attributes into `item` from the given selection result
/// row. If include_all_embedded_attributes is true, all attributes from the
/// ATTRS_COLUMN map column are included regardless of attrs_to_get.
void describe_single_item(const cql3::selection::selection&,
const std::vector<managed_bytes_opt>&,
const std::optional<attrs_to_get>&,
rjson::value&,
uint64_t* item_length_in_bytes = nullptr,
bool include_all_embedded_attributes = false);
/// Converts a single result row to a JSON item, or returns an empty optional
/// if the result is empty.
std::optional<rjson::value> describe_single_item(schema_ptr,
const query::partition_slice&,
const cql3::selection::selection&,
const query::result&,
const std::optional<attrs_to_get>&,
uint64_t* item_length_in_bytes = nullptr);
/// Make a body_writer (function that can write output incrementally to the
/// HTTP stream) from the given JSON object.
/// Note: only useful for (very) large objects as there are overhead issues
/// with this as well, but for massive lists of return objects this can
/// help avoid large allocations/many re-allocs.
body_writer make_streamed(rjson::value&&);
} // namespace alternator

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "expressions.hh"
@@ -744,7 +744,7 @@ void validate_attr_name_length(std::string_view supplementary_context, size_t at
constexpr const size_t DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX = 65535;
const size_t max_length = is_key ? DYNAMODB_KEY_ATTR_NAME_SIZE_MAX : DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX;
if (attr_name_length > max_length || attr_name_length == 0) {
if (attr_name_length > max_length) {
std::string error_msg;
if (!error_msg_prefix.empty()) {
error_msg += error_msg_prefix;
@@ -754,11 +754,7 @@ void validate_attr_name_length(std::string_view supplementary_context, size_t at
error_msg += supplementary_context;
error_msg += " - ";
}
if (attr_name_length == 0) {
error_msg += "Empty attribute name";
} else {
error_msg += fmt::format("Attribute name is too large, must be less than {} bytes", std::to_string(max_length + 1));
}
error_msg += fmt::format("Attribute name is too large, must be less than {} bytes", std::to_string(max_length + 1));
throw api_error::validation(error_msg);
}
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
/*
@@ -91,18 +91,6 @@ options {
throw expressions_syntax_error(format("{} at char {}", err,
ex->get_charPositionInLine()));
}
// ANTLR3 tries to recover missing tokens - it tries to finish parsing
// and create valid objects, as if the missing token was there.
// But it has a bug and leaks these tokens.
// We override offending method and handle abandoned pointers.
std::vector<std::unique_ptr<TokenType>> _missing_tokens;
TokenType* getMissingSymbol(IntStreamType* istream, ExceptionBaseType* e,
ANTLR_UINT32 expectedTokenType, BitsetListType* follow) {
auto token = BaseType::getMissingSymbol(istream, e, expectedTokenType, follow);
_missing_tokens.emplace_back(token);
return token;
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
@@ -196,13 +184,7 @@ path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' {
try {
$p.add_index(std::stoi($INTEGER.text));
} catch(std::out_of_range&) {
throw expressions_syntax_error("list index out of integer range");
}
}
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
/* See comment above why the "depth" counter was needed here */
@@ -248,7 +230,7 @@ update_expression_clause returns [parsed::update_expression e]:
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })+ EOF;
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
@@ -275,13 +257,6 @@ primitive_condition returns [parsed::primitive_condition c]:
(',' v=value[0] { $c.add_value(std::move($v.v)); })*
')'
)?
{
// Post-parse check to reject non-function single values
if ($c._op == parsed::primitive_condition::type::VALUE &&
!$c._values.front().is_func()) {
throw expressions_syntax_error("Single value must be a function");
}
}
;
// The following rules for parsing boolean expressions are verbose and

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -18,8 +18,6 @@
#include "expressions_types.hh"
#include "utils/rjson.hh"
#include "utils/updateable_value.hh"
#include "stats.hh"
namespace alternator {
@@ -28,26 +26,6 @@ public:
using runtime_error::runtime_error;
};
namespace parsed {
class expression_cache_impl;
class expression_cache {
std::unique_ptr<expression_cache_impl> _impl;
public:
struct config {
utils::updateable_value<uint32_t> max_cache_entries;
};
expression_cache(config cfg, stats& stats);
~expression_cache();
// stop background tasks, if any
future<> stop();
update_expression parse_update_expression(std::string_view query);
std::vector<path> parse_projection_expression(std::string_view query);
condition_expression parse_condition_expression(std::string_view query, const char* caller);
};
} // namespace parsed
// Preferably use parsed::expression_cache instance instead of this free functions.
parsed::update_expression parse_update_expression(std::string_view query);
std::vector<parsed::path> parse_projection_expression(std::string_view query);
parsed::condition_expression parse_condition_expression(std::string_view query, const char* caller);

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -50,7 +50,7 @@ public:
_operators.emplace_back(i);
check_depth_limit();
}
void add_dot(std::string name) {
void add_dot(std::string(name)) {
_operators.emplace_back(std::move(name));
check_depth_limit();
}
@@ -85,7 +85,7 @@ struct constant {
}
};
// "value" is a value used in the right hand side of an assignment
// "value" is is a value used in the right hand side of an assignment
// expression, "SET a = ...". It can be a constant (a reference to a value
// included in the request, e.g., ":val"), a path to an attribute from the
// existing item (e.g., "a.b[3].c"), or a function of other such values.
@@ -205,11 +205,13 @@ public:
// The supported primitive conditions are:
// 1. Binary operators - v1 OP v2, where OP is =, <>, <, <=, >, or >= and
// v1 and v2 are values - from the item (an attribute path), the query
// (a ":val" reference), or a function of the above (only the size()
// (a ":val" reference), or a function of the the above (only the size()
// function is supported).
// 2. Ternary operator - v1 BETWEEN v2 and v3 (means v1 >= v2 AND v1 <= v3).
// 3. N-ary operator - v1 IN ( v2, v3, ... )
// 4. A single function call (attribute_exists etc.).
// 4. A single function call (attribute_exists etc.). The parser actually
// accepts a more general "value" here but later stages reject a value
// which is not a function call (because DynamoDB does it too).
class primitive_condition {
public:
enum class type {

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -13,7 +13,7 @@
#include "utils/rjson.hh"
#include "serialization.hh"
#include "schema/column_computation.hh"
#include "column_computation.hh"
#include "db/view/regular_column_transformation.hh"
namespace alternator {

View File

@@ -1,301 +0,0 @@
/*
* Copyright 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#include "alternator/http_compression.hh"
#include "alternator/server.hh"
#include <seastar/coroutine/maybe_yield.hh>
#include <zlib.h>
static logging::logger slogger("alternator-http-compression");
namespace alternator {
static constexpr size_t compressed_buffer_size = 1024;
class zlib_compressor {
z_stream _zs;
temporary_buffer<char> _output_buf;
noncopyable_function<future<>(temporary_buffer<char>&&)> _write_func;
public:
zlib_compressor(bool gzip, int compression_level, noncopyable_function<future<>(temporary_buffer<char>&&)> write_func)
: _write_func(std::move(write_func)) {
memset(&_zs, 0, sizeof(_zs));
if (deflateInit2(&_zs, std::clamp(compression_level, Z_NO_COMPRESSION, Z_BEST_COMPRESSION), Z_DEFLATED,
(gzip ? 16 : 0) + MAX_WBITS, 8, Z_DEFAULT_STRATEGY) != Z_OK) {
// Should only happen if memory allocation fails
throw std::bad_alloc();
}
}
~zlib_compressor() {
deflateEnd(&_zs);
}
future<> close() {
return compress(nullptr, 0, true);
}
future<> compress(const char* buf, size_t len, bool is_last_chunk = false) {
_zs.next_in = reinterpret_cast<unsigned char*>(const_cast<char*>(buf));
_zs.avail_in = (uInt) len;
int mode = is_last_chunk ? Z_FINISH : Z_NO_FLUSH;
while(_zs.avail_in > 0 || is_last_chunk) {
co_await coroutine::maybe_yield();
if (_output_buf.empty()) {
if (is_last_chunk) {
uint32_t max_buffer_size = 0;
deflatePending(&_zs, &max_buffer_size, nullptr);
max_buffer_size += deflateBound(&_zs, _zs.avail_in) + 1;
_output_buf = temporary_buffer<char>(std::min(compressed_buffer_size, (size_t) max_buffer_size));
} else {
_output_buf = temporary_buffer<char>(compressed_buffer_size);
}
_zs.next_out = reinterpret_cast<unsigned char*>(_output_buf.get_write());
_zs.avail_out = compressed_buffer_size;
}
int e = deflate(&_zs, mode);
if (e < Z_OK) {
throw api_error::internal("Error during compression of response body");
}
if (e == Z_STREAM_END || _zs.avail_out < compressed_buffer_size / 4) {
_output_buf.trim(compressed_buffer_size - _zs.avail_out);
co_await _write_func(std::move(_output_buf));
if (e == Z_STREAM_END) {
break;
}
}
}
}
};
// Helper string_view functions for parsing Accept-Encoding header
struct case_insensitive_cmp_sv {
bool operator()(std::string_view s1, std::string_view s2) const {
return std::equal(s1.begin(), s1.end(), s2.begin(), s2.end(),
[](char a, char b) { return ::tolower(a) == ::tolower(b); });
}
};
static inline std::string_view trim_left(std::string_view sv) {
while (!sv.empty() && std::isspace(static_cast<unsigned char>(sv.front())))
sv.remove_prefix(1);
return sv;
}
static inline std::string_view trim_right(std::string_view sv) {
while (!sv.empty() && std::isspace(static_cast<unsigned char>(sv.back())))
sv.remove_suffix(1);
return sv;
}
static inline std::string_view trim(std::string_view sv) {
return trim_left(trim_right(sv));
}
inline std::vector<std::string_view> split(std::string_view text, char separator) {
std::vector<std::string_view> tokens;
if (text == "") {
return tokens;
}
while (true) {
auto pos = text.find_first_of(separator);
if (pos != std::string_view::npos) {
tokens.emplace_back(text.data(), pos);
text.remove_prefix(pos + 1);
} else {
tokens.emplace_back(text);
break;
}
}
return tokens;
}
constexpr response_compressor::compression_type response_compressor::get_compression_type(std::string_view encoding) {
for (size_t i = 0; i < static_cast<size_t>(compression_type::count); ++i) {
if (case_insensitive_cmp_sv{}(encoding, compression_names[i])) {
return static_cast<compression_type>(i);
}
}
return compression_type::unknown;
}
response_compressor::compression_type response_compressor::find_compression(std::string_view accept_encoding, size_t response_size) {
std::optional<float> ct_q[static_cast<size_t>(compression_type::count)];
ct_q[static_cast<size_t>(compression_type::none)] = std::numeric_limits<float>::min(); // enabled, but lowest priority
compression_type selected_ct = compression_type::none;
std::vector<std::string_view> entries = split(accept_encoding, ',');
for (auto& e : entries) {
std::vector<std::string_view> params = split(e, ';');
if (params.size() == 0) {
continue;
}
compression_type ct = get_compression_type(trim(params[0]));
if (ct == compression_type::unknown) {
continue; // ignore unknown encoding types
}
if (ct_q[static_cast<size_t>(ct)].has_value() && ct_q[static_cast<size_t>(ct)] != 0.0f) {
continue; // already processed this encoding
}
if (response_size < _threshold[static_cast<size_t>(ct)]) {
continue; // below threshold treat as unknown
}
for (size_t i = 1; i < params.size(); ++i) { // find "q=" parameter
auto pos = params[i].find("q=");
if (pos == std::string_view::npos) {
continue;
}
std::string_view param = params[i].substr(pos + 2);
param = trim(param);
// parse quality value
float q_value = 1.0f;
auto [ptr, ec] = std::from_chars(param.data(), param.data() + param.size(), q_value);
if (ec != std::errc() || ptr != param.data() + param.size()) {
continue;
}
if (q_value < 0.0) {
q_value = 0.0;
} else if (q_value > 1.0) {
q_value = 1.0;
}
ct_q[static_cast<size_t>(ct)] = q_value;
break; // we parsed quality value
}
if (!ct_q[static_cast<size_t>(ct)].has_value()) {
ct_q[static_cast<size_t>(ct)] = 1.0f; // default quality value
}
// keep the highest encoding (in the order, unless 'any')
if (selected_ct == compression_type::any) {
if (ct_q[static_cast<size_t>(ct)] >= ct_q[static_cast<size_t>(selected_ct)]) {
selected_ct = ct;
}
} else {
if (ct_q[static_cast<size_t>(ct)] > ct_q[static_cast<size_t>(selected_ct)]) {
selected_ct = ct;
}
}
}
if (selected_ct == compression_type::any) {
// select any not mentioned or highest quality
selected_ct = compression_type::none;
for (size_t i = 0; i < static_cast<size_t>(compression_type::compressions_count); ++i) {
if (!ct_q[i].has_value()) {
return static_cast<compression_type>(i);
}
if (ct_q[i] > ct_q[static_cast<size_t>(selected_ct)]) {
selected_ct = static_cast<compression_type>(i);
}
}
}
return selected_ct;
}
static future<chunked_content> compress(response_compressor::compression_type ct, const db::config& cfg, std::string str) {
chunked_content compressed;
auto write = [&compressed](temporary_buffer<char>&& buf) -> future<> {
compressed.push_back(std::move(buf));
return make_ready_future<>();
};
zlib_compressor compressor(ct != response_compressor::compression_type::deflate,
cfg.alternator_response_gzip_compression_level(), std::move(write));
co_await compressor.compress(str.data(), str.size(), true);
co_return compressed;
}
static sstring flatten(chunked_content&& cc) {
size_t total_size = 0;
for (const auto& chunk : cc) {
total_size += chunk.size();
}
sstring result = sstring{ sstring::initialized_later{}, total_size };
size_t offset = 0;
for (const auto& chunk : cc) {
std::copy(chunk.begin(), chunk.end(), result.begin() + offset);
offset += chunk.size();
}
return result;
}
future<std::unique_ptr<http::reply>> response_compressor::generate_reply(std::unique_ptr<http::reply> rep, sstring accept_encoding, const char* content_type, std::string&& response_body) {
response_compressor::compression_type ct = find_compression(accept_encoding, response_body.size());
if (ct != response_compressor::compression_type::none) {
rep->add_header("Content-Encoding", get_encoding_name(ct));
rep->set_content_type(content_type);
return compress(ct, cfg, std::move(response_body)).then([rep = std::move(rep)] (chunked_content compressed) mutable {
rep->_content = flatten(std::move(compressed));
return make_ready_future<std::unique_ptr<http::reply>>(std::move(rep));
});
} else {
// Note that despite the move, there is a copy here -
// as str is std::string and rep->_content is sstring.
rep->_content = std::move(response_body);
rep->set_content_type(content_type);
}
return make_ready_future<std::unique_ptr<http::reply>>(std::move(rep));
}
template<typename Compressor>
class compressed_data_sink_impl : public data_sink_impl {
output_stream<char> _out;
Compressor _compressor;
public:
template<typename... Args>
compressed_data_sink_impl(output_stream<char>&& out, Args&&... args)
: _out(std::move(out)), _compressor(std::forward<Args>(args)..., [this](temporary_buffer<char>&& buf) {
return _out.write(std::move(buf));
}) { }
future<> put(std::span<temporary_buffer<char>> data) override {
return data_sink_impl::fallback_put(data, [this] (temporary_buffer<char>&& buf) {
return do_put(std::move(buf));
});
}
private:
future<> do_put(temporary_buffer<char> buf) {
co_return co_await _compressor.compress(buf.get(), buf.size());
}
future<> close() override {
return _compressor.close().then([this] {
return _out.close();
});
}
};
body_writer compress(response_compressor::compression_type ct, const db::config& cfg, body_writer&& bw) {
return [bw = std::move(bw), ct, level = cfg.alternator_response_gzip_compression_level()](output_stream<char>&& out) mutable -> future<> {
output_stream_options opts;
opts.trim_to_size = true;
std::unique_ptr<data_sink_impl> data_sink_impl;
switch (ct) {
case response_compressor::compression_type::gzip:
data_sink_impl = std::make_unique<compressed_data_sink_impl<zlib_compressor>>(std::move(out), true, level);
break;
case response_compressor::compression_type::deflate:
data_sink_impl = std::make_unique<compressed_data_sink_impl<zlib_compressor>>(std::move(out), false, level);
break;
case response_compressor::compression_type::none:
case response_compressor::compression_type::any:
case response_compressor::compression_type::unknown:
on_internal_error(slogger,"Compression not selected");
default:
on_internal_error(slogger, "Unsupported compression type for data sink");
}
return bw(output_stream<char>(data_sink(std::move(data_sink_impl)), compressed_buffer_size, opts));
};
}
future<std::unique_ptr<http::reply>> response_compressor::generate_reply(std::unique_ptr<http::reply> rep, sstring accept_encoding, const char* content_type, body_writer&& body_writer) {
response_compressor::compression_type ct = find_compression(accept_encoding, std::numeric_limits<size_t>::max());
if (ct != response_compressor::compression_type::none) {
rep->add_header("Content-Encoding", get_encoding_name(ct));
rep->write_body(content_type, compress(ct, cfg, std::move(body_writer)));
} else {
rep->write_body(content_type, std::move(body_writer));
}
return make_ready_future<std::unique_ptr<http::reply>>(std::move(rep));
}
} // namespace alternator

View File

@@ -1,91 +0,0 @@
/*
* Copyright 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#pragma once
#include "alternator/executor.hh"
#include <seastar/http/httpd.hh>
#include "db/config.hh"
namespace alternator {
class response_compressor {
public:
enum class compression_type {
gzip,
deflate,
compressions_count,
any = compressions_count,
none,
count,
unknown = count
};
static constexpr std::string_view compression_names[] = {
"gzip",
"deflate",
"*",
"identity"
};
static sstring get_encoding_name(compression_type ct) {
return sstring(compression_names[static_cast<size_t>(ct)]);
}
static constexpr compression_type get_compression_type(std::string_view encoding);
sstring get_accepted_encoding(const http::request& req) {
if (get_threshold() == 0) {
return "";
}
return req.get_header("Accept-Encoding");
}
compression_type find_compression(std::string_view accept_encoding, size_t response_size);
response_compressor(const db::config& cfg)
: cfg(cfg)
,_gzip_level_observer(
cfg.alternator_response_gzip_compression_level.observe([this](int v) {
update_threshold();
}))
,_gzip_threshold_observer(
cfg.alternator_response_compression_threshold_in_bytes.observe([this](uint32_t v) {
update_threshold();
}))
{
update_threshold();
}
response_compressor(const response_compressor& rhs) : response_compressor(rhs.cfg) {}
private:
const db::config& cfg;
utils::observable<int>::observer _gzip_level_observer;
utils::observable<uint32_t>::observer _gzip_threshold_observer;
uint32_t _threshold[static_cast<size_t>(compression_type::count)];
size_t get_threshold() { return _threshold[static_cast<size_t>(compression_type::any)]; }
void update_threshold() {
_threshold[static_cast<size_t>(compression_type::none)] = std::numeric_limits<uint32_t>::max();
_threshold[static_cast<size_t>(compression_type::any)] = std::numeric_limits<uint32_t>::max();
uint32_t gzip = cfg.alternator_response_gzip_compression_level() <= 0 ? std::numeric_limits<uint32_t>::max()
: cfg.alternator_response_compression_threshold_in_bytes();
_threshold[static_cast<size_t>(compression_type::gzip)] = gzip;
_threshold[static_cast<size_t>(compression_type::deflate)] = gzip;
for (size_t i = 0; i < static_cast<size_t>(compression_type::compressions_count); ++i) {
if (_threshold[i] < _threshold[static_cast<size_t>(compression_type::any)]) {
_threshold[static_cast<size_t>(compression_type::any)] = _threshold[i];
}
}
}
public:
future<std::unique_ptr<http::reply>> generate_reply(std::unique_ptr<http::reply> rep,
sstring accept_encoding, const char* content_type, std::string&& response_body);
future<std::unique_ptr<http::reply>> generate_reply(std::unique_ptr<http::reply> rep,
sstring accept_encoding, const char* content_type, body_writer&& body_writer);
};
}

View File

@@ -1,109 +0,0 @@
/*
* Copyright 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#include "expressions.hh"
#include "utils/log.hh"
#include "utils/lru_string_map.hh"
#include <variant>
static logging::logger logger_("parsed-expression-cache");
namespace alternator::parsed {
struct expression_cache_impl {
stats& _stats;
using cached_expressions_types = std::variant<
update_expression,
condition_expression,
std::vector<path>
>;
sized_lru_string_map<cached_expressions_types> _cached_entries;
utils::observable<uint32_t>::observer _max_cache_entries_observer;
expression_cache_impl(expression_cache::config cfg, stats& stats);
// to define the specialized return type of `get_or_create()`
template <typename Func, typename... Args>
using ParseResult = std::invoke_result_t<Func, std::string_view, Args...>;
// Caching layer for parsed expressions
// The expression type is determined by the type of the parsing function passed as a parameter,
// and the return type is exactly the same as the return type of this parsing function.
// StatsType is used only to update appropriate statistics - currently it is aligned with the expression type,
// but it could be extended in the future if needed, e.g. split per operation.
template <stats::expression_types StatsType, typename Func, typename... Args>
ParseResult<Func, Args...> get_or_create(std::string_view query, Func&& parse_func, Args&&... other_args) {
if (_cached_entries.disabled()) {
return parse_func(query, std::forward<Args>(other_args)...);
}
if (!_cached_entries.sanity_check()) {
_stats.expression_cache.requests[StatsType].misses++;
return parse_func(query, std::forward<Args>(other_args)...);
}
auto value = _cached_entries.find(query);
if (value) {
logger_.trace("Cache hit for query: {}", query);
_stats.expression_cache.requests[StatsType].hits++;
try {
return std::get<ParseResult<Func, Args...>>(value->get());
} catch (const std::bad_variant_access&) {
// User can reach this code, by sending the same query string as a different expression type.
// In practice valid queries are different enough to not collide.
// Entries in cache are only valid queries.
// This request will fail at parsing below.
// If, by any chance this is a valid query, it will be updated below with the new value.
logger_.trace("Cache hit for '{}', but type mismatch.", query);
_stats.expression_cache.requests[StatsType].hits--;
}
} else {
logger_.trace("Cache miss for query: {}", query);
}
ParseResult<Func, Args...> expr = parse_func(query, std::forward<Args>(other_args)...);
// Invalid query will throw here ^
_stats.expression_cache.requests[StatsType].misses++;
if (value) [[unlikely]] {
value->get() = cached_expressions_types{expr};
} else {
_cached_entries.insert(query, cached_expressions_types{expr});
}
return expr;
}
};
expression_cache_impl::expression_cache_impl(expression_cache::config cfg, stats& stats) :
_stats(stats), _cached_entries(logger_, _stats.expression_cache.evictions),
_max_cache_entries_observer(cfg.max_cache_entries.observe([this] (uint32_t max_value) {
_cached_entries.set_max_size(max_value);
})) {
_cached_entries.set_max_size(cfg.max_cache_entries());
}
expression_cache::expression_cache(expression_cache::config cfg, stats& stats) :
_impl(std::make_unique<expression_cache_impl>(std::move(cfg), stats)) {
}
expression_cache::~expression_cache() = default;
future<> expression_cache::stop() {
return _impl->_cached_entries.stop();
}
update_expression expression_cache::parse_update_expression(std::string_view query) {
return _impl->get_or_create<stats::expression_types::UPDATE_EXPRESSION>(query, alternator::parse_update_expression);
}
std::vector<path> expression_cache::parse_projection_expression(std::string_view query) {
return _impl->get_or_create<stats::expression_types::PROJECTION_EXPRESSION>(query, alternator::parse_projection_expression);
}
condition_expression expression_cache::parse_condition_expression(std::string_view query, const char* caller) {
return _impl->get_or_create<stats::expression_types::CONDITION_EXPRESSION>(query, alternator::parse_condition_expression, caller);
}
} // namespace alternator::parsed

View File

@@ -3,21 +3,18 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include "cdc/cdc_options.hh"
#include "cdc/log.hh"
#include "seastarx.hh"
#include "service/paxos/cas_request.hh"
#include "service/cas_shard.hh"
#include "utils/rjson.hh"
#include "consumed_capacity.hh"
#include "executor.hh"
#include "tracing/trace_state.hh"
#include "keys/keys.hh"
#include "keys.hh"
namespace alternator {
@@ -58,7 +55,7 @@ public:
static write_isolation get_write_isolation_for_schema(schema_ptr schema);
static write_isolation default_write_isolation;
public:
static void set_default_write_isolation(std::string_view mode);
protected:
@@ -109,16 +106,14 @@ public:
// violating this). We mark apply() "const" to let the compiler validate
// this for us. The output-only field _return_attributes is marked
// "mutable" above so that apply() can still write to it.
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts, cdc::per_request_options& cdc_opts) const = 0;
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const = 0;
// Convert the above apply() into the signature needed by cas_request:
virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts, cdc::per_request_options& cdc_opts) override;
virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts) override;
virtual ~rmw_operation() = default;
const wcu_consumed_capacity_counter& consumed_capacity() const noexcept { return _consumed_capacity; }
schema_ptr schema() const { return _schema; }
const rjson::value& request() const { return _request; }
rjson::value&& move_request() && { return std::move(_request); }
future<executor::request_return_type> execute(service::storage_proxy& proxy,
std::optional<service::cas_shard> cas_shard,
service::client_state& client_state,
tracing::trace_state_ptr trace_state,
service_permit permit,
@@ -126,10 +121,7 @@ public:
stats& global_stats,
stats& per_table_stats,
uint64_t& wcu_total);
std::optional<service::cas_shard> shard_for_execute(bool needs_read_before_write);
private:
inline bool should_fill_preimage() const { return _schema->cdc_options().enabled(); }
std::optional<shard_id> shard_for_execute(bool needs_read_before_write);
};
} // namespace alternator

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "utils/base64.hh"
@@ -11,15 +11,15 @@
#include "utils/log.hh"
#include "serialization.hh"
#include "error.hh"
#include "types/concrete_types.hh"
#include "types/json_utils.hh"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
#include "mutation/position_in_partition.hh"
#include "alternator/executor_util.hh"
static logging::logger slogger("alternator-serialization");
namespace alternator {
bool is_alternator_keyspace(const sstring& ks_name);
type_info type_info_from_string(std::string_view type) {
static thread_local const std::unordered_map<std::string_view, type_info> type_infos = {
@@ -282,23 +282,15 @@ std::string type_to_string(data_type type) {
return it->second;
}
std::optional<bytes> try_get_key_column_value(const rjson::value& item, const column_definition& column) {
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
const rjson::value* key_typed_value = rjson::find(item, column_name);
if (!key_typed_value) {
return std::nullopt;
throw api_error::validation(fmt::format("Key column {} not found", column_name));
}
return get_key_from_typed_value(*key_typed_value, column);
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
auto value = try_get_key_column_value(item, column);
if (!value) {
throw api_error::validation(fmt::format("Key column {} not found", column.name_as_text()));
}
return std::move(*value);
}
// Parses the JSON encoding for a key value, which is a map with a single
// entry whose key is the type and the value is the encoded value.
// If this type does not match the desired "type_str", an api_error::validation
@@ -388,38 +380,20 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// Note: it's possible to get more than one clustering column here, as
// Alternator can be used to read scylla internal tables.
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
auto raw_value = get_key_column_value(item, cdef);
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
clustering_key_prefix ck_prefix_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key_prefix::make_empty();
}
std::vector<bytes> raw_ck;
for (const column_definition& cdef : schema->clustering_key_columns()) {
auto raw_value = try_get_key_column_value(item, cdef);
if (!raw_value) {
break;
}
raw_ck.push_back(std::move(*raw_value));
}
return clustering_key_prefix::from_exploded(raw_ck);
}
position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema) {
const bool is_alternator_ks = is_alternator_keyspace(schema->ks_name());
if (is_alternator_ks) {
return position_in_partition::for_key(ck_from_json(item, schema));
auto ck = ck_from_json(item, schema);
if (is_alternator_keyspace(schema->ks_name())) {
return position_in_partition::for_key(std::move(ck));
}
const auto region_item = rjson::find(item, scylla_paging_region);
const auto weight_item = rjson::find(item, scylla_paging_weight);
if (bool(region_item) != bool(weight_item)) {
@@ -439,9 +413,8 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
} else {
throw std::runtime_error(fmt::format("Invalid value for weight: {}", weight_view));
}
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(ck_prefix_from_json(item, schema)) : std::nullopt);
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(std::move(ck)) : std::nullopt);
}
auto ck = ck_from_json(item, schema);
if (ck.is_empty()) {
return position_in_partition::for_partition_start();
}
@@ -496,7 +469,7 @@ const std::pair<std::string, const rjson::value*> unwrap_set(const rjson::value&
return {"", nullptr};
}
auto it = v.MemberBegin();
const std::string it_key = rjson::to_string(it->name);
const std::string it_key = it->name.GetString();
if (it_key != "SS" && it_key != "BS" && it_key != "NS") {
return {std::move(it_key), nullptr};
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -13,7 +13,7 @@
#include <optional>
#include "types/types.hh"
#include "schema/schema_fwd.hh"
#include "keys/keys.hh"
#include "keys.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"
@@ -55,7 +55,7 @@ partition_key pk_from_json(const rjson::value& item, schema_ptr schema);
clustering_key ck_from_json(const rjson::value& item, schema_ptr schema);
position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema);
// If v encodes a number (i.e., it is a {"N": [...]}), returns an object representing it. Otherwise,
// If v encodes a number (i.e., it is a {"N": [...]}, returns an object representing it. Otherwise,
// raises ValidationException with diagnostic.
big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic);

View File

@@ -3,19 +3,16 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "alternator/server.hh"
#include "audit/audit.hh"
#include "alternator/executor_util.hh"
#include "gms/application_state.hh"
#include "utils/log.hh"
#include <fmt/ranges.h>
#include <seastar/http/function_handlers.hh>
#include <seastar/http/short_streams.hh>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/util/defer.hh>
#include <seastar/util/short_streams.hh>
#include "seastarx.hh"
@@ -33,10 +30,6 @@
#include "gms/gossiper.hh"
#include "utils/overloaded_functor.hh"
#include "utils/aws_sigv4.hh"
#include "client_data.hh"
#include "utils/updateable_value.hh"
#include <zlib.h>
#include "alternator/http_compression.hh"
static logging::logger slogger("alternator-server");
@@ -106,20 +99,10 @@ static void handle_CORS(const request& req, reply& rep, bool preflight) {
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
// Although the the DynamoDB API responses are JSON, additional
// conventions apply to these responses. For this reason, DynamoDB uses
// the content type "application/x-amz-json-1.0" instead of the standard
// "application/json". Some other AWS services use later versions instead
// of "1.0", but DynamoDB currently uses "1.0". Note that this content
// type applies to all replies, both success and error.
static constexpr const char* REPLY_CONTENT_TYPE = "application/x-amz-json-1.0";
public:
api_handler(const std::function<future<executor::request_return_type>(std::unique_ptr<request> req)>& _handle,
const db::config& config) : _response_compressor(config), _f_handle(
api_handler(const std::function<future<executor::request_return_type>(std::unique_ptr<request> req)>& _handle) : _f_handle(
[this, _handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
sstring accept_encoding = _response_compressor.get_accepted_encoding(*req);
return seastar::futurize_invoke(_handle, std::move(req)).then_wrapped(
[this, rep = std::move(rep), accept_encoding=std::move(accept_encoding)](future<executor::request_return_type> resf) mutable {
return seastar::futurize_invoke(_handle, std::move(req)).then_wrapped([this, rep = std::move(rep)](future<executor::request_return_type> resf) mutable {
if (resf.failed()) {
// Exceptions of type api_error are wrapped as JSON and
// returned to the client as expected. Other types of
@@ -139,20 +122,25 @@ public:
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
auto res = resf.get();
return std::visit(overloaded_functor {
std::visit(overloaded_functor {
[&] (std::string&& str) {
return _response_compressor.generate_reply(std::move(rep), std::move(accept_encoding),
REPLY_CONTENT_TYPE, std::move(str));
// Note that despite the move, there is a copy here -
// as str is std::string and rep->_content is sstring.
rep->_content = std::move(str);
},
[&] (body_writer&& body_writer) {
return _response_compressor.generate_reply(std::move(rep), std::move(accept_encoding),
REPLY_CONTENT_TYPE, std::move(body_writer));
[&] (executor::body_writer&& body_writer) {
// Unfortunately, write_body() forces us to choose
// from a fixed and irrelevant list of "mime-types"
// at this point. But we'll override it with the
// correct one (application/x-amz-json-1.0) below.
rep->write_body("json", std::move(body_writer));
},
[&] (const api_error& err) {
generate_error_reply(*rep, err);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
}
}, std::move(res));
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
}) { }
@@ -162,6 +150,7 @@ public:
handle_CORS(*req, *rep, false);
return _f_handle(std::move(req), std::move(rep)).then(
[](std::unique_ptr<reply> rep) {
rep->set_mime_type("application/x-amz-json-1.0");
rep->done();
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
@@ -177,11 +166,9 @@ protected:
rjson::add(results, "message", err._msg);
rep._content = rjson::print(std::move(results));
rep._status = err._http_code;
rep.set_content_type(REPLY_CONTENT_TYPE);
slogger.trace("api_handler error case: {}", rep._content);
}
response_compressor _response_compressor;
future_handler_function _f_handle;
};
@@ -278,57 +265,24 @@ protected:
}
};
// This function increments the authentication_failures counter, and may also
// log a warn-level message and/or throw an exception, depending on what
// enforce_authorization and warn_authorization are set to.
// The username and client address are only used for logging purposes -
// they are not included in the error message returned to the client, since
// the client knows who it is.
// Note that if enforce_authorization is false, this function will return
// without throwing. So a caller that doesn't want to continue after an
// authentication_error must explicitly return after calling this function.
template<typename Exception>
static void authentication_error(alternator::stats& stats, bool enforce_authorization, bool warn_authorization, Exception&& e, std::string_view user, gms::inet_address client_address) {
stats.authentication_failures++;
if (enforce_authorization) {
if (warn_authorization) {
slogger.warn("alternator_warn_authorization=true: {} for user {}, client address {}", e.what(), user, client_address);
}
throw std::move(e);
} else {
if (warn_authorization) {
slogger.warn("If you set alternator_enforce_authorization=true the following will be enforced: {} for user {}, client address {}", e.what(), user, client_address);
}
}
}
future<std::string> server::verify_signature(const request& req, const chunked_content& content) {
if (!_enforce_authorization.get() && !_warn_authorization.get()) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<std::string>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::invalid_signature("Host header is mandatory for signature verification"),
"", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::invalid_signature("Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (authorization_it == req._headers.end()) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::missing_authentication_token("Authorization header is mandatory for signature verification"),
"", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::missing_authentication_token("Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::string_view authorization_header = authorization_it->second;
auto pos = authorization_header.find_first_of(' ');
if (pos == std::string_view::npos || authorization_header.substr(0, pos) != "AWS4-HMAC-SHA256") {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header)),
"", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));
}
authorization_header.remove_prefix(pos+1);
std::string credential;
@@ -363,9 +317,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::validation(fmt::format("Incorrect credential information format: {}", credential)), "", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::validation(fmt::format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
@@ -376,81 +328,39 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
std::vector<std::string> modified_values;
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
// replace multiple spaces in the header value header.second with
// a single space, as required by AWS SigV4 header canonization.
// If we modify the value, we need to save it in modified_values
// to keep it alive.
std::string value;
value.reserve(header.second.size());
bool prev_space = false;
bool modified = false;
for (char ch : header.second) {
if (ch == ' ') {
if (!prev_space) {
value += ch;
prev_space = true;
} else {
modified = true; // skip a space
}
} else {
value += ch;
prev_space = false;
}
}
if (modified) {
modified_values.emplace_back(std::move(value));
it->second = std::string_view(modified_values.back());
} else {
it->second = std::string_view(header.second);
}
it->second = std::string_view(header.second);
}
}
auto cache_getter = [&proxy = _proxy] (std::string username) {
return get_key_from_roles(proxy, std::move(username));
auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {
return get_key_from_roles(proxy, as, std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then_wrapped([this, &req, &content,
return _key_cache.get_ptr(user, cache_getter).then([this, &req, &content,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
modified_values = std::move(modified_values),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (future<key_cache::value_ptr> key_ptr_fut) {
key_cache::value_ptr key_ptr(nullptr);
try {
key_ptr = key_ptr_fut.get();
} catch (const api_error& e) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
e, user, req.get_client_address());
return std::string();
}
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature;
try {
signature = utils::aws::get_signature(user, *key_ptr, std::string_view(host), "/", req._method,
datestamp, signed_headers_str, signed_headers_map, &content, region, service, "");
} catch (const std::exception& e) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::invalid_signature(fmt::format("invalid signature: {}", e.what())),
user, req.get_client_address());
return std::string();
throw api_error::invalid_signature(e.what());
}
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::unrecognized_client("wrong signature"),
user, req.get_client_address());
return std::string();
throw api_error::unrecognized_client("The security token included in the request is invalid.");
}
return user;
});
@@ -463,82 +373,35 @@ static tracing::trace_state_ptr create_tracing_session(tracing::tracing& tracing
return tracing_instance.create_session(tracing::trace_type::QUERY, props);
}
// A helper class to represent a potentially truncated view of a chunked_content.
// If the content is short enough and single chunked, it just holds a view into the content.
// Otherwise it will be copied into an internal buffer, possibly truncated (depending on maximum allowed size passed in),
// and the view will point into that buffer.
// `as_view()` method will return the view.
// `take_as_sstring()` will either move out the internal buffer (if any), or create a new sstring from the view.
// You should consider `as_view()` valid as long both the original chunked_content and the truncated_content object are alive.
class truncated_content {
std::string_view _view;
sstring _content_maybe;
void copy_from_content(const chunked_content& content) {
size_t offset = 0;
for(auto &tmp : content) {
size_t to_copy = std::min(tmp.size(), _content_maybe.size() - offset);
std::copy(tmp.get(), tmp.get() + to_copy, _content_maybe.data() + offset);
offset += to_copy;
if (offset >= _content_maybe.size()) {
break;
}
}
// truncated_content_view() prints a potentially long chunked_content for
// debugging purposes. In the common case when the content is not excessively
// long, it just returns a view into the given content, without any copying.
// But when the content is very long, it is truncated after some arbitrary
// max_len (or one chunk, whichever comes first), with "<truncated>" added at
// the end. To do this modification to the string, we need to create a new
// std::string, so the caller must pass us a reference to one, "buf", where
// we can store the content. The returned view is only alive for as long this
// buf is kept alive.
static std::string_view truncated_content_view(const chunked_content& content, std::string& buf) {
constexpr size_t max_len = 1024;
if (content.empty()) {
return std::string_view();
} else if (content.size() == 1 && content.begin()->size() <= max_len) {
return std::string_view(content.begin()->get(), content.begin()->size());
} else {
buf = std::string(content.begin()->get(), std::min(content.begin()->size(), max_len)) + "<truncated>";
return std::string_view(buf);
}
public:
truncated_content(const chunked_content& content, size_t max_len = std::numeric_limits<size_t>::max()) {
if (content.empty()) return;
if (content.size() == 1 && content.begin()->size() <= max_len) {
_view = std::string_view(content.begin()->get(), content.begin()->size());
return;
}
constexpr std::string_view truncated_text = "<truncated>";
size_t content_size = 0;
for(auto &tmp : content) {
content_size += tmp.size();
}
if (content_size <= max_len) {
_content_maybe = sstring{ sstring::initialized_later{}, content_size };
copy_from_content(content);
}
else {
_content_maybe = sstring{ sstring::initialized_later{}, max_len + truncated_text.size() };
copy_from_content(content);
std::copy(truncated_text.begin(), truncated_text.end(), _content_maybe.data() + _content_maybe.size() - truncated_text.size());
}
_view = std::string_view(_content_maybe);
}
std::string_view as_view() const { return _view; }
sstring take_as_sstring() && {
if (_content_maybe.empty() && !_view.empty()) {
return sstring{_view};
}
return std::move(_content_maybe);
}
};
// `truncated_content_view` will produce an object representing a view to a passed content
// possibly truncated at some length. The value returned is used in two ways:
// - to print it in logs (use `as_view()` method for this)
// - to pass it to tracing object, where it will be stored and used later
// (use `take_as_sstring()` method as this produces a copy in form of a sstring)
// `truncated_content` delays constructing `sstring` object until it's actually needed.
// `truncated_content` is valid as long as passed `content` is alive.
// if the content is truncated, `<truncated>` will be appended at the maximum size limit
// and total size will be `max_users_query_size_in_trace_output() + strlen("<truncated>")`.
static truncated_content truncated_content_view(const chunked_content& content, size_t max_size) {
return truncated_content{content, max_size};
}
static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_state, std::string_view username, std::string_view op, const chunked_content& query, size_t max_users_query_size_in_trace_output) {
static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_state, std::string_view username, std::string_view op, const chunked_content& query) {
tracing::trace_state_ptr trace_state;
tracing::tracing& tracing_instance = tracing::tracing::get_local_tracing_instance();
if (tracing_instance.trace_next_query() || tracing_instance.slow_query_tracing_enabled()) {
trace_state = create_tracing_session(tracing_instance);
std::string buf;
tracing::add_session_param(trace_state, "alternator_op", op);
tracing::add_query(trace_state, truncated_content_view(query, max_users_query_size_in_trace_output).take_as_sstring());
tracing::add_query(trace_state, truncated_content_view(query, buf));
tracing::begin(trace_state, seastar::format("Alternator {}", op), client_state.get_client_address());
if (!username.empty()) {
tracing::set_username(trace_state, auth::authenticated_user(username));
@@ -547,265 +410,62 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_
return trace_state;
}
// This read_entire_stream() is similar to Seastar's read_entire_stream()
// which reads the given content_stream until its end into non-contiguous
// memory. The difference is that this implementation takes an extra length
// limit, and throws an error if we read more than this limit.
// This length-limited variant would not have been needed if Seastar's HTTP
// server's set_content_length_limit() worked in every case, but unfortunately
// it does not - it only works if the request has a Content-Length header (see
// issue #8196). In contrast this function can limit the request's length no
// matter how it's encoded. We need this limit to protect Alternator from
// oversized requests that can deplete memory.
static future<chunked_content>
read_entire_stream(input_stream<char>& inp, size_t length_limit) {
chunked_content ret;
// We try to read length_limit + 1 bytes, so that we can throw an
// exception if we managed to read more than length_limit.
ssize_t remain = length_limit + 1;
do {
temporary_buffer<char> buf = co_await inp.read_up_to(remain);
if (buf.empty()) {
break;
}
remain -= buf.size();
ret.push_back(std::move(buf));
} while (remain > 0);
// If we read the full length_limit + 1 bytes, we went over the limit:
if (remain <= 0) {
// By throwing here an error, we may send a reply (the error message)
// without having read the full request body. Seastar's httpd will
// realize that we have not read the entire content stream, and
// correctly mark the connection unreusable, i.e., close it.
// This means we are currently exposed to issue #12166 caused by
// Seastar issue 1325), where the client may get an RST instead of
// a FIN, and may rarely get a "Connection reset by peer" before
// reading the error we send.
throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
}
co_return ret;
}
// safe_gzip_stream is an exception-safe wrapper for zlib's z_stream.
// The "z_stream" struct is used by zlib to hold state while decompressing a
// stream of data. It allocates memory which must be freed with inflateEnd(),
// which the destructor of this class does.
class safe_gzip_zstream {
z_stream _zs;
public:
// If gzip is true, decode a gzip header (for "Content-Encoding: gzip").
// Otherwise, a zlib header (for "Content-Encoding: deflate").
safe_gzip_zstream(bool gzip = true) {
memset(&_zs, 0, sizeof(_zs));
if (inflateInit2(&_zs, gzip ? 16 + MAX_WBITS : MAX_WBITS) != Z_OK) {
// Should only happen if memory allocation fails
throw std::bad_alloc();
}
}
~safe_gzip_zstream() {
inflateEnd(&_zs);
}
z_stream* operator->() {
return &_zs;
}
z_stream* get() {
return &_zs;
}
void reset() {
inflateReset(&_zs);
}
};
// ungzip() takes a chunked_content of a compressed request body, and returns
// the uncompressed content as a chunked_content. If gzip is true, we expect
// gzip header (for "Content-Encoding: gzip"), if gzip is false, we expect a
// zlib header (for "Content-Encoding: deflate").
// If the uncompressed content exceeds length_limit, an error is thrown.
static future<chunked_content>
ungzip(chunked_content&& compressed_body, size_t length_limit, bool gzip = true) {
chunked_content ret;
// output_buf can be any size - when uncompressing input_buf, it doesn't
// need to fit in a single output_buf, we'll use multiple output_buf for
// a single input_buf if needed.
constexpr size_t OUTPUT_BUF_SIZE = 4096;
temporary_buffer<char> output_buf;
safe_gzip_zstream strm(gzip);
bool complete_stream = false; // empty input is not a valid gzip/deflate
size_t total_out_bytes = 0;
for (const temporary_buffer<char>& input_buf : compressed_body) {
if (input_buf.empty()) {
continue;
}
complete_stream = false;
strm->next_in = (Bytef*) input_buf.get();
strm->avail_in = (uInt) input_buf.size();
do {
co_await coroutine::maybe_yield();
if (output_buf.empty()) {
output_buf = temporary_buffer<char>(OUTPUT_BUF_SIZE);
}
strm->next_out = (Bytef*) output_buf.get();
strm->avail_out = OUTPUT_BUF_SIZE;
int e = inflate(strm.get(), Z_NO_FLUSH);
size_t out_bytes = OUTPUT_BUF_SIZE - strm->avail_out;
if (out_bytes > 0) {
// If output_buf is nearly full, we save it as-is in ret. But
// if it only has little data, better copy to a small buffer.
if (out_bytes > OUTPUT_BUF_SIZE/2) {
ret.push_back(std::move(output_buf).prefix(out_bytes));
// output_buf is now empty. if this loop finds more input,
// we'll allocate a new output buffer.
} else {
ret.push_back(temporary_buffer<char>(output_buf.get(), out_bytes));
}
total_out_bytes += out_bytes;
if (total_out_bytes > length_limit) {
throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
}
}
if (e == Z_STREAM_END) {
// There may be more input after the first gzip stream - in
// either this input_buf or the next one. The additional input
// should be a second concatenated gzip. We need to allow that
// by resetting the gzip stream and continuing the input loop
// until there's no more input.
strm.reset();
if (strm->avail_in == 0) {
complete_stream = true;
break;
}
} else if (e != Z_OK && e != Z_BUF_ERROR) {
// DynamoDB returns an InternalServerError when given a bad
// gzip request body. See test test_broken_gzip_content
throw api_error::internal("Error during gzip decompression of request body");
}
} while (strm->avail_in > 0 || strm->avail_out == 0);
}
if (!complete_stream) {
// The gzip stream was not properly finished with Z_STREAM_END
throw api_error::internal("Truncated gzip in request body");
}
co_return ret;
}
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
_executor._stats.total_operations++;
sstring target = req->get_header("X-Amz-Target");
// target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)
auto dot = target.find('.');
std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);
if (req->content_length > request_content_length_limit) {
// If we have a Content-Length header and know the request will be too
// long, we don't need to wait for read_entire_stream() below to
// discover it. And we definitely mustn't try to get_units() below for
// for such a size.
co_return api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", request_content_length_limit));
}
// Check the concurrency limit early, before acquiring memory and
// reading the request body, to avoid piling up memory from excess
// requests that will be rejected anyway. This mirrors the CQL
// transport which also checks concurrency before memory acquisition
// (transport/server.cc).
if (_pending_requests.get_count() >= _max_concurrent_requests) {
_executor._stats.requests_shed++;
co_return api_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}", _pending_requests.get_count()));
}
_pending_requests.enter();
auto leave = defer([this] () noexcept { _pending_requests.leave(); });
// JSON parsing can allocate up to roughly 2x the size of the raw
// document, + a couple of bytes for maintenance.
// If the Content-Length of the request is not available, we assume
// the largest possible request (request_content_length_limit, i.e., 16 MB)
// and after reading the request we return_units() the excess.
size_t mem_estimate = (req->content_length ? req->content_length : request_content_length_limit) * 2 + 8000;
// TODO: consider the case where req->content_length is missing. Maybe
// we need to take the content_length_limit and return some of the units
// when we finish read_content_and_verify_signature?
size_t mem_estimate = req->content_length * 2 + 8000;
auto units_fut = get_units(*_memory_limiter, mem_estimate);
if (_memory_limiter->waiters()) {
++_executor._stats.requests_blocked_memory;
}
auto units = co_await std::move(units_fut);
throwing_assert(req->content_stream);
chunked_content content = co_await read_entire_stream(*req->content_stream, request_content_length_limit);
// If the request had no Content-Length, we reserved too many units
// so need to return some
if (req->content_length == 0) {
size_t content_length = 0;
for (const auto& chunk : content) {
content_length += chunk.size();
}
size_t new_mem_estimate = content_length * 2 + 8000;
units.return_units(mem_estimate - new_mem_estimate);
}
SCYLLA_ASSERT(req->content_stream);
chunked_content content = co_await util::read_entire_stream(*req->content_stream);
auto username = co_await verify_signature(*req, content);
// If the request is compressed, uncompress it now, after we checked
// the signature (the signature is computed on the compressed content).
// We apply the request_content_length_limit again to the uncompressed
// content - we don't want to allow a tiny compressed request to
// expand to a huge uncompressed request.
sstring content_encoding = req->get_header("Content-Encoding");
if (content_encoding == "gzip") {
content = co_await ungzip(std::move(content), request_content_length_limit);
} else if (content_encoding == "deflate") {
content = co_await ungzip(std::move(content), request_content_length_limit, false);
} else if (!content_encoding.empty()) {
// DynamoDB returns a 500 error for unsupported Content-Encoding.
// I'm not sure if this is the best error code, but let's do it too.
// See the test test_garbage_content_encoding confirming this case.
co_return api_error::internal("Unsupported Content-Encoding");
}
// As long as the system_clients_entry object is alive, this request will
// be visible in the "system.clients" virtual table. When requested, this
// entry will be formatted by server::ongoing_request::make_client_data().
auto user_agent_header = co_await _connection_options_keys_and_values.get_or_load(req->get_header("User-Agent"), [] (const client_options_cache_key_type&) {
return make_ready_future<options_cache_value_type>(options_cache_value_type{});
});
auto system_clients_entry = _ongoing_requests.emplace(
req->get_client_address(), std::move(user_agent_header),
username, current_scheduling_group(),
req->get_protocol_name() == "https");
if (slogger.is_enabled(log_level::trace)) {
slogger.trace("Request: {} {} {}", op, truncated_content_view(content, _max_users_query_size_in_trace_output).as_view(), req->_headers);
std::string buf;
slogger.trace("Request: {} {} {}", op, truncated_content_view(content, buf), req->_headers);
}
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
_executor._stats.unsupported_operations++;
co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));
}
if (_pending_requests.get_count() >= _max_concurrent_requests) {
_executor._stats.requests_shed++;
co_return api_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}", _pending_requests.get_count()));
}
_pending_requests.enter();
auto leave = defer([this] () noexcept { _pending_requests.leave(); });
executor::client_state client_state(service::client_state::external_tag(),
_auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());
if (!username.empty()) {
client_state.set_login(auth::authenticated_user(username));
}
client_state.maybe_update_per_service_level_params();
co_await client_state.maybe_update_per_service_level_params();
tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content, _max_users_query_size_in_trace_output.get());
tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);
tracing::trace(trace_state, "{}", op);
auto user = client_state.user();
auto f = [this, content = std::move(content), &callback = callback_it->second,
client_state = std::move(client_state), trace_state = std::move(trace_state),
units = std::move(units), req = std::move(req)] () mutable -> future<executor::request_return_type> {
rjson::value json_request = co_await _json_parser.parse(std::move(content));
if (!json_request.IsObject()) {
co_return api_error::validation("Request content must be an object");
}
std::unique_ptr<audit::audit_info_alternator> audit_info;
std::exception_ptr ex = {};
executor::request_return_type ret;
try {
ret = co_await callback(_executor, client_state, trace_state, make_service_permit(std::move(units)), std::move(json_request), std::move(req), audit_info);
} catch (...) {
ex = std::current_exception();
}
if (audit_info) {
co_await audit::inspect(*audit_info, client_state, ex != nullptr);
}
if (ex) {
co_return coroutine::exception(std::move(ex));
}
co_return ret;
rjson::value json_request = co_await _json_parser.parse(std::move(content));
if (!json_request.IsObject()) {
co_return api_error::validation("Request content must be an object");
}
co_return co_await callback(_executor, client_state, trace_state,
make_service_permit(std::move(units)), std::move(json_request), std::move(req));
};
co_return co_await _sl_controller.with_user_service_level(user, std::ref(f));
}
@@ -813,7 +473,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
void server::set_routes(routes& r) {
api_handler* req_handler = new api_handler([this] (std::unique_ptr<request> req) mutable {
return handle_api_request(std::move(req));
}, _proxy.data_dictionary().get_config());
});
r.put(operation_type::POST, "/", req_handler);
r.put(operation_type::GET, "/", new health_handler(_pending_requests));
@@ -844,119 +504,108 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
, _auth_service(auth_service)
, _sl_controller(sl_controller)
, _key_cache(1024, 1min, slogger)
, _max_users_query_size_in_trace_output(1024)
, _enforce_authorization(false)
, _enabled_servers{}
, _pending_requests("alternator::server::pending_requests")
, _timeout_config(_proxy.data_dictionary().get_config())
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.list_tables(client_state, std::move(permit), std::move(json_request), audit_info);
{"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_tables(client_state, std::move(permit), std::move(json_request));
}},
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"), audit_info);
{"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"));
}},
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.tag_resource(client_state, std::move(permit), std::move(json_request), audit_info);
{"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.tag_resource(client_state, std::move(permit), std::move(json_request));
}},
{"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.untag_resource(client_state, std::move(permit), std::move(json_request), audit_info);
{"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.untag_resource(client_state, std::move(permit), std::move(json_request));
}},
{"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request), audit_info);
{"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request));
}},
{"UpdateTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.update_time_to_live(client_state, std::move(permit), std::move(json_request), audit_info);
{"UpdateTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.update_time_to_live(client_state, std::move(permit), std::move(json_request));
}},
{"DescribeTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.describe_time_to_live(client_state, std::move(permit), std::move(json_request), audit_info);
{"DescribeTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_time_to_live(client_state, std::move(permit), std::move(json_request));
}},
{"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.list_streams(client_state, std::move(permit), std::move(json_request), audit_info);
{"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.list_streams(client_state, std::move(permit), std::move(json_request));
}},
{"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.describe_stream(client_state, std::move(permit), std::move(json_request), audit_info);
{"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_stream(client_state, std::move(permit), std::move(json_request));
}},
{"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request), audit_info);
{"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request));
}},
{"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);
{"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request));
}},
{"DescribeContinuousBackups", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
return e.describe_continuous_backups(client_state, std::move(permit), std::move(json_request), audit_info);
{"DescribeContinuousBackups", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
return e.describe_continuous_backups(client_state, std::move(permit), std::move(json_request));
}},
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port,
std::optional<uint16_t> port_proxy_protocol, std::optional<uint16_t> https_port_proxy_protocol,
std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {
_memory_limiter = memory_limiter;
_enforce_authorization = std::move(enforce_authorization);
_warn_authorization = std::move(warn_authorization);
_max_concurrent_requests = std::move(max_concurrent_requests);
_max_users_query_size_in_trace_output = std::move(max_users_query_size_in_trace_output);
if (!port && !https_port && !port_proxy_protocol && !https_port_proxy_protocol) {
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, port_proxy_protocol, https_port_proxy_protocol, creds] {
return seastar::async([this, addr, port, https_port, creds] {
_executor.start().get();
if (port || port_proxy_protocol) {
if (port) {
set_routes(_http_server._routes);
_http_server.set_content_length_limit(server::content_length_limit);
_http_server.set_content_streaming(true);
if (port) {
_http_server.listen(socket_address{addr, *port}).get();
}
if (port_proxy_protocol) {
listen_options lo;
lo.reuse_address = true;
lo.proxy_protocol = true;
_http_server.listen(socket_address{addr, *port_proxy_protocol}, lo).get();
}
_http_server.listen(socket_address{addr, *port}).get();
_enabled_servers.push_back(std::ref(_http_server));
}
if (https_port || https_port_proxy_protocol) {
if (https_port) {
set_routes(_https_server._routes);
_https_server.set_content_length_limit(server::content_length_limit);
_https_server.set_content_streaming(true);
if (this_shard_id() == 0) {
@@ -975,15 +624,7 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
} else {
_credentials = creds->build_server_credentials();
}
if (https_port) {
_https_server.listen(socket_address{addr, *https_port}, _credentials).get();
}
if (https_port_proxy_protocol) {
listen_options lo;
lo.reuse_address = true;
lo.proxy_protocol = true;
_https_server.listen(socket_address{addr, *https_port_proxy_protocol}, lo, _credentials).get();
}
_https_server.listen(socket_address{addr, *https_port}, _credentials).get();
_enabled_servers.push_back(std::ref(_https_server));
}
});
@@ -1039,36 +680,6 @@ future<> server::json_parser::stop() {
return std::move(_run_parse_json_thread);
}
// Convert an entry in the server's list of ongoing Alternator requests
// (_ongoing_requests) into a client_data object. This client_data object
// will then be used to produce a row for the "system.clients" virtual table.
client_data server::ongoing_request::make_client_data() const {
client_data cd;
cd.ct = client_type::alternator;
cd.ip = _client_address.addr();
cd.port = _client_address.port();
cd.shard_id = this_shard_id();
cd.connection_stage = client_connection_stage::established;
cd.username = _username;
cd.scheduling_group_name = _scheduling_group.name();
cd.ssl_enabled = _is_https;
// For now, we save the full User-Agent header as the "driver name"
// and keep "driver_version" unset.
cd.driver_name = _user_agent;
// Leave "protocol_version" unset, it has no meaning in Alternator.
// Leave "hostname", "ssl_protocol" and "ssl_cipher_suite" unset for Alternator.
// Note: CQL sets ssl_protocol and ssl_cipher_suite via generic_server::connection base class.
return cd;
}
future<utils::chunked_vector<foreign_ptr<std::unique_ptr<client_data>>>> server::get_client_data() {
utils::chunked_vector<foreign_ptr<std::unique_ptr<client_data>>> ret;
co_await _ongoing_requests.for_each_gently([&ret] (const ongoing_request& r) {
ret.emplace_back(make_foreign(std::make_unique<client_data>(r.make_client_data())));
});
co_return ret;
}
const char* api_error::what() const noexcept {
if (_what_string.empty()) {
_what_string = fmt::format("{} {}: {}", std::to_underlying(_http_code), _type, _msg);

View File

@@ -3,13 +3,12 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include "alternator/executor.hh"
#include "utils/scoped_item_list.hh"
#include <seastar/core/future.hh>
#include <seastar/core/condition-variable.hh>
#include <seastar/http/httpd.hh>
@@ -21,20 +20,14 @@
#include "utils/updateable_value.hh"
#include <seastar/core/units.hh>
struct client_data;
namespace alternator {
using chunked_content = rjson::chunked_content;
class server : public peering_sharded_service<server> {
// The maximum size of a request body that Alternator will accept,
// in bytes. This is a safety measure to prevent Alternator from
// running out of memory when a client sends a very large request.
// DynamoDB also has the same limit set to 16 MB.
static constexpr size_t request_content_length_limit = 16*MB;
static constexpr size_t content_length_limit = 16*MB;
using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>, std::unique_ptr<audit::audit_info_alternator>&)>;
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
httpd::http_server _http_server;
@@ -47,15 +40,12 @@ class server : public peering_sharded_service<server> {
key_cache _key_cache;
utils::updateable_value<bool> _enforce_authorization;
utils::updateable_value<bool> _warn_authorization;
utils::updateable_value<uint64_t> _max_users_query_size_in_trace_output;
utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;
named_gate _pending_requests;
// In some places we will need a CQL updateable_timeout_config object even
// though it isn't really relevant for Alternator which defines its own
// timeouts separately. We can create this object only once.
updateable_timeout_config _timeout_config;
client_options_cache_type _connection_options_keys_and_values;
alternator_callbacks_map _callbacks;
@@ -84,33 +74,12 @@ class server : public peering_sharded_service<server> {
};
json_parser _json_parser;
// The server maintains a list of ongoing requests, that are being handled
// by handle_api_request(). It uses this list in get_client_data(), which
// is called when reading the "system.clients" virtual table.
struct ongoing_request {
socket_address _client_address;
client_options_cache_entry_type _user_agent;
sstring _username;
scheduling_group _scheduling_group;
bool _is_https;
client_data make_client_data() const;
};
utils::scoped_item_list<ongoing_request> _ongoing_requests;
public:
server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port,
std::optional<uint16_t> port_proxy_protocol, std::optional<uint16_t> https_port_proxy_protocol,
std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);
future<> stop();
// get_client_data() is called (on each shard separately) when the virtual
// table "system.clients" is read. It is expected to generate a list of
// clients connected to this server (on this shard). This function is
// called by alternator::controller::get_client_data().
future<utils::chunked_vector<foreign_ptr<std::unique_ptr<client_data>>>> get_client_data();
private:
void set_routes(seastar::httpd::routes& r);
// If verification succeeds, returns the authenticated user's username

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "stats.hh"
@@ -14,6 +14,20 @@
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
static seastar::metrics::histogram estimated_histogram_to_metrics(const utils::estimated_histogram& histogram) {
seastar::metrics::histogram res;
res.buckets.resize(histogram.bucket_offsets.size());
uint64_t cumulative_count = 0;
res.sample_count = histogram._count;
res.sample_sum = histogram._sample_sum;
for (size_t i = 0; i < res.buckets.size(); i++) {
auto& v = res.buckets[i];
v.upper_bound = histogram.bucket_offsets[i];
cumulative_count += histogram.buckets[i];
v.count = cumulative_count;
}
return res;
}
static seastar::metrics::label column_family_label("cf");
static seastar::metrics::label keyspace_label("ks");
@@ -137,53 +151,10 @@ static void register_metrics_with_optional_table(seastar::metrics::metric_groups
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"), labels,
stats.api_operations.batch_get_item_batch_total)(op("BatchGetItem")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,
[&stats]{ return to_metrics_histogram(stats.api_operations.batch_get_item_histogram);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
[&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_get_item_histogram);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,
[&stats]{ return to_metrics_histogram(stats.api_operations.batch_write_item_histogram);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return to_metrics_histogram(stats.operation_sizes.get_item_op_size_kb);})(op("GetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return to_metrics_histogram(stats.operation_sizes.put_item_op_size_kb);})(op("PutItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return to_metrics_histogram(stats.operation_sizes.delete_item_op_size_kb);})(op("DeleteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return to_metrics_histogram(stats.operation_sizes.update_item_op_size_kb);})(op("UpdateItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return to_metrics_histogram(stats.operation_sizes.batch_get_item_op_size_kb);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return to_metrics_histogram(stats.operation_sizes.batch_write_item_op_size_kb);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
[&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_write_item_histogram);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
});
seastar::metrics::label expression_label("expression");
metrics.add_group(group_name, {
seastar::metrics::make_total_operations("expression_cache_evictions", stats.expression_cache.evictions,
seastar::metrics::description("Counts number of entries evicted from expressions cache"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_hits", stats.expression_cache.requests[stats::expression_types::UPDATE_EXPRESSION].hits,
seastar::metrics::description("Counts number of hits of cached expressions"), labels)(expression_label("UpdateExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_misses", stats.expression_cache.requests[stats::expression_types::UPDATE_EXPRESSION].misses,
seastar::metrics::description("Counts number of misses of cached expressions"), labels)(expression_label("UpdateExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_hits", stats.expression_cache.requests[stats::expression_types::CONDITION_EXPRESSION].hits,
seastar::metrics::description("Counts number of hits of cached expressions"), labels)(expression_label("ConditionExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_misses", stats.expression_cache.requests[stats::expression_types::CONDITION_EXPRESSION].misses,
seastar::metrics::description("Counts number of misses of cached expressions"), labels)(expression_label("ConditionExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_hits", stats.expression_cache.requests[stats::expression_types::PROJECTION_EXPRESSION].hits,
seastar::metrics::description("Counts number of hits of cached expressions"), labels)(expression_label("ProjectionExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_misses", stats.expression_cache.requests[stats::expression_types::PROJECTION_EXPRESSION].misses,
seastar::metrics::description("Counts number of misses of cached expressions"), labels)(expression_label("ProjectionExpression")).aggregate(aggregate_labels).set_skip_when_empty()
});
// Only register the following metrics for the global metrics, not per-table
if (!has_table) {
metrics.add_group("alternator", {
seastar::metrics::make_counter("authentication_failures", stats.authentication_failures,
seastar::metrics::description("total number of authentication failures"), labels).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_counter("authorization_failures", stats.authorization_failures,
seastar::metrics::description("total number of authorization failures"), labels).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
});
}
}
void register_metrics(seastar::metrics::metric_groups& metrics, const stats& stats) {

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -16,8 +16,6 @@
#include "cql3/stats.hh"
namespace alternator {
using batch_histogram = utils::estimated_histogram_with_max<128>;
using op_size_histogram = utils::estimated_histogram_with_max<512>;
// Object holding per-shard statistics related to Alternator.
// While this object is alive, these metrics are also registered to be
@@ -78,46 +76,9 @@ public:
utils::timed_rate_moving_average_summary_and_histogram batch_get_item_latency;
utils::timed_rate_moving_average_summary_and_histogram get_records_latency;
batch_histogram batch_get_item_histogram;
batch_histogram batch_write_item_histogram;
utils::estimated_histogram batch_get_item_histogram{22}; // a histogram that covers the range 1 - 100
utils::estimated_histogram batch_write_item_histogram{22}; // a histogram that covers the range 1 - 100
} api_operations;
// Operation size metrics
struct {
// Item size statistics collected per table and aggregated per node.
// Each histogram covers the range 0 - 512. Resolves #25143.
// A size is the retrieved item's size.
op_size_histogram get_item_op_size_kb;
// A size is the maximum of the new item's size and the old item's size.
op_size_histogram put_item_op_size_kb;
// A size is the deleted item's size. If the deleted item's size is
// unknown (i.e. read-before-write wasn't necessary and it wasn't
// forced by a configuration option), it won't be recorded on the
// histogram.
op_size_histogram delete_item_op_size_kb;
// A size is the maximum of existing item's size and the estimated size
// of the update. This will be changed to the maximum of the existing item's
// size and the new item's size in a subsequent PR.
op_size_histogram update_item_op_size_kb;
// A size is the sum of the sizes of all items per table. This means
// that a single BatchGetItem / BatchWriteItem updates the histogram
// for each table that it has items in.
// The sizes are the retrieved items' sizes grouped per table.
op_size_histogram batch_get_item_op_size_kb;
// The sizes are the the written items' sizes grouped per table.
op_size_histogram batch_write_item_op_size_kb;
} operation_sizes;
// Count of authentication and authorization failures, counted if either
// alternator_enforce_authorization or alternator_warn_authorization are
// set to true. If both are false, no authentication or authorization
// checks are performed, so failures are not recognized or counted.
// "authentication" failure means the request was not signed with a valid
// user and key combination. "authorization" failure means the request was
// authenticated to a valid user - but this user did not have permissions
// to perform the operation (considering RBAC settings and the user's
// superuser status).
uint64_t authentication_failures = 0;
uint64_t authorization_failures = 0;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
@@ -140,22 +101,6 @@ public:
uint64_t wcu_total[NUM_TYPES] = {0};
// CQL-derived stats
cql3::cql_stats cql_stats;
// Enumeration of expression types only for stats
// if needed it can be extended e.g. per operation
enum expression_types {
UPDATE_EXPRESSION,
CONDITION_EXPRESSION,
PROJECTION_EXPRESSION,
NUM_EXPRESSION_TYPES
};
struct {
struct {
uint64_t hits = 0;
uint64_t misses = 0;
} requests[NUM_EXPRESSION_TYPES];
uint64_t evictions = 0;
} expression_cache;
};
struct table_stats {
@@ -165,8 +110,4 @@ struct table_stats {
};
void register_metrics(seastar::metrics::metric_groups& metrics, const stats& stats);
inline uint64_t bytes_to_kb_ceil(uint64_t bytes) {
return (bytes) / 1024;
}
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,62 +0,0 @@
/*
* Copyright 2026-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#pragma once
#include "utils/chunked_vector.hh"
#include "cdc/generation.hh"
#include <generator>
namespace cdc {
class stream_id;
}
namespace alternator {
class stream_id_range {
// helper class for manipulating (possibly wrapped around) range of stream_ids
// it holds one or two ranges [lo1, end1) and [lo2, end2)
// if the range doesn't wrap around, then lo2 == end2 == items.end()
// if the range wraps around, then
// `lo1 == items.begin() and end2 == items.end()` must be true
// the object doesn't own `items`, but it does manipulate it - it will
// reorder elements (so both ranges were next to each other) and sort them by unsigned comparison
// usage - create an object with needed ranges. before iteration call `prepare_for_iterating` method -
// it will reorder elements of `items` array to what is needed and then call begin / end pair.
// note - `items` array will be modified - elements will be reordered, but no elements will be added or removed.
// `items` array must stay intact as long as iteration is in progress.
utils::chunked_vector<cdc::stream_id>::iterator _lo1 = {}, _end1 = {}, _lo2 = {}, _end2 = {};
const cdc::stream_id* _skip_to = nullptr;
bool _prepared = false;
public:
stream_id_range(
utils::chunked_vector<cdc::stream_id> &items,
utils::chunked_vector<cdc::stream_id>::iterator lo1,
utils::chunked_vector<cdc::stream_id>::iterator end1);
stream_id_range(
utils::chunked_vector<cdc::stream_id> &items,
utils::chunked_vector<cdc::stream_id>::iterator lo1,
utils::chunked_vector<cdc::stream_id>::iterator end1,
utils::chunked_vector<cdc::stream_id>::iterator lo2,
utils::chunked_vector<cdc::stream_id>::iterator end2);
void set_starting_position(const cdc::stream_id &update_to);
// Must be called after construction and after set_starting_position()
// (if used), but before begin()/end() iteration.
void prepare_for_iterating();
utils::chunked_vector<cdc::stream_id>::iterator begin() const { return _lo1; }
utils::chunked_vector<cdc::stream_id>::iterator end() const { return _end1; }
};
stream_id_range find_children_range_from_parent_token(
const utils::chunked_vector<cdc::stream_id>& parent_streams,
utils::chunked_vector<cdc::stream_id>& current_streams,
cdc::stream_id parent,
bool uses_tablets
);
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include <chrono>
@@ -17,7 +17,6 @@
#include <seastar/core/lowres_clock.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include "cdc/log.hh"
#include "exceptions/exceptions.hh"
#include "gms/gossiper.hh"
#include "gms/inet_address.hh"
@@ -28,7 +27,7 @@
#include "replica/database.hh"
#include "service/client_state.hh"
#include "service_permit.hh"
#include "mutation/timestamp.hh"
#include "timestamp.hh"
#include "service/storage_proxy.hh"
#include "service/pager/paging_state.hh"
#include "service/pager/query_pagers.hh"
@@ -44,10 +43,8 @@
#include "cql3/query_options.hh"
#include "cql3/column_identifier.hh"
#include "alternator/executor.hh"
#include "alternator/executor_util.hh"
#include "alternator/controller.hh"
#include "alternator/serialization.hh"
#include "alternator/ttl_tag.hh"
#include "dht/sharder.hh"
#include "db/config.hh"
#include "db/tags/utils.hh"
@@ -59,17 +56,22 @@ static logging::logger tlogger("alternator_ttl");
namespace alternator {
future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
// We write the expiration-time attribute enabled on a table using a
// tag TTL_TAG_KEY.
// Currently, the *value* of this tag is simply the name of the attribute,
// and the expiration scanner interprets it as an Alternator attribute name -
// It can refer to a real column or if that doesn't exist, to a member of
// the ":attrs" map column. Although this is designed for Alternator, it may
// be good enough for CQL as well (there, the ":attrs" column won't exist).
static const sstring TTL_TAG_KEY("system:ttl_attribute");
future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {
_stats.api_operations.update_time_to_live++;
if (!_proxy.features().alternator_ttl) {
co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Upgrade all nodes to a version that supports it.");
if (!_proxy.data_dictionary().features().alternator_ttl) {
co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Experimental support is available if the 'alternator-ttl' experimental feature is enabled on all nodes.");
}
schema_ptr schema = get_table(_proxy, request);
maybe_audit(audit_info, audit::statement_category::DDL,
schema->ks_name(), schema->cf_name(), "UpdateTimeToLive", request);
rjson::value* spec = rjson::find(request, "TimeToLiveSpecification");
if (!spec || !spec->IsObject()) {
co_return api_error::validation("UpdateTimeToLive missing mandatory TimeToLiveSpecification");
@@ -90,9 +92,9 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
if (v->GetStringLength() < 1 || v->GetStringLength() > 255) {
co_return api_error::validation("The length of AttributeName must be between 1 and 255");
}
sstring attribute_name = rjson::to_sstring(*v);
sstring attribute_name(v->GetString(), v->GetStringLength());
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, schema, auth::permission::ALTER, _stats);
co_await verify_permission(_enforce_authorization, client_state, schema, auth::permission::ALTER);
co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {
if (enabled) {
if (tags_map.contains(TTL_TAG_KEY)) {
@@ -119,13 +121,9 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
co_return rjson::print(std::move(response));
}
future<executor::request_return_type> executor::describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info) {
future<executor::request_return_type> executor::describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {
_stats.api_operations.describe_time_to_live++;
schema_ptr schema = get_table(_proxy, request);
maybe_audit(audit_info, audit::statement_category::QUERY,
schema->ks_name(), schema->cf_name(), "DescribeTimeToLive", request);
std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);
rjson::value desc = rjson::empty_object();
auto i = tags_map.find(TTL_TAG_KEY);
@@ -142,7 +140,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta
// expiration_service is a sharded service responsible for cleaning up expired
// items in all tables with per-item expiration enabled. Currently, this means
// Alternator tables with TTL configured via an UpdateTimeToLive request.
// Alternator tables with TTL configured via a UpdateTimeToLive request.
//
// Here is a brief overview of how the expiration service works:
//
@@ -288,18 +286,13 @@ static future<> expire_item(service::storage_proxy& proxy,
auto ck = clustering_key::from_exploded(exploded_ck);
m.partition().clustered_row(*schema, ck).apply(tombstone(ts, gc_clock::now()));
}
utils::chunked_vector<mutation> mutations;
std::vector<mutation> mutations;
mutations.push_back(std::move(m));
return proxy.mutate(std::move(mutations),
db::consistency_level::LOCAL_QUORUM,
executor::default_timeout(), // FIXME - which timeout?
qs.get_trace_state(), qs.get_permit(),
db::allow_per_partition_rate_limit::no,
false,
cdc::per_request_options{
.is_system_originated = true,
}
);
db::allow_per_partition_rate_limit::no);
}
static size_t random_offset(size_t min, size_t max) {
@@ -320,16 +313,17 @@ static size_t random_offset(size_t min, size_t max) {
//
// The function is to be used with vnodes only
static future<std::vector<std::pair<dht::token_range, locator::host_id>>> get_secondary_ranges(
const locator::effective_replication_map* erm,
const locator::effective_replication_map_ptr& erm,
locator::host_id ep) {
const auto& tm = *erm->get_token_metadata_ptr();
const auto& sorted_tokens = tm.sorted_tokens();
std::vector<std::pair<dht::token_range, locator::host_id>> ret;
throwing_assert(!sorted_tokens.empty());
if (sorted_tokens.empty()) {
on_internal_error(tlogger, "Token metadata is empty");
}
auto prev_tok = sorted_tokens.back();
for (const auto& tok : sorted_tokens) {
co_await coroutine::maybe_yield();
// FIXME: pass is_vnode=true to get_natural_replicas since the token is in tm.sorted_tokens()
host_id_vector_replica_set eps = erm->get_natural_replicas(tok);
if (eps.size() <= 1 || eps[1] != ep) {
prev_tok = tok;
@@ -399,7 +393,7 @@ class ranges_holder_primary {
dht::token_range_vector _token_ranges;
public:
explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}
static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map* erm, locator::host_id ep) {
static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, locator::host_id ep) {
co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));
}
std::size_t size() const { return _token_ranges.size(); }
@@ -419,7 +413,7 @@ public:
explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, locator::host_id>> token_ranges, const gms::gossiper& g)
: _token_ranges(std::move(token_ranges))
, _gossiper(g) {}
static future<ranges_holder_secondary> make(const locator::vnode_effective_replication_map* erm, locator::host_id ep, const gms::gossiper& g) {
static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, locator::host_id ep, const gms::gossiper& g) {
co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);
}
std::size_t size() const { return _token_ranges.size(); }
@@ -527,7 +521,7 @@ struct scan_ranges_context {
// should be possible (and a must for issue #7751!).
lw_shared_ptr<service::pager::paging_state> paging_state = nullptr;
auto regular_columns =
s->regular_columns() | std::views::transform(&column_definition::id)
s->regular_columns() | std::views::transform([] (const column_definition& cdef) { return cdef.id; })
| std::ranges::to<query::column_id_vector>();
selection = cql3::selection::selection::wildcard(s);
query::partition_slice::option_set opts = selection->get_query_options();
@@ -562,7 +556,7 @@ static future<> scan_table_ranges(
expiration_service::stats& expiration_stats)
{
const schema_ptr& s = scan_ctx.s;
throwing_assert(partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.
SCYLLA_ASSERT (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.
auto p = service::pager::query_pagers::pager(proxy, s, scan_ctx.selection, *scan_ctx.query_state_ptr,
*scan_ctx.query_options, scan_ctx.command, std::move(partition_ranges), nullptr);
while (!p->is_exhausted()) {
@@ -592,7 +586,7 @@ static future<> scan_table_ranges(
if (retries >= 10) {
// Don't get stuck forever asking the same page, maybe there's
// a bug or a real problem in several replicas. Give up on
// this scan and retry the scan from a random position later,
// this scan an retry the scan from a random position later,
// in the next scan period.
throw runtime_exception("scanner thread failed after too many timeouts for the same page");
}
@@ -639,38 +633,13 @@ static future<> scan_table_ranges(
}
} else {
// For a real column to contain an expiration time, it
// must be a numeric type. We currently support decimal
// (used by Alternator TTL) as well as bigint, int and
// timestamp (used by CQL per-row TTL).
switch (meta[*expiration_column]->type->get_kind()) {
case abstract_type::kind::decimal:
// Used by Alternator TTL for key columns not stored
// in the map. The value is in seconds, fractional
// part is ignored.
expired = is_expired(value_cast<big_decimal>(v), now);
break;
case abstract_type::kind::long_kind:
// Used by CQL per-row TTL. The value is in seconds.
expired = is_expired(gc_clock::time_point(std::chrono::seconds(value_cast<int64_t>(v))), now);
break;
case abstract_type::kind::int32:
// Used by CQL per-row TTL. The value is in seconds.
// Using int type is not recommended because it will
// overflow in 2038, but we support it to allow users
// to use existing int columns for expiration.
expired = is_expired(gc_clock::time_point(std::chrono::seconds(value_cast<int32_t>(v))), now);
break;
case abstract_type::kind::timestamp:
// Used by CQL per-row TTL. The value is in milliseconds
// but we truncate it to gc_clock's precision (whole seconds).
expired = is_expired(gc_clock::time_point(std::chrono::duration_cast<gc_clock::duration>(value_cast<db_clock::time_point>(v).time_since_epoch())), now);
break;
default:
// Should never happen - we verified the column's type
// before starting the scan.
[[unlikely]]
on_internal_error(tlogger, format("expiration scanner value of unsupported type {} in column {}", meta[*expiration_column]->type->cql3_type_name(), scan_ctx.column_name) );
}
// must be a numeric type.
// FIXME: Currently we only support decimal_type (which is
// what Alternator uses), but other numeric types can be
// supported as well to make this feature more useful in CQL.
// Note that kind::decimal is also checked above.
big_decimal n = value_cast<big_decimal>(v);
expired = is_expired(n, now);
}
if (expired) {
expiration_stats.items_deleted++;
@@ -732,12 +701,16 @@ static future<bool> scan_table(
co_return false;
}
// attribute_name may be one of the schema's columns (in Alternator, this
// means a key column, in CQL it's a regular column), or an element in
// Alternator's attrs map encoded in Alternator's JSON encoding (which we
// decode). If attribute_name is a real column, in Alternator it will have
// the type decimal, counting seconds since the UNIX epoch, while in CQL
// it will one of the types bigint or int (counting seconds) or timestamp
// (counting milliseconds).
// means it's a key column), or an element in Alternator's attrs map
// encoded in Alternator's JSON encoding.
// FIXME: To make this less Alternators-specific, we should encode in the
// single key's value three things:
// 1. The name of a column
// 2. Optionally if column is a map, a member in the map
// 3. The deserializer for the value: CQL or Alternator (JSON).
// The deserializer can be guessed: If the given column or map item is
// numeric, it can be used directly. If it is a "bytes" type, it needs to
// be deserialized using Alternator's deserializer.
bytes column_name = to_bytes(*attribute_name);
const column_definition *cd = s->get_column_definition(column_name);
std::optional<std::string> member;
@@ -756,14 +729,11 @@ static future<bool> scan_table(
data_type column_type = cd->type;
// Verify that the column has the right type: If "member" exists
// the column must be a map, and if it doesn't, the column must
// be decimal_type (Alternator), bigint, int or timestamp (CQL).
// If the column has the wrong type nothing can get expired in
// this table, and it's pointless to scan it.
// (currently) be a decimal_type. If the column has the wrong type
// nothing can get expired in this table, and it's pointless to
// scan it.
if ((member && column_type->get_kind() != abstract_type::kind::map) ||
(!member && column_type->get_kind() != abstract_type::kind::decimal &&
column_type->get_kind() != abstract_type::kind::long_kind &&
column_type->get_kind() != abstract_type::kind::int32 &&
column_type->get_kind() != abstract_type::kind::timestamp)) {
(!member && column_type->get_kind() != abstract_type::kind::decimal)) {
tlogger.info("table {} TTL column has unsupported type, not scanning", s->cf_name());
co_return false;
}
@@ -776,7 +746,7 @@ static future<bool> scan_table(
auto my_host_id = erm->get_topology().my_host_id();
const auto &tablet_map = erm->get_token_metadata().tablets().get_tablet_map(s->id());
for (std::optional tablet = tablet_map.first_tablet(); tablet; tablet = tablet_map.next_tablet(*tablet)) {
auto tablet_primary_replica = tablet_map.get_primary_replica(*tablet, erm->get_topology());
auto tablet_primary_replica = tablet_map.get_primary_replica(*tablet);
// check if this is the primary replica for the current tablet
if (tablet_primary_replica.host == my_host_id && tablet_primary_replica.shard == this_shard_id()) {
co_await scan_tablet(*tablet, proxy, abort_source, page_sem, expiration_stats, scan_ctx, tablet_map);
@@ -790,7 +760,7 @@ static future<bool> scan_table(
// by tasking another node to take over scanning of the dead node's primary
// ranges. What we do here is that this node will also check expiration
// on its *secondary* ranges - but only those whose primary owner is down.
auto tablet_secondary_replica = tablet_map.get_secondary_replica(*tablet, erm->get_topology()); // throws if no secondary replica
auto tablet_secondary_replica = tablet_map.get_secondary_replica(*tablet); // throws if no secondary replica
if (tablet_secondary_replica.host == my_host_id && tablet_secondary_replica.shard == this_shard_id()) {
if (!gossiper.is_alive(tablet_primary_replica.host)) {
co_await scan_tablet(*tablet, proxy, abort_source, page_sem, expiration_stats, scan_ctx, tablet_map);
@@ -799,12 +769,8 @@ static future<bool> scan_table(
}
}
} else { // VNodes
locator::static_effective_replication_map_ptr ermp =
db.real_database().find_keyspace(s->ks_name()).get_static_effective_replication_map();
auto* erm = ermp->maybe_as_vnode_effective_replication_map();
if (!erm) {
on_internal_error(tlogger, format("Keyspace {} is local", s->ks_name()));
}
locator::vnode_effective_replication_map_ptr erm =
db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();
auto my_host_id = erm->get_topology().my_host_id();
token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_host_id));
while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {
@@ -901,10 +867,12 @@ future<> expiration_service::run() {
future<> expiration_service::start() {
// Called by main() on each shard to start the expiration-service
// thread. Just runs run() in the background and allows stop().
if (!shutting_down()) {
_end = run().handle_exception([] (std::exception_ptr ep) {
tlogger.error("expiration_service failed: {}", ep);
});
if (_db.features().alternator_ttl) {
if (!shutting_down()) {
_end = run().handle_exception([] (std::exception_ptr ep) {
tlogger.error("expiration_service failed: {}", ep);
});
}
}
return make_ready_future<>();
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -30,7 +30,7 @@ namespace alternator {
// expiration_service is a sharded service responsible for cleaning up expired
// items in all tables with per-item expiration enabled. Currently, this means
// Alternator tables with TTL configured via an UpdateTimeToLive request.
// Alternator tables with TTL configured via a UpdateTimeToLeave request.
class expiration_service final : public seastar::peering_sharded_service<expiration_service> {
public:
// Object holding per-shard statistics related to the expiration service.
@@ -52,7 +52,7 @@ private:
data_dictionary::database _db;
service::storage_proxy& _proxy;
gms::gossiper& _gossiper;
// _end is set by start(), and resolves when the background service
// _end is set by start(), and resolves when the the background service
// started by it ends. To ask the background service to end, _abort_source
// should be triggered. stop() below uses both _abort_source and _end.
std::optional<future<>> _end;

View File

@@ -1,26 +0,0 @@
/*
* Copyright 2026-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#pragma once
#include "seastarx.hh"
#include <seastar/core/sstring.hh>
namespace alternator {
// We use the table tag TTL_TAG_KEY ("system:ttl_attribute") to remember
// which attribute was chosen as the expiration-time attribute for
// Alternator's TTL and CQL's per-row TTL features.
// Currently, the *value* of this tag is simply the name of the attribute:
// It can refer to a real column or if that doesn't exist, to a member of
// the ":attrs" map column (which Alternator uses).
extern const sstring TTL_TAG_KEY;
} // namespace alternator
// let users use TTL_TAG_KEY without the "alternator::" prefix,
// to make it easier to move it to a different namespace later.
using alternator::TTL_TAG_KEY;

View File

@@ -31,7 +31,6 @@ set(swagger_files
api-doc/column_family.json
api-doc/commitlog.json
api-doc/compaction_manager.json
api-doc/client_routes.json
api-doc/config.json
api-doc/cql_server_test.json
api-doc/endpoint_snitch_info.json
@@ -69,7 +68,6 @@ target_sources(api
PRIVATE
api.cc
cache_service.cc
client_routes.cc
collectd.cc
column_family.cc
commitlog.cc
@@ -108,8 +106,5 @@ target_link_libraries(api
wasmtime_bindings
absl::headers)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(api REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers api
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -12,7 +12,7 @@
"operations":[
{
"method":"POST",
"summary":"Resets authorized prepared statements cache",
"summary":"Reset cache",
"type":"void",
"nickname":"authorization_cache_reset",
"produces":[

View File

@@ -1,23 +0,0 @@
, "client_routes_entry": {
"id": "client_routes_entry",
"summary": "An entry storing client routes",
"properties": {
"connection_id": {"type": "string"},
"host_id": {"type": "string", "format": "uuid"},
"address": {"type": "string"},
"port": {"type": "integer"},
"tls_port": {"type": "integer"},
"alternator_port": {"type": "integer"},
"alternator_https_port": {"type": "integer"}
},
"required": ["connection_id", "host_id", "address"]
}
, "client_routes_key": {
"id": "client_routes_key",
"summary": "A key of client_routes_entry",
"properties": {
"connection_id": {"type": "string"},
"host_id": {"type": "string", "format": "uuid"}
}
}

View File

@@ -1,74 +0,0 @@
, "/v2/client-routes":{
"get": {
"description":"List all client route entries",
"operationId":"get_client_routes",
"tags":["client_routes"],
"produces":[
"application/json"
],
"parameters":[],
"responses":{
"200":{
"schema":{
"type":"array",
"items":{ "$ref":"#/definitions/client_routes_entry" }
}
},
"default":{
"description":"unexpected error",
"schema":{"$ref":"#/definitions/ErrorModel"}
}
}
},
"post": {
"description":"Upsert one or more client route entries",
"operationId":"set_client_routes",
"tags":["client_routes"],
"parameters":[
{
"name":"body",
"in":"body",
"required":true,
"schema":{
"type":"array",
"items":{ "$ref":"#/definitions/client_routes_entry" }
}
}
],
"responses":{
"200":{ "description": "OK" },
"default":{
"description":"unexpected error",
"schema":{ "$ref":"#/definitions/ErrorModel" }
}
}
},
"delete": {
"description":"Delete one or more client route entries",
"operationId":"delete_client_routes",
"tags":["client_routes"],
"parameters":[
{
"name":"body",
"in":"body",
"required":true,
"schema":{
"type":"array",
"items":{ "$ref":"#/definitions/client_routes_key" }
}
}
],
"responses":{
"200":{
"description": "OK"
},
"default":{
"description":"unexpected error",
"schema":{
"$ref":"#/definitions/ErrorModel"
}
}
}
}
}

View File

@@ -243,7 +243,7 @@
"GOSSIP_DIGEST_SYN",
"GOSSIP_DIGEST_ACK2",
"GOSSIP_SHUTDOWN",
"UNUSED__DEFINITIONS_UPDATE",
"DEFINITIONS_UPDATE",
"TRUNCATE",
"UNUSED__REPLICATION_FINISHED",
"MIGRATION_REQUEST",

View File

@@ -220,25 +220,6 @@
}
]
},
{
"path":"/storage_service/nodes/excluded",
"operations":[
{
"method":"GET",
"summary":"Retrieve host ids of nodes which are marked as excluded",
"type":"array",
"items":{
"type":"string"
},
"nickname":"get_excluded_nodes",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/nodes/joining",
"operations":[
@@ -613,50 +594,6 @@
}
]
},
{
"path": "/storage_service/natural_endpoints/v2/{keyspace}",
"operations": [
{
"method": "GET",
"summary":"This method returns the N endpoints that are responsible for storing the specified key i.e for replication. the endpoint responsible for this key",
"type": "array",
"items": {
"type": "string"
},
"nickname": "get_natural_endpoints_v2",
"produces": [
"application/json"
],
"parameters": [
{
"name": "keyspace",
"description": "The keyspace to query about.",
"required": true,
"allowMultiple": false,
"type": "string",
"paramType": "path"
},
{
"name": "cf",
"description": "Column family name.",
"required": true,
"allowMultiple": false,
"type": "string",
"paramType": "query"
},
{
"name": "key_component",
"description": "Each component of the key for which we need to find the endpoint (e.g. ?key_component=part1&key_component=part2).",
"required": true,
"allowMultiple": true,
"type": "string",
"paramType": "query"
}
]
}
]
},
{
"path":"/storage_service/cdc_streams_check_and_repair",
"operations":[
@@ -743,7 +680,7 @@
"parameters":[
{
"name":"tag",
"description":"The snapshot tag to delete. If omitted, all snapshots are removed.",
"description":"the tag given to the snapshot",
"required":false,
"allowMultiple":false,
"type":"string",
@@ -751,7 +688,7 @@
},
{
"name":"kn",
"description":"Comma-separated list of keyspace names to delete snapshots from. If omitted, snapshots are deleted from all keyspaces.",
"description":"Comma-separated keyspaces name that their snapshot will be deleted",
"required":false,
"allowMultiple":false,
"type":"string",
@@ -759,7 +696,7 @@
},
{
"name":"cf",
"description":"A table name used to filter which table's snapshots are deleted. If omitted or empty, snapshots for all tables are eligible. When provided together with 'kn', the table is looked up in each listed keyspace independently. For secondary indexes, the logical index name (e.g. 'myindex') can be used and is resolved automatically.",
"description":"an optional table name that its snapshot will be deleted",
"required":false,
"allowMultiple":false,
"type":"string",
@@ -961,14 +898,6 @@
"type":"string",
"paramType":"query",
"enum": ["all", "dc", "rack", "node"]
},
{
"name":"primary_replica_only",
"description":"Load the sstables and stream to the primary replica node within the scope, if one is specified. If not, stream to the global primary replica.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -1055,7 +984,7 @@
]
},
{
"path":"/storage_service/cleanup_all/",
"path":"/storage_service/cleanup_all",
"operations":[
{
"method":"POST",
@@ -1065,30 +994,6 @@
"produces":[
"application/json"
],
"parameters":[
{
"name":"global",
"description":"true if cleanup of entire cluster is requested",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/mark_node_as_clean",
"operations":[
{
"method":"POST",
"summary":"Mark the node as clean. After that the node will not be considered as needing cleanup during automatic cleanup which is triggered by some topology operations",
"type":"void",
"nickname":"reset_cleanup_needed",
"produces":[
"application/json"
],
"parameters":[]
}
]
@@ -1195,14 +1100,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name": "drop_unfixable_sstables",
"description": "When set to true, drop unfixable sstables. Applies only to scrub mode SEGREGATE.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -1295,45 +1192,6 @@
}
]
},
{
"path":"/storage_service/logstor_compaction",
"operations":[
{
"method":"POST",
"summary":"Trigger compaction of the key-value storage",
"type":"void",
"nickname":"logstor_compaction",
"produces":[
"application/json"
],
"parameters":[
{
"name":"major",
"description":"When true, perform a major compaction",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/logstor_flush",
"operations":[
{
"method":"POST",
"summary":"Trigger flush of logstor storage",
"type":"void",
"nickname":"logstor_flush",
"produces":[
"application/json"
],
"parameters":[]
}
]
},
{
"path":"/storage_service/active_repair/",
"operations":[
@@ -1661,30 +1519,6 @@
}
]
},
{
"path":"/storage_service/exclude_node",
"operations":[
{
"method":"POST",
"summary":"Marks the node as permanently down (excluded).",
"type":"void",
"nickname":"exclude_node",
"produces":[
"application/json"
],
"parameters":[
{
"name":"hosts",
"description":"Comma-separated list of host ids to exclude",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/removal_status",
"operations":[
@@ -3087,14 +2921,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"incremental_mode",
"description":"Set the incremental repair mode. Can be 'disabled', 'incremental', or 'full'. 'incremental': The incremental repair logic is enabled. Unrepaired sstables will be included for repair. Repaired sstables will be skipped. The incremental repair states will be updated after repair. 'full': The incremental repair logic is enabled. Both repaired and unrepaired sstables will be included for repair. The incremental repair states will be updated after repair. 'disabled': The incremental repair logic is disabled completely. The incremental repair states, e.g., repaired_at in sstables and sstables_repaired_at in the system.tablets table, will not be updated after repair. When the option is not provided, it defaults to incremental mode.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -3124,125 +2950,6 @@
}
]
},
{
"path":"/storage_service/tablets/snapshots",
"operations":[
{
"method":"POST",
"summary":"Takes the snapshot for the given keyspaces/tables. A snapshot name must be specified.",
"type":"void",
"nickname":"take_cluster_snapshot",
"produces":[
"application/json"
],
"parameters":[
{
"name":"tag",
"description":"the tag given to the snapshot",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"keyspace",
"description":"Keyspace(s) to snapshot. Multiple keyspaces can be provided using a comma-separated list. If omitted, snapshot all keyspaces.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"Table(s) to snapshot. Multiple tables (in a single keyspace) can be provided using a comma-separated list. If omitted, snapshot all tables in the given keyspace(s).",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/vnode_tablet_migrations/keyspaces/{keyspace}",
"operations":[{
"method":"POST",
"summary":"Start vnodes-to-tablets migration for all tables in a keyspace",
"type":"void",
"nickname":"create_vnode_tablet_migration",
"produces":["application/json"],
"parameters":[
{
"name":"keyspace",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
},
{
"method":"GET",
"summary":"Get a keyspace's vnodes-to-tablets migration status",
"type":"vnode_tablet_migration_status",
"nickname":"get_vnode_tablet_migration",
"produces":["application/json"],
"parameters":[
{
"name":"keyspace",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}]
},
{
"path":"/storage_service/vnode_tablet_migrations/node/storage_mode",
"operations":[{
"method":"PUT",
"summary":"Set the intended storage mode for this node during vnodes-to-tablets migration",
"type":"void",
"nickname":"set_vnode_tablet_migration_node_storage_mode",
"produces":["application/json"],
"parameters":[
{
"name":"intended_mode",
"description":"Intended storage mode (tablets or vnodes)",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}]
},
{
"path":"/storage_service/vnode_tablet_migrations/keyspaces/{keyspace}/finalization",
"operations":[{
"method":"POST",
"summary":"Finalize vnodes-to-tablets migration for all tables in a keyspace",
"type":"void",
"nickname":"finalize_vnode_tablet_migration",
"produces":["application/json"],
"parameters":[
{
"name":"keyspace",
"description":"Keyspace name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"path"
}
]
}]
},
{
"path":"/storage_service/quiesce_topology",
"operations":[
@@ -3345,38 +3052,6 @@
}
]
},
{
"path":"/storage_service/logstor_info",
"operations":[
{
"method":"GET",
"summary":"Logstor segment information for one table",
"type":"table_logstor_info",
"nickname":"logstor_info",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"table",
"description":"table name",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/retrain_dict",
"operations":[
@@ -3502,38 +3177,6 @@
]
}
]
},
{
"path":"/storage_service/drop_quarantined_sstables",
"operations":[
{
"method":"POST",
"summary":"Drops all quarantined sstables in all keyspaces or specified keyspace and tables",
"type":"void",
"nickname":"drop_quarantined_sstables",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace name to drop quarantined sstables from.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"tables",
"description":"Comma-separated table names to drop quarantined sstables from.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
],
"models":{
@@ -3747,7 +3390,7 @@
"version":{
"type":"string",
"enum":[
"ka", "la", "mc", "md", "me", "ms"
"ka", "la", "mc", "md", "me"
],
"description":"SSTable version"
},
@@ -3785,47 +3428,6 @@
}
}
},
"logstor_hist_bucket":{
"id":"logstor_hist_bucket",
"properties":{
"bucket":{
"type":"long"
},
"count":{
"type":"long"
},
"min_data_size":{
"type":"long"
},
"max_data_size":{
"type":"long"
}
}
},
"table_logstor_info":{
"id":"table_logstor_info",
"description":"Per-table logstor segment distribution",
"properties":{
"keyspace":{
"type":"string"
},
"table":{
"type":"string"
},
"compaction_groups":{
"type":"long"
},
"segments":{
"type":"long"
},
"data_size_histogram":{
"type":"array",
"items":{
"$ref":"logstor_hist_bucket"
}
}
}
},
"tablet_repair_result":{
"id":"tablet_repair_result",
"description":"Tablet repair result",
@@ -3860,45 +3462,6 @@
"description":"The resulting compression ratio (estimated on a random sample of files)"
}
}
},
"vnode_tablet_migration_node_status":{
"id":"vnode_tablet_migration_node_status",
"description":"Node storage mode info during vnodes-to-tablets migration",
"properties":{
"host_id":{
"type":"string",
"description":"The host ID"
},
"current_mode":{
"type":"string",
"description":"The current storage mode: `vnodes` or `tablets`"
},
"intended_mode":{
"type":"string",
"description":"The intended storage mode: `vnodes` or `tablets`"
}
}
},
"vnode_tablet_migration_status":{
"id":"vnode_tablet_migration_status",
"description":"Vnodes-to-tablets migration status for a keyspace",
"properties":{
"keyspace":{
"type":"string",
"description":"The keyspace name"
},
"status":{
"type":"string",
"description":"The migration status: `vnodes` (not started), `migrating_to_tablets` (in progress), or `tablets` (complete)"
},
"nodes":{
"type":"array",
"items":{
"$ref":"vnode_tablet_migration_node_status"
},
"description":"Per-node storage mode information. Empty if the keyspace is not being migrated."
}
}
}
}
}

View File

@@ -209,21 +209,6 @@
"parameters":[]
}
]
},
{
"path":"/system/chosen_sstable_version",
"operations":[
{
"method":"GET",
"summary":"Get sstable version currently chosen for use in new sstables",
"type":"string",
"nickname":"get_chosen_sstable_version",
"produces":[
"application/json"
],
"parameters":[]
}
]
}
]
}

View File

@@ -447,7 +447,7 @@
"description":"The number of units completed so far"
},
"children_ids":{
"type":"chunked_array",
"type":"array",
"items":{
"type":"task_identity"
},

View File

@@ -42,14 +42,6 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"consider_only_existing_data",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "api.hh"
@@ -37,7 +37,6 @@
#include "raft.hh"
#include "gms/gossip_address_map.hh"
#include "service_levels.hh"
#include "client_routes.hh"
logging::logger apilog("api");
@@ -68,11 +67,9 @@ future<> set_server_init(http_context& ctx) {
rb02->set_api_doc(r);
rb02->register_api_file(r, "swagger20_header");
rb02->register_api_file(r, "metrics");
rb02->register_api_file(r, "client_routes");
rb->register_function(r, "system",
"The system related API");
rb02->add_definitions_file(r, "metrics");
rb02->add_definitions_file(r, "client_routes");
set_system(ctx, r);
rb->register_function(r, "error_injection",
"The error injection API");
@@ -122,9 +119,9 @@ future<> unset_thrift_controller(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_thrift_controller(ctx, r); });
}
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& ssc, service::raft_group0_client& group0_client) {
return ctx.http_server.set_routes([&ctx, &ss, &ssc, &group0_client] (routes& r) {
set_storage_service(ctx, r, ss, ssc, group0_client);
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {
return ctx.http_server.set_routes([&ctx, &ss, &group0_client] (routes& r) {
set_storage_service(ctx, r, ss, group0_client);
});
}
@@ -132,16 +129,6 @@ future<> unset_server_storage_service(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_service(ctx, r); });
}
future<> set_server_client_routes(http_context& ctx, sharded<service::client_routes_service>& cr) {
return ctx.http_server.set_routes([&ctx, &cr] (routes& r) {
set_client_routes(ctx, r, cr);
});
}
future<> unset_server_client_routes(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_client_routes(ctx, r); });
}
future<> set_load_meter(http_context& ctx, service::load_meter& lm) {
return ctx.http_server.set_routes([&ctx, &lm] (routes& r) { set_load_meter(ctx, r, lm); });
}
@@ -150,6 +137,14 @@ future<> unset_load_meter(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_load_meter(ctx, r); });
}
future<> set_format_selector(http_context& ctx, db::sstables_format_selector& sel) {
return ctx.http_server.set_routes([&ctx, &sel] (routes& r) { set_format_selector(ctx, r, sel); });
}
future<> unset_format_selector(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_format_selector(ctx, r); });
}
future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader) {
return ctx.http_server.set_routes([&ctx, &sst_loader] (routes& r) { set_sstables_loader(ctx, r, sst_loader); });
}
@@ -229,22 +224,15 @@ future<> unset_server_gossip(http_context& ctx) {
});
}
future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db) {
co_await register_api(ctx, "column_family",
"The column family API", [&db] (http_context& ctx, routes& r) {
set_column_family(ctx, r, db);
});
co_await register_api(ctx, "cache_service",
"The cache service API", [&db] (http_context& ctx, routes& r) {
set_cache_service(ctx, db, r);
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
return register_api(ctx, "column_family",
"The column family API", [&sys_ks] (http_context& ctx, routes& r) {
set_column_family(ctx, r, sys_ks);
});
}
future<> unset_server_column_family(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) {
unset_column_family(ctx, r);
unset_cache_service(ctx, r);
});
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });
}
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {
@@ -276,6 +264,15 @@ future<> unset_server_stream_manager(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_stream_manager(ctx, r); });
}
future<> set_server_cache(http_context& ctx) {
return register_api(ctx, "cache_service",
"The cache service API", set_cache_service);
}
future<> unset_server_cache(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cache_service(ctx, r); });
}
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& proxy, sharded<gms::gossiper>& g) {
return register_api(ctx, "hinted_handoff",
"The hinted handoff API", [&proxy, &g] (http_context& ctx, routes& r) {
@@ -287,7 +284,7 @@ future<> unset_hinted_handoff(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_hinted_handoff(ctx, r); });
}
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction::compaction_manager>& cm) {
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm) {
return register_api(ctx, "compaction_manager", "The Compaction manager API", [&cm] (http_context& ctx, routes& r) {
set_compaction_manager(ctx, r, cm);
});

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -23,6 +23,31 @@
namespace api {
template<class T>
std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {
std::vector<T> res;
res.reserve(map.size());
for (const auto& [key, value] : map) {
res.push_back(T());
res.back().key = key;
res.back().value = value;
}
return res;
}
template<class T, class MAP>
std::vector<T>& map_to_key_value(const MAP& map, std::vector<T>& res) {
res.reserve(res.size() + std::size(map));
for (const auto& [key, value] : map) {
T val;
val.key = fmt::to_string(key);
val.value = fmt::to_string(value);
res.push_back(val);
}
return res;
}
template <typename T, typename S = T>
T map_sum(T&& dest, const S& src) {
for (const auto& i : src) {
@@ -48,7 +73,7 @@ inline std::vector<sstring> split(const sstring& text, const char* separator) {
*
*/
template<class T, class F, class V>
future<json::json_return_type> sum_stats(sharded<T>& d, V F::*f) {
future<json::json_return_type> sum_stats(distributed<T>& d, V F::*f) {
return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, 0,
std::plus<V>()).then([](V val) {
return make_ready_future<json::json_return_type>(val);
@@ -81,7 +106,7 @@ httpd::utils_json::rate_moving_average_and_histogram timer_to_json(const utils::
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(sharded<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).hist;}, utils::ihistogram(),
std::plus<utils::ihistogram>()).then([](const utils::ihistogram& val) {
@@ -90,7 +115,7 @@ future<json::json_return_type> sum_histogram_stats(sharded<T>& d, utils::timed_
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(sharded<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
@@ -99,7 +124,7 @@ future<json::json_return_type> sum_timer_stats(sharded<T>& d, utils::timed_rate
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(sharded<T>& d, utils::timed_rate_moving_average_summary_and_histogram F::*f) {
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_summary_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
@@ -18,9 +18,7 @@
using request = http::request;
using reply = http::reply;
namespace compaction {
class compaction_manager;
}
namespace service {
@@ -29,7 +27,6 @@ class storage_proxy;
class storage_service;
class raft_group0_client;
class raft_group_registry;
class client_routes_service;
} // namespace service
@@ -59,6 +56,7 @@ class sstables_format_selector;
namespace view {
class view_builder;
}
class system_keyspace;
}
namespace netw { class messaging_service; }
class repair_service;
@@ -85,9 +83,9 @@ struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
sharded<replica::database>& db;
distributed<replica::database>& db;
http_context(sharded<replica::database>& _db)
http_context(distributed<replica::database>& _db)
: db(_db)
{
}
@@ -98,10 +96,8 @@ future<> set_server_config(http_context& ctx, db::config& cfg);
future<> unset_server_config(http_context& ctx);
future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);
future<> unset_server_snitch(http_context& ctx);
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>&, service::raft_group0_client&);
future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);
future<> unset_server_storage_service(http_context& ctx);
future<> set_server_client_routes(http_context& ctx, sharded<service::client_routes_service>& cr);
future<> unset_server_client_routes(http_context& ctx);
future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader);
future<> unset_server_sstables_loader(http_context& ctx);
future<> set_server_view_builder(http_context& ctx, sharded<db::view::view_builder>& vb, sharded<gms::gossiper>& g);
@@ -120,7 +116,7 @@ future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_to
future<> unset_server_token_metadata(http_context& ctx);
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
future<> unset_server_gossip(http_context& ctx);
future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db);
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_column_family(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
@@ -130,7 +126,9 @@ future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_
future<> unset_server_stream_manager(http_context& ctx);
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p, sharded<gms::gossiper>& g);
future<> unset_hinted_handoff(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction::compaction_manager>& cm);
future<> set_server_cache(http_context& ctx);
future<> unset_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm);
future<> unset_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg, sharded<gms::gossiper>& gossiper);
@@ -143,6 +141,8 @@ future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);
future<> unset_server_raft(http_context&);
future<> set_load_meter(http_context& ctx, service::load_meter& lm);
future<> unset_load_meter(http_context& ctx);
future<> set_format_selector(http_context& ctx, db::sstables_format_selector& sel);
future<> unset_format_selector(http_context& ctx);
future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl);
future<> unset_server_cql_server_test(http_context& ctx);
future<> set_server_service_levels(http_context& ctx, cql_transport::controller& ctl, sharded<cql3::query_processor>& qp);

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "api/api-doc/authorization_cache.json.hh"

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once

View File

@@ -3,7 +3,7 @@
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "cache_service.hh"
@@ -16,7 +16,7 @@ using namespace json;
using namespace seastar::httpd;
namespace cs = httpd::cache_service_json;
void set_cache_service(http_context& ctx, sharded<replica::database>& db, routes& r) {
void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// We never save the cache
// Origin uses 0 for never
@@ -204,53 +204,53 @@ void set_cache_service(http_context& ctx, sharded<replica::database>& db, routes
});
});
cs::get_row_hits.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf(db, uint64_t(0), [](const replica::column_family& cf) {
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.count();
}, std::plus<uint64_t>());
});
cs::get_row_requests.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf(db, uint64_t(0), [](const replica::column_family& cf) {
cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count();
}, std::plus<uint64_t>());
});
cs::get_row_hit_rate.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf(db, ratio_holder(), [](const replica::column_family& cf) {
cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, ratio_holder(), [](const replica::column_family& cf) {
return ratio_holder(cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count(),
cf.get_row_cache().stats().hits.count());
}, std::plus<ratio_holder>());
});
cs::get_row_hits_moving_avrage.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(db, utils::rate_moving_average(), [](const replica::column_family& cf) {
cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_requests_moving_avrage.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(db, utils::rate_moving_average(), [](const replica::column_family& cf) {
cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate() + cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_size.set(r, [&db] (std::unique_ptr<http::request> req) {
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use raw size in bytes instead
return db.map_reduce0([](replica::database& db) -> uint64_t {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().region().occupancy().used_space();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
});
cs::get_row_entries.set(r, [&db] (std::unique_ptr<http::request> req) {
return db.map_reduce0([](replica::database& db) -> uint64_t {
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);

Some files were not shown because too many files have changed in this diff Show More