Commit Graph

982 Commits

Author SHA1 Message Date
Pavel Emelyanov
02513ac2b8 alternator: Get feature service from proxy directly
The executor::add_stream_options() obtains local database reference from
proxy just to get feature service from it.

Similar chain is used in executor::update_time_to_live().

It's shorter to get features from proxy itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#26973
2025-11-18 08:17:16 +02:00
Pavel Emelyanov
f47f2db710 Merge 'Support local primary-replica-only for native restore' from Robert Bindar
This PR extends the restore API so that it accepts primary_replica_only as parameter and it combines the concepts of primary-replica-only with scoped streaming so that with:
- `scope=all primary_replica_only=true` The restoring node will stream to the global primary replica only
- `scope=dc primary_replica_only=true` The restoring node will stream to the local primary replica only.
- `scope=rack primary_replica_only=true` The restoring node will stream only to the primary replica from within its own rack (with rf=#racks, the restoring node will stream only to itself)
- `scope=node primary_replica_only=true` is not allowed, the restoring node will always stream only to itself so the primary_replica_only parameter wouldn't make sense.

The PR also adjusts the `nodetool refresh` restriction on running restore with both primary_replica_only and scope, it adds primary_replica_only to `nodetool restore` and it adds cluster tests for primary replica within scope.

Fixes #26584

Closes scylladb/scylladb#26609

* github.com:scylladb/scylladb:
  Add cluster tests for checking scoped primary_replica_only streaming
  Improve choice distribution for primary replica
  Refactor cluster/object_store/test_backup
  nodetool restore: add primary-replica-only option
  nodetool refresh: Enable scope={all,dc,rack} with primary_replica_only
  Enable scoped primary replica only streaming
  Support primary_replica_only for native restore API
2025-11-13 12:11:18 +03:00
Robert Bindar
817fdadd49 Improve choice distribution for primary replica
I noticed during tests that `maybe_get_primary_replica`
would not distribute uniformly the choice of primary replica
because `info.replicas` on some shards would have an order whilst
on others it'd be ordered differently, thus making the function choose
a node as primary replica multiple times when it clearly could've
chosen a different nodes.

This patch sorts the replica set before passing it through the
scope filter.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:18:01 +02:00
Nadav Har'El
c03081eb12 alternator: improve error in tablets_mode_for_new_keyspaces=enforced
When in tablets_mode_for_new_keyspaces=enforced mode, Alternator is
supposed to fail when CreateTable asks explicitly for vnodes. Before
this patch, this error was an ugly "Internal Server Error" (an
exception thrown from deep inside the implementation), this patch
checks for this case in the right place, to generate a proper
ValidationException with a proper error message.

We also enable the test test_tablets_tag_vs_config which should have
caught this error, but didn't because it was marked xfail because
tablets_mode_for_new_keyspaces had not been live-updatable. Now that
it is, we can enable the test. I also improved the test to be slightly
faster (no need to change the configuration so many times) and also
check the ordinary case - where the schema doesn't choose neither
vnodes nor tablets explicitly and we should just use the default.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-09 12:52:29 +02:00
Nadav Har'El
b34f28dae2 alternator: improve comment about non-hidden system tags
The previous patches added a somewhat misleading comment in front of
system:initial_tablets, which this patch improves.

That tag is NOT where Alternator "stores" table properties like the
existing comment claimed. In fact, the whole point is that it's the
opposite - Alternator never writes to this tag - it's a user-writable
tag which Alternator *reads*, to configure the new table. And this is
why it obviously can't be hidden from the user.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
63897370cb alternator: Fix tag name to request vnodes
The tag was lately renamed from `experimental:initial_tablets` to
`system::initial_tablets`. This commit fixes both the tests as well as
the exceptions sent to the user instructing how to create table with
vnodes.
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
376a2f2109 alternator: Support tablets_mode_for_new_keyspaces config flag
Until now, tablets in Alternator were experimental feature enabled only
when a TAG "experimental:initial_tablets" was present when creating a
table and associated with a numeric value.

After this patch, Alternator honours the value of
`tablets_mode_for_new_keyspaces` config flag.

Each table can be overriden to use tablets or not by supplying a new TAG
"system:initial_tablets". The rules stay the same as with the earlier,
experimental tag: when supplied with a numeric value, the table will use
tablets (as long as they are supported). When supplied with something
else (like a string "none"), the table will use vnodes, provided that
tablets are not `enforced` by the config flag.

Fixes #22463
2025-11-09 12:52:17 +02:00
Pavel Emelyanov
59019bc9a9 Merge 'Alternator: allow warning on auth errors before enabling enforcement' from Nadav Har'El
An Alternator user was recently "bit" when switching `alternator_enforce_authorization` from "false" to "true": ְְְAfter the configuration change, all application requests suddenly failed because unbeknownst to the user, their application used incorrect secret keys.

This series introduces a solution for users who want to **safely** switch `alternator_enforce_authorization`  from "false" to "true": Before switching from "false" to "true", the user can temporarily switch a new option, `alternator_warn_authorization`, to true. In this "warn" mode, authentication and authorization errors are counted in metrics (`scylla_alternator_authentication_failures` and `scylla_alternator_authorization_failures`) and logged as WARNings, but the user's application continues to work. The user can use these metrics or log messages to learn of errors in their application's setup, fix them, and only do the switch of `alternator_enforce_authorization` when the metrics or log messages show there are no more errors.

The first patch is the implementation of the the feature - the new configuration option, the metrics and the log messages,  the second patch is a test for the new feature, and the third patch is documentation recommending how to use the warn mode and the associated metrics or log messages to safely switch `alternaor_enforce_authorization` from false to true.

Fixes #25308

This is a feature that users need, so it should probably be backported to live branches.

Closes scylladb/scylladb#25457

* github.com:scylladb/scylladb:
  docs/alternator: explain alternator_warn_authorization
  test/alternator: tests for new auth failure metrics and log messages
  alternator: add alternator_warn_authorization config
2025-11-05 10:45:17 +03:00
Avi Kivity
04a289cae6 Merge 'Auto expand to rack list' from Tomasz Grabiec
We want to move towards rack-list based replication factor for tablets being the default mode, and in the future the only supported mode. This PR is a step towards that. We auto-expand numeric RF to rack list on keyspace creation and ALTER when rf_rack_valid_keyspaces option is enabled.

The PR is mostly about adjusting tests. The main logic change is in the last patch, which modifies option post-processing in ks_prop_defs.

Fixes #26397

Closes scylladb/scylladb#26692

* github.com:scylladb/scylladb:
  cql3: ks_prop_defs: Expand numeric RF to rack list
  locator: Move rack_list to topology.hh
  alternator: Do not set RF for zero-token DCs
  alternator: Switch keyspace creation to use ks_prop_defs
  test: alternator: Adjust for rack lists
  cql3: Move validation of invalid ALTER KEYSPACE earlier, to ks_prop_defs
  test: cqlpy: Mark tests using rack lists as scylla-only
  test: Switch to rack-list based RF
  test: Generalize tests to work with both numeric RF and rack lists
  test: cluster: test_zero_token_nodes_multidc: Adjust to rack list RF
  test: Prepare for handling errors specific to rack list path
  test: cluster: dtest: alternator: Force RF=1 in test_putitem_contention
  test: Create cluster with multiple racks in multi-dc setups
  test: boost: network_topology_strategy_test: Adjust to rack-list RF
  test: tablets: Adjust to rack list
  test: cluster: test_group0_schema_versioning: Use smaller RF to respect rf-rack-validness
  test: tablets_test: Convert test_per_shard_goal_mixed_dc_rf to be rack-valid
  test: object_store: test_backup: Adjust for rack lists
  test: cluster: tablets: Do not move tablet across racks in test_tablet_transition_sanity
  test: cluster: mv: Do not move tablets across racks
  test: cluster: util: Fix docstring for parse_replication_options()
  tablets, topology_coordinator: Skip tablet draining on replace
2025-10-30 21:54:08 +02:00
Tomasz Grabiec
f6dfea2fb1 alternator: Do not set RF for zero-token DCs
That will fail with tablets because it won't be able to allocate
replicas.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
21db21af7e alternator: Switch keyspace creation to use ks_prop_defs
So that we get the same validation and option post-processing as
during regular keyspace creation.
RF auto-expansion logic happens in ks_prop_defs, and we want that
for tablets.
2025-10-29 23:32:58 +01:00
Nadav Har'El
aa34f0b875 alternator: fix CDC events for TTL expiration
In commit a3ec6c7d1d we supposedly
implemented the feature of telling TTL experation events from regular
user-sent deletions. However, that implementation did not actually work
at all... It had two bugs:

 1. It created an null rjson::value() instead of an empty dictionary
    rjson::empty_object(), so GetRecords failed every time such a
    TTL expiration event was generated.
 2. In the output, it used lowercase field names "type" and "principalId"
    instead of the uppercase "Type" and "PrincipalId". This is not the
    correct capitalization, and when boto3 recieves such incorrect
    fields it silently deletes them and never passes them to the user's
    get_records() call.

This patch fixes those two bugs, and importantly - enables a test for
this feature. We did already have such a test but it was marked as
"veryslow" so doesn't run in CI and apparently not even run once to
check the new feature. This test is not actually very long on Alternator
when the TTL period is set very low (as we do in our tests), so I replaced
the "veryslow" marker by "waits_for_expiration". The latter marker means
that the test is still very slow - as much as half an hour - on DynamoDB -
but runs quickly on Scylla in our test setup, and enabled in CI by
default.

The enabled test failed badly before this patch (a server error during
GetRecords), and passes with this patch.

Also, the aforementioned commit forgot to remove the paragraph in
Alternator's compatibility.md that claims we don't have that feature yet.
So we do it now.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26633
2025-10-29 17:08:20 +01:00
Nadav Har'El
51186b2f2c alternator: add alternator_warn_authorization config
Before this patch, the configuration alternator_enforce_authorization
is a boolean: true means enforce authentication checks (i.e., each
request is signed by a valid user) and authorization checks (the user
who signed the request is allowed by RBAC to perform this request).

This patch adds a second boolean configuration option,
alternator_warn_authorization. When alternator_enforce_authorization
is false but alternator_warn_authorization is true, authentication and
authorization checks are performed as in enforce mode, but failures
are ignored and counted in two new metrics:

    scylla_alternator_authentication_failures
    scylla_alternator_authorization_failures

additionally,also each authentication or authorization error is logged as
a WARN-level log message. Some users prefer those log messages over
metrics, as the log messages contain additional information about the
failure that can be useful - such as the address of the misconfigured
client, or the username attempted in the request.

All combinations of the two configuration options are allowed:
 * If just "enforce" is true, auth failures cause a request failure.
   The failures are counted, but not logged.
 * If both "enforce" and "warn" are true, auth failures cause a request
   failure. The failures are both counted and logged.
 * If just "warn" is true, auth failures are ignored (the request
   is allowed to compelete) but are counted and logged.
 * If neither "enforce" nor "warn" are true, no authentication or
   authorization check are done at all. So we don't know about failures,
   so naturally we don't count them and don't log them.

This patch is fairly straightforward, doing mainly the following
things:

1. Add an alternator_warn_authorization config parameter.

2. Make sure alternator_enforce_authorization is live-updatable (we'll
   use this in a test in the next patch). It "almost" was, but a typo
   prevented the live update from working properly.

3. Add the two new metrics, and increment them in every type of
   authentication or authorization error.
   Some code that needs to increment these new metrics didn't have
   access to the "stats" object, so we had to pass it around more.

4. Add log messages when alternator_warn_authorization is true.

5. If alternator_enforce_authorization is false, allow the auth check
   to allow the request to proceed (after having counted and/or logged
   the auth error).

A separate patch will follow and add documentation suggesting to users
how to use the new "warn" options to safely switch between non-enforcing
to enforcing mode. Another patch will add tests for the new configuration
options, new metrics and new log messages.

Fixes #25308.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-10-29 11:16:26 +02:00
Nadav Har'El
c3593462a4 alternator: improve protection against oversized requests
Following DynamoDB, Alternator also places a 16 MB limit on the size of
a request. Such a limit is necessary to avoid running out of memory -
because the AWS message authentication protocol requires reading the
entire request into memory before its signature can be verified.

Our implementation for this limit used Seastar's HTTP server's
content_length_limit feature. However, this Seastar feature is
incomplete - it only works when the request uses the Content-Length
header, and doesn't do anything if the request doesn't have a
Content-Length (it may use chunked encoding, or have no length at all).
So malicious users can cause Scylla to OOM by sending a huge request
without a Content-Length.

So in this patch we stop using the incomplete Seastar feature, and
implement the length limit in Scylla in a way that works correctly with
or without Content-Length: We read from the input stream and if we go
over 16MB, we generate an error.

Because we dropped Seastar's protection against a long Content-Length,
we also need to fix a piece of code which used Content-Length to reserve
some semaphore units to prevent reading many large requests in parallel.
We fix two problems in the code:
1. If Content-Length is over the limit, we shouldn't attempt to reserve
   semaphore units - this should just be a Payload Too Large error.
2. If Content-Length is missing, the existing code did nothing and had
   a TODO that we should. In this patch we implement what was suggested
   in that TODO: We temporarily reserve the whole 16 MB limit, and
   after reading the actual request, we return part of the reservation
   according to the real request size.

That last fix is important, because typically the largest requests will be
BatchWriteItem where a well-written client would want to use chunked
encoding, not Content-Length, to avoid materializing the entire request
up-front. For such clients, the memory use semaphore did nothing, and
now it does the right thing.

Note that this patch does *not* solve the problem #12166 that existed
with Seastar's length-limiting implementation but still exists in the
new in-Scylla length-limiting implementation: The fact we send an
error response in the middle of the request and then close the
connection, while the client continues to send the request, can lead
to an RST being sent by the server kernel. Usually this will be fine -
well-written client libraries will be able to read the response before
the RST. But even with a well-written library in some rare timings
the client may get the RST before the response, and will miss the
response, and get an empty or partial response or "connection reset
by peer". This issue existed before this patch, and still exists, but
is probably of minor impact.

Fixes #8196

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23434
2025-10-28 15:24:46 +03:00
Radosław Cybulski
ea6b22f461 Add max trace size output configuration variable
In #24031 users complained, that trace message is truncated, namely it's
no longer json parsable and table name might not be part of the output.
This path enables users to configure maximum size of trace message.
In case user wanted `table` name, but didn't care about message size,
 #26634 will help.

- add configuration varable `alternator_max_users_query_size_in_trace_output`
   with default value of 4096 (4 times old default value).
- modify `truncated_content_view` function to use new configuration
  variable for truncation limit
- update `truncated_content_view` to consistently truncate at given
  size, previously trunctation would also happen when data arrived in
  more than one chunk
- update `truncated_content_view` to better handle truncated value
  (limit number of copies)
- fix `scylla_config_read` call - call to `query` for a configuration
  name that is not existing will return `Items` array empty
  (but present) - this would raise array access exception few lines
  below.
- add test

Refs #26634
Refs #24031

Closes scylladb/scylladb#26618
2025-10-28 13:29:15 +03:00
Nadav Har'El
7c9f5ef59e Merge 'alternator/executor: instantly mark view as built when creating it with base table' from Michał Jadwiszczak
`CreateTable` request creates GSI/LSI together with the base table,
the base table is empty and we don't need to actually build the view.

In tablet-based keyspaces we can just don't create view building tasks
and mark the view build status as SUCCESS on all nodes. Then, the view
building worker on each node will mark the view as built in
`system.built_views` (`view_building_worker::update_built_views()`).

Vnode-based keyspaces will use the "old" logic of view builder, which
will process the view and mark it as built.

Fixes scylladb/scylladb#26615

This fix should be backported to 2025.4.

Closes scylladb/scylladb#26657

* github.com:scylladb/scylladb:
  test/alternator/test_tablets: add test for GSI backfill with tablets
  test/alternator/test_tablets: add reproducer for GSI with tablets
  alternator/executor: instantly mark view as built when creating it with base table
2025-10-22 10:44:28 +03:00
Michał Jadwiszczak
8fbf122277 alternator/executor: instantly mark view as built when creating it with base table
`CreateTable` request creates GSI/LSI together with the base table,
the base table is empty and we don't need to actually build the view.

In tablet-based keyspaces we can just don't create view building tasks
and mark the view build status as SUCCESS on all nodes. Then, the view
building worker on each node will mark the view as built in
`system.built_views` (`view_building_worker::update_built_views()`).

Vnode-based keyspaces will use the "old" logic of view builder, which
will process the view and mark it as built.

Fixes scylladb/scylladb#26615
2025-10-22 00:05:40 +02:00
Michał Hudobski
5c957e83cb vector_search: remove dependence on cql3
This patch removes the dependence of vector search module
on the cql3 module by moving the contents of cql3/type_json.hh
to types/json_utils.hh and removing the usage of cql3 primary_key
object in vector_store_client. We also make the needed adjustments
to files that were previously using the afformentioned type_json.hh
file.

This fixes the circular dependency cql3 <-> vector_search.

Closes scylladb/scylladb#26482
2025-10-21 17:41:55 +03:00
Piotr Wieczorek
a3ec6c7d1d alternator/streams: Support userIdentity field for TTL deletions
UserIdentity is a map of two fields in GetRecords responses, which
always has the same value. It may be missing, or contain a constant
object with value `{"type": "Service", "principalId":
"dynamodb.amazonaws.com"}`. Currently, the latter is set only for
`REMOVE`s triggered by TTL.

This commit introduces two new CDC operation types: `service_row_delete`
and `service_partition_delete`, emitted in place of `row_delete` and
`partition_delete`. Alternator Streams treats them as regular `REMOVE`s,
but in addition adds the `userIdentity` field to the record.

This change may break existing Scylla libraries for reading raw CDC
tables, but we doubt that anybody has this use case.

Refs https://github.com/scylladb/scylladb/pull/26149
Refs https://github.com/scylladb/scylladb/pull/26121
Fixes https://github.com/scylladb/scylladb/issues/11523

Closes scylladb/scylladb#26460
2025-10-20 17:15:59 +02:00
Piotr Dulikowski
a716fab125 Merge 'alternator/metrics: Log operation sizes to histograms' from Piotr Wieczorek
This PR adds operation per-table histograms to Alternator with item sizes involved in an operation, for each of the operations: `GetItem`, `PutItem`, `DeleteItem`, `UpdateItem`, `BatchGetItem`, `BatchWriteItem`. If read-before-write wasn't performed (i.e. it was not needed by the operation and the flag `alternator_force_read_before_write` was disabled), then we log sizes of the items that are in the request. Also, `UpdateItem` logs the maximum of the update size and the existing item size. We'll change it in a next PR.

Fixes: #25143

Closes scylladb/scylladb#25529

* github.com:scylladb/scylladb:
  alternator: Add UpdateItem and BatchWriteItem response size metrics
  alternator: Add PutItem and DeleteItem response size metrics
  alternator: Add BatchGetItem response size metrics
  alternator: Add GetItem response size metrics
  alternator/test: Add more context to test_metrics.py asserts
2025-10-20 10:03:31 +03:00
Piotr Wieczorek
a2b9d7eed5 alternator: Split update_item_operation::apply into smaller methods
This is a minor refactoring aimed at reducing cognitive complexity of
`update_item_operation::apply`. The logic remains unchanged.

Closes scylladb/scylladb#25887
2025-10-17 09:51:05 +02:00
Tomasz Grabiec
c4a87453a2 Merge 'Add experimental feature flag for strongly consistent tables and extend kesypace creation syntax to allow specifying consistency mode.' from Gleb Natapov
The series adds an experimental flag for strongly consistent tables  and extends "CREATE KEYSPACE" ddl with `consistency` option that allows specifying the consistency mode for the keyspace.

Closes scylladb/scylladb#26116

* github.com:scylladb/scylladb:
  schema: Allow configuring consistency setting for a keyspace
  db: experimental consistent-tablets option
2025-10-16 21:48:06 +02:00
Piotr Wieczorek
caa522a29d alternator: Add UpdateItem and BatchWriteItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator operations.

The new metrics are:
- `operation_size_kib op=UpdateItem`: Tracks the size of an `UpdateItem`
  operation. This is calculated as the sum of the existing item's size
  plus the estimated size of the updated fields.
- `operation_size_kib op=BatchWriteItem`: Tracks the total size of items
  within a `BatchWriteItem` request, aggregated on a per-table basis. If
  an item already exists, the logged size is the maximum of the old and
  the new item size.

NOTE: Both metrics rely on read-before-write, so if the
`alternator_force_read_before_write` option is disabled, these metrics
may be incomplete and report inaccurate sizes.
2025-10-16 19:17:27 +02:00
Piotr Wieczorek
5ca42b3baf alternator: Add PutItem and DeleteItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator
operations. Specifically, this commit adds `operation_size_kb`
histograms for sizes of items created or replaced by the `PutItem`
operation, and sizes of items deleted by `DeleteItem` requests. The
latter needs a read-before-write, so the metrics may be incomplete if
`alternator_force_read_before_write` is disabled.
2025-10-16 19:17:26 +02:00
Piotr Wieczorek
5c72fd9ea3 alternator: Add BatchGetItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator
operations. Specifically, this commit adds a `operation_size_kb`
per-table histogram, which contains item sizes in BatchGetItem requests.

A size of a BatchGetItem is the sum of the sizes of all items in the
operation grouped by table. In other words, a single BatchGetItem, and
BatchWriteItem for that matter, updates the histograms for each table
that it has items in.
2025-10-16 19:16:57 +02:00
Piotr Wieczorek
1aa3819b57 alternator: Add GetItem response size metrics
This commit bundle introduces metrics on item sizes for Alternator
operations. Specifically, this commit adds a per-table
`operation_size_kb` histogram, recording the sizes of the items
contained in GetItem responses.
2025-10-16 19:04:55 +02:00
Gleb Natapov
c255740989 schema: Allow configuring consistency setting for a keyspace
We want to add strongly consistent tables as an option. We will have
two kind of strongly consistent tables: globally consistent and locally
consistent. The former means that requests from all DCs will be globally
linearisable while the later - only requests to the same DCs will be
linearisable.  To allow configuring all the possibilities the patch
adds new parameter to a keyspace definition "consistency" that can be
configured to be `eventual`, `global` or `local`. Non eventual setting
is supported for tablets enabled keyspaces only. Since we want to start
with implementing local consistency configuring global consistency will
result in an error for now.
2025-10-16 13:34:49 +03:00
Piotr Dulikowski
61662bc562 Merge 'alternator: Make CDC use preimages from LWT for Alternator' from Piotr Wieczorek
This patch adds a struct `per_request_options` used to communicate between CDC and upper abstraction layers. We need this for better compatibility with DynamoDB Streams in Alternator (https://github.com/scylladb/scylladb/issues/6918) to change operation types of log rows. This patch also adds a way to conditionally forward the item read by LWT to CDC and use it as a preimage. For now, only Alternator uses this feature.

The main changes are:
- add a struct `cdc::per_request_options` to pass information between CDC and upper abstraction layers,
- add the struct to `cas_request::apply`'s signature,
- add a possibility to provide a preimage fetched by an upper abstraction layer (to propagate a row read by Alternator to CDC's preimage). This reduces the number of reads-before-write by 1 for some **Alternator** requests and it is always safe. It's possible to use this feature also in CQL.

No backport, it's a feature.

Refs https://github.com/scylladb/scylladb/issues/6918
Refs https://github.com/scylladb/scylladb/pull/26121

Closes scylladb/scylladb#26149

* github.com:scylladb/scylladb:
  alternator, cdc: Re-use the row read by LWT as a CDC preimage
  cdc: Support prefetched preimages
  storage: Add cdc options to cas_request::apply
  cdc, storage: Add a struct to pass per-mutation options to CDC
  cdc: Move operations enum to the top of the namespace
2025-10-15 12:30:29 +02:00
Piotr Wieczorek
28eda0203e alternator: Small cleanup, removing unnecessary statements, etc.
Tiny code cleanup to improve readability without changing behavior.

Changes:
- remove unused variables and imports,
- remove redundant whitespaces, and a duplicated `public:` access
  specifier,
- use `is_aws` function to check if running in AWS
  test/alternator/test_metrics.py,
- other trivial changes.

Closes scylladb/scylladb#26423
2025-10-15 12:05:20 +02:00
Piotr Wieczorek
5ff2d2d6ab alternator, cdc: Re-use the row read by LWT as a CDC preimage
Propagates the row read by CAS to CDC's preimage to save one
read-before-write.

As of now, a preimage in Alternator Streams always contains the entire
item (see previous_item_read_command in executor.cc), so the resulting
preimage should stay the same. In other words, this change should be
transparent to users.
2025-10-14 07:52:40 +02:00
Piotr Wieczorek
a55c5e9ec7 alternator: Correct RCU undercount in BatchGetItem
The `describe_multi_item` function treated the last reference-captured
argument as the number of used RCU half units. The caller
`batch_get_item`, however, expected this parameter to hold an item size.
This RCU value was then passed to
`rcu_consumed_capacity_counter::get_half_units`, treating the
already-calculated RCU integer as if it were a size in bytes.

This caused a second conversion that undercounted the true RCU. During
conversion, the number of bytes is divided by `RCU_BLOCK_SIZE_LENGTH`
(=4KB), so the double conversion divided the number of bytes by 16 MB.

The fix removes the second conversion in `describe_multi_item` and
changes the API of `describe_multi_item`.

Fixes: https://github.com/scylladb/scylladb/pull/25847

Closes scylladb/scylladb#25842
2025-10-12 10:42:32 +03:00
Piotr Wieczorek
b54ad9e22f storage: Add cdc options to cas_request::apply 2025-10-09 12:28:10 +02:00
Tomasz Grabiec
91e51a5dd1 cql3, locator: Use type aliases for option maps
In preparation for changing their structure.

1) std::map<sstring, sstring> -> replication_strategy_config_options

  Parsed options. Values will become std::variant<sstring, rack_list>

2) std::map<sstring, sstring> -> property_definitions::map_type

  Flattened map of options, as stored system tables.
2025-10-01 16:06:51 +02:00
Benny Halevy
da6e2fdb1b locator: Pass topology to replication strategy constructor 2025-10-01 16:06:28 +02:00
Piotr Wieczorek
4be0bdbc07 alternator: Don't emit a redundant REMOVE event in Alternator Streams for PutItem calls
Until now, every PutItem operation appeared in the Alternator Streams as
two events - a REMOVE and a MODIFY. DynamoDB Streams emits only INSERT
or MODIFY, depending on whether a row was replaced, or created anew. A
related issue scylladb#6918 concerns distinguishing the mutation type properly.

This was because each call to PutItem emitted the two CDC rows, returned
by GetRecords. Since this patch, we use a collection tombstone for the
`:attrs` column, and a separate tombstone for each regular column in the
table's schema. We don't expect that new tables would have any other
regular column, except for the `:attrs` and keys, but we may encounter
them in in upgraded tables which had old GSIs or LSIs.

Fixes: scylladb#6930.

Closes scylladb/scylladb#24991
2025-09-30 13:12:16 +03:00
Ernest Zaslavsky
debc756794 treewide: Move transport related files to a transport directory As requested in #22112, moved the files and fixed other includes and build system.
Moved files:
- generic_server.hh
- generic_server.cc
- protocol_server.hh

Fixes: #22112

This is a cleanup, no need to backport

Closes scylladb/scylladb#25090
2025-09-29 11:46:06 +03:00
Nadav Har'El
69672a5863 alternator: fix deprecation warning
Until recently, Seastar's HTTP server's reply::write_body() only
supported a few "well-known" content types. But Alternator uses a
lesser known one - "application/x-amz-json-1.0" - so it was forced to
use a *wrong* (but legal) content type, and later override it with the
correct one. This was really ugly and we had a comment that once this
feature was fixed in Seastar, we should remove the ugly workaround.

Well, the time has finally come. We can now finally pass the correct
content type to write_body(), and don't need to call the deprecated
type-changing function later.

The new implementation is less awkward, but actually longer - whereas
previously we only set the content type in one place - just before
the done(), after this patch we actually need to do it in three places
where we write the body (string response, streaming response and
error response). But I think this is actually better - there is no
inherent reason why, for example, error messages and success messages
needed to use the same content type. We use a new constant
REPLY_CONSTANT_TYPE so that we don't need to repeat it three times.

We already have a regression test for the content-type returned by
Alternator, test_manual_requests.py::test_content_type, and this
test continues to pass after the patch. But this test only checked
the short response path, so we add additional tests for the streaming
response path and for the error response path. As usual, the new
tests pass on DynamoDB as well.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26268
2025-09-29 11:40:46 +03:00
Szymon Malewski
6ce7843774 alternator: use expression caching
Before this patch, every expression in Alternator's requests was parsed from string to adequate structure.
This patch enables caching - all calls to parse an expression (all types) are proxied through the cache.
New expression is added to the cache, the least recently used entry (above cache size) is removed.
For existing entries the copy of the template is returned - individual instances still need to be resolved (placeholders substituted with names and values).
The cache is per shard - shared for all operations, expression types, tables, users.
Default cache size is 2000 entries per shard and it has configuration option `alternator_max_expression_cache_entries_per_shard` (0 means cache disabled).
Added Python tests are based on metrics.
2025-09-28 04:27:44 +02:00
Szymon Malewski
b75c6c9ef7 alternator: adds expression cache implementation
Every expression in Alternator's requests is parsed from string to adequate structure.
This patch implements a caching structure (input expression strings mapping to parsed 'template' structures), which will be used for handling requests in following commits.
If the reqested expression is valid (parsable) the cache will always return a value - if it is not already in the cache it will be created and stored.
The cache has limited (live configurable) size - when it is reached, the least recently used entry is removed.
The copy of the template in cache is returned - individual instances still need to be resolved (placeholders substituted with names and values).
Invalid requests will have no effect on the cache - the parser throws an exception.
Caching is implemented for all expression types. Internally it is based on helper structure `lru_string_map`.
Basic metrics (total count of hits and misses for each expression type and number of evictions) are implemented.
Metrics are used in boost unit tests.
2025-09-28 04:27:44 +02:00
Szymon Malewski
65c95d3a93 alternator/expressions: error on parsing empty update expression
With this patch empty update expression is no longer accepted by the parser.
So far it was rejected only after resolving, however it could pollute the expression cache.
2025-09-28 04:06:00 +02:00
Szymon Malewski
be159acc03 alternator/expressions: fix single value condition expression parsing
Primitive conditions usually use operator with two or more values.
The only case of a "single value" condition is a function call -
DynamoDB does not accept other general values (i.e., attribute or value references).
In Alternator single general value was parsed as correct and only failed
later when the calculated value ended up to not be a boolean.
This works, but not when attribute or value actually is boolean.

What is more, when a parsed (but not resolved) expression is cached, this invalid expression could pollute cache.
This would be also the only case where the same string can be parsed both as a condition and a projection expression.

The issue is fixed by explicitly checking this case at primitive condition parsing.
Updated test confirms consistence between Alternator and DynamoDB.

Fixes #25855.
2025-09-28 04:06:00 +02:00
Nadav Har'El
f65998cd39 alternator: improve error handling of incorrect ARN
Before this patch, if an ARN that is passed to Alternator requests
like TagResource is well-formatted but points to non-existent table,
Alternator returns the unhelpful error:

  (AccessDeniedException) when calling the TagResource operation:
  Incorrect resource identifier

This patch modifies this error to be:

  (ResourceNotFoundException) when calling the TagResource operation:
  ResourceArn 'arn:scylla:alternator:alternator_alternator_Test_
  1758532308880:scylla:table/alternator_Test_1758532308880x' not found

This is the same error type (ResourceNotFoundException) that DynamoDB
returns in that case - and a more helpful error message.

This patch also includes a regression test that checks the error
type in this case. The new test fails on Alternator before this
patch, and passes afterwards (and also passes on DyanamoDB).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26179
2025-09-25 11:51:17 +03:00
Botond Dénes
50038ef2cc Merge 'alternator: update references to alternator streams issue' from Michael Litvak
update all the references about the issue of tablets support for
alternator streams to issue https://github.com/scylladb/scylladb/issues/23838 instead of https://github.com/scylladb/scylladb/issues/16317.

The issue https://github.com/scylladb/scylladb/issues/16317 is about support of CDC with tablets, but it is now
closed and it didn't address alternator streams. the remaining issues
about alternator streams should be addressed as part of https://github.com/scylladb/scylladb/issues/23838, so fix
the references in order for them not to be missed.

backport is not needed

Closes scylladb/scylladb#26178

* github.com:scylladb/scylladb:
  test/cqlpy/test_permissions: unskip test for tablets
  alternator: update references to alternator streams issue
2025-09-25 11:05:52 +03:00
Ernest Zaslavsky
5ba5aec1f8 treewide: Move mutation related files to a mutation directory
As requested in #22104, moved the files and fixed other includes and build system.

Moved files:
 - combine.hh
 - collection_mutation.hh
 - collection_mutation.cc
 - converting_mutation_partition_applier.hh
 - converting_mutation_partition_applier.cc
 - counters.hh
 - counters.cc
 - timestamp.hh

Fixes: #22104

This is a cleanup, no need to backport

Closes scylladb/scylladb#25085
2025-09-24 13:23:38 +03:00
Michael Litvak
65351fda29 alternator: update references to alternator streams issue
update all the references about the issue of tablets support for
alternator streams to issue #23838 instead of #16317.

The issue #16317 is about support of CDC with tablets, but it is now
closed and it didn't address alternator streams. the remaining issues
about alternator streams should be addressed as part of #23838, so fix
the references in order for them not to be missed.
2025-09-22 09:56:23 +02:00
Nadav Har'El
7be5454db1 Merge 'alternator: Store LSI keys in :attrs for newly created tables' from Piotr Wieczorek
Previously, LSI keys were stored as separate, top-level columns in the base table. This patch changes this behavior for newly created tables, so that the key columns are stored inside the `:attrs` map. Then, we use top-level computed columns instead of regular ones.

This makes LSI storage consistent with GSIs and allows the use of a collection tombstone on `:attrs` to delete all attributes in a row except for keys in new tables.

Refs https://github.com/scylladb/scylladb/pull/24991
Refs https://github.com/scylladb/scylladb/issues/6930

Closes scylladb/scylladb#25796

* github.com:scylladb/scylladb:
  alternator: Store LSI keys in :attrs for newly created tables
  alternator/test: Add LSI tests based mostly on the existing GSI tests
2025-09-18 21:48:43 +03:00
Radosław Cybulski
c0db278c03 Don't report spurious keys in DescribeTable
Alternator, when creating gsi, adds artificially columns, that user
had not ask for. This patch prevents those columns from showing up in
DescribeTable's output.

Fixes #5320

Closes scylladb/scylladb#25978
2025-09-18 19:34:39 +03:00
Ernest Zaslavsky
54aa552af7 treewide: Move type related files to a type directory As requested in #22110, moved the files and fixed other includes and build system.
Moved files:
- duration.hh
- duration.cc
- concrete_types.hh

Fixes: #22110

This is a cleanup, no need to backport

Closes scylladb/scylladb#25088
2025-09-17 17:32:19 +03:00
Ernest Zaslavsky
a1f18a8883 treewide: Move schema related files to a schema directory As requested in #22111, moved the files and fixed other includes and build system.
Moved files:
- frozen_schema.hh
- frozen_schema.cc
- schema_mutations.hh
- schema_mutations.cc
- column_computation.hh

Fixes: #22111

Closes scylladb/scylladb#25089
2025-09-17 17:31:05 +03:00
Nadav Har'El
3c0032deb4 alternator: fix bug in combination of AttributeUpdates + ReturnValues
In test/alternator/test_returnvalues.py we had tests for the
ReturnValues feature on UpdateItem requests - but we only tested
UpdateItem requests with the "modern" UpdateExpression, and forgot to
test the combination of ReturnValues with the old AttributeUpdates API.

It turns out this combination is buggy: when both ReturnValues=ALL_OLD
and AttributeUpdates need the previous value of the item, we may wrongly
std::move() the value out, and the operation will fail with a strange
error:

    An error occurred (ValidationException) when calling the UpdateItem
    operation: JSON assert failed on condition 'IsObject()'

The fix in this patch is trivial - just move the std::move() to the
correct place, after both UpdateExpression and AttributeUpdates
handling is done.

This patch also includes a reproducing test, which fails before this
patch and passes with it - and of course passes on DynamoDB. This
test reproduces two cases where the bug happened, as well as one
case where it didn't (to make sure we don't regress in what already
worked).

Fixes #25894

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25900
2025-09-17 16:04:01 +03:00