scylladb

Author	SHA1	Message	Date
Michael Litvak	b9ec1180f5	alternator: require rf_rack_valid_keyspaces when creating index When creating an alternator table with tablets, if it has an index, LSI or GSI, require the config option rf_rack_valid_keyspaces to be enabled. The option is required for materialized views in tablets keyspaces to function properly and avoid consistency issues that could happen due to cross-rack migrations and pairing switches when RF-rack validity is not enforced. Currently the option is validated when creating a materialized view via the CQL interface, but it's missing from the alternator interface. Since alternator indexes are based on materialized views, the same check should be added there as well. Fixes scylladb/scylladb#27612 Closes scylladb/scylladb#27622	2025-12-15 10:36:57 +02:00
Nadav Har'El	0c64e3be9a	Merge 'Unify and fix rjson string and string_view conversions' from Marcin Maliszkiewicz This patch-set consolidates and corrects rjson string conversion handling. It removes unnecessary string copies, ensures proper length usage and replaces ad-hoc conversions with consistent helper functions. Overall, the changes make rjson string handling safer, faster, and more uniform across the codebase. Backport: no, it's a refactor Closes scylladb/scylladb#27394 * github.com:scylladb/scylladb: fix rjson::value to bytes conversion with missing GetStringLength call alternator: change type from string to string_view in should_add_capacity fix rjson::value to string_view conversion with missing GetStringLength call use rjson::to_string_view when rjson::value gets converted using GetStringLength use rjson::to_sstring and rjson::to_string for various string conversions utils: use rjson document wrapper in instance_profile_credentials_provider::parse_creds utils: move rjson::to_string_view func to string related place utils: add to_sstring and to_string rjson helper	2025-12-11 12:05:41 +02:00
Marcin Maliszkiewicz	be9992cfb3	fix rjson::value to bytes conversion with missing GetStringLength call	2025-12-09 19:27:22 +01:00
Marcin Maliszkiewicz	62962f33bb	fix rjson::value to string_view conversion with missing GetStringLength call In some cases we unnecessarily convert to string which causes a copy. In other we convert without calling GetStringLength which causes iteration to dermine length which is already known. In some cases we do even both. This commit fixes that.	2025-12-09 19:27:21 +01:00
Marcin Maliszkiewicz	060c2f7c0d	use rjson::to_string_view when rjson::value gets converted using GetStringLength This commit is only cosmetics, changes calls to GetStringLength into rjson::to_string_view with the same underlying implementation.	2025-12-09 19:27:21 +01:00
Marcin Maliszkiewicz	64149b57c3	use rjson::to_sstring and rjson::to_string for various string conversions In some cases we ommit size checking which is wrong as according to rapid json documentation strings may contain \0 byte in the middle.	2025-12-09 19:27:21 +01:00
Petr Gusev	608eee0357	alternator/executor.cc: eliminate redundant dk copy A small refactoring/optimization.	2025-12-09 10:21:06 +01:00
Petr Gusev	0bcc2977bb	alternator/executor.cc: release cas_shard on the original shard Before this series, we kept the cas_shard on the original shard to guard against tablet movements running in parallel with storage_proxy::cas. The bug addressed by this PR shows that this approach is flawed: keeping the cas_shard on the original shard does not guarantee that a new cas_shard acquired on the target shard won’t require another jump. We fixed this in the previous commit by checking cas_shard.this_shard() on the target shard and continuing to jump to another shard if necessary. Once cas_shard.this_shard() on the target shard returns true, the storage_proxy::cas invariants are satisfied, and no other cas_shard instances need to remain alive except the one passed into storage_proxy::cas.	2025-12-09 10:21:06 +01:00
Petr Gusev	3a865fe991	alternator/executor.cc: move shard check into cas_write This change ensures that if cas_shard points to a different shard, the executor will continue issuing shard jumps until cas_shard.this_shard() returns true. The commit simply moves the this_shard() check from the parallel_for_each lambda into cas_write, with minimal functional changes. We enable test_alternator_invalid_shard_for_lwt since now it should pass. Fixes scylladb/scylladb#27353	2025-12-09 10:21:01 +01:00
Petr Gusev	c6eec4eeef	alternator/executor.cc: make cas_write a private method We will need to access executor::_stats field from cas_write. We could pass it as a paramter, but it seems simpler to just make cas_write and instance method too.	2025-12-08 10:29:54 +01:00
Petr Gusev	9bef142328	alternator/executor.cc: make do_batch_write a private method We will need to access executor::_stats field on other shards.	2025-12-08 10:29:54 +01:00
Petr Gusev	74bf24a4a7	alternator/executor.cc: fix indent	2025-12-08 10:29:28 +01:00
Petr Gusev	e60bcd0011	test_alternator: add test_alternator_invalid_shard_for_lwt This test reproduces scylladb/scylladb#27353 using two injection points. First, the test triggers an intra-node tablet migration and suspends it at the streaming stage using the intranode_migration_streaming_wait injection. Next, it enables the alternator_executor_batch_write_wait injection, which suspends a batch write after its cas_shard has already been created. The test then issues several batch writes and waits until one of them hits this injection on the destination shard. At this point, the cas_shard.erm for that write is still in the streaming state, meaning the executor would need to jump back to the source shard. The test then resumes the suspended tablet migration, allowing it to update the ERM on the source shard to write_both_read_new. After that, the test releases the suspended batch write and expects it to perform two shard jumps: first from the destination to the source shard, and then again back to the source shard. This commit adds the alternator_executor_batch_write_wait injection to alternator/executor.cc. Coroutines are intentionally avoided in the parallel_for_each lambda to prevent unnecessary coroutine-frame allocations.	2025-12-08 10:29:28 +01:00
Petr Gusev	f00f7976c1	alternator/executor.cc: avoid cross-shard free This commit is an optimization: avoiding destruction of foreign objects on the wrong shard. Releasing objects allocated on a different shard causes their ::free calls to be executed remotely, which adds unnecessary load to the SMP subsystem. Before this patch, a std::vector could be moved to another shard. When the vector was eventually destroyed, its ::free had to be marshalled back to the shard where the memory had originally been allocated. This change avoids that overhead by passing the vector by const reference instead. The referenced objects lifetime correctness reasoning: * the put_or_delete_item refs usages in put_or_delete_item_cas_request are bound to its lifetime * cas_request lifetime is bound to storage_proxy::cas future * we don't release put_or_delete_item-s untill all storage_proxy::cas calls are done.	2025-12-07 16:14:56 +01:00
Petr Gusev	c428645d16	storage_proxy: cas: take cas_request by raw reference In the next commit we want to add an optimization that relies on precise control over the lifetime of cas_request. In particular, we want the implementation of this interface in Alternator to operate on raw references that are guaranteed to remain valid only until the cas() future is resolved. We already depend on the same lifetime assumptions in cas_request when used by modification_statement. However, these assumptions are not clearly expressed in the current interface: cas_request is taken by shared_ptr, and nothing prevents cas() from storing that pointer inside paxos_response_handler, which may outlive the cas() future. This commit fixes that by taking cas_request by raw reference. This makes it explicit that cas() does not assume ownership of the object. Callers must ensure that the referenced object remains valid until the returned future is resolved.	2025-12-07 16:14:56 +01:00
Nadav Har'El	350cbd1d66	alternator: fix typo of BatchWriteItem in comments The DynamoDB API's "BatchWriteItem" operation is spelled like this, in singular. Some comments incorrectly referred to as BatchWriteItems - in plural. This patch fixes those mistakes. There are no functional changes here or changes to user-facing documents - these mistakes were only in code comments. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27446	2025-12-05 15:08:58 +02:00
Piotr Dulikowski	44c605e59c	Merge 'Fix the types of change events in Alternator Streams' from Piotr Wieczorek This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in https://github.com/scylladb/scylladb/issues/6918) into Alternator. The main changes are: - introduce a new flag `alternator_streams_strict_compatibility`, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams, - reduce splitting of simple Alternator mutations, - correctly distinguish event types described in #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs. To summarize, the emitted events of the data manipulation operations should be as follows: - DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK) - DeleteItem of nonexistent item: nothing (OK) - BatchWriteItem.DeleteItem of nonexistent item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK) No backport is necessary. Refs https://github.com/scylladb/scylladb/pull/26149 Refs https://github.com/scylladb/scylladb/pull/26396 Refs https://github.com/scylladb/scylladb/issues/26382 Fixes https://github.com/scylladb/scylladb/issues/6918 Closes scylladb/scylladb#26121 * github.com:scylladb/scylladb: test/alternator: Enable the tests failing because of #6918 alternator, cdc: Don't emit events for no-op removes alternator, cdc: Don't emit an event for equal items alternator/streams, cdc: Differentiate item replace and item update in CDC alternator: Change the return type of rmw_operation_return config: Add alternator_streams_strict_compatibility flag cdc: Don't split a row marker away from row cells	2025-11-30 07:20:22 +01:00
Radosław Cybulski	b54a9f4613	Fix use-after-free in encode_paging_state in Alternator Fix unlikely use-after-free in `encode_paging_state`. The function incorrectly assumes that current position to encode will always have data for all clustering columns the schema defines. It's possible to encounter current position having less than all columns specified, for eample in case of range tombstone. Those don't happen in Alternator tables as DynamoDB doesn't allow range deletions and clustering key might be of size at most 1. Alternator api can be used to read scylla system tables and those do have range tombstones with more than single clustering column. The fix is to stop trying to encode columns, that don't have the value - they are not needed anyway, as there's no possible position with those values (range tombstone made sure of that). Fixes #27001 Fixes #27125 Closes scylladb/scylladb#26960	2025-11-28 16:51:15 +03:00
Wojciech Mitros	3c376d1b64	alternator: use storage_proxy from the correct shard in executor::delete_table When we delete a table in alternator, the schema change is performed on shard 0. However, we actually use the storage_proxy from the shard that is handling the delete_table command. This can lead to problems because some information is stored only on shard 0 and using storage_proxy from another shard may make us miss it. In this patch we fix this by using the storage_proxy from shard 0 instead. Fixes https://github.com/scylladb/scylladb/issues/27223 Closes scylladb/scylladb#27224	2025-11-25 18:56:31 +01:00
Nadav Har'El	64a075533b	alternator: fix update of stats from wrong shard In commit `51186b2` (PR #25457) we introduced new statistics for authentication errors, and among other places we modified executor::create_table() to update them when necessary. This function runs its real work (create_table_on_shard0()) on shard 0, but incorrectly updates "_stats" from the original shard. It doesn't really matter which shard's stats we update - but it does matter that code running on shard 0 shouldn't touch some other shard's objects. Since all we do on these stats is to increment an integer, the risk of updating it on the wrong shard is minimal to non-existant, but it's still wrong and can cause bigger trouble in the future as the code continues to evolve. The fix is simple - we should pass to create_table_on_shard0() the _stats object from the acutal shard running it (shard 0). Fixes #26942 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#26944	2025-11-21 11:53:06 +02:00
Radosław Cybulski	ce8db6e19e	Add table name to tracing in alternator Add a table name to Alternator's tracing output, as some clients would like to consistently receive this information. - add missing `tracing::add_table_name` in `executor::scan` - add emiting tables' names in `trace_state::build_parameters_map` - update tests, so when tracing is looked for it is filtered by table's name, which confirms table is being outputed. - change `struct one_session_records` declaration to `class one_session_records`, as `one_session_records` is later defined as class. Refs #26618 Fixes #24031 Closes scylladb/scylladb#26634	2025-11-21 09:33:40 +02:00
Nadav Har'El	c03081eb12	alternator: improve error in tablets_mode_for_new_keyspaces=enforced When in tablets_mode_for_new_keyspaces=enforced mode, Alternator is supposed to fail when CreateTable asks explicitly for vnodes. Before this patch, this error was an ugly "Internal Server Error" (an exception thrown from deep inside the implementation), this patch checks for this case in the right place, to generate a proper ValidationException with a proper error message. We also enable the test test_tablets_tag_vs_config which should have caught this error, but didn't because it was marked xfail because tablets_mode_for_new_keyspaces had not been live-updatable. Now that it is, we can enable the test. I also improved the test to be slightly faster (no need to change the configuration so many times) and also check the ordinary case - where the schema doesn't choose neither vnodes nor tablets explicitly and we should just use the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Nadav Har'El	b34f28dae2	alternator: improve comment about non-hidden system tags The previous patches added a somewhat misleading comment in front of system:initial_tablets, which this patch improves. That tag is NOT where Alternator "stores" table properties like the existing comment claimed. In fact, the whole point is that it's the opposite - Alternator never writes to this tag - it's a user-writable tag which Alternator reads, to configure the new table. And this is why it obviously can't be hidden from the user. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	63897370cb	alternator: Fix tag name to request vnodes The tag was lately renamed from `experimental:initial_tablets` to `system::initial_tablets`. This commit fixes both the tests as well as the exceptions sent to the user instructing how to create table with vnodes.	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	376a2f2109	alternator: Support `tablets_mode_for_new_keyspaces` config flag Until now, tablets in Alternator were experimental feature enabled only when a TAG "experimental:initial_tablets" was present when creating a table and associated with a numeric value. After this patch, Alternator honours the value of `tablets_mode_for_new_keyspaces` config flag. Each table can be overriden to use tablets or not by supplying a new TAG "system:initial_tablets". The rules stay the same as with the earlier, experimental tag: when supplied with a numeric value, the table will use tablets (as long as they are supported). When supplied with something else (like a string "none"), the table will use vnodes, provided that tablets are not `enforced` by the config flag. Fixes #22463	2025-11-09 12:52:17 +02:00
Pavel Emelyanov	59019bc9a9	Merge 'Alternator: allow warning on auth errors before enabling enforcement' from Nadav Har'El An Alternator user was recently "bit" when switching `alternator_enforce_authorization` from "false" to "true": ְְְAfter the configuration change, all application requests suddenly failed because unbeknownst to the user, their application used incorrect secret keys. This series introduces a solution for users who want to safely switch `alternator_enforce_authorization` from "false" to "true": Before switching from "false" to "true", the user can temporarily switch a new option, `alternator_warn_authorization`, to true. In this "warn" mode, authentication and authorization errors are counted in metrics (`scylla_alternator_authentication_failures` and `scylla_alternator_authorization_failures`) and logged as WARNings, but the user's application continues to work. The user can use these metrics or log messages to learn of errors in their application's setup, fix them, and only do the switch of `alternator_enforce_authorization` when the metrics or log messages show there are no more errors. The first patch is the implementation of the the feature - the new configuration option, the metrics and the log messages, the second patch is a test for the new feature, and the third patch is documentation recommending how to use the warn mode and the associated metrics or log messages to safely switch `alternaor_enforce_authorization` from false to true. Fixes #25308 This is a feature that users need, so it should probably be backported to live branches. Closes scylladb/scylladb#25457 * github.com:scylladb/scylladb: docs/alternator: explain alternator_warn_authorization test/alternator: tests for new auth failure metrics and log messages alternator: add alternator_warn_authorization config	2025-11-05 10:45:17 +03:00
Piotr Wieczorek	8c2f60f111	alternator/streams, cdc: Differentiate item replace and item update in CDC This commit improves compatibility with DynamoDB streams by changing the emitted events when creating/updating an item. Replace/update operations of an existing item emit a MODIFY, whereas replacing/updating a missing item results in an INSERT. If the state of the item doesn't change after applying the operation, no event is emitted. This commit handles the following cases: - `PutItem/UpdateItem/BatchWriteItem.PutItem of an existing and not equal item: MODIFY` - `PutItem/UpdateItem/BatchWriteItem.PutItem of a nonexistent item: INSERT` Refs https://github.com/scylladb/scylladb/issues/6918	2025-10-30 07:40:31 +01:00
Piotr Wieczorek	4f6aeb7b6b	alternator: Change the return type of rmw_operation_return Change the type from future<executor::request_return_type> to executor::request_return_type, because the method isn't async and one out of two callers unwraps the future immediately. This simplifies the code a little and probably saves a few instructions, since we suspect that moving a future<X> is more expensive than just moving X.	2025-10-30 07:40:31 +01:00
Piotr Wieczorek	e3fde8087a	cdc: Don't split a row marker away from row cells CDC log table records a mutation as a sequence of log rows that record an atomic change (i.e. a row marker, tombstones, etc.), whereas a mutation in Alternator Streams always appears as a single log row. The type of operation is determined based on the type of the last log row in CDC. As a result, updates that create a row always appeared to Alternator Streams as an update (row marker + data), rather than an insert. This commit makes them a single log row. Its operation type is insert if it contains a row marker, and an update otherwise, which gives results consistent with DynamoDB Streams.	2025-10-30 07:40:31 +01:00
Tomasz Grabiec	f6dfea2fb1	alternator: Do not set RF for zero-token DCs That will fail with tablets because it won't be able to allocate replicas.	2025-10-29 23:32:58 +01:00
Tomasz Grabiec	21db21af7e	alternator: Switch keyspace creation to use ks_prop_defs So that we get the same validation and option post-processing as during regular keyspace creation. RF auto-expansion logic happens in ks_prop_defs, and we want that for tablets.	2025-10-29 23:32:58 +01:00
Nadav Har'El	51186b2f2c	alternator: add alternator_warn_authorization config Before this patch, the configuration alternator_enforce_authorization is a boolean: true means enforce authentication checks (i.e., each request is signed by a valid user) and authorization checks (the user who signed the request is allowed by RBAC to perform this request). This patch adds a second boolean configuration option, alternator_warn_authorization. When alternator_enforce_authorization is false but alternator_warn_authorization is true, authentication and authorization checks are performed as in enforce mode, but failures are ignored and counted in two new metrics: scylla_alternator_authentication_failures scylla_alternator_authorization_failures additionally,also each authentication or authorization error is logged as a WARN-level log message. Some users prefer those log messages over metrics, as the log messages contain additional information about the failure that can be useful - such as the address of the misconfigured client, or the username attempted in the request. All combinations of the two configuration options are allowed: * If just "enforce" is true, auth failures cause a request failure. The failures are counted, but not logged. * If both "enforce" and "warn" are true, auth failures cause a request failure. The failures are both counted and logged. * If just "warn" is true, auth failures are ignored (the request is allowed to compelete) but are counted and logged. * If neither "enforce" nor "warn" are true, no authentication or authorization check are done at all. So we don't know about failures, so naturally we don't count them and don't log them. This patch is fairly straightforward, doing mainly the following things: 1. Add an alternator_warn_authorization config parameter. 2. Make sure alternator_enforce_authorization is live-updatable (we'll use this in a test in the next patch). It "almost" was, but a typo prevented the live update from working properly. 3. Add the two new metrics, and increment them in every type of authentication or authorization error. Some code that needs to increment these new metrics didn't have access to the "stats" object, so we had to pass it around more. 4. Add log messages when alternator_warn_authorization is true. 5. If alternator_enforce_authorization is false, allow the auth check to allow the request to proceed (after having counted and/or logged the auth error). A separate patch will follow and add documentation suggesting to users how to use the new "warn" options to safely switch between non-enforcing to enforcing mode. Another patch will add tests for the new configuration options, new metrics and new log messages. Fixes #25308. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-10-29 11:16:26 +02:00
Michał Jadwiszczak	8fbf122277	alternator/executor: instantly mark view as built when creating it with base table `CreateTable` request creates GSI/LSI together with the base table, the base table is empty and we don't need to actually build the view. In tablet-based keyspaces we can just don't create view building tasks and mark the view build status as SUCCESS on all nodes. Then, the view building worker on each node will mark the view as built in `system.built_views` (`view_building_worker::update_built_views()`). Vnode-based keyspaces will use the "old" logic of view builder, which will process the view and mark it as built. Fixes scylladb/scylladb#26615	2025-10-22 00:05:40 +02:00
Piotr Dulikowski	a716fab125	Merge 'alternator/metrics: Log operation sizes to histograms' from Piotr Wieczorek This PR adds operation per-table histograms to Alternator with item sizes involved in an operation, for each of the operations: `GetItem`, `PutItem`, `DeleteItem`, `UpdateItem`, `BatchGetItem`, `BatchWriteItem`. If read-before-write wasn't performed (i.e. it was not needed by the operation and the flag `alternator_force_read_before_write` was disabled), then we log sizes of the items that are in the request. Also, `UpdateItem` logs the maximum of the update size and the existing item size. We'll change it in a next PR. Fixes: #25143 Closes scylladb/scylladb#25529 * github.com:scylladb/scylladb: alternator: Add UpdateItem and BatchWriteItem response size metrics alternator: Add PutItem and DeleteItem response size metrics alternator: Add BatchGetItem response size metrics alternator: Add GetItem response size metrics alternator/test: Add more context to test_metrics.py asserts	2025-10-20 10:03:31 +03:00
Piotr Wieczorek	a2b9d7eed5	alternator: Split `update_item_operation::apply` into smaller methods This is a minor refactoring aimed at reducing cognitive complexity of `update_item_operation::apply`. The logic remains unchanged. Closes scylladb/scylladb#25887	2025-10-17 09:51:05 +02:00
Tomasz Grabiec	c4a87453a2	Merge 'Add experimental feature flag for strongly consistent tables and extend kesypace creation syntax to allow specifying consistency mode.' from Gleb Natapov The series adds an experimental flag for strongly consistent tables and extends "CREATE KEYSPACE" ddl with `consistency` option that allows specifying the consistency mode for the keyspace. Closes scylladb/scylladb#26116 * github.com:scylladb/scylladb: schema: Allow configuring consistency setting for a keyspace db: experimental consistent-tablets option	2025-10-16 21:48:06 +02:00
Piotr Wieczorek	caa522a29d	alternator: Add UpdateItem and BatchWriteItem response size metrics This commit bundle introduces metrics on item sizes for Alternator operations. The new metrics are: - `operation_size_kib op=UpdateItem`: Tracks the size of an `UpdateItem` operation. This is calculated as the sum of the existing item's size plus the estimated size of the updated fields. - `operation_size_kib op=BatchWriteItem`: Tracks the total size of items within a `BatchWriteItem` request, aggregated on a per-table basis. If an item already exists, the logged size is the maximum of the old and the new item size. NOTE: Both metrics rely on read-before-write, so if the `alternator_force_read_before_write` option is disabled, these metrics may be incomplete and report inaccurate sizes.	2025-10-16 19:17:27 +02:00
Piotr Wieczorek	5ca42b3baf	alternator: Add PutItem and DeleteItem response size metrics This commit bundle introduces metrics on item sizes for Alternator operations. Specifically, this commit adds `operation_size_kb` histograms for sizes of items created or replaced by the `PutItem` operation, and sizes of items deleted by `DeleteItem` requests. The latter needs a read-before-write, so the metrics may be incomplete if `alternator_force_read_before_write` is disabled.	2025-10-16 19:17:26 +02:00
Piotr Wieczorek	5c72fd9ea3	alternator: Add BatchGetItem response size metrics This commit bundle introduces metrics on item sizes for Alternator operations. Specifically, this commit adds a `operation_size_kb` per-table histogram, which contains item sizes in BatchGetItem requests. A size of a BatchGetItem is the sum of the sizes of all items in the operation grouped by table. In other words, a single BatchGetItem, and BatchWriteItem for that matter, updates the histograms for each table that it has items in.	2025-10-16 19:16:57 +02:00
Piotr Wieczorek	1aa3819b57	alternator: Add GetItem response size metrics This commit bundle introduces metrics on item sizes for Alternator operations. Specifically, this commit adds a per-table `operation_size_kb` histogram, recording the sizes of the items contained in GetItem responses.	2025-10-16 19:04:55 +02:00
Gleb Natapov	c255740989	schema: Allow configuring consistency setting for a keyspace We want to add strongly consistent tables as an option. We will have two kind of strongly consistent tables: globally consistent and locally consistent. The former means that requests from all DCs will be globally linearisable while the later - only requests to the same DCs will be linearisable. To allow configuring all the possibilities the patch adds new parameter to a keyspace definition "consistency" that can be configured to be `eventual`, `global` or `local`. Non eventual setting is supported for tablets enabled keyspaces only. Since we want to start with implementing local consistency configuring global consistency will result in an error for now.	2025-10-16 13:34:49 +03:00
Piotr Dulikowski	61662bc562	Merge 'alternator: Make CDC use preimages from LWT for Alternator' from Piotr Wieczorek This patch adds a struct `per_request_options` used to communicate between CDC and upper abstraction layers. We need this for better compatibility with DynamoDB Streams in Alternator (https://github.com/scylladb/scylladb/issues/6918) to change operation types of log rows. This patch also adds a way to conditionally forward the item read by LWT to CDC and use it as a preimage. For now, only Alternator uses this feature. The main changes are: - add a struct `cdc::per_request_options` to pass information between CDC and upper abstraction layers, - add the struct to `cas_request::apply`'s signature, - add a possibility to provide a preimage fetched by an upper abstraction layer (to propagate a row read by Alternator to CDC's preimage). This reduces the number of reads-before-write by 1 for some Alternator requests and it is always safe. It's possible to use this feature also in CQL. No backport, it's a feature. Refs https://github.com/scylladb/scylladb/issues/6918 Refs https://github.com/scylladb/scylladb/pull/26121 Closes scylladb/scylladb#26149 * github.com:scylladb/scylladb: alternator, cdc: Re-use the row read by LWT as a CDC preimage cdc: Support prefetched preimages storage: Add cdc options to cas_request::apply cdc, storage: Add a struct to pass per-mutation options to CDC cdc: Move operations enum to the top of the namespace	2025-10-15 12:30:29 +02:00
Piotr Wieczorek	28eda0203e	alternator: Small cleanup, removing unnecessary statements, etc. Tiny code cleanup to improve readability without changing behavior. Changes: - remove unused variables and imports, - remove redundant whitespaces, and a duplicated `public:` access specifier, - use `is_aws` function to check if running in AWS test/alternator/test_metrics.py, - other trivial changes. Closes scylladb/scylladb#26423	2025-10-15 12:05:20 +02:00
Piotr Wieczorek	5ff2d2d6ab	alternator, cdc: Re-use the row read by LWT as a CDC preimage Propagates the row read by CAS to CDC's preimage to save one read-before-write. As of now, a preimage in Alternator Streams always contains the entire item (see previous_item_read_command in executor.cc), so the resulting preimage should stay the same. In other words, this change should be transparent to users.	2025-10-14 07:52:40 +02:00
Piotr Wieczorek	a55c5e9ec7	alternator: Correct RCU undercount in BatchGetItem The `describe_multi_item` function treated the last reference-captured argument as the number of used RCU half units. The caller `batch_get_item`, however, expected this parameter to hold an item size. This RCU value was then passed to `rcu_consumed_capacity_counter::get_half_units`, treating the already-calculated RCU integer as if it were a size in bytes. This caused a second conversion that undercounted the true RCU. During conversion, the number of bytes is divided by `RCU_BLOCK_SIZE_LENGTH` (=4KB), so the double conversion divided the number of bytes by 16 MB. The fix removes the second conversion in `describe_multi_item` and changes the API of `describe_multi_item`. Fixes: https://github.com/scylladb/scylladb/pull/25847 Closes scylladb/scylladb#25842	2025-10-12 10:42:32 +03:00
Piotr Wieczorek	b54ad9e22f	storage: Add cdc options to cas_request::apply	2025-10-09 12:28:10 +02:00
Tomasz Grabiec	91e51a5dd1	cql3, locator: Use type aliases for option maps In preparation for changing their structure. 1) std::map<sstring, sstring> -> replication_strategy_config_options Parsed options. Values will become std::variant<sstring, rack_list> 2) std::map<sstring, sstring> -> property_definitions::map_type Flattened map of options, as stored system tables.	2025-10-01 16:06:51 +02:00
Benny Halevy	da6e2fdb1b	locator: Pass topology to replication strategy constructor	2025-10-01 16:06:28 +02:00
Piotr Wieczorek	4be0bdbc07	alternator: Don't emit a redundant REMOVE event in Alternator Streams for PutItem calls Until now, every PutItem operation appeared in the Alternator Streams as two events - a REMOVE and a MODIFY. DynamoDB Streams emits only INSERT or MODIFY, depending on whether a row was replaced, or created anew. A related issue scylladb#6918 concerns distinguishing the mutation type properly. This was because each call to PutItem emitted the two CDC rows, returned by GetRecords. Since this patch, we use a collection tombstone for the `:attrs` column, and a separate tombstone for each regular column in the table's schema. We don't expect that new tables would have any other regular column, except for the `:attrs` and keys, but we may encounter them in in upgraded tables which had old GSIs or LSIs. Fixes: scylladb#6930. Closes scylladb/scylladb#24991	2025-09-30 13:12:16 +03:00
Szymon Malewski	6ce7843774	alternator: use expression caching Before this patch, every expression in Alternator's requests was parsed from string to adequate structure. This patch enables caching - all calls to parse an expression (all types) are proxied through the cache. New expression is added to the cache, the least recently used entry (above cache size) is removed. For existing entries the copy of the template is returned - individual instances still need to be resolved (placeholders substituted with names and values). The cache is per shard - shared for all operations, expression types, tables, users. Default cache size is 2000 entries per shard and it has configuration option `alternator_max_expression_cache_entries_per_shard` (0 means cache disabled). Added Python tests are based on metrics.	2025-09-28 04:27:44 +02:00

1 2 3 4 5 ...

577 Commits