scylladb

Author	SHA1	Message	Date
Vlad Zolotarov	28cbaef110	service/client_state and alternator/server: use cached values for driver_name and driver_version fields Optimize memory usage changing types of driver_name and driver_version be a reference to a cached value instead of an sstring. These fields very often have the same value among different connections hence it makes sense to cache these values and use references to them instead of duplicating such strings in each connection state. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2025-12-20 12:26:22 -05:00
Vlad Zolotarov	3a54bab193	controller: update get_client_data to use foreign_ptr for client_data get_client_data() is used to assemble `client_data` objects from each connection on each CPU in the context of generation of the `system.clients` virtual table data. After collected, `client_data` objects were std::moved and arranged into a different structure to match the table's sorting requirements. This didn't allow having not-cross-shard-movable objects as fields in the `client_data`, e.g. lw_shared_ptr objects. Since we are planning to add such fields to `client_data` in following patches this patch is solving the limitation above by making get_client_data() return `foreign_ptr<std::unique_ptr<client_data>>` objects instead of naked `client_data` ones. Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>	2025-12-19 11:01:41 -05:00
Michael Litvak	b9ec1180f5	alternator: require rf_rack_valid_keyspaces when creating index When creating an alternator table with tablets, if it has an index, LSI or GSI, require the config option rf_rack_valid_keyspaces to be enabled. The option is required for materialized views in tablets keyspaces to function properly and avoid consistency issues that could happen due to cross-rack migrations and pairing switches when RF-rack validity is not enforced. Currently the option is validated when creating a materialized view via the CQL interface, but it's missing from the alternator interface. Since alternator indexes are based on materialized views, the same check should be added there as well. Fixes scylladb/scylladb#27612 Closes scylladb/scylladb#27622	2025-12-15 10:36:57 +02:00
Nadav Har'El	0c64e3be9a	Merge 'Unify and fix rjson string and string_view conversions' from Marcin Maliszkiewicz This patch-set consolidates and corrects rjson string conversion handling. It removes unnecessary string copies, ensures proper length usage and replaces ad-hoc conversions with consistent helper functions. Overall, the changes make rjson string handling safer, faster, and more uniform across the codebase. Backport: no, it's a refactor Closes scylladb/scylladb#27394 * github.com:scylladb/scylladb: fix rjson::value to bytes conversion with missing GetStringLength call alternator: change type from string to string_view in should_add_capacity fix rjson::value to string_view conversion with missing GetStringLength call use rjson::to_string_view when rjson::value gets converted using GetStringLength use rjson::to_sstring and rjson::to_string for various string conversions utils: use rjson document wrapper in instance_profile_credentials_provider::parse_creds utils: move rjson::to_string_view func to string related place utils: add to_sstring and to_string rjson helper	2025-12-11 12:05:41 +02:00
Marcin Maliszkiewicz	be9992cfb3	fix rjson::value to bytes conversion with missing GetStringLength call	2025-12-09 19:27:22 +01:00
Marcin Maliszkiewicz	daf00a7f24	alternator: change type from string to string_view in should_add_capacity It avoids allocation.	2025-12-09 19:27:21 +01:00
Marcin Maliszkiewicz	62962f33bb	fix rjson::value to string_view conversion with missing GetStringLength call In some cases we unnecessarily convert to string which causes a copy. In other we convert without calling GetStringLength which causes iteration to dermine length which is already known. In some cases we do even both. This commit fixes that.	2025-12-09 19:27:21 +01:00
Marcin Maliszkiewicz	060c2f7c0d	use rjson::to_string_view when rjson::value gets converted using GetStringLength This commit is only cosmetics, changes calls to GetStringLength into rjson::to_string_view with the same underlying implementation.	2025-12-09 19:27:21 +01:00
Marcin Maliszkiewicz	64149b57c3	use rjson::to_sstring and rjson::to_string for various string conversions In some cases we ommit size checking which is wrong as according to rapid json documentation strings may contain \0 byte in the middle.	2025-12-09 19:27:21 +01:00
Petr Gusev	608eee0357	alternator/executor.cc: eliminate redundant dk copy A small refactoring/optimization.	2025-12-09 10:21:06 +01:00
Petr Gusev	0bcc2977bb	alternator/executor.cc: release cas_shard on the original shard Before this series, we kept the cas_shard on the original shard to guard against tablet movements running in parallel with storage_proxy::cas. The bug addressed by this PR shows that this approach is flawed: keeping the cas_shard on the original shard does not guarantee that a new cas_shard acquired on the target shard won’t require another jump. We fixed this in the previous commit by checking cas_shard.this_shard() on the target shard and continuing to jump to another shard if necessary. Once cas_shard.this_shard() on the target shard returns true, the storage_proxy::cas invariants are satisfied, and no other cas_shard instances need to remain alive except the one passed into storage_proxy::cas.	2025-12-09 10:21:06 +01:00
Petr Gusev	3a865fe991	alternator/executor.cc: move shard check into cas_write This change ensures that if cas_shard points to a different shard, the executor will continue issuing shard jumps until cas_shard.this_shard() returns true. The commit simply moves the this_shard() check from the parallel_for_each lambda into cas_write, with minimal functional changes. We enable test_alternator_invalid_shard_for_lwt since now it should pass. Fixes scylladb/scylladb#27353	2025-12-09 10:21:01 +01:00
Petr Gusev	c6eec4eeef	alternator/executor.cc: make cas_write a private method We will need to access executor::_stats field from cas_write. We could pass it as a paramter, but it seems simpler to just make cas_write and instance method too.	2025-12-08 10:29:54 +01:00
Petr Gusev	9bef142328	alternator/executor.cc: make do_batch_write a private method We will need to access executor::_stats field on other shards.	2025-12-08 10:29:54 +01:00
Petr Gusev	74bf24a4a7	alternator/executor.cc: fix indent	2025-12-08 10:29:28 +01:00
Petr Gusev	e60bcd0011	test_alternator: add test_alternator_invalid_shard_for_lwt This test reproduces scylladb/scylladb#27353 using two injection points. First, the test triggers an intra-node tablet migration and suspends it at the streaming stage using the intranode_migration_streaming_wait injection. Next, it enables the alternator_executor_batch_write_wait injection, which suspends a batch write after its cas_shard has already been created. The test then issues several batch writes and waits until one of them hits this injection on the destination shard. At this point, the cas_shard.erm for that write is still in the streaming state, meaning the executor would need to jump back to the source shard. The test then resumes the suspended tablet migration, allowing it to update the ERM on the source shard to write_both_read_new. After that, the test releases the suspended batch write and expects it to perform two shard jumps: first from the destination to the source shard, and then again back to the source shard. This commit adds the alternator_executor_batch_write_wait injection to alternator/executor.cc. Coroutines are intentionally avoided in the parallel_for_each lambda to prevent unnecessary coroutine-frame allocations.	2025-12-08 10:29:28 +01:00
Petr Gusev	f00f7976c1	alternator/executor.cc: avoid cross-shard free This commit is an optimization: avoiding destruction of foreign objects on the wrong shard. Releasing objects allocated on a different shard causes their ::free calls to be executed remotely, which adds unnecessary load to the SMP subsystem. Before this patch, a std::vector could be moved to another shard. When the vector was eventually destroyed, its ::free had to be marshalled back to the shard where the memory had originally been allocated. This change avoids that overhead by passing the vector by const reference instead. The referenced objects lifetime correctness reasoning: * the put_or_delete_item refs usages in put_or_delete_item_cas_request are bound to its lifetime * cas_request lifetime is bound to storage_proxy::cas future * we don't release put_or_delete_item-s untill all storage_proxy::cas calls are done.	2025-12-07 16:14:56 +01:00
Petr Gusev	c428645d16	storage_proxy: cas: take cas_request by raw reference In the next commit we want to add an optimization that relies on precise control over the lifetime of cas_request. In particular, we want the implementation of this interface in Alternator to operate on raw references that are guaranteed to remain valid only until the cas() future is resolved. We already depend on the same lifetime assumptions in cas_request when used by modification_statement. However, these assumptions are not clearly expressed in the current interface: cas_request is taken by shared_ptr, and nothing prevents cas() from storing that pointer inside paxos_response_handler, which may outlive the cas() future. This commit fixes that by taking cas_request by raw reference. This makes it explicit that cas() does not assume ownership of the object. Callers must ensure that the referenced object remains valid until the returned future is resolved.	2025-12-07 16:14:56 +01:00
Nadav Har'El	350cbd1d66	alternator: fix typo of BatchWriteItem in comments The DynamoDB API's "BatchWriteItem" operation is spelled like this, in singular. Some comments incorrectly referred to as BatchWriteItems - in plural. This patch fixes those mistakes. There are no functional changes here or changes to user-facing documents - these mistakes were only in code comments. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27446	2025-12-05 15:08:58 +02:00
Avi Kivity	ce2a403f18	Merge 'alternator: implement gzip-compressed requests' from Nadav Har'El In this series we implement Alternator's support for gzip-compressed requests, i.e., requests with the "Content-Encoding: gzip" header, other uncompressed header, and a gzip-compressed body. The server needs to verify the signature of the compressed content, and then uncompress the body before running the request. We only support gzip compression because this is what DynamoDB supports. But in the future we can easily add support for other compression algorithms like lz4 or zstd. This series Refs #5041 but doesn't "Fixes" it because it only implements compressed requests (Content-Encoding), not compressed responses (Accept-Encoding). In addition to the code changes, the series also contains tests for this feature that make sure it behaves like DynamoDB. Note that while we will have now support in our server for compressed requests, just like DynamoDB does, the clients (AWS SDKs) will probably NOT make use of it because they do not enable request compression by default. For example, see the tests for some hoops one needs to jump through in boto3 (the Python SDK) to send compressed requests. However, we are hoping that in the future Alternator's modified clients will use compressed requests and enjoy this feature. Closes scylladb/scylladb#27080 * github.com:scylladb/scylladb: test/alternator: enable, and add, tests for gzip'ed requests alternator: implement gzip-compressed requests	2025-11-30 13:27:46 +02:00
Piotr Dulikowski	44c605e59c	Merge 'Fix the types of change events in Alternator Streams' from Piotr Wieczorek This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in https://github.com/scylladb/scylladb/issues/6918) into Alternator. The main changes are: - introduce a new flag `alternator_streams_strict_compatibility`, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams, - reduce splitting of simple Alternator mutations, - correctly distinguish event types described in #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs. To summarize, the emitted events of the data manipulation operations should be as follows: - DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK) - DeleteItem of nonexistent item: nothing (OK) - BatchWriteItem.DeleteItem of nonexistent item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK) No backport is necessary. Refs https://github.com/scylladb/scylladb/pull/26149 Refs https://github.com/scylladb/scylladb/pull/26396 Refs https://github.com/scylladb/scylladb/issues/26382 Fixes https://github.com/scylladb/scylladb/issues/6918 Closes scylladb/scylladb#26121 * github.com:scylladb/scylladb: test/alternator: Enable the tests failing because of #6918 alternator, cdc: Don't emit events for no-op removes alternator, cdc: Don't emit an event for equal items alternator/streams, cdc: Differentiate item replace and item update in CDC alternator: Change the return type of rmw_operation_return config: Add alternator_streams_strict_compatibility flag cdc: Don't split a row marker away from row cells	2025-11-30 07:20:22 +01:00
Radosław Cybulski	b54a9f4613	Fix use-after-free in encode_paging_state in Alternator Fix unlikely use-after-free in `encode_paging_state`. The function incorrectly assumes that current position to encode will always have data for all clustering columns the schema defines. It's possible to encounter current position having less than all columns specified, for eample in case of range tombstone. Those don't happen in Alternator tables as DynamoDB doesn't allow range deletions and clustering key might be of size at most 1. Alternator api can be used to read scylla system tables and those do have range tombstones with more than single clustering column. The fix is to stop trying to encode columns, that don't have the value - they are not needed anyway, as there's no possible position with those values (range tombstone made sure of that). Fixes #27001 Fixes #27125 Closes scylladb/scylladb#26960	2025-11-28 16:51:15 +03:00
Wojciech Mitros	3c376d1b64	alternator: use storage_proxy from the correct shard in executor::delete_table When we delete a table in alternator, the schema change is performed on shard 0. However, we actually use the storage_proxy from the shard that is handling the delete_table command. This can lead to problems because some information is stored only on shard 0 and using storage_proxy from another shard may make us miss it. In this patch we fix this by using the storage_proxy from shard 0 instead. Fixes https://github.com/scylladb/scylladb/issues/27223 Closes scylladb/scylladb#27224	2025-11-25 18:56:31 +01:00
Nadav Har'El	4c7c5f4af7	alternator: implement gzip-compressed requests In this patch we implement Alternator's support for gzip-compressed requests, i.e., requests with the "Content-Encoding: gzip" header, other uncompressed headers, and a gzip-compressed body. The server needs to verify the signature of the compressed content, and then uncompress the body before running the request. We only support gzip compression because this is what DynamoDB supports. But in the future we can easily add support for other compression algorithms like lz4 or zstd. This patch Refs #5041 but doesn't "Fixes" it because it only implements compressed requests (Content-Encoding), not compressed responses (Accept-Encoding). The next patch will enable several tests for this feature and make sure it behaves like DynamoDB. Note that while we will have now support in our server for compressed requests, just like DynamoDB does, the clients (AWS SDKs) will probably NOT make use of it because they do not enable request compression by default. For example, see the tests for some hoops one needs to jump through in boto3 (the Python SDK) to send compressed requests. However, we are hoping that in the future Alternator's modified clients will use compressed requests and enjoy this feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-25 17:46:44 +02:00
Radosław Cybulski	d589e68642	Add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime Closes scylladb/scylladb#26617	2025-11-21 12:27:41 +02:00
Nadav Har'El	64a075533b	alternator: fix update of stats from wrong shard In commit `51186b2` (PR #25457) we introduced new statistics for authentication errors, and among other places we modified executor::create_table() to update them when necessary. This function runs its real work (create_table_on_shard0()) on shard 0, but incorrectly updates "_stats" from the original shard. It doesn't really matter which shard's stats we update - but it does matter that code running on shard 0 shouldn't touch some other shard's objects. Since all we do on these stats is to increment an integer, the risk of updating it on the wrong shard is minimal to non-existant, but it's still wrong and can cause bigger trouble in the future as the code continues to evolve. The fix is simple - we should pass to create_table_on_shard0() the _stats object from the acutal shard running it (shard 0). Fixes #26942 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#26944	2025-11-21 11:53:06 +02:00
Radosław Cybulski	ce8db6e19e	Add table name to tracing in alternator Add a table name to Alternator's tracing output, as some clients would like to consistently receive this information. - add missing `tracing::add_table_name` in `executor::scan` - add emiting tables' names in `trace_state::build_parameters_map` - update tests, so when tracing is looked for it is filtered by table's name, which confirms table is being outputed. - change `struct one_session_records` declaration to `class one_session_records`, as `one_session_records` is later defined as class. Refs #26618 Fixes #24031 Closes scylladb/scylladb#26634	2025-11-21 09:33:40 +02:00
Pavel Emelyanov	02513ac2b8	alternator: Get feature service from proxy directly The executor::add_stream_options() obtains local database reference from proxy just to get feature service from it. Similar chain is used in executor::update_time_to_live(). It's shorter to get features from proxy itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26973	2025-11-18 08:17:16 +02:00
Pavel Emelyanov	f47f2db710	Merge 'Support local primary-replica-only for native restore' from Robert Bindar This PR extends the restore API so that it accepts primary_replica_only as parameter and it combines the concepts of primary-replica-only with scoped streaming so that with: - `scope=all primary_replica_only=true` The restoring node will stream to the global primary replica only - `scope=dc primary_replica_only=true` The restoring node will stream to the local primary replica only. - `scope=rack primary_replica_only=true` The restoring node will stream only to the primary replica from within its own rack (with rf=#racks, the restoring node will stream only to itself) - `scope=node primary_replica_only=true` is not allowed, the restoring node will always stream only to itself so the primary_replica_only parameter wouldn't make sense. The PR also adjusts the `nodetool refresh` restriction on running restore with both primary_replica_only and scope, it adds primary_replica_only to `nodetool restore` and it adds cluster tests for primary replica within scope. Fixes #26584 Closes scylladb/scylladb#26609 * github.com:scylladb/scylladb: Add cluster tests for checking scoped primary_replica_only streaming Improve choice distribution for primary replica Refactor cluster/object_store/test_backup nodetool restore: add primary-replica-only option nodetool refresh: Enable scope={all,dc,rack} with primary_replica_only Enable scoped primary replica only streaming Support primary_replica_only for native restore API	2025-11-13 12:11:18 +03:00
Robert Bindar	817fdadd49	Improve choice distribution for primary replica I noticed during tests that `maybe_get_primary_replica` would not distribute uniformly the choice of primary replica because `info.replicas` on some shards would have an order whilst on others it'd be ordered differently, thus making the function choose a node as primary replica multiple times when it clearly could've chosen a different nodes. This patch sorts the replica set before passing it through the scope filter. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2025-11-11 09:18:01 +02:00
Nadav Har'El	c03081eb12	alternator: improve error in tablets_mode_for_new_keyspaces=enforced When in tablets_mode_for_new_keyspaces=enforced mode, Alternator is supposed to fail when CreateTable asks explicitly for vnodes. Before this patch, this error was an ugly "Internal Server Error" (an exception thrown from deep inside the implementation), this patch checks for this case in the right place, to generate a proper ValidationException with a proper error message. We also enable the test test_tablets_tag_vs_config which should have caught this error, but didn't because it was marked xfail because tablets_mode_for_new_keyspaces had not been live-updatable. Now that it is, we can enable the test. I also improved the test to be slightly faster (no need to change the configuration so many times) and also check the ordinary case - where the schema doesn't choose neither vnodes nor tablets explicitly and we should just use the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Nadav Har'El	b34f28dae2	alternator: improve comment about non-hidden system tags The previous patches added a somewhat misleading comment in front of system:initial_tablets, which this patch improves. That tag is NOT where Alternator "stores" table properties like the existing comment claimed. In fact, the whole point is that it's the opposite - Alternator never writes to this tag - it's a user-writable tag which Alternator reads, to configure the new table. And this is why it obviously can't be hidden from the user. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	63897370cb	alternator: Fix tag name to request vnodes The tag was lately renamed from `experimental:initial_tablets` to `system::initial_tablets`. This commit fixes both the tests as well as the exceptions sent to the user instructing how to create table with vnodes.	2025-11-09 12:52:29 +02:00
Piotr Szymaniak	376a2f2109	alternator: Support `tablets_mode_for_new_keyspaces` config flag Until now, tablets in Alternator were experimental feature enabled only when a TAG "experimental:initial_tablets" was present when creating a table and associated with a numeric value. After this patch, Alternator honours the value of `tablets_mode_for_new_keyspaces` config flag. Each table can be overriden to use tablets or not by supplying a new TAG "system:initial_tablets". The rules stay the same as with the earlier, experimental tag: when supplied with a numeric value, the table will use tablets (as long as they are supported). When supplied with something else (like a string "none"), the table will use vnodes, provided that tablets are not `enforced` by the config flag. Fixes #22463	2025-11-09 12:52:17 +02:00
Pavel Emelyanov	59019bc9a9	Merge 'Alternator: allow warning on auth errors before enabling enforcement' from Nadav Har'El An Alternator user was recently "bit" when switching `alternator_enforce_authorization` from "false" to "true": ְְְAfter the configuration change, all application requests suddenly failed because unbeknownst to the user, their application used incorrect secret keys. This series introduces a solution for users who want to safely switch `alternator_enforce_authorization` from "false" to "true": Before switching from "false" to "true", the user can temporarily switch a new option, `alternator_warn_authorization`, to true. In this "warn" mode, authentication and authorization errors are counted in metrics (`scylla_alternator_authentication_failures` and `scylla_alternator_authorization_failures`) and logged as WARNings, but the user's application continues to work. The user can use these metrics or log messages to learn of errors in their application's setup, fix them, and only do the switch of `alternator_enforce_authorization` when the metrics or log messages show there are no more errors. The first patch is the implementation of the the feature - the new configuration option, the metrics and the log messages, the second patch is a test for the new feature, and the third patch is documentation recommending how to use the warn mode and the associated metrics or log messages to safely switch `alternaor_enforce_authorization` from false to true. Fixes #25308 This is a feature that users need, so it should probably be backported to live branches. Closes scylladb/scylladb#25457 * github.com:scylladb/scylladb: docs/alternator: explain alternator_warn_authorization test/alternator: tests for new auth failure metrics and log messages alternator: add alternator_warn_authorization config	2025-11-05 10:45:17 +03:00
Avi Kivity	04a289cae6	Merge 'Auto expand to rack list' from Tomasz Grabiec We want to move towards rack-list based replication factor for tablets being the default mode, and in the future the only supported mode. This PR is a step towards that. We auto-expand numeric RF to rack list on keyspace creation and ALTER when rf_rack_valid_keyspaces option is enabled. The PR is mostly about adjusting tests. The main logic change is in the last patch, which modifies option post-processing in ks_prop_defs. Fixes #26397 Closes scylladb/scylladb#26692 * github.com:scylladb/scylladb: cql3: ks_prop_defs: Expand numeric RF to rack list locator: Move rack_list to topology.hh alternator: Do not set RF for zero-token DCs alternator: Switch keyspace creation to use ks_prop_defs test: alternator: Adjust for rack lists cql3: Move validation of invalid ALTER KEYSPACE earlier, to ks_prop_defs test: cqlpy: Mark tests using rack lists as scylla-only test: Switch to rack-list based RF test: Generalize tests to work with both numeric RF and rack lists test: cluster: test_zero_token_nodes_multidc: Adjust to rack list RF test: Prepare for handling errors specific to rack list path test: cluster: dtest: alternator: Force RF=1 in test_putitem_contention test: Create cluster with multiple racks in multi-dc setups test: boost: network_topology_strategy_test: Adjust to rack-list RF test: tablets: Adjust to rack list test: cluster: test_group0_schema_versioning: Use smaller RF to respect rf-rack-validness test: tablets_test: Convert test_per_shard_goal_mixed_dc_rf to be rack-valid test: object_store: test_backup: Adjust for rack lists test: cluster: tablets: Do not move tablet across racks in test_tablet_transition_sanity test: cluster: mv: Do not move tablets across racks test: cluster: util: Fix docstring for parse_replication_options() tablets, topology_coordinator: Skip tablet draining on replace	2025-10-30 21:54:08 +02:00
Piotr Wieczorek	8c2f60f111	alternator/streams, cdc: Differentiate item replace and item update in CDC This commit improves compatibility with DynamoDB streams by changing the emitted events when creating/updating an item. Replace/update operations of an existing item emit a MODIFY, whereas replacing/updating a missing item results in an INSERT. If the state of the item doesn't change after applying the operation, no event is emitted. This commit handles the following cases: - `PutItem/UpdateItem/BatchWriteItem.PutItem of an existing and not equal item: MODIFY` - `PutItem/UpdateItem/BatchWriteItem.PutItem of a nonexistent item: INSERT` Refs https://github.com/scylladb/scylladb/issues/6918	2025-10-30 07:40:31 +01:00
Piotr Wieczorek	4f6aeb7b6b	alternator: Change the return type of rmw_operation_return Change the type from future<executor::request_return_type> to executor::request_return_type, because the method isn't async and one out of two callers unwraps the future immediately. This simplifies the code a little and probably saves a few instructions, since we suspect that moving a future<X> is more expensive than just moving X.	2025-10-30 07:40:31 +01:00
Piotr Wieczorek	e3fde8087a	cdc: Don't split a row marker away from row cells CDC log table records a mutation as a sequence of log rows that record an atomic change (i.e. a row marker, tombstones, etc.), whereas a mutation in Alternator Streams always appears as a single log row. The type of operation is determined based on the type of the last log row in CDC. As a result, updates that create a row always appeared to Alternator Streams as an update (row marker + data), rather than an insert. This commit makes them a single log row. Its operation type is insert if it contains a row marker, and an update otherwise, which gives results consistent with DynamoDB Streams.	2025-10-30 07:40:31 +01:00
Tomasz Grabiec	f6dfea2fb1	alternator: Do not set RF for zero-token DCs That will fail with tablets because it won't be able to allocate replicas.	2025-10-29 23:32:58 +01:00
Tomasz Grabiec	21db21af7e	alternator: Switch keyspace creation to use ks_prop_defs So that we get the same validation and option post-processing as during regular keyspace creation. RF auto-expansion logic happens in ks_prop_defs, and we want that for tablets.	2025-10-29 23:32:58 +01:00
Nadav Har'El	aa34f0b875	alternator: fix CDC events for TTL expiration In commit `a3ec6c7d1d` we supposedly implemented the feature of telling TTL experation events from regular user-sent deletions. However, that implementation did not actually work at all... It had two bugs: 1. It created an null rjson::value() instead of an empty dictionary rjson::empty_object(), so GetRecords failed every time such a TTL expiration event was generated. 2. In the output, it used lowercase field names "type" and "principalId" instead of the uppercase "Type" and "PrincipalId". This is not the correct capitalization, and when boto3 recieves such incorrect fields it silently deletes them and never passes them to the user's get_records() call. This patch fixes those two bugs, and importantly - enables a test for this feature. We did already have such a test but it was marked as "veryslow" so doesn't run in CI and apparently not even run once to check the new feature. This test is not actually very long on Alternator when the TTL period is set very low (as we do in our tests), so I replaced the "veryslow" marker by "waits_for_expiration". The latter marker means that the test is still very slow - as much as half an hour - on DynamoDB - but runs quickly on Scylla in our test setup, and enabled in CI by default. The enabled test failed badly before this patch (a server error during GetRecords), and passes with this patch. Also, the aforementioned commit forgot to remove the paragraph in Alternator's compatibility.md that claims we don't have that feature yet. So we do it now. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#26633	2025-10-29 17:08:20 +01:00
Nadav Har'El	51186b2f2c	alternator: add alternator_warn_authorization config Before this patch, the configuration alternator_enforce_authorization is a boolean: true means enforce authentication checks (i.e., each request is signed by a valid user) and authorization checks (the user who signed the request is allowed by RBAC to perform this request). This patch adds a second boolean configuration option, alternator_warn_authorization. When alternator_enforce_authorization is false but alternator_warn_authorization is true, authentication and authorization checks are performed as in enforce mode, but failures are ignored and counted in two new metrics: scylla_alternator_authentication_failures scylla_alternator_authorization_failures additionally,also each authentication or authorization error is logged as a WARN-level log message. Some users prefer those log messages over metrics, as the log messages contain additional information about the failure that can be useful - such as the address of the misconfigured client, or the username attempted in the request. All combinations of the two configuration options are allowed: * If just "enforce" is true, auth failures cause a request failure. The failures are counted, but not logged. * If both "enforce" and "warn" are true, auth failures cause a request failure. The failures are both counted and logged. * If just "warn" is true, auth failures are ignored (the request is allowed to compelete) but are counted and logged. * If neither "enforce" nor "warn" are true, no authentication or authorization check are done at all. So we don't know about failures, so naturally we don't count them and don't log them. This patch is fairly straightforward, doing mainly the following things: 1. Add an alternator_warn_authorization config parameter. 2. Make sure alternator_enforce_authorization is live-updatable (we'll use this in a test in the next patch). It "almost" was, but a typo prevented the live update from working properly. 3. Add the two new metrics, and increment them in every type of authentication or authorization error. Some code that needs to increment these new metrics didn't have access to the "stats" object, so we had to pass it around more. 4. Add log messages when alternator_warn_authorization is true. 5. If alternator_enforce_authorization is false, allow the auth check to allow the request to proceed (after having counted and/or logged the auth error). A separate patch will follow and add documentation suggesting to users how to use the new "warn" options to safely switch between non-enforcing to enforcing mode. Another patch will add tests for the new configuration options, new metrics and new log messages. Fixes #25308. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-10-29 11:16:26 +02:00
Nadav Har'El	c3593462a4	alternator: improve protection against oversized requests Following DynamoDB, Alternator also places a 16 MB limit on the size of a request. Such a limit is necessary to avoid running out of memory - because the AWS message authentication protocol requires reading the entire request into memory before its signature can be verified. Our implementation for this limit used Seastar's HTTP server's content_length_limit feature. However, this Seastar feature is incomplete - it only works when the request uses the Content-Length header, and doesn't do anything if the request doesn't have a Content-Length (it may use chunked encoding, or have no length at all). So malicious users can cause Scylla to OOM by sending a huge request without a Content-Length. So in this patch we stop using the incomplete Seastar feature, and implement the length limit in Scylla in a way that works correctly with or without Content-Length: We read from the input stream and if we go over 16MB, we generate an error. Because we dropped Seastar's protection against a long Content-Length, we also need to fix a piece of code which used Content-Length to reserve some semaphore units to prevent reading many large requests in parallel. We fix two problems in the code: 1. If Content-Length is over the limit, we shouldn't attempt to reserve semaphore units - this should just be a Payload Too Large error. 2. If Content-Length is missing, the existing code did nothing and had a TODO that we should. In this patch we implement what was suggested in that TODO: We temporarily reserve the whole 16 MB limit, and after reading the actual request, we return part of the reservation according to the real request size. That last fix is important, because typically the largest requests will be BatchWriteItem where a well-written client would want to use chunked encoding, not Content-Length, to avoid materializing the entire request up-front. For such clients, the memory use semaphore did nothing, and now it does the right thing. Note that this patch does not solve the problem #12166 that existed with Seastar's length-limiting implementation but still exists in the new in-Scylla length-limiting implementation: The fact we send an error response in the middle of the request and then close the connection, while the client continues to send the request, can lead to an RST being sent by the server kernel. Usually this will be fine - well-written client libraries will be able to read the response before the RST. But even with a well-written library in some rare timings the client may get the RST before the response, and will miss the response, and get an empty or partial response or "connection reset by peer". This issue existed before this patch, and still exists, but is probably of minor impact. Fixes #8196 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23434	2025-10-28 15:24:46 +03:00
Radosław Cybulski	ea6b22f461	Add max trace size output configuration variable In #24031 users complained, that trace message is truncated, namely it's no longer json parsable and table name might not be part of the output. This path enables users to configure maximum size of trace message. In case user wanted `table` name, but didn't care about message size, #26634 will help. - add configuration varable `alternator_max_users_query_size_in_trace_output` with default value of 4096 (4 times old default value). - modify `truncated_content_view` function to use new configuration variable for truncation limit - update `truncated_content_view` to consistently truncate at given size, previously trunctation would also happen when data arrived in more than one chunk - update `truncated_content_view` to better handle truncated value (limit number of copies) - fix `scylla_config_read` call - call to `query` for a configuration name that is not existing will return `Items` array empty (but present) - this would raise array access exception few lines below. - add test Refs #26634 Refs #24031 Closes scylladb/scylladb#26618	2025-10-28 13:29:15 +03:00
Nadav Har'El	7c9f5ef59e	Merge 'alternator/executor: instantly mark view as built when creating it with base table' from Michał Jadwiszczak `CreateTable` request creates GSI/LSI together with the base table, the base table is empty and we don't need to actually build the view. In tablet-based keyspaces we can just don't create view building tasks and mark the view build status as SUCCESS on all nodes. Then, the view building worker on each node will mark the view as built in `system.built_views` (`view_building_worker::update_built_views()`). Vnode-based keyspaces will use the "old" logic of view builder, which will process the view and mark it as built. Fixes scylladb/scylladb#26615 This fix should be backported to 2025.4. Closes scylladb/scylladb#26657 * github.com:scylladb/scylladb: test/alternator/test_tablets: add test for GSI backfill with tablets test/alternator/test_tablets: add reproducer for GSI with tablets alternator/executor: instantly mark view as built when creating it with base table	2025-10-22 10:44:28 +03:00
Michał Jadwiszczak	8fbf122277	alternator/executor: instantly mark view as built when creating it with base table `CreateTable` request creates GSI/LSI together with the base table, the base table is empty and we don't need to actually build the view. In tablet-based keyspaces we can just don't create view building tasks and mark the view build status as SUCCESS on all nodes. Then, the view building worker on each node will mark the view as built in `system.built_views` (`view_building_worker::update_built_views()`). Vnode-based keyspaces will use the "old" logic of view builder, which will process the view and mark it as built. Fixes scylladb/scylladb#26615	2025-10-22 00:05:40 +02:00
Michał Hudobski	5c957e83cb	vector_search: remove dependence on cql3 This patch removes the dependence of vector search module on the cql3 module by moving the contents of cql3/type_json.hh to types/json_utils.hh and removing the usage of cql3 primary_key object in vector_store_client. We also make the needed adjustments to files that were previously using the afformentioned type_json.hh file. This fixes the circular dependency cql3 <-> vector_search. Closes scylladb/scylladb#26482	2025-10-21 17:41:55 +03:00
Piotr Wieczorek	a3ec6c7d1d	alternator/streams: Support userIdentity field for TTL deletions UserIdentity is a map of two fields in GetRecords responses, which always has the same value. It may be missing, or contain a constant object with value `{"type": "Service", "principalId": "dynamodb.amazonaws.com"}`. Currently, the latter is set only for `REMOVE`s triggered by TTL. This commit introduces two new CDC operation types: `service_row_delete` and `service_partition_delete`, emitted in place of `row_delete` and `partition_delete`. Alternator Streams treats them as regular `REMOVE`s, but in addition adds the `userIdentity` field to the record. This change may break existing Scylla libraries for reading raw CDC tables, but we doubt that anybody has this use case. Refs https://github.com/scylladb/scylladb/pull/26149 Refs https://github.com/scylladb/scylladb/pull/26121 Fixes https://github.com/scylladb/scylladb/issues/11523 Closes scylladb/scylladb#26460	2025-10-20 17:15:59 +02:00
Piotr Dulikowski	a716fab125	Merge 'alternator/metrics: Log operation sizes to histograms' from Piotr Wieczorek This PR adds operation per-table histograms to Alternator with item sizes involved in an operation, for each of the operations: `GetItem`, `PutItem`, `DeleteItem`, `UpdateItem`, `BatchGetItem`, `BatchWriteItem`. If read-before-write wasn't performed (i.e. it was not needed by the operation and the flag `alternator_force_read_before_write` was disabled), then we log sizes of the items that are in the request. Also, `UpdateItem` logs the maximum of the update size and the existing item size. We'll change it in a next PR. Fixes: #25143 Closes scylladb/scylladb#25529 * github.com:scylladb/scylladb: alternator: Add UpdateItem and BatchWriteItem response size metrics alternator: Add PutItem and DeleteItem response size metrics alternator: Add BatchGetItem response size metrics alternator: Add GetItem response size metrics alternator/test: Add more context to test_metrics.py asserts	2025-10-20 10:03:31 +03:00

1 2 3 4 5 ...

1012 Commits