Commit Graph

441 Commits

Author SHA1 Message Date
Botond Dénes
3158e9b017 doc: reorganize properties in config.cc and config.hh
This commit moves the "Ungrouped properties" category to the end of the
properties list. The properties are now published in the documentation,
and it doesn't look good if the list starts with ungrouped properties.

This patch was taken over from Anna Stuchlik <anna.stuchlik@scylladb.com>.

Closes scylladb/scylladb#28343
2026-01-29 11:27:42 +03:00
Łukasz Paszkowski
21348050e8 config: Add parameters to control reads' preemptive_abort_factor 2026-01-28 14:20:01 +01:00
Botond Dénes
7d2e6c0170 Merge 'config: add enforce_rack_list option' from Aleksandra Martyniuk
Add enforce_rack_list option. When the option is set to true,
all tablet keyspaces have rack list replication factor.

When the option is on:
- CREATE STATEMENT always auto-extends rf to rack lists;
- ALTER STATEMENT fails when there is numeric rf in any DC.

The flag is set to false by default and a node needs to be restarted
in order to change its value. Starting a node with enforce_rack_list
option will fail, if there are any tablet keyspaces with numeric rf
in any DC.

enforce_rack_list is a per-node option and a user needs to ensure
that no tablet keyspace is altered or created while nodes in
the cluster don't have the consistent value.

Mark rf_rack_valid_keyspaces as deprecated.

Fixes: https://github.com/scylladb/scylladb/issues/26399.

New feature; no backport needed

Closes scylladb/scylladb#28084

* github.com:scylladb/scylladb:
  test: add test for enforce_rack_list option
  db: mark rf_rack_valid_keyspaces as deprecated
  config: add enforce_rack_list option
  Revert "alternator: require rf_rack_valid_keyspaces when creating index"
2026-01-22 10:27:35 +02:00
Aleksandra Martyniuk
761ace4f05 config: add enforce_rack_list option
Add enforce_rack_list option. When the option is set to true,
all tablet keyspaces have rack list replication factor.

When the option is on:
- CREATE STATEMENT always auto-extends rf to rack lists;
- ALTER STATEMENT fails when there is numeric rf in any DC.

The flag is set to false by default and a node needs to be restarted
in order to change its value. Starting a node with enforce_rack_list
option will fail, if there are any tablet keyspaces with numeric rf
in any DC.

enforce_rack_list is a per-node option and a user needs to ensure
that no tablet keyspace is altered or created while nodes in
the cluster don't have the consistent value.
2026-01-20 09:58:51 +01:00
Nikos Dragazis
76b2d0f961 db: config: Add accessor for sstable_compression_user_table_options
The `sstable_compression_user_table_options` config option determines
the default compression settings for user tables.

In patch 2fc812a1b9, the default value of this option was changed from
LZ4 to LZ4WithDicts and a fallback logic was introduced during startup
to temporarily revert the option to LZ4 until the dictionary compression
feature is enabled.

Replace this fallback logic with an accessor that returns the correct
settings depending on the feature flag. This is cleaner and more
consistent with the way we handle the `sstable_format` option, where the
same problem appears (see `get_preferred_sstable_version()`).

As a consequence, the configuration option must always be accessed
through this accessor. Add a comment to point this out.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2026-01-13 18:30:38 +02:00
Avi Kivity
66aee0fb5e alternator: add optional listeners for proxy protocol v2
Following 954f2cbd2f, which added proxy protocol v2 listeners
for CQL, we do the same for alternator. We add two optional ports
for plain and TLS-wrapped HTTP.

We test each new port, that the old ports still work, and that
mixing up a port with no proxy protocol and a connection with proxy
protocol (or the opposite) fails. The latter serves to show
that the testing strategy is valid and doesn't just pass whatever
happens. We also verify that the correct addresses (and TLS mode)
show up in system.clients.

Closes scylladb/scylladb#27889
2026-01-13 09:59:24 +02:00
Asias He
7ba7b25bdd repair: Implement auto repair for tablet repair
This patch implements the basic auto repair support for tablet repair.

It was decided to add no per table configuration for the initial
implementation, so two scylla yaml config options are introduced to set
the default auto repair configs for all the tablet tables.

- auto_repair_enabled_default

Set true to enable auto repair for tablet tables by default. The value
will be overridden by the per keyspace or per table configuration which
is not implemented yet.

- auto_repair_threshold_default_in_seconds

Set the default time in seconds for the auto repair threshold for tablet
tables. If the time since last repair is bigger than the configured
time, the tablet is eligible for auto repair. The value will be
overridden by the per keyspace or per table configuration which is not
implemented yet.

The following metrcis are added:

- auto_repair_needs_repair_nr

The number of tablets with auto repair enabled that needs repair

- auto_repair_enabled_nr

The number of tablets with auto repair enabled

The metrics are useful to tell if auto repair is falling behind.

In the future, more auto repair scheduling will be added, e.g.,
scheduling based on the repaired and unrepaired sstable set size,
tombstone ratio and so on, in addition to the time based scheduling.

Fixes SCYLLADB-99
2026-01-09 16:11:39 +08:00
Szymon Malewski
ec329f85b0 alternator/http_compression: Add handling of Accept-Encoding header
This is an initial patch to add support of Alternator's compressed responses.
The actual compression (gzip,deflate) will be added in the following commits.
The main functionality added in this commmit is parsing of Accept-Encoding header,
that indicates compression algorithms supported by the client.
In this commit we add also configuration parameters of response gzip/deflate compression.
They allow to enable/disable compression, set level and a size threshold below which a response is not compressed.
With current implementation it is possible to decide a compression for each response, but it is not used yet.
2026-01-05 10:14:40 +01:00
Tomasz Grabiec
bbf9ce18ef Merge 'load_balancer: compute node load based on tablet sizes' from Ferenc Szili
Currently, the tablet load balancer performs capacity based balancing by collecting the gross disk capacity of the nodes, and computes balance assuming that all tablet sizes are the same.

This change introduces size-based load balancing. The load balancer does not assume identical tablet sizes any more, and computes load based on actual tablet sizes.

The size-based load balancer computes the difference between the most and least loaded nodes in the balancing set (nodes in DC, or nodes in a rack in case of `rf-rack-valid-keyspaces`) and stops further balancing if this difference is bellow the config option `size_based_balance_threshold_percentage`.

This config option does not apply to the absolute load, but instead to the percentage of how much the most loaded node is more loaded than the least loaded node:

`delta = (most_loaded - least_loaded) / most_loaded`

If this delta is smaller then the config threshold, the balancer will consider the nodes balanced.

This PR is a part of a series of PRs which are based on top of each other.

- First part for tablet size collection via load_stats: #26035
- Second part reconcile load_stats: #26152
- The third part for load_sketch changes: #26153
- The fourth part which performs tablet load balancing based on tablet size: #26254
- The fifth part changes the load balancing simulator: #26438

This is a new feature, backport is not needed.

Fixes #26254

Closes scylladb/scylladb#26254

* github.com:scylladb/scylladb:
  test, load balancing: add test for table balance
  load_balancer: add cluster feature for size based balancing
  load_balancer: implement size-based load balancing
  config: add size based load balancing config params
  load_stats: use trinfo to decide how to reconcile tablet size
  load_sketch: use tablet sizes in load computation
  load_stats: add get_tablet_size_in_transition()
2025-12-29 15:01:38 +01:00
Radosław Cybulski
a532fc73bc Add alternator_describe_table_info_cache_validity_in_seconds config option
Add a `alternator_describe_table_info_cache_validity_in_seconds`
configuration option with default value of 6 hours.
2025-12-29 08:33:05 +01:00
Ferenc Szili
cc9e125f12 config: add size based load balancing config params
This change adds:

- The config paremeter force_capacity_based_balancing which, when
  enabled performs capacity based balancing instead of size based.
- The config parameter size_based_balance_threshold_percentage which
  sets the balance threshold for the size based load balancer.
- The config parameter minimal_tablet_size_for_balancing which sets the
  minimal tablet size for the load balancer.
2025-12-27 10:37:38 +01:00
Avi Kivity
1382b47d45 config, transport: support proxy protocol v2 enhanced connections
We have four native transport ports: two for plain/TLS, and two
more for shard-aware (plain/TLS as well). Add four more that expect
the proxy protocol v2 header. This allows nodes behind a reverse
proxy to record the correct source address and port in system.clients,
and the shard-aware port to see the correct source port selection
made my the client.
2025-12-17 14:18:04 +02:00
Szymon Wasik
4f803aad22 Improve documentation of vector search configuration parameters.
This patch adds separate group for vector search parameters in the
documentation and fixes small typos and formatting.

Fixes: SCYLLADB-77.

Closes scylladb/scylladb#27385
2025-12-03 21:02:59 +02:00
Piotr Dulikowski
44c605e59c Merge 'Fix the types of change events in Alternator Streams' from Piotr Wieczorek
This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in https://github.com/scylladb/scylladb/issues/6918) into Alternator. The main changes are:
- introduce a new flag `alternator_streams_strict_compatibility`, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams,
- reduce splitting of simple Alternator mutations,
- correctly distinguish event types described in #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs.

To summarize, the emitted events of the data manipulation operations should be as follows:
- DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK)
- DeleteItem of nonexistent item: nothing (OK)
- BatchWriteItem.DeleteItem of nonexistent item: nothing (OK)
- PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK)
- PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK)
- PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK)

No backport is necessary.

Refs https://github.com/scylladb/scylladb/pull/26149
Refs https://github.com/scylladb/scylladb/pull/26396
Refs https://github.com/scylladb/scylladb/issues/26382
Fixes https://github.com/scylladb/scylladb/issues/6918

Closes scylladb/scylladb#26121

* github.com:scylladb/scylladb:
  test/alternator: Enable the tests failing because of #6918
  alternator, cdc: Don't emit events for no-op removes
  alternator, cdc: Don't emit an event for equal items
  alternator/streams, cdc: Differentiate item replace and item update in CDC
  alternator: Change the return type of rmw_operation_return
  config: Add alternator_streams_strict_compatibility flag
  cdc: Don't split a row marker away from row cells
2025-11-30 07:20:22 +01:00
Botond Dénes
384bffb8da Merge 'compaction: limit the maximum shares allocated to a compaction scheduling class' from Raphael Raph Carvalho
This PR adds support for limiting the maximum shares allocated to a
compaction scheduling class by the compaction controller. It introduces
a new configuration parameter, compaction_max_shares, which, when set
to a non zero value, will cap the shares allocated to compaction jobs.
This PR also exposes the shares computed by the compaction controller
via metrics, for observability purposes.

Fixes https://github.com/scylladb/scylladb/issues/9431

Enhancement. No need to backport.

NOTE: Replaces PR https://github.com/scylladb/scylladb/pull/26696

Ran a test in which the backlog raised the need for max shares (normalized backlog above normalization_factor), and played with different values for new option compaction_max_shares to see it works (500, 1000, 2000, 250, 50)

Closes scylladb/scylladb#27024

* github.com:scylladb/scylladb:
  db/config: introduce new config parameter `compaction_max_shares`
  compaction_manager:config: introduce max_shares
  compaction_controller: add configurable maximum shares
  compaction_controller: introduce `set_max_shares()`
2025-11-26 06:51:30 +02:00
Lakshmi Narayanan Sreethar
9cb766f929 db/config: introduce new config parameter compaction_max_shares
Add support for the new configuration parameter `compaction_max_shares`,
and update the compaction manager to pass it down to the compaction
controller when it changes. The shares allocated to compaction jobs will
be limited by this new parameter.

Fixes #9431

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
2025-11-24 12:52:29 -03:00
Karol Nowacki
c40b3ba4b3 vector_search: Add HTTPS support for vector store connections
This commit introduces TLS encryption support for vector store connections.
A new configuration option is added:
- vector_store_encryption_options.truststore: path to the trust store file

To enable secure connections, use the https:// scheme in the
vector_store_primary_uri/vector_store_secondary_uri configuration options.

Fixes: VECTOR-327
2025-11-22 08:18:45 +01:00
Karol Nowacki
104de44a8d vector_search: Add support for secondary vector store clients
This change adds support for secondary vector store clients, typically
located in different availability zones. Secondary clients serve as
fallback targets when all primary clients are unavailable.
New configuration option allows specifying secondary client addresses
and ports.

Fixes: VECTOR-187

Closes scylladb/scylladb#26484
2025-11-20 08:37:18 +01:00
Piotr Wieczorek
ffdc8d49c7 config: Add alternator_streams_strict_compatibility flag
With this flag enabled, Alternator Streams produces more accurate event
types:
- nop operations (i.e. replacing an item with an identical one, deleting
  a nonexistent item) don't produce an event,
- updates of an existing item produce a MODIFY event, instead of INSERT,
- etc.

This flag affects the internal behaviour of some operations, i.e.
Alternator may select a preimage and propagate it to CDC (in contrary to
CDC making the request), or do extra item comparisons (i.e. compare the
existing item with the new one). These operations may be costly, and
users that don't use Streams won't need them.

This flag is live-updatable. An operation reads this flag once, and uses
its value for the entire operation.
2025-10-30 07:40:31 +01:00
Nadav Har'El
51186b2f2c alternator: add alternator_warn_authorization config
Before this patch, the configuration alternator_enforce_authorization
is a boolean: true means enforce authentication checks (i.e., each
request is signed by a valid user) and authorization checks (the user
who signed the request is allowed by RBAC to perform this request).

This patch adds a second boolean configuration option,
alternator_warn_authorization. When alternator_enforce_authorization
is false but alternator_warn_authorization is true, authentication and
authorization checks are performed as in enforce mode, but failures
are ignored and counted in two new metrics:

    scylla_alternator_authentication_failures
    scylla_alternator_authorization_failures

additionally,also each authentication or authorization error is logged as
a WARN-level log message. Some users prefer those log messages over
metrics, as the log messages contain additional information about the
failure that can be useful - such as the address of the misconfigured
client, or the username attempted in the request.

All combinations of the two configuration options are allowed:
 * If just "enforce" is true, auth failures cause a request failure.
   The failures are counted, but not logged.
 * If both "enforce" and "warn" are true, auth failures cause a request
   failure. The failures are both counted and logged.
 * If just "warn" is true, auth failures are ignored (the request
   is allowed to compelete) but are counted and logged.
 * If neither "enforce" nor "warn" are true, no authentication or
   authorization check are done at all. So we don't know about failures,
   so naturally we don't count them and don't log them.

This patch is fairly straightforward, doing mainly the following
things:

1. Add an alternator_warn_authorization config parameter.

2. Make sure alternator_enforce_authorization is live-updatable (we'll
   use this in a test in the next patch). It "almost" was, but a typo
   prevented the live update from working properly.

3. Add the two new metrics, and increment them in every type of
   authentication or authorization error.
   Some code that needs to increment these new metrics didn't have
   access to the "stats" object, so we had to pass it around more.

4. Add log messages when alternator_warn_authorization is true.

5. If alternator_enforce_authorization is false, allow the auth check
   to allow the request to proceed (after having counted and/or logged
   the auth error).

A separate patch will follow and add documentation suggesting to users
how to use the new "warn" options to safely switch between non-enforcing
to enforcing mode. Another patch will add tests for the new configuration
options, new metrics and new log messages.

Fixes #25308.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-10-29 11:16:26 +02:00
Radosław Cybulski
ea6b22f461 Add max trace size output configuration variable
In #24031 users complained, that trace message is truncated, namely it's
no longer json parsable and table name might not be part of the output.
This path enables users to configure maximum size of trace message.
In case user wanted `table` name, but didn't care about message size,
 #26634 will help.

- add configuration varable `alternator_max_users_query_size_in_trace_output`
   with default value of 4096 (4 times old default value).
- modify `truncated_content_view` function to use new configuration
  variable for truncation limit
- update `truncated_content_view` to consistently truncate at given
  size, previously trunctation would also happen when data arrived in
  more than one chunk
- update `truncated_content_view` to better handle truncated value
  (limit number of copies)
- fix `scylla_config_read` call - call to `query` for a configuration
  name that is not existing will return `Items` array empty
  (but present) - this would raise array access exception few lines
  below.
- add test

Refs #26634
Refs #24031

Closes scylladb/scylladb#26618
2025-10-28 13:29:15 +03:00
Pavel Emelyanov
44ed3bbb7c Merge 'RFC: Initial GCP storage backend for scylla (sstables + backup)' from Calle Wilund
Integrates GCP object storage as a working storage backend for scylla sstables as well as backup storage.

Adds an abstraction layer (atm very heavily designed around the s3 client interface and usage) to allow the "storage" etc layers of sstable management to pick transparently between "s3" and "gs" providers.

This modifies the scylla config such that endpoints can optionally (through a "type" param) ref a GS backend.
Similarly with storage_options.

Also adds some IO wrapping primitives to make it more feasible to place some logic at a mid level of the implementation stack (such as making networked storage files, ranged reading etc).

Test s3 fixture is replaced (where appropriate) with an `object_storage` fixture that multiplexes the test across both backends.
Unit tests are duplicated and for the GS versions use a boost test fixture for GCS, default local fake.

Fixes #25359
Fixes #26453

Closes scylladb/scylladb#26186

* github.com:scylladb/scylladb:
  docs::dev::object_storage: Add some initial info on GS storage
  docs/dev: Add mention of (nested) docker usage in testing.md
  sstables::object_storage_client: Forward memory limit semaphore to GS instance
  utils::gcp::object_storage: Add optional memory limits to up/download
  sstables::object_storage_client: Add multi-upload support for GS
  utils::gcp::storage: Add merge objects operation
  test_backup/test_basic: Make tests multiplex both s3 and gs backends
  test::cluster::conftest: Add support for multiple object storage backends
  boost::gcs_storage_test: reindent
  boost::gcs_storage_test: Convert to use fixture
  tests::boost: Add GS object storage cases to mirror S3 ones
  tests::lib::gcs_fixture: Add a reusable test fixture for real/fake GS/GCS
  tests::lib::test_utils: Add overloads/helpers for reading and (temp) writing env
  sstables::object_storage_client: Add google storage implementation
  test_services: Allow testing with GS object storage parameters
  utils::gcp::gcp_credentials: Add option to create uninitialized credentials
  utils::gcp::object_storage: Make create_download_source return seekable_data_source
  utils::gcp::object_storage: Add defensive copies of string_view params
  utils::gcp::object_storage: Add missing retry backoff increate
  utils::gcp::object_storage: Add timestamp to object listing
  utils::gcp::object_storage: Add paging support to list_objects
  object_storage_client: Add object_name wrapper type
  utils::gcp::object_storage: Add optional abort_source
  utils::rest::client: Add abort_source support
  sstables: Use object_storage_client for remote storage
  sstables::object_storage_client: Add abstraction layer for OS cliens (s3 initial)
  s3::upload_progress: Promote to general util type
  storage_options: Abstract s3 to "object_storage" and add gs as option
  sstables::file_io_extension: Change "creator" callback to just data_source
  utils::io-wrappers: Add ranged data_source
  utils::io-wrappers: Add file wrapper type for seekable_source
  utils::seekable_source: Add a seekable IO source type
  object_storage_endpoint_param: Add gs storage as option
  config: break out object_storage_endpoint_param preparing for multi storage
2025-10-20 13:14:53 +03:00
Tomasz Grabiec
c4a87453a2 Merge 'Add experimental feature flag for strongly consistent tables and extend kesypace creation syntax to allow specifying consistency mode.' from Gleb Natapov
The series adds an experimental flag for strongly consistent tables  and extends "CREATE KEYSPACE" ddl with `consistency` option that allows specifying the consistency mode for the keyspace.

Closes scylladb/scylladb#26116

* github.com:scylladb/scylladb:
  schema: Allow configuring consistency setting for a keyspace
  db: experimental consistent-tablets option
2025-10-16 21:48:06 +02:00
Nadav Har'El
921d07a26b cql: make SELECT's "internal page size" configurable
In some uses of SELECT, such as aggregation (sum() et al.), GROUP BY or
secondary index, it needs to perform internal scans. It uses an "internal
page size" which before this patch was always DEFAULT_COUNT_PAGE_SIZE = 10000.

There was an ad-hoc and undocumented way to override this default in C++
tests, using functions in test/lib/select_statement_utils.hh, but it
was so non-obvious that the test that most needed to override this
default - the very slow test test_indexing_paging_and_aggregation which
would have been must faster with a lower setting - never used it.

So in this patch we replace the ad-hoc configuration functions by a
bona-fide Scylla configuration option named "select_internal_page_size".

The few C++ tests that used the old configuration functions were
modified to use the new configuration parameters. The slow test
test_indexing_paging_and_aggregation still doesn't use the new
configuration to become faster - we'll do this in the next patch.

Another benefit of having this "internal page size" as a configuration
option is that one day a user might realize that the default choice
10,000 is bad for some reason (which I can't envision right now), so
having it configurable might come it handy.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-10-15 18:42:09 +03:00
Gleb Natapov
eb9112a4a2 db: experimental consistent-tablets option
The option will be used to hid consistent tablets feature until it is
ready.
2025-10-15 11:27:10 +03:00
Calle Wilund
78d9dda060 config: break out object_storage_endpoint_param preparing for multi storage
Moves the config wrapper to own file (to reduce recompilation for modifying)
and refactors to handle extending this parameter to non-s3 endpoint configs.
2025-10-13 08:53:24 +00:00
Piotr Dulikowski
e7907b173a Merge 'db/view: Require rf_rack_valid_keyspaces when creating materialized view' from Dawid Mędrek
Materialized views are currently in the experimental phase and using them
in tablet-based keyspaces requires starting Scylla with an experimental feature,
`views-with-tablets`. Any attempts to create a materialized view or secondary
index when it's not enabled will fail with an appropriate error.

After considerable effort, we're drawing close to bringing views out of the
experimental phase, and the experimental feature will no longer be needed.
However, materialized views in tablet-based keyspaces will still be restricted,
and creating them will only be possible after enabling the configuration option
`rf_rack_valid_keyspaces`. That's what we do in this PR.

In this patch, we adjust existing tests in the tree to work with the new
restriction. That shouldn't have been necessary because we've already seemingly
adjusted all of them to work with the configuration option, but some tests hid
well. We fix that mistake now.

After that, we introduce the new restriction. What's more, when starting Scylla,
we verify that there is no materialized view that would violate the contract.
If there are some that do, we list them, notify the user, and refuse to start.

High-level implementation strategy:

1. Name the restrictions in form of a function.
2. Adjust existing tests.
3. Restrict materialized views by both the experimental feature
   and the configuration option. Add validation test.
4. Drop the requirement for the experimental feature. Adjust the added test
   and add a new one.
5. Update the user documentation.

Fixes scylladb/scylladb#23030

Backport: 2025.4, as we are aiming to support materialized views for tablets from that version.

Closes scylladb/scylladb#25802

* github.com:scylladb/scylladb:
  view: Stop requiring experimental feature
  db/view: Verify valid configuration for tablet-based views
  db/view: Require rf_rack_valid_keyspaces when creating view
  test/cluster/random_failures: Skip creating secondary indexes
  test/cluster/mv: Mark test_mv_rf_change as skipped
  test/cluster: Adjust MV tests to RF-rack-validity
  test/boost/schema_loader_test.cc: Explicitly enable rf_rack_valid_keyspaces
  db/view: Name requirement for views with tablets
2025-10-06 12:46:46 +02:00
Dawid Mędrek
b409e85c20 view: Stop requiring experimental feature
We modify the requirements for using materialized views in tablet-based
keyspaces. Before, it was necessary to enable the configuration option
`rf_rack_valid_keyspaces`, having the cluster feature `VIEWS_WITH_TABLETS`
enabled, and using the experimental feature `views-with-tablets`.
We drop the last requirement.

We adjust code to that change and provide a new validation test.
We also update the user documentation to reflect the changes.

Fixes scylladb/scylladb#23030
2025-10-01 09:01:53 +02:00
Nadav Har'El
926089746b message: move RPC compression from utils/ to message/
The directory utils/ is supposed to contain general-purpose utility
classes and functions, which are either already used across the project,
or are designed to be used across the project.

This patch moves 8 files out of utils/:

    utils/advanced_rpc_compressor.hh
    utils/advanced_rpc_compressor.cc
    utils/advanced_rpc_compressor_protocol.hh
    utils/stream_compressor.hh
    utils/stream_compressor.cc
    utils/dict_trainer.cc
    utils/dict_trainer.hh
    utils/shared_dict.hh

These 8 files together implement the compression feature of RPC.
None of them are used by any other Scylla component (e.g., sstables have
a different compression), or are ready to be used by another component,
so this patch moves all of them into message/, where RPC is implemented.

Theoretically, we may want in the future to use this cluster of classes
for some other component, but even then, we shouldn't just have these
files individually in utils/ - these are not useful stand-alone
utilities. One cannot use "shared_dict.hh" assuming it is some sort of
general-purpose shared hash table or something - it is completely
specific to compression and zstd, and specifically to its use in those
other classes.

Beyond moving these 8 files, this patch also contains changes to:
1. Fix includes to the 5 moved header files (.hh).
2. Fix configure.py, utils/CMakeLists.txt and message/CMakeLists.txt
   for the three moved source files (.cc).
3. In the moved files, change from the "utils::" namespace, to the
   "netw::" namespace used by RPC. Also needed to change a bunch
   of callers for the new namespace. Also, had to add "utils::"
   explicitly in several places which previously assumed the
   current namespace is "utils::".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25149
2025-09-30 17:03:09 +03:00
Nadav Har'El
1aef733d48 Merge 'Alternator/cache expressions' from Szymon Malewski
Before this patch, every expression in Alternator's requests was parsed from string to adequate structure.
This patch enables caching, where input expression strings are mapped to parsed template structures.
Every new valid (parsable) expression is added to the cache. The cache has limited (configurable) size - when it is reached, the least recently used entry is removed.
When requested expression is in the cache, the copy of the template is returned - individual instances still need to be resolved (placeholders substituted with names and values).
Caching is implemented for all expression types. The cache is per shard - shared for all operations, expression types, tables, users.
Default cache size is 2000 entries per shard and it has configuration option `alternator_max_expression_cache_entries_per_shard` (0 means cache disabled).
Basic metrics (total count of hits and misses for each expression type and number of evicted enries) are implemented.
Cache features are tested in boost unit tests and overall expression caching is tested with Python tests - both mostly rely on metrics.

refs #5023
`perf-alternator` test shows improvement (median):
| test | throughput | instructions_per_op | cpu_cycles_per_op | allocs_per_op |
| ------ | ---------------- | ----------------------------- | --------------------------- | -------------------  |
| read | +6.0% | -8.5% | -7.0% | -4.9% |
| write | +13.4% | -17.6% | -14.7% | -7.4% |
| write(lwt) | +12.7% | -7.9% | -6.9% | -2.8% |
| write_rwm | +5.4% | -10.5% | -7.3% | -4.1% |

"read" had a ProjectionExpression with 10 column names, "write" had a UpdateExpression with 10 column names and "write_rmw" had both ConditionExpression and UpdateExpression.

This patch also includes minor refactoring of other expressions related tests (https://github.com/scylladb/scylladb/issues/22494) - use `test_table_ss` instead of `test_table`.
Fixes #25855.

This is new feature - no backporting.

Closes scylladb/scylladb#25176

* github.com:scylladb/scylladb:
  alternator: use expression caching
  alternator: adds expression cache implementation
  utils: extend lru_string_map
  utils: add lru_string_map
  alternator/expressions: error on parsing empty update expression
  alternator/expressions: fix single value condition expression parsing
  test/alternator: use `test_table_ss` instead of `test_table` in expressions related tests.
2025-09-29 11:36:31 +03:00
Szymon Malewski
6ce7843774 alternator: use expression caching
Before this patch, every expression in Alternator's requests was parsed from string to adequate structure.
This patch enables caching - all calls to parse an expression (all types) are proxied through the cache.
New expression is added to the cache, the least recently used entry (above cache size) is removed.
For existing entries the copy of the template is returned - individual instances still need to be resolved (placeholders substituted with names and values).
The cache is per shard - shared for all operations, expression types, tables, users.
Default cache size is 2000 entries per shard and it has configuration option `alternator_max_expression_cache_entries_per_shard` (0 means cache disabled).
Added Python tests are based on metrics.
2025-09-28 04:27:44 +02:00
Nikos Dragazis
e1d9c83406 db/config: Add SSTable compression options for user tables
ScyllaDB offers the `compression` DDL property for configuring
compression per user table (compression algorithm and chunk size). If
not specified, the default compression algorithm is the LZ4Compressor
with a 4KiB chunk size (refer to the default constructor for
`compression_parameters`). The same default applies to system tables as
well.

Add a new configuration option to allow customizing the default for user
tables. Use the previously hardcoded default as the new option's default
value.

Note that the option has no effect on ALTER TABLE statements. An altered
table either inherits explicit compression options from the CQL
statement, or maintains its existing options.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
2025-09-26 12:02:00 +03:00
Karol Nowacki
eedf506be5 vector_store_client: Rename vector_store_uri to vector_store_primary_uri
The configuration setting vector_store_uri is renamed to
vector_store_primary_uri according to the final design.
In the future, the vector_store_secondary_uri setting will
be introduced.

This setting now also accepts a comma-separated list of URIs to prepare
for future support for redundancy and load balancing. Currently, only the
first URI in the list is used.

This change must be included before the next release.
Otherwise, users will be affected by a breaking change.

References: VECTOR-187

Closes scylladb/scylladb#26033
2025-09-21 16:33:10 +03:00
Łukasz Paszkowski
535c901e50 config: Add critical_disk_utilization_level option
The option defines the threshold at which the defensive mechanisms
preventing nodes from running out of space, e.g. rejecting user
writes shall be activated.

Its default value is 98% of the disk capacity.
2025-08-28 18:06:37 +02:00
Nadav Har'El
0a990d2a48 config: split tri_mode_restriction to a separate header
Today, any source file or header file that wants to use the
tri_mode_restriction type needs to include db/config.hh, which is a
large and frequently-changing header file. In this patch we split this
type into a separate header file, db/tri_mode_restriction.hh, and avoid
a few unnecessary inclusions of db/config.hh. However, a few source
files now need to explicitly include db/config.hh, after its
transitive inclusion is gone.

Note that the overwhelmingly common inclusion of db/config.hh continues
to be a problem after this patch - 128 source files include it directly.
So this patch is just the first step in long journey.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25692
2025-08-27 13:47:04 +03:00
Nadav Har'El
87dd96f9a2 Merge ' Alternator: DynamoDB compatible WCU Calculation via Read-Before-Write Support' from Amnon Heiman
This series adds support for a DynamoDB-compatible Write Capacity Unit (WCU) calculation in Alternator by introducing an optional forced read-before-write mechanism.

Alternator's model differs from DynamoDB, and as a result, some write operations may report lower WCU usage compared to what DynamoDB would report. While this is acceptable in many cases, there are scenarios where users may require accurate WCU reporting that aligns more closely with DynamoDB's behavior.

To address this, a new configuration option, alternator_force_read_before_write, is introduced. When enabled, Alternator will perform a read before executing PutItem, UpdateItem, and DeleteItem operations. This allows it to take the existing item size into account when computing the WCU. BatchWriteItem support is also extended to use this mechanism. Because BatchWriteItem does not support returning old items directly, several internal changes were made to support reading previous item sizes with minimal overhead. Reads are performed at consistency level LOCAL_ONE for efficiency, and the WCU calculation is now done in multiple stages to accurately account for item size differences.

In addition to the implementation changes, test coverage was added to validate the new behavior. These tests confirm that WCU is calculated based on the larger of the old and new items when read-before-write is active, including for BatchWriteItem.

This feature comes with performance overhead and is therefore disabled by default. It can be enabled at runtime via the system.config table and should be used only when precise WCU tracking is necessary.
**New feature, no need to backport**

Closes scylladb/scylladb#24436

* github.com:scylladb/scylladb:
  alternator/test_returnconsumedcapacity.py: Test forced read before write
  alternator/executor.cc: DynamoDB WCU calculation in BatchWriteItem using read-before-write
  executor.cc: get_previous_item with consistency level
  executor: Extend API of put_or_delete_item
  alternator/executor.cc: Accurate WCU for put, update, delete
  config: add alternator_force_read_before_write
2025-08-24 11:38:24 +03:00
Ran Regev
ebf1db5c5e remove ./redis and dependencies
Remove ./redis and all its usages.
This is the second commit that removes
./redis from Scylla

Signed-off-by: Ran Regev <ran.regev@scylladb.com>
2025-08-20 17:53:23 +03:00
Nadav Har'El
a896e2dbb9 alternator: add optional support for writing to system table
In commit 44a1daf we added the ability to read system tables through
the DynamoDB API (actually, the Scan and Query requests only).
This ability is useful for tests, and can also be useful to users who
want to read information that is only available through system tables.

This patch adds support also for *writing* into system tables. This will
be useful for Alternator tests, were we want to temporarily change
some live-updatable configuration option - and so far haven't been
able to do that like we did do in some cql-pytest tests.

For reasons explained in issue #23218, only superuser roles are allowed to
write to system tables - it is not enough for the role to be granted
MODIFY permissions on the system table or on ALL KEYSPACES. Moreover,
the ability to modify system tables carries special risks, so this
patch only allows writes to the system tables if a new configuration
option "alternator_allow_system_table_write" turned on. This option is
turned off by default.

This patch also includes a test for this new configuration-writing
capability. The test scripts test/alternator/run and test.py now
run Scylla with alternator_allow_system_table_write turned on, but
the new test can also run without this option, and will be skipped
in that case (to allow running the test suite against some manually-
run instance of Scylla).

Fixes: #12348

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-08-06 10:00:04 +03:00
Amnon Heiman
94a3556be5 config: add alternator_force_read_before_write
This patch introduces a new configuration parameter,
`alternator_force_read_before_write`, which forces Alternator to perform
a read-before-write on all write operations (`PutItem`, `UpdateItem`, and
`DeleteItem`), even when not strictly required.

Enabling this option ensures abetter DynamoDB compatibility in WCU
calculation. by accounting for the size of the existing item, as done
in DynamoDB.

This option introduces performance overhead and should be used with care.

The parameter is runtime-configurable and can be toggled via CQL:

UPDATE system.config SET value = 'true' WHERE name = 'alternator_force_read_before_write';
2025-08-05 08:31:00 +03:00
Piotr Dulikowski
ec7832cc84 Merge 'Raft-based recovery procedure: simplify rolling restart with recovery_leader' from Patryk Jędrzejczak
The following steps are performed in sequence as part of the
Raft-based recovery procedure:
- set `recovery_leader` to the host ID of the recovery leader in
  `scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- perform a rolling restart (with the recovery leader being restarted
  first).

These steps are not intuitive and more complicated than they could be.

In this PR, we simplify these steps. From now on, we will be able to
simply set `recovery_leader` on each node just before restarting it.

Apart from making necessary changes in the code, we also update all
tests of the Raft-based recovery procedure and the user-facing
documentation.

Fixes scylladb/scylladb#25015

The Raft-based procedure was added in 2025.2. This PR makes the
procedure simpler and less error-prone, so it should be backported
to 2025.2 and 2025.3.

Closes scylladb/scylladb#25032

* github.com:scylladb/scylladb:
  docs: document the option to set recovery_leader later
  test: delay setting recovery_leader in the recovery procedure tests
  gossip: add recovery_leader to gossip_digest_syn
  db: system_keyspace: peers_table_read_fixup: remove rows with null host_id
  db/config, gms/gossiper: change recovery_leader to UUID
  db/config, utils: allow using UUID as a config option
2025-08-04 08:29:32 +02:00
Taras Veretilnyk
1d6808aec4 topology_coordinator: Make tablet_load_stats_refresh_interval configurable
This commits introduces an config option 'tablet_load_stats_refresh_interval_in_seconds'
that allows overriding the default value without using error injection.

Fixes scylladb/scylladb#24641

Closes scylladb/scylladb#24746
2025-07-31 14:31:55 +03:00
Sergey Zolotukhin
ea311be12b generic_server: Two-step connection shutdown.
When shutting down in `generic_server`, connections are now closed in two steps.
First, only the RX (receive) side is shut down. Then, after all ongoing requests
are completed, or a timeout happened the connections are fully closed.

Fixes scylladb/scylladb#24481
2025-07-28 10:08:06 +02:00
Patryk Jędrzejczak
445a15ff45 db/config, gms/gossiper: change recovery_leader to UUID
We change the type of the `recovery_leader` config parameter and
`gossip_config::recovery_leader` from sstring to UUID. `recovery_leader`
is supposed to store host ID, so UUID is a natural choice.

After changing the type to UUID, if the user provides an incorrect UUID,
parsing `recovery_leader` will fail early, but the start-up will
continue. Outside the recovery procedure, `recovery_leader` will then be
ignored. In the recovery procedure, the start-up will fail on:

```
throw std::runtime_error(
        "Cannot start - Raft-based topology has been enabled but persistent group 0 ID is not present. "
        "If you are trying to run the Raft-based recovery procedure, you must set recovery_leader.");
```
2025-07-23 15:36:56 +02:00
Patryk Jędrzejczak
ec69028907 db/config, utils: allow using UUID as a config option
We change the `recovery_leader` option to UUID in the following commit.
2025-07-23 15:36:45 +02:00
Pawel Pery
7bf53fc908 vector_store_client: implement initial vector_store_client service
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.

It adds a `services/vector_store_client.{cc|hh}` sharded service and a
configuration parameter `vector_store_uri` with a
`http://vector-store.dns.name:port` format. If there will be an error
during parsing that parameter there will be an exception during
construction.

For the future unit testing purposes the patch adds
`vector_store_client_tester` as a way to inject mockup functionality.

This service will be used by the select statements for the Vector search
indexes (see VS-46). For this reason I've added vector_store_client
service in the query processor.

Reference: VS-47 VS-45
2025-07-08 16:29:55 +02:00
Michał Chojnowski
7d26d3c7cb db/config: add an option that disables dict-aware sstable compressors in DDL statements
For reasons, we want to be able to disallow dictionary-aware compressors
in chosen deployments.

This patch adds a knob for that. When the knob is disabled,
dictionary-aware compressors will be rejected in the validation
stage of CREATE and ALTER statements.

Closes scylladb/scylladb#24355
2025-06-09 13:30:40 +03:00
Kefu Chai
a33651b03e db, service: do not include unused header
these unused headers were flagged by clang-include-cleaner.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23735
2025-04-17 11:49:59 +03:00
Nadav Har'El
fbcf77d134 raft: make group0 Raft operation timeout configurable
A recent commit 370707b111 (re)introduced
a timeout for every group0 Raft operation. This timeout was set to 60
seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody".

However, one of the things we do as a group0 operation is schema
changes, and we already noticed a few years ago, see commit
0b2cf21932, that in some extremely
overloaded test machines where tests run hundreds of times (!) slower
than usual, a single big schema operation - such as Alternator's
DeleteTable deleting a table and multiple of its CDC or view tables -
sometimes takes more than 60 seconds. The above fix changed the
client's timeout to wait for 300 seconds instead of 60 seconds,
but now we also need to increase our Raft timeout, or the server can
time out. We've seen this happening recently making some tests flaky
in CI (issue #23543).

So let's make this timeout configurable, as a new configuration option
group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e,
60 seconds), the same as the existing default. The test framework
overrides this default with a a higher 300 second timeout, matching
the client-side timeout.

Before this patch, this timeout was already configurable in a strange
way, using injections. But this was a misstep: We already have more
than a dozen timeouts configurable through the normal configration,
and this one should have been configured in the same way. There is
nothing "holy" about the default of 60 seconds we chose, and who
knows maybe in the future we might need to tweek it in the field,
just like we made the other timeouts tweakable. Injections cannot
be used in release mode, but configuration options can.

Fixes #23543

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23717
2025-04-15 10:57:39 +03:00
Avi Kivity
ed3e4f33fd Merge 'generic_server: throttle and shed incoming connections according to semaphore limit' from Marcin Maliszkiewicz
Adds new live updatable config: uninitialized_connections_semaphore_cpu_concurrency.

It should help to reduce cpu usage by limiting cpu concurrency for new connections.  As a last resort when those connections are waiting for initial processing too long (over 1m) they are shed.

New connections_shed and connections_blocked metrics are added for tracking.

Testing:
 - manually via simple program creating high number of connection and constantly re-connecting
 - added benchmark

Following are benchmark results:

Before:
```
> build/release/test/perf/perf_generic_server --smp=1
170101.41 tps ( 13.1 allocs/op,   0.0 logallocs/op,   7.0 tasks/op,    4695 insns/op,    3178 cycles/op,        0 errors)
[...]
throughput: mean=173850.06 standard-deviation=1844.48 median=174509.66 median-absolute-deviation=874.23 maximum=175087.49 minimum=170588.54
instructions_per_op: mean=4725.59 standard-deviation=13.35 median=4729.38 median-absolute-deviation=12.49 maximum=4738.61 minimum=4709.96
  cpu_cycles_per_op: mean=3135.08 standard-deviation=32.13 median=3122.68 median-absolute-deviation=22.29 maximum=3179.38 minimum=3103.15
```

After:
```
> build/release/test/perf/perf_generic_server --smp=1
167373.19 tps ( 13.1 allocs/op,   0.0 logallocs/op,   7.0 tasks/op,    4821 insns/op,    3371 cycles/op,        0 errors)
[...]
throughput:
  mean=   171199.55 standard-deviation=2484.58
  median= 171667.06 median-absolute-deviation=2087.63
  maximum=173689.11 minimum=167904.76
instructions_per_op:
  mean=   4801.90 standard-deviation=16.54
  median= 4796.78 median-absolute-deviation=9.32
  maximum=4830.71 minimum=4789.81
cpu_cycles_per_op:
  mean=   3245.26 standard-deviation=32.28
  median= 3230.44 median-absolute-deviation=16.52
  maximum=3297.39 minimum=3215.62
```

The patch adds around 67 insns/op so it's effect on performance should be negligible.

Fixes: https://github.com/scylladb/scylladb/issues/22844

Closes scylladb/scylladb#22828

* github.com:scylladb/scylladb:
  transport: move on_connection_close into connection destructor
  test: perf: make aggregated_perf_results formatting more human readable
  transport: add blocked and shed connection metrics
  generic_server: throttle and shed incoming connections according to semaphore limit
  generic_server: add data source and sink wrappers bookkeeping network IO
  generic_server: coroutinize part of server::do_accepts
  test: add benchmark for generic_server
  test: perf: add option to count multiple ops per time_parallel iteration
  generic_server: add semaphore for limiting new connections concurrency
  generic_server: add config to the constructor
  generic_server: add on_connection_ready handler
2025-04-09 21:41:38 +03:00
Marcin Maliszkiewicz
ed82bede39 generic_server: add semaphore for limiting new connections concurrency
It will be used in following commits.
2025-04-09 10:30:58 +02:00