The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.
Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.
This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:
The repaired_at on disk field of a SSTable is used.
- A 64-bit number increases sequentially
The sstables_repaired_at is added to the system.tablets table.
- repaired_at <= sstables_repaired_at means the sstable is repaired
The being_repaired in memory field of a SSTable is added.
- A repair UUID tells which sstable has participated in the repair
Initial test results:
1) Medium dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~500GB
Cluster pre-populated with ~500GB of data before starting repairs job.
Results for Repair Timings:
The regular repair run took 210 mins.
Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
The speedup is: 183 mins / 48s = 228X
2) Small dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~167GB
Cluster pre-populated with ~167GB of data before starting the repairs job.
Regular repair 1st run took 110s, 2nd and 3rd runs took 110s.
Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
The speedup is: 110s / 1.5s = 73X
3) Large dataset results
Node amount: 6
Instance type: i4i.2xlarge, 3 racks
50% of base load, 50% read/write
Dataset == Sum of data on each node
Dataset Non-incremental repair (minutes)
1.3 TiB 31:07
3.5 TiB 25:10
5.0 TiB 19:03
6.3 TiB 31:42
Dataset Incremental repair (minutes)
1.3 TiB 24:32
3.0 TiB 13:06
4.0 TiB 5:23
4.8 TiB 7:14
5.6 TiB 3:58
6.3 TiB 7:33
7.0 TiB 6:55
Fixes#22472Closesscylladb/scylladb#24291
* github.com:scylladb/scylladb:
replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
compaction: Move compaction_reenabler to compaction_reenabler.hh
topology_coordinator: Make rpc::remote_verb_error to warning level
repair: Add metrics for sstable bytes read and skipped from sstables
test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
test.py: Add tests for tablet incremental repair
repair: Add tablet incremental repair support
compaction: Add tablet incremental repair support
feature_service: Add TABLET_INCREMENTAL_REPAIR feature
tablet_allocator: Add tablet_force_tablet_count_increase and decrease
repair: Add incremental helpers
sstable: Add being_repaired to sstable
sstables: Add set_repaired_at to metadata_collector
mutation_compactor: Introduce add operator to compaction_stats
tablet: Add sstables_repaired_at to system.tablets table
test: Fix drain api in task_manager_client.py
When creating a new keyspace, both replication strategy and replication
factor must be stated. For example:
`CREATE KEYSPACE ks WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };`
This syntax is verbose, and in all but some testing scenarios
`NetworkTopologyStrategy` is used.
This patch allows skipping replication strategy name, filling it with
`NetworkTopologyStrategy` when that happens. The following syntax is now
valid:
`CREATE KEYSPACE ks WITH REPLICATION = { 'replication_factor' : 3 };`
and will give the same result as the previous, more explicit one.
Fixes https://github.com/scylladb/scylladb/issues/16029
Backport is not needed. This is an enhancement for future releases.
Closesscylladb/scylladb#25236
* github.com:scylladb/scylladb:
docs/cql: update documentation for default replication strategy
test/cqlpy: add keyspace creation default strategy test
cql3: add default replication strategy to `create_keyspace_statement`
Update create-keyspace-statement section of ddl.rst since `class` is no longer mandatory.
Add an example for keyspace creation without specifying `class`.
Refs: #16029
https://github.com/scylladb/scylladb/issues/24962 introduced memtable overlap checks to cache tombstone GC. This was observed to be very strict and greatly reduce the effectiveness of tombstone GC in the cache, especially for MV workloads, which regularly recycle old timestamp into new writes, so the memtable often has smaller min live timestamp than the timestamp of the tombstones in the cache.
When creating a new memtable, save a snapshot of the tombstone gc state. This snapshot is used later to exclude this memtable from overlap checks for tombstones, whose token have an expiry time larger than that of the tombstone, meaning: all writes in this memtable were produced at a point in time when the current tombstone has already expired. This has the following implications:
* The partition the tombstone is part of was already repaired at the time the memtable was created.
* All writes in the memtable were produced *after* this tombstone's expiry time, these writes cannot be possibly relevant for this tombstone.
Based on this, such memtables are excluded from the overlap checks. With adequately frequent memtable flushes -- so that the tombstone gc state snapshot is refreshed -- most memtables should be excluded from overlap checks, greatly helping the cache's tombstone GC efficiency.
Fixes: https://github.com/scylladb/scylladb/issues/24962
Fixes a regression introduced by https://github.com/scylladb/scylladb/pull/23255 which was backported to all releases, needs backport to all releases as well
Closesscylladb/scylladb#25033
* github.com:scylladb/scylladb:
docs/dev/tombstone.md: document the memtable overlap check elision optimization
test/boost/row_cache_test: add test for memtable overlap check elision
db/cache_mutation_reader: obtain gc-before and min-live-ts lazily
mutation/mutation_compactor: use max_purgeable::can_purge and max_purgeable::purge_result
db/cache_mutation_reader: use max_purgeable::can_purge()
replica/table: get_max_purgeable_fn_for_cache_underlying_reader(): use max_purgable::combine()
replica/database: memtable_list::get_max_purgeable(): set expiry-treshold
compaction/compaction_garbage_collector: max_purgeable: add expiry_treshold
replica/table: propagate gc_state to memtable_list
replica/memtable_list: add tombstone_gc_state* member
replica/memtable: add tombstone_gc_state_snapshot
tombstone_gc: introduce tombstone_gc_state_snapshot
tombstone_gc: extract shared state into shared_tombstone_gc_state
tombstone_gc: per_table_history_maps::_group0_gc_time: make it a value
tombstone_gc: fold get_group0_gc_time() into its caller
tombstone_gc: fold get_or_create_group0_gc_time() into update_group0_refresh_time()
tombstone_gc: fold get_or_create_repair_history_for_table() into update_repair_time()
tombstone_gc: refactor get_or_greate_repair_history_for_table()
replica/memtable_list: s/min_live_timestamp()/get_max_purgeable()/
db/read_context: return max_purgeable from get_max_purgeable()
compaction/compaction_garbage_collector: add formatter for max_purgeable
mutation: move definition of gc symbols to compaction.cc
compaction/compaction_garbage_collector: refactor max_purgeable into a class
test/boost/row_cache_test: refactor test_populating_reader_tombstone_gc_with_data_in_memtable
test: rewrite test_compacting_reader_tombstone_gc_with_data_in_memtable in C++
test/boost/row_cache_test: refactor cache tombstone GC with memtable overlap tests
This instruction adds additional safety. The faster we notice that
a node didn't restart properly, the better.
The old gossip-based recovery procedure had a similar recommendation
to verify that each restarting node entered `RECOVERY` mode.
Fixes#25375
This is a documentation improvement. We should backport it to all
branches with the new recovery procedure, so 2025.2 and 2025.3.
Closesscylladb/scylladb#25376
In commit 44a1daf we added the ability to read Scylla system tables with Alternator. This feature is useful, among other things, in tests that want to read Scylla's configuration through the system table system.config. But tests often want to modify system.config, e.g., to temporarily reduce some threshold to make tests shorter. Until now, this was not possible
This series add supports for writing to system tables through Alternator, and examples of tests using this capability (and utility functions to make it easy).
Because the ability to write to system tables may have non-obvious security consequences, it is turned off by default and needs to be enabled with a new configuration option "alternator_allow_system_table_write"
No backports are necessary - this feature is only intended for tests. We may later decide to backport if we want to backport new tests, but I think the probability we'll want to do this is low.
Fixes#12348Closesscylladb/scylladb#19147
* github.com:scylladb/scylladb:
test/alternator: utility functions for changing configuration
alternator: add optional support for writing to system table
test/alternator: reduce duplicated code
Added a new POST endpoint `/storage_service/drop_quarantined_sstables` to the REST API.
This endpoint allows dropping all quarantined SSTables either globally or
for a specific keyspace and tables.
Optional query parameters `keyspace` and `tables` (comma-separated table names) can be
provided to limit the scope of the operation.
Fixesscylladb/scylladb#19061
Backport is not required, it is new functionality
Closesscylladb/scylladb#25063
* github.com:scylladb/scylladb:
docs: Add documentation for the nodetool dropquarantinedsstables command
nodetool: add command for dropping quarantine sstables
rest_api: add endpoint which drops all quarantined sstables
In commit 44a1daf we added the ability to read system tables through
the DynamoDB API (actually, the Scan and Query requests only).
This ability is useful for tests, and can also be useful to users who
want to read information that is only available through system tables.
This patch adds support also for *writing* into system tables. This will
be useful for Alternator tests, were we want to temporarily change
some live-updatable configuration option - and so far haven't been
able to do that like we did do in some cql-pytest tests.
For reasons explained in issue #23218, only superuser roles are allowed to
write to system tables - it is not enough for the role to be granted
MODIFY permissions on the system table or on ALL KEYSPACES. Moreover,
the ability to modify system tables carries special risks, so this
patch only allows writes to the system tables if a new configuration
option "alternator_allow_system_table_write" turned on. This option is
turned off by default.
This patch also includes a test for this new configuration-writing
capability. The test scripts test/alternator/run and test.py now
run Scylla with alternator_allow_system_table_write turned on, but
the new test can also run without this option, and will be skipped
in that case (to allow running the test suite against some manually-
run instance of Scylla).
Fixes: #12348
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this series, the "system.clients" virtual table lists active connections (and their various properties, like client address, logged in username and client version) only for CQL requests. This series adds also Alternator clients to system.clients. One of the interesting use cases of this new feature is understanding exactly which SDK a user is using -without inspecting their application code. Different SDKs pass different "User-Agent" headers in requests, and that User-Agent will be visible in the system.clients entries for Alternator requests as the "driver_name" field.
Unlike CQL where logged in username, driver name, etc. applies to a complete connection, in the Alternator API, different requests can theoretically be signed by different users and carry different headers but still arrive over the same HTTP connection. So instead of listing the currently open Alternator *connections*, we will list the currently active *requests*.
The first three patches introduce utilities that will be useful in the implementation. The fourth patch is the implementation itself (which is quite simple with the utility introduced in the second patch), and the fifth patch a regression test for the new feature. The sixth patch adds documentation, the seventh patch refactors generic_server to use the newly introduced utility class and reduce code duplication, and the eighth patch adds a small check to an existing check of CQL's system.clients.
Fixes#24993
This patch adds a new feature, so doesn't require a backport. Nevertheless, if we want it to get to existing customers more quickly to allow us to better understand their use case by reading the system.clients table, we may want to consider backporting this patch to existing branches. There is some risk involved in this patch, because it adds code that gets run on every Alternator request, so a bug on it can cause problems for every Alternator request.
Closesscylladb/scylladb#25178
* github.com:scylladb/scylladb:
test/cqlpy: slightly strengthen test for system.clients
generic_server: use utils::scoped_item_list
docs/alternator: document the system.clients system table in Alternator
alternator: add test for Alternator clients in system.clients
alternator: list active Alternator requests in system.clients
utils: unit test for utils::scoped_item_list
utils: add a scoped_item_list utility class
utils: add "fatal" version of utils::on_internal_error()
The following steps are performed in sequence as part of the
Raft-based recovery procedure:
- set `recovery_leader` to the host ID of the recovery leader in
`scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- perform a rolling restart (with the recovery leader being restarted
first).
These steps are not intuitive and more complicated than they could be.
In this PR, we simplify these steps. From now on, we will be able to
simply set `recovery_leader` on each node just before restarting it.
Apart from making necessary changes in the code, we also update all
tests of the Raft-based recovery procedure and the user-facing
documentation.
Fixesscylladb/scylladb#25015
The Raft-based procedure was added in 2025.2. This PR makes the
procedure simpler and less error-prone, so it should be backported
to 2025.2 and 2025.3.
Closesscylladb/scylladb#25032
* github.com:scylladb/scylladb:
docs: document the option to set recovery_leader later
test: delay setting recovery_leader in the recovery procedure tests
gossip: add recovery_leader to gossip_digest_syn
db: system_keyspace: peers_table_read_fixup: remove rows with null host_id
db/config, gms/gossiper: change recovery_leader to UUID
db/config, utils: allow using UUID as a config option
Add to docs/alternator/new-apis.md a full description of the
`system.clients` support in Alternator that was added in the previous
patches.
Although arguably *all* Scylla system tables should work on Alternator
and do not need to be individually documented, I believe that this
specific table, is interesting to document. This is because some of
the attributes in this table have non-obvious and Alternator-specific
meanings. Moreover, there's even a diffence in what each individual
item in the table represents (it represents active requests, not entire
connections as in CQL).
While editing the system tables section of new-apis.md, this patch also slightly
improves its formatting.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Add in docs/alternator/compatibility.md a mention of the ShardFilter
option which we don't support in Alternator Streams. This option was
only introduced to DynamoDB a week ago, so it's not surprising we
don't yet support it :-)
Refs #25160Closesscylladb/scylladb#25161
Documentation had outdated information how to run C++ test.
Additionally, some information added about gathered test metrics.
Closesscylladb/scylladb#25180
This PR adds the upgrade guide from version 2025.2 to 2025.3.
Also, it removes the upgrade guide existing for the previous version
that is irrelevant in 2025.2 (upgrade from 2025.1 to 2025.2).
Note that the new guide does not include the "Enable Consistent Topology Updates" page and note,
as users upgrading to 2025.3 have consistent topology updates already enabled.
Fixes https://github.com/scylladb/scylladb/issues/24696Closesscylladb/scylladb#25219
This commit:
- Extends the Drivers support table with information on which driver supports tablets
and since which version.
- Adds the driver support policy to the Drivers page.
- Reorganizes the Drivers page to accommodate the updates.
In addition:
- The CPP-over-Rust driver is added to the table.
- The information about Serverless (which we don't support) is removed
and replaced with tablets to correctly describe the contents of the table.
Fixes https://github.com/scylladb/scylladb/issues/19471
Refs https://github.com/scylladb/scylladb-docs-homepage/issues/69Closesscylladb/scylladb#24635
This PR enables **LWT (Lightweight Transactions)** support for tablet-based tables by leveraging **colocated tables**.
Currently, storing Paxos state in system tables causes two major issues:
* **Loss of Paxos state during tablet migration or base table rebuilds**
* When a tablet is migrated or the base table is rebuilt, system tables don't retain Paxos state.
* This breaks LWT correctness in certain scenarios.
* Failing test cases demonstrating this:
* test_lwt_state_is_preserved_on_tablet_migration
* test_lwt_state_is_preserved_on_rebuild
* **Shard misalignment and performance overhead**
* Tablets may be placed on arbitrary shards by the tablet balancer.
* Accessing Paxos state in system tables could require a shard jump, degrading performance.
We move Paxos state into a dedicated Paxos table, colocated with the base table:
* Each base table gets its own Paxos state table.
* This table is lazily created on the first LWT operation.
* Its tablets are colocated with those of the base table, ensuring:
* Co-migration during tablet movement
* Co-rebuilding with the base table
* Shard alignment for local access to Paxos state
Some reasoning for why this is sufficient to preserve LWT correctness is discussed in [2].
This PR addresses two issues from the "Why doesn't it work for tablets" section in [1]:
* Tablet migration vs LWT correctness
* Paxos table sharding
Other issues ("bounce to shard" and "locking for intranode_migration") have already been resolved in previous PRs.
References
[1] - [LWT over tablets design](https://docs.google.com/document/d/1CPm0N9XFUcZ8zILpTkfP5O4EtlwGsXg_TU4-1m7dTuM/edit?tab=t.0#heading=h.goufx7gx24yu)
[2] - [LWT: Paxos state and tablet balancer](https://docs.google.com/document/d/1-xubDo612GGgguc0khCj5ukmMGgLGCLWLIeG6GtHTY4/edit?tab=t.0)
[3] - [Colocated tables PR](https://github.com/scylladb/scylladb/pull/22906#issuecomment-3027123886)
[4] - [Possible LWT consistency violations after a topology change](https://github.com/scylladb/scylladb/issues/5251)
Backport: not needed because this is a new feature.
Closesscylladb/scylladb#24819
* github.com:scylladb/scylladb:
create_keyspace: fix warning for tablets
docs: fix lwt.rst
docs: fix tablets.rst
alternator: enable LWT
random_failures: enable execute_lwt_transaction
test_tablets_lwt: add test_paxos_state_table_permissions
test_tablets_lwt: add test_lwt_for_tablets_is_not_supported_without_raft
test_tablets_lwt: test timeout creating paxos state table
test_tablets_lwt: add test_lwt_concurrent_base_table_recreation
test_tablets_lwt: add test_lwt_state_is_preserved_on_rebuild
test_tablets_lwt: migrate test_lwt_support_with_tablets
test_tablets_lwt: add test_lwt_state_is_preserved_on_tablet_migration
test_tablets_lwt: add simple test for LWT
check_internal_table_permissions: handle Paxos state tables
client_state: extract check_internal_table_permissions
paxos_store: handle base table removal
database: get_base_table_for_tablet_colocation: handle paxos state table
paxos_state: use node_local_only mode to access paxos state
query_options: add node_local_only mode
storage_proxy: handle node_local_only in query
storage_proxy: handle node_local_only in mutate
storage_proxy: introduce node_local_only flag
abstract_replication_strategy: remove unused using
storage_proxy: add coordinator_mutate_options
storage_proxy: rename create_write_response_handler -> make_write_response_handler
storage_proxy: simplify mutate_prepare
paxos_state: lazily create paxos state table
migration_manager: add timeout to start_group0_operation and announce
paxos_store: use non-internal queries
qp: make make_internal_options public
paxos_store: conditional cf_id filter
paxos_store: coroutinize
feature_service: add LWT_WITH_TABLETS feature
paxos_state: inline system_keyspace functions into paxos_store
paxos_state: extract state access functions into paxos_store
This PR introduces a new Key Provider to support Azure Key Vault as a Key Management System (KMS) for Encryption at Rest. The core design principle is the same as in the AWS and GCP key providers - an externally provided Vault key that is used to protect local data encryption keys (a process known as "key wrapping").
In more detail, this patch series consists of:
* Multiple Azure credential sources, offering a variety of authentication options (Service Principals, Managed Identities, environment variables, Azure CLI).
* The Azure host - the Key Vault endpoint bridge.
* The Azure Key Provider - the interface for the Azure host.
* Unit tests using real Azure resources (credentials and Vault keys).
* Log filtering logic to not expose sensitive data in the logs (plaintext keys, credentials, access tokens).
This is part of the overall effort to support Azure deployments.
Testing done:
* Unit tests.
* Manual test on an Azure VM with a Managed Identity.
* Manual test with credentials from Azure CLI.
* Manual test of `--azure-hosts` cmdline option.
* Manual test of log filtering.
Remaining items:
- [x] Create necessary Azure resources for CI.
- [x] Merge pipeline changes (https://github.com/scylladb/scylla-pkg/pull/5201).
Closes https://github.com/scylladb/scylla-enterprise/issues/1077.
New feature. No backport is needed.
Closesscylladb/scylladb#23920
* github.com:scylladb/scylladb:
docs: Document the Azure Key Provider
test: Add tests for Azure Key Provider
pylib: Add mock server for Azure Key Vault
encryption: Define and enable Azure Key Provider
encryption: azure: Delegate hosts to shard 0
encryption: Add Azure host cache
encryption: Add config options for Azure hosts
encryption: azure: Add override options
encryption: azure: Add retries for transient errors
encryption: azure: Implement init()
encryption: azure: Implement get_key_by_id()
encryption: azure: Add id-based key cache
encryption: azure: Implement get_or_create_key()
encryption: azure: Add credentials in Azure host
encryption: azure: Add attribute-based key cache
encryption: azure: Add skeleton for Azure host
encryption: Templatize get_{kmip,kms,gcp}_host()
encryption: gcp: Fix typo in docstring
utils: azure: Get access token with default credentials
utils: azure: Get access token from Azure CLI
utils: azure: Get access token from IMDS
utils: azure: Get access token with SP certificate
utils: azure: Get access token with SP secret
utils: rest: Add interface for request/response redaction logic
utils: azure: Declare all Azure credential types
utils: azure: Define interface for Azure credentials
utils: Introduce base64url_{encode,decode}
In one of the previous commits, we made it possible to set
`recovery_leader` on each node just before restarting it. Here, we
update the corresponding documentation.
Extend the EaR ops guide to incorporate the new Azure Key Provider.
Document its options and provide instructions on how to configure it.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Before this series, it is possible to crash Scylla (due to an I/O error) by creating an Alternator table close to the maximum name length of 222, and then enabling Alternator Streams. This series fixes this bug in two ways:
1. On a pre-existing table whose name might be up to 222 characters, enabling Streams will check if the resulting name is too long, and if it is, fail with a clear error instead of crashing. This case will effect pre-existing tables whose name has between 207 and 222 characters (207 is `222 - strlen("_scylla_cdc_log")`) - for such tables enabling Streams will fail, but no longer crash.
2. For new tables, the table name length limit is lowered from 222 to 192. The new limit is still high enough, but ensures it will be possible to enable streams any new table. It will also always be possible to add a GSI for such a table with name up to 29 characters (if the table name is shorter, the GSI name can be longer - the sum can be up to 221 characters).
No need to backport, Alternator Streams is still an experimental feature and this patch just improves the unlikely situation of extremely long table names.
Fixes#24598Closesscylladb/scylladb#24717
* github.com:scylladb/scylladb:
alternator: lower maximum table name length to 192
alternator: don't crash when adding Streams to long table name
alternator: split length limit for regular and auxiliary tables
alternator: avoid needlessly validating table name
This PR extends the KMS host to support temporary AWS security credentials provided externally via the Scylla configuration file, environment variables, or the AWS credentials file.
The KMS host already supports:
* Temporary credentials obtained automatically from the EC2 instance metadata service or via IAM role assumption.
* Long-term credentials provided externally via configuration, environment, or the AWS credentials file.
This PR is about temporary credentials that are external, i.e., not generated by Scylla. Such credentials may be issued, for example, through identity federation (e.g., Okta + gimme-aws-creds).
External temporary credentials are useful for short-lived tasks like local development, debugging corrupted SSTables with `scylla-sstable`, or other local testing scenarios. These credentials are temporary and cannot be refreshed automatically, so this method is not intended for production use.
Documentation has been updated to mention these additional credential sources.
Fixes#22470.
New feature, no backport is needed.
Closesscylladb/scylladb#22465
* github.com:scylladb/scylladb:
doc: Expose new `aws_session_token` option for KMS hosts
kms_host: Support authn with temporary security credentials
encryption_config: Mention environment in credential sources for KMS
This patch is a part of vector_store_client sharded service
implementation for a communication with vector-store service.
It implements a functionality for ANN search request to a vector-store
service. It sends request, receive response and after parsing it returns
the list of primary keys.
It adds json parsing functionality specific for the HTTP ANN API.
It adds a hardcoded http request timeout for retrieving response from
the Vector Store service.
It also adds an automatic boost test of the ANN search interface, which
uses a mockup http server in a background to simulate vector-store
service.
It adds a documentation for HTTP API protocol used used for ANN
functionality.
Fixes: VS-47
Several audit test issues caused test failures, and in the result, almost all of audit syslog tests were marked with xfail.
This patch series enables the syslog audit tests, that should finally pass after the following fixes are introduced:
- bring back commas to audit syslog (scylladb#24410 fix)
- synchronize audit syslog server
- fix parsing of syslog messages
- generate unique uuid for each line in syslog audit
- allow audit logging from multiple nodes
Fixes: scylladb/scylladb#24410
Test improvements, no backport required.
Closesscylladb/scylladb#24553
* github.com:scylladb/scylladb:
test: audit: use automatic comparators in AuditEntry
test: audit: enable syslog audit tests
test: audit: sort new audit entries before comparing with expected ones
test: audit: check audit logging from multiple nodes
test: audit: generate unique uuid for each line in syslog audit
test: audit: fix parsing of syslog messages
test: audit: synchronize audit syslog server
docs: audit: update syslog audit format to the current one
audit: bring back commas to audit syslog
Currently, Alternator allows creating a table with a name up to 222
(max_table_name_length) characters in length. But if you do create
a table with such a long name, you can have some difficulties later:
You you will not be able to add Streams or GSI or LSI to that table,
because 222 is also the absolute maximum length Scylla tables can have
and the auxilliary tables we want to create (CDC log, materialized views)
will go over this absolute limit (max_auxiliary_table_name_length).
This is not nice. DynamoDB users assume that after successfully
creating a table, they can later - perhaps much later - decide to
add Streams or GSI to it, and today if they chose extremely long
names, they won't be able to do this.
So in this patch, we lower max_table_name_length from 222 to 192.
A user will not be able to create tables with longer names, but
the good news is that once successfully creating a table, it will
always be possible to enable Streams on it (the CDC log table has an
extra 15 bytes in its name, and 192 + 15 is less than 222), and it
will be possible to add GSIs with short enough names (if the GSI
name is 29 or less, 192 + 29 + 1 = 222).
This patch is a trivial one-line code change, but also includes the
corrected documentation of the limits, and a fix for one test that
previously checked that a table name with length 222 was allowed -
and now needs to check 192 because 222 is no longer allowed.
Note that if a user has existing tables and upgrades Scylla, it
is possible that some pre-existing Alternator tables might have
lengths over 192 (up to 222). This is fine - in the previous patches
we made sure that even in this case, all operations will still work
correctly on these old tables (by not not validating the name!), and
we also made sure that attempting to enable Streams may fail when
the name is too long (we do not remove those old checks in this patch,
and don't plan to remove them in the forseeable future).
Note that the limit we chose - 192 characters - is identical to the
table name limit we recently chose in CQL. It's nicer that we don't
need to memorize two different limits for Alternator and CQL.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Replacing "from" is incorrect. The typo comes from recently
merged #24583.
Fixes#24732
Requires backport to 2025.2 since #24583 has been backported to 2025.2.
Closesscylladb/scylladb#24733
This PR introduces a new `comparable_bytes` class to add byte-comparable format support for all the [native cql3 data types](https://opensource.docs.scylladb.com/stable/cql/types.html#native-types) except `counter` type as that is not comparable. The byte-comparable format is a pre-requisite for implementing the trie based index format for our sstables(https://github.com/scylladb/scylladb/issues/19191). This implementation adheres to the byte-comparable format specification in https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md
Note that support for composite data types like lists, maps, and sets has not been implemented yet and will be made available in a separate PR.
Refs https://github.com/scylladb/scylladb/issues/19407
New feature - backport not required.
Closesscylladb/scylladb#23541
* github.com:scylladb/scylladb:
types/comparable_bytes: add testcase to verify compatibility with cassandra
types/comparable_bytes: support variable-length natively byte-ordered data types
types/comparable_bytes: support decimal cql3 types
types/comparable_bytes: introduce count_digits() method
types/comparable_bytes: support uuid and timeuuid cql3 types
types/comparable_bytes: support varint cql3 type
types/comparable_bytes: support skipping sign byte write in decode_signed_long_type
types/comparable_bytes: introduce encode/decode_varint_length
types/comparable_bytes: support float and double cql3 types
types/comparable_bytes: support date, time and timestamp cql3 types
types/comparable_bytes: support bigint cql3 type
types/comparable_bytes: support fixed length signed integers
types/comparable_bytes: support boolean cql3 type
types: introduce comparable_bytes class
bytes_ostream: overload write() to support writing from FragmentedView
docs: fix minor typo in docs/dev/cql3-type-mapping.md
Add the option to co-locate tablets of different tables. For example, a base table and its CDC table, or a local index.
main changes and ideas:
* "table group" - a set of one or more tables that should be co-located. (Example: base table and CDC table). A group consists of one base table and zero or more children tables.
* new column `base_table` in `system.tablets`: when creating a new table, it can be set to point to a base table, which the new table's tablets will be co-located with. when it's set, the tablet map information should be retrieved from the base table map. the child map doesn't contain per-tablet information.
* co-located tables always have the same tablet count and the same tablet replicas. each tablet operation - migration, resize, repair - is applied on all tablets in a synchronized manner by the topology coordinator.
* resize decision for a group is made by combining the per-table hints and comparing the average tablet size (over all tablets in the group) with the target tablet size.
* the tablets load balancer works with the base table as a representative of the group. it represents a single migration unit with some `group_size` that is taken into account.
* view tablets are co-located with base tablets when the partition keys match.
Fixes https://github.com/scylladb/scylladb/issues/17043
backport is not needed. this is preliminary work for support of MVs and CDC with tablets.
Closesscylladb/scylladb#22906
* github.com:scylladb/scylladb:
tablets: validate no clustering row mutations on co-located tables
raft_group0_client: extend validate_change to mixed_change type
docs: topology-over-raft: document co-located tables
tablet-mon.py: visual indication for co-located tablets
tablet-mon.py: handle co-located tablets
test/boost/view_schema_test.cc: fix race in wait_until_built
boost/tablets_test: test load balancing and resize of co-located tablets
test/tablets: test tablets colocation
tablets: co-locate view tablets with base when the partition keys match
test/pylib/tablets: common get_tablet_count api
test_mv_tablets: use get_tablet_replicas from common tablets api
test/pylib/tablets: fix test api to read tablet replicas from base table
tablets: allocator: create co-located tables in a single operation
alternator: prepare all new tables in a single announcement
migration_manager: add notification for creating multiple tables
tablets: read_tablet_transition_stage: read from base table
storage service: allow repair request only on base tables
tablets: keyspace_rf_change: apply on base table
storage service: generate tablet migration updates on base tables
tablets: replace all_tables method
tablets: split when all co-located tablets are ready
tablets: load balancer: sizing plan for table groups
tablets: load balancer: handle co-located tablets
tablets: allocate co-located tablets
tablets: handle migration of co-located tablets
storage service: add repair colocated tablets rpc
tablets: save and read tablet metadata of co-located tables
tablets: represent co-located tables in tablet metadata
tablets: add base_table column to system.tablets
docs: update system.tablets schema
ScyllaDB supports non-frozen UDTs since 3.2, no need to keep referencing
this limitation in the current docs. Replace the description of the
limitation with general description of frozen semantics for UDTs.
Fixes: #22929Closesscylladb/scylladb#24763
This commit migrates the Software Bill Of Materials (SBOM) page
added to the Enterprise docs with https://github.com/scylladb/scylla-enterprise/pull/5067.
The only difference is the link to the SBOM files - it was Enterprise SBOM in the Enterprise docs,
while here is a link to the ScyllaDB SBOM.
It's a follow-up of migration to Source Avalable and should be backported
to all Source Available versions - 2025.1 and later.
Fixes https://github.com/scylladb/scylladb/issues/24730Closesscylladb/scylladb#24735