mirror of https://github.com/scylladb/scylladb.git synced 2026-05-13 03:12:13 +00:00

Files

Avi Kivity 85bd6d0114 Merge 'Add multiple-shard persistent metadata storage for strongly consistent tables' from Wojciech Mitros

In this series we introduce new system tables and use them for storing the raft metadata
for strongly consistent tables. In contrast to the previously used raft group0 tables, the
new tables can store data on any shard. The tables also allow specifying the shard where
each partition should reside, which enables the tablets of strongly consistent tables to have
their raft group metadata co-located on the same shard as the tablet replica.

The new tables have almost the same schemas as the raft group0 tables. However, they
have an additional column in their partition keys. The additional column is the shard
that specifies where the data should be located. While a tablet and its corresponding
raft group server resides on some shard, it now writes and reads all requests to the
metadata tables using its shard in addition to the group_id.

The extra partition key column is used by the new partitioner and sharder which allow
this special shard routing. The partitioner encodes the shard in the token and the
sharder decodes the shard from the token. This approach for routing avoids any
additional lookups (for the tablet mapping) during operations on the new tables
and it also doesn't require keeping any state. It also doesn't interact negatively
with resharding - as long as tablets (and their corresponding raft metadata) occupy
some shard, we do not allow starting the node with a shard count lower than the
id of this shard. When increasing the shard count, the routing does not change,
similarly to how tablet allocation doesn't change.

To use the new tables, a new implementation of `raft::persistence` is added. Currently,
it's almost an exact copy of the `raft_sys_table_storage` which just uses the new tables,
but in the future we can modify it with changes specific to metadata (or mutation)
storage for strongly consistent tables. The new storage is used in the `groups_manager`,
which combined with the removal of some `this_shard_id() == 0` checks, allows strongly
consistent tables to be used on all shards.

This approach for making sure that the reads/writes to the new tables end up on the correct shards
won in the balance of complexity/usability/performance against a few other approaches we've considered.
They include:
1. Making the Raft server read/write directly to the database, skipping the sharder, on its shard, while using
the default partitioner/sharder. This approach could let us avoid changing the schema and there should be
no problems for reads and writes performed by the Raft server. However, in this approach we would input
data in tables conflicting with the placement determined by the sharder. As a result, any read going through
the sharder could miss the rows it was supposed to read. Even when reading all shards to find a specific value,
there is a risk of polluting the cache - the rows loaded on incorrect shards may persist in the cache for an unknown
amount of time. The cache may also mistakenly remember that a row is missing, even though it's actually present,
just on an incorrect shard.
Some of the issues with this approach could be worked around using another sharder which always returns
this_shard_id() when asked about a shard. It's not clear how such a sharder would implement a method like
`token_for_next_shard`, and how much simpler it would be compared to the current "identity" sharder.
2. Using a sharder depending on the current allocation of tablets on the node. This approach relies on the
knowledge of group_id -> shard mapping at any point in time in the cluster. For this approach we'd also
need to either add a custom partitioner which encodes the group_id in the token, or we'd need to track the
token(group_id) -> shard mapping. This approach has the benefit over the one used in the series of keeping
the partition key as just group_id. However, it requires more logic, and the access to the live state of the node
in the sharder, and it's not static - the same token may be sharded differently depending on the state of the
node - it shouldn't occur in practice, but if we changed the state of the node before adjusting the table data,
we would be unable to access/fix the stale data without artificially also changing the state of the node.
3. Using metadata tables co-located to the strongly consistent tables. This approach could simplify the
metadata migrations in the future, however it would require additional schema management of all co-located
metadata tables, and it's not even obvious what could be used as the partition key in these tables - some
metadata is per-raft-group, so we couldn't reuse the partition key of the strongly consistent table for it. And
finding and remembering a partition key that is routed to a specific shard is not a simple task. Finally, splits
and merges will most likely need special handling for metadata anyway, so we wouldn't even make use of
co-located table's splits and merges.

Fixes [SCYLLADB-361](https://scylladb.atlassian.net/browse/SCYLLADB-361)

[SCYLLADB-361]: https://scylladb.atlassian.net/browse/SCYLLADB-361?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#28509

* github.com:scylladb/scylladb:
  docs: add strong consistency doc
  test/cluster: add tests for strongly-consistent tables' metadata persistence
  raft: enable multi-shard raft groups for strongly consistent tablets
  test/raft: add unit tests for raft_groups_storage
  raft: add raft_groups_storage persistence class
  db: add system tables for strongly consistent tables' raft groups
  dht: add fixed_shard_partitioner and fixed_shard_sharder
  raft: add group_id -> shard mapping to raft_group_registry
  schema: add with_sharder overload accepting static_sharder reference

2026-03-04 08:55:43 +02:00

_ext

docs: fix local build

2025-12-14 11:48:48 +02:00

_static

doc: add OS support for version 2025.4

2025-10-28 13:29:40 +03:00

_templates

…

_utils

doc: replace the OS Support page with a link to the new location

2026-02-06 11:38:21 +02:00

alternator

doc: remove the tablets limitation for Alternator

2026-02-24 14:24:30 +02:00

architecture

tablet options: Add max_tablet_count tablet option to enforce tablet count upper bounds

2026-03-03 11:19:24 +03:00

cql

tablet options: Add max_tablet_count tablet option to enforce tablet count upper bounds

2026-03-03 11:19:24 +03:00

dev

Merge 'Add multiple-shard persistent metadata storage for strongly consistent tables' from Wojciech Mitros

2026-03-04 08:55:43 +02:00

features

docs/cql: document the new CQL per-row TTL feature

2026-02-25 14:59:44 +02:00

getting-started

doc: remove reduntant Java-related information

2026-02-24 14:37:39 +01:00

docs/cql: document the new CQL per-row TTL feature

2026-02-25 14:59:44 +02:00

operating-scylla

tools/scylla-sstable: introduce scylla sstable split

2026-03-02 15:19:17 +01:00

reference

docs: make the glossary more tablet inclusive

2026-01-19 11:50:13 +03:00

rst_include

…

troubleshooting

doc: remove the troubleshooting section on upgrades from OSS

2026-01-26 14:02:53 +02:00

upgrade

doc: add the upgrade guide from 2025.x to 2026.1

2026-02-27 15:36:34 +02:00

using-scylla

doc: remove references to Open Source

2026-01-13 08:43:26 +02:00

.gitignore

…

conf.py

docs: add metrics generation validation

2025-11-25 15:39:52 +03:00

faq.rst

doc: replace the OS Support page with a link to the new location

2026-02-06 11:38:21 +02:00

index.rst

replace the Driver pages with a link to the new Drivers pages

2025-12-04 10:07:27 +02:00

Makefile

…

poetry.lock

build(deps): bump sphinx-multiversion-scylla in /docs

2026-03-02 14:13:03 +01:00

pyproject.toml

build(deps): bump sphinx-multiversion-scylla in /docs

2026-03-02 14:13:03 +01:00

README-metrics.md

docs: add metrics generation validation

2025-11-25 15:39:52 +03:00

README.md

Clarify documentation build instructions

2025-12-16 06:56:00 +02:00

robots.txt

…

README.md

ScyllaDB Documentation

This repository contains the source files for ScyllaDB documentation.

The dev folder contains developer-oriented documentation related to the ScyllaDB code base. It is not published and is only available via GitHub.
All other folders and files contain user-oriented documentation related to ScyllaDB and are sources for docs.scylladb.com/manual.

To report a documentation bug or suggest an improvement, open an issue in GitHub issues for this project.

To contribute to the documentation, open a GitHub pull request.

Key Guidelines for Contributors

The user documentation is written in reStructuredText (RST) - a plaintext markup language similar to Markdown. If you're not familiar with RST, see ScyllaDB RST Examples.
The developer documentation is written in Markdown. See Basic Markdown Syntax for reference.
Follow the ScyllaDB Style Guide.

To prevent the build from failing:

If you add a new file, ensure it's added to an appropriate toctree, for example:

 .. toctree::
    :maxdepth: 2
    :hidden:

    Page X </folder1/article1>
    Page Y </folder1/article2>
    Your New Page </folder1/your-new-article>

Make sure the link syntax is correct. See the guidelines on creating links
Make sure the section headings are correct. See the guidelines on creating headings Note that the markup must be at least as long as the text in the heading. For example:
```
----------------------
Prerequisites
----------------------
```

Building User Documentation

Prerequisites

Python
poetry
make

See the ScyllaDB Sphinx Theme prerequisites to check which versions of the above are currently required.

Mac OS X

You must have a working Homebrew in order to install the needed tools.

You also need the standard utility make.

Check if you have these two items with the following commands:

brew help
make -h

Linux Distributions

Building the user docs should work out of the box on most Linux distributions.

Windows

Use "Bash on Ubuntu on Windows" for the same tools and capabilities as on Linux distributions.

Building the Docs

Run make preview in the docs/ directory to build the documentation.
Preview the built documentation locally at http://127.0.0.1:5500/.

Cleanup

You can clean up all the build products and auto-installed Python stuff with:

make pristine

Information for Contributors

If you are interested in contributing to Scylla docs, please read the Scylla open source page at http://www.scylladb.com/opensource/ and complete a Scylla contributor agreement if needed. We can only accept documentation pull requests if we have a contributor agreement on file for you.

Third-party Documentation

Do any copying as a separate commit. Always commit an unmodified version first and then do any editing in a separate commit.
We already have a copy of the Apache license in our tree, so you do not need to commit a copy of the license.
Include the copyright header from the source file in the edited version. If you are copying an Apache Cassandra document with no copyright header, use:

This document includes material from Apache Cassandra.
Apache Cassandra is Copyright 2009-2014 The Apache Software Foundation.