In this series we introduce new system tables and use them for storing the raft metadata for strongly consistent tables. In contrast to the previously used raft group0 tables, the new tables can store data on any shard. The tables also allow specifying the shard where each partition should reside, which enables the tablets of strongly consistent tables to have their raft group metadata co-located on the same shard as the tablet replica. The new tables have almost the same schemas as the raft group0 tables. However, they have an additional column in their partition keys. The additional column is the shard that specifies where the data should be located. While a tablet and its corresponding raft group server resides on some shard, it now writes and reads all requests to the metadata tables using its shard in addition to the group_id. The extra partition key column is used by the new partitioner and sharder which allow this special shard routing. The partitioner encodes the shard in the token and the sharder decodes the shard from the token. This approach for routing avoids any additional lookups (for the tablet mapping) during operations on the new tables and it also doesn't require keeping any state. It also doesn't interact negatively with resharding - as long as tablets (and their corresponding raft metadata) occupy some shard, we do not allow starting the node with a shard count lower than the id of this shard. When increasing the shard count, the routing does not change, similarly to how tablet allocation doesn't change. To use the new tables, a new implementation of `raft::persistence` is added. Currently, it's almost an exact copy of the `raft_sys_table_storage` which just uses the new tables, but in the future we can modify it with changes specific to metadata (or mutation) storage for strongly consistent tables. The new storage is used in the `groups_manager`, which combined with the removal of some `this_shard_id() == 0` checks, allows strongly consistent tables to be used on all shards. This approach for making sure that the reads/writes to the new tables end up on the correct shards won in the balance of complexity/usability/performance against a few other approaches we've considered. They include: 1. Making the Raft server read/write directly to the database, skipping the sharder, on its shard, while using the default partitioner/sharder. This approach could let us avoid changing the schema and there should be no problems for reads and writes performed by the Raft server. However, in this approach we would input data in tables conflicting with the placement determined by the sharder. As a result, any read going through the sharder could miss the rows it was supposed to read. Even when reading all shards to find a specific value, there is a risk of polluting the cache - the rows loaded on incorrect shards may persist in the cache for an unknown amount of time. The cache may also mistakenly remember that a row is missing, even though it's actually present, just on an incorrect shard. Some of the issues with this approach could be worked around using another sharder which always returns this_shard_id() when asked about a shard. It's not clear how such a sharder would implement a method like `token_for_next_shard`, and how much simpler it would be compared to the current "identity" sharder. 2. Using a sharder depending on the current allocation of tablets on the node. This approach relies on the knowledge of group_id -> shard mapping at any point in time in the cluster. For this approach we'd also need to either add a custom partitioner which encodes the group_id in the token, or we'd need to track the token(group_id) -> shard mapping. This approach has the benefit over the one used in the series of keeping the partition key as just group_id. However, it requires more logic, and the access to the live state of the node in the sharder, and it's not static - the same token may be sharded differently depending on the state of the node - it shouldn't occur in practice, but if we changed the state of the node before adjusting the table data, we would be unable to access/fix the stale data without artificially also changing the state of the node. 3. Using metadata tables co-located to the strongly consistent tables. This approach could simplify the metadata migrations in the future, however it would require additional schema management of all co-located metadata tables, and it's not even obvious what could be used as the partition key in these tables - some metadata is per-raft-group, so we couldn't reuse the partition key of the strongly consistent table for it. And finding and remembering a partition key that is routed to a specific shard is not a simple task. Finally, splits and merges will most likely need special handling for metadata anyway, so we wouldn't even make use of co-located table's splits and merges. Fixes [SCYLLADB-361](https://scylladb.atlassian.net/browse/SCYLLADB-361) [SCYLLADB-361]: https://scylladb.atlassian.net/browse/SCYLLADB-361?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28509 * github.com:scylladb/scylladb: docs: add strong consistency doc test/cluster: add tests for strongly-consistent tables' metadata persistence raft: enable multi-shard raft groups for strongly consistent tablets test/raft: add unit tests for raft_groups_storage raft: add raft_groups_storage persistence class db: add system tables for strongly consistent tables' raft groups dht: add fixed_shard_partitioner and fixed_shard_sharder raft: add group_id -> shard mapping to raft_group_registry schema: add with_sharder overload accepting static_sharder reference
ScyllaDB Documentation
This repository contains the source files for ScyllaDB documentation.
- The
devfolder contains developer-oriented documentation related to the ScyllaDB code base. It is not published and is only available via GitHub. - All other folders and files contain user-oriented documentation related to ScyllaDB and are sources for docs.scylladb.com/manual.
To report a documentation bug or suggest an improvement, open an issue in GitHub issues for this project.
To contribute to the documentation, open a GitHub pull request.
Key Guidelines for Contributors
- The user documentation is written in reStructuredText (RST) - a plaintext markup language similar to Markdown. If you're not familiar with RST, see ScyllaDB RST Examples.
- The developer documentation is written in Markdown. See Basic Markdown Syntax for reference.
- Follow the ScyllaDB Style Guide.
To prevent the build from failing:
-
If you add a new file, ensure it's added to an appropriate toctree, for example:
.. toctree:: :maxdepth: 2 :hidden: Page X </folder1/article1> Page Y </folder1/article2> Your New Page </folder1/your-new-article> -
Make sure the link syntax is correct. See the guidelines on creating links
-
Make sure the section headings are correct. See the guidelines on creating headings Note that the markup must be at least as long as the text in the heading. For example:
---------------------- Prerequisites ----------------------
Building User Documentation
Prerequisites
- Python
- poetry
- make
See the ScyllaDB Sphinx Theme prerequisites to check which versions of the above are currently required.
Mac OS X
You must have a working Homebrew in order to install the needed tools.
You also need the standard utility make.
Check if you have these two items with the following commands:
brew help
make -h
Linux Distributions
Building the user docs should work out of the box on most Linux distributions.
Windows
Use "Bash on Ubuntu on Windows" for the same tools and capabilities as on Linux distributions.
Building the Docs
- Run
make previewin thedocs/directory to build the documentation. - Preview the built documentation locally at http://127.0.0.1:5500/.
Cleanup
You can clean up all the build products and auto-installed Python stuff with:
make pristine
Information for Contributors
If you are interested in contributing to Scylla docs, please read the Scylla open source page at http://www.scylladb.com/opensource/ and complete a Scylla contributor agreement if needed. We can only accept documentation pull requests if we have a contributor agreement on file for you.
Third-party Documentation
-
Do any copying as a separate commit. Always commit an unmodified version first and then do any editing in a separate commit.
-
We already have a copy of the Apache license in our tree, so you do not need to commit a copy of the license.
-
Include the copyright header from the source file in the edited version. If you are copying an Apache Cassandra document with no copyright header, use:
This document includes material from Apache Cassandra.
Apache Cassandra is Copyright 2009-2014 The Apache Software Foundation.