`system.large_partitions`, `system.large_rows`, and `system.large_cells` store records keyed by SSTable name. When SSTables are migrated between shards or nodes (resharding, streaming, decommission), the records are lost because the destination never writes entries for the migrated SSTables. This patch series moves the source of truth for large data records into the SSTable's scylla metadata component (new `LargeDataRecords` tag 13) and reimplements the three `system.large_*` tables as virtual tables that query live SSTables on demand. A cluster feature flag (`LARGE_DATA_VIRTUAL_TABLES`) gates the transition for safe rolling upgrades. When the cluster feature is enabled, each node drops the old system large_* tables and starts serving the corresponding tables using virtual tables that represent the large data records now stored on the sstables. Note that the virtual tables will be empty after upgrade until the sstables that contained large data are rewritten, therefore it is recommended to run upgrade sstables compaction or major compaction to repopulate the sstables scylla-metadata with large data records. 1. **keys: move key_to_str() to keys/keys.hh** — make the helper reusable across large_data_handler, virtual tables, and scylla-sstable 2. **sstables: add LargeDataRecords metadata type (tag 13)** — new struct with binary-serialized key fields, scylla-sstable JSON support, format documentation 3. **large_data_handler: rename partition_above_threshold to above_threshold_result** — generalize the struct for reuse 4. **large_data_handler: return above_threshold_result from maybe_record_large_cells** — separate booleans for cell size vs collection elements thresholds 5. **sstables: populate LargeDataRecords from writer** — bounded min-heaps (one per large_data_type), configurable top-N via `compaction_large_data_records_per_sstable` 6. **test: add LargeDataRecords round-trip unit tests** — verify write/read, top-N bounding, below-threshold behavior 7. **db: call initialize_virtual_tables from shard 0 only** — preparatory refactoring to enable cross-shard coordination 8. **db: implement large_data virtual tables with feature flag gating** — three virtual table classes, feature flag activation, legacy SSTable fallback, dual-threshold dedup, cross-shard collection Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1276 * Although this fixes a bug where large data entries are effectively lost when sstables are renamed or migrated, the changes are intrusive and do not warrant a backport Closes scylladb/scylladb#29257 * github.com:scylladb/scylladb: db: implement large_data virtual tables with feature flag gating db: call initialize_virtual_tables from shard 0 only test: add LargeDataRecords round-trip unit tests sstables: populate LargeDataRecords from writer large_data_handler: return above_threshold_result from maybe_record_large_cells large_data_handler: rename partition_above_threshold to above_threshold_result sstables: add LargeDataRecords metadata type (tag 13) sstables: add fmt::formatter for large_data_type keys: move key_to_str() to keys/keys.hh
ScyllaDB Documentation
This repository contains the source files for ScyllaDB documentation.
- The
devfolder contains developer-oriented documentation related to the ScyllaDB code base. It is not published and is only available via GitHub. - All other folders and files contain user-oriented documentation related to ScyllaDB and are sources for docs.scylladb.com/manual.
To report a documentation bug or suggest an improvement, open an issue in GitHub issues for this project.
To contribute to the documentation, open a GitHub pull request.
Key Guidelines for Contributors
- The user documentation is written in reStructuredText (RST) - a plaintext markup language similar to Markdown. If you're not familiar with RST, see ScyllaDB RST Examples.
- The developer documentation is written in Markdown. See Basic Markdown Syntax for reference.
- Follow the ScyllaDB Style Guide.
To prevent the build from failing:
-
If you add a new file, ensure it's added to an appropriate toctree, for example:
.. toctree:: :maxdepth: 2 :hidden: Page X </folder1/article1> Page Y </folder1/article2> Your New Page </folder1/your-new-article> -
Make sure the link syntax is correct. See the guidelines on creating links
-
Make sure the section headings are correct. See the guidelines on creating headings Note that the markup must be at least as long as the text in the heading. For example:
---------------------- Prerequisites ----------------------
Building User Documentation
Prerequisites
- Python
- poetry
- make
See the ScyllaDB Sphinx Theme prerequisites to check which versions of the above are currently required.
Mac OS X
You must have a working Homebrew in order to install the needed tools.
You also need the standard utility make.
Check if you have these two items with the following commands:
brew help
make -h
Linux Distributions
Building the user docs should work out of the box on most Linux distributions.
Windows
Use "Bash on Ubuntu on Windows" for the same tools and capabilities as on Linux distributions.
Building the Docs
- Run
make previewin thedocs/directory to build the documentation. - Preview the built documentation locally at http://127.0.0.1:5500/.
Cleanup
You can clean up all the build products and auto-installed Python stuff with:
make pristine
Information for Contributors
If you are interested in contributing to Scylla docs, please read the Scylla open source page at http://www.scylladb.com/opensource/ and complete a Scylla contributor agreement if needed. We can only accept documentation pull requests if we have a contributor agreement on file for you.
Third-party Documentation
-
Do any copying as a separate commit. Always commit an unmodified version first and then do any editing in a separate commit.
-
We already have a copy of the Apache license in our tree, so you do not need to commit a copy of the license.
-
Include the copyright header from the source file in the edited version. If you are copying an Apache Cassandra document with no copyright header, use:
This document includes material from Apache Cassandra.
Apache Cassandra is Copyright 2009-2014 The Apache Software Foundation.