With commitsed7d352e7dandbb1867c7c7, we now have input streams for both compressed and uncompressed SSTables that provide seamless checksum and digest checking. The code for these was based on `validate_checksums()`, which implements its own validation logic over raw streams. This has led to some duplicate code. This PR deduplicates the uncompressed case by modifying `validate_checksums()` to use a checksummed input stream instead of a raw stream. The same cannot be done for compressed SSTables though. The reason is that `validate_checksums()` needs to examine the whole data file, even if an invalid chunk is encountered. In the checksummed case we support that by offloading the error handling logic from the data source via a function parameter. In the compressed data source we cannot do that because it needs to return decompressed data and decompression may fail if the data are invalid. This PR also enables `validate_checksums()` to partially verify SSTables with just the per-chunk checksums if the digest is missing. In more detail, this PR consists of: * Port of some integrity checks from `do_validate_uncompressed()` to the checksummed data source. It should now be able to detect corruption due to truncated or appended chunks (expected number of chunks is retrieved from the CRC component). * Introduction of `error_handler` parameter in checksummed data source and `data_stream()`. * Refactoring of `validate_checksums()`. The JSON response of `sstable validate-checksums` was also modified to report a missing digest. * Tests for `validate_checksums()` against SSTables with truncated data, appended data, invalid digests, or no digest. Refs #19058. This PR is a hybrid of cleanup and feature. No backport is needed. Closes scylladb/scylladb#20933 * github.com:scylladb/scylladb: tools/scylla-sstable: Rename valid_checksums -> valid test: Check validate_checksums() with missing digest sstables: Allow validate_checksums() to report missing digests sstables: Refactor validate_checksums() to use checksummed data stream sstables: Add error_handler parameter to data_stream() sstables: Add error handler in checksummed data source sstables: Check for excessive chunks in checksummed data source sstables: Check for premature EOF in checksummed data source test: test_validate_checksums: Check SSTable with invalid digest test: test_validate_checksums: Check SSTable with appended data test: test_validate_checksums: Complement test for truncated SSTable
ScyllaDB Documentation
This repository contains the source files for ScyllaDB Open Source documentation.
- The
devfolder contains developer-oriented documentation related to the ScyllaDB code base. It is not published and is only available via GitHub. - All other folders and files contain user-oriented documentation related to ScyllaDB Open Source and are sources for opensource.docs.scylladb.com.
To report a documentation bug or suggest an improvement, open an issue in GitHub issues for this project.
To contribute to the documentation, open a GitHub pull request.
Key Guidelines for Contributors
- The user documentation is written in reStructuredText (RST) - a plaintext markup language similar to Markdown. If you're not familiar with RST, see ScyllaDB RST Examples.
- The developer documentation is written in Markdown. See Basic Markdown Syntax for reference.
- Follow the ScyllaDB Style Guide.
To prevent the build from failing:
-
If you add a new file, ensure it's added to an appropriate toctree, for example:
.. toctree:: :maxdepth: 2 :hidden: Page X </folder1/article1> Page Y </folder1/article2> Your New Page </folder1/your-new-article> -
Make sure the link syntax is correct. See the guidelines on creating links
-
Make sure the section headings are correct. See the guidelines on creating headings Note that the markup must be at least as long as the text in the heading. For example:
---------------------- Prerequisites ----------------------
Building User Documentation
Prerequisites
- Python
- poetry
- make
See the ScyllaDB Sphinx Theme prerequisites to check which versions of the above are currently required.
Mac OS X
You must have a working Homebrew in order to install the needed tools.
You also need the standard utility make.
Check if you have these two items with the following commands:
brew help
make -h
Linux Distributions
Building the user docs should work out of the box on most Linux distributions.
Windows
Use "Bash on Ubuntu on Windows" for the same tools and capabilities as on Linux distributions.
Building the Docs
- Run
make previewto build the documentation. - Preview the built documentation locally at http://127.0.0.1:5500/.
Cleanup
You can clean up all the build products and auto-installed Python stuff with:
make pristine
Information for Contributors
If you are interested in contributing to Scylla docs, please read the Scylla open source page at http://www.scylladb.com/opensource/ and complete a Scylla contributor agreement if needed. We can only accept documentation pull requests if we have a contributor agreement on file for you.
Third-party Documentation
-
Do any copying as a separate commit. Always commit an unmodified version first and then do any editing in a separate commit.
-
We already have a copy of the Apache license in our tree, so you do not need to commit a copy of the license.
-
Include the copyright header from the source file in the edited version. If you are copying an Apache Cassandra document with no copyright header, use:
This document includes material from Apache Cassandra.
Apache Cassandra is Copyright 2009-2014 The Apache Software Foundation.