mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-01 21:55:50 +00:00
The checksummed file data source uses the chunk size to enforce that the reads from the underlying file input stream will be aligned at the chunk boundary. This is necessary so that we can validate the checksum of each chunk. However, a mismatch in the numeric types caused a bug where the underlying file input stream would read a smaller portion of the data file than expected. The bug is located in the following lines: ``` auto start = _beg_pos & ~(chunk_size - 1); auto end = (_end_pos & ~(chunk_size - 1)) + chunk_size; ``` `_beg_pos` and `_end_pos` are `uint64_t`, whereas `chunk_size` is `uint32_t`. When executing the AND operation, the compiler converts the right operand from `uint32_t` to `uint64_t`. Since the integer is unsigned, the four most-significant bytes are filled with zeros, thus erroneously truncating the corresponding bytes of the position. Fix the bug by explicitly converting the chunk size to `uint64_t` before any arithmetic operations. Also, replace the handwritten alignment implementations with the `align_up()` and `align_down()` helpers. Finally, restrict the file end position to not exceed the file length. Since the last chunk can be smaller than the chunk size, it could happen that the end position exceeds the file length after the round-up. This is not a bug on its own since `make_file_input_stream()` can accept lengths that go beyond end-of-file, but still it makes the code more error prone and should be avoided. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#21665