mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-28 04:06:59 +00:00
before this change, `checksummed_file_data_sink_impl` just inherits the `data_sink_impl::flush()` from its parent class. but as a wrapper around the underlying `_out` data_sink, this is not only an unusual design decision in a layered design of an I/O system, but also could be problematic. to be more specific, the typical user of `data_sink_impl` is a `data_sink`, whose `flush()` member function is called when the user of `data_sink` want to ensure that the data sent to the sink is pushed to the underlying storage / channel. this in general works, as the typical user of `data_sink` is in turn `output_stream`, which calls `data_sink.flush()` before closing the `data_sink` with `data_sink.close()`. and the operating system will eventually flush the data after application closes the corresponding fd. to be more specific, almost none of the popular local filesystem implements the file_operations.op, hence, it's safe even if the `output_stream` does not flush the underlying data_sink after writing to it. this is the use case when we write to sstables stored on local filesystem. but as explained above, if the data_sink is backed by a network filesystem, a layered filesystem or a storage connected via a buffered network device, then it is crucial to flush in a timely manner, otherwise we could risk data lost if the application / machine / network breaks when the data is considerered persisted but they are _not_! but the `data_sink` returned by `client::make_upload_jumbo_sink` is a little bit different. multipart upload is used under the hood, and we have to finalize the upload once all the parts are uploaded by calling `close()`. but if the caller fails / chooses to close the sink before flushing it, the upload is aborted, and the partially uploaded parts are deleted. the default-implemented `checksummed_file_data_sink_impl::flush()` breaks `upload_jumbo_sink` which is the `_out` data_sink being wrapped by `checksummed_file_data_sink_impl`. as the `flush()` calls are shortcircuited by the wrapper, the `close()` call always aborts the upload. that's why the data and index components just fail to upload with the S3 backend. in this change, we just delegate the `flush()` call to the wrapped class. Fixes #15079 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #15134