mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-20 00:20:47 +00:00
Although valid for compact tables, non-full (or empty) clustering key prefixes are not handled for row keys when writing sstables. Only the present components are written, consequently if the key is empty, it is omitted entirely. When parsing sstables, the parsing code unconditionally parses a full prefix. This mis-match results in parsing failures, as the parser parses part of the row content as a key resulting in a garbage key and subsequent mis-parsing of the row content and maybe even subsequent partitions. Introduce a new system table: `system.corrupt_data` and infrastructure similar to `large_data_handler`: `corrupt_data_handler` which abstracts how corrupt data is handled. The sstable writer now passes rows such corrupt keys to the corrupt data handler. This way, we avoid corrupting the sstables beyond parsing and the rows are also kept around in system.corrupt_data for later inspection and possible recovery. Add a full-stack test which checks that rows with bad keys are correctly handled. Fixes: https://github.com/scylladb/scylladb/issues/24489 The bug is present in all versions, has to be backported to all supported versions. - (cherry picked from commit92b5fe8983) - (cherry picked from commit0753643606) - (cherry picked from commitb0d5462440) - (cherry picked from commit093d4f8d69) - (cherry picked from commit678deece88) - (cherry picked from commit64f8500367) - (cherry picked from commitb931145a26) - (cherry picked from commit3e1c50e9a7) - (cherry picked from commit46ff7f9c12) - (cherry picked from commitebd9420687) - (cherry picked from commitaae212a87c) - (cherry picked from commit592ca789e2) - (cherry picked from commitedc2906892) Parent PR: #24492 Closes scylladb/scylladb#24740 * github.com:scylladb/scylladb: test/boost/sstable_datafile_test: add test for corrupt data sstables/mx/writer: handler rows with empty keys test/lib/cql_assertions: introduce columns_assertions sstables: add corrupt_data_handler to sstables::sstables tools/scylla-sstable: make large_data_handler a local db: introduce corrupt_data_handler mutation: introduce frozen_mutation_fragment_v2 mutation/mutation_partition_view: read_{clustering,static}_row(): return row type mutation/mutation_partition_view: extract de-ser of {clustering,static} row idl-compiler.py: generate skip() definition for enums serializers idl: extract full_position.idl from position_in_partition.idl db/system_keyspace: add apply_mutation() db/system_keyspace: introduce the corrupt_data table