Files
scylladb/docs/dev/scylla_assert_conversion.md
2025-12-10 18:25:47 +00:00

7.2 KiB

SCYLLA_ASSERT to scylla_assert() Conversion Guide

Overview

This document tracks the conversion of SCYLLA_ASSERT to the new scylla_assert() macro based on on_internal_error(). The new macro throws exceptions instead of crashing the process, preventing cluster-wide crashes and loss of availability.

Status Summary

  • Total SCYLLA_ASSERT usages: ~1307 (including tests)
  • Non-test usages: ~886
  • Unsafe conversions (noexcept): ~187
  • Unsafe conversions (destructors): ~36
  • Safe conversions possible: ~668
  • Converted so far: 112

Safe vs Unsafe Contexts

Safe to Convert ✓

  • Regular functions (non-noexcept)
  • Coroutine functions (returning future<T>)
  • Member functions without noexcept specifier
  • Functions where exception propagation is acceptable

Unsafe to Convert ✗

  1. noexcept functions - throwing exceptions from noexcept causes std::terminate()
  2. Destructors - destructors are implicitly noexcept
  3. noexcept lambdas and callbacks
  4. Code with explicit exception-safety requirements that cannot handle exceptions

Files with Unsafe Conversions

Files with SCYLLA_ASSERT in noexcept contexts (examples)

  1. reader_concurrency_semaphore.cc

    • Lines with noexcept functions containing SCYLLA_ASSERT
    • Must remain as SCYLLA_ASSERT
  2. db/large_data_handler.cc

    • Line 86: maybe_delete_large_data_entries() - marked noexcept but contains SCYLLA_ASSERT
    • Analysis shows this is actually safe (not truly noexcept)
  3. db/row_cache.cc

    • Multiple SCYLLA_ASSERT usages in noexcept member functions
  4. db/schema_tables.cc

    • SCYLLA_ASSERT in noexcept contexts
  5. raft/server.cc

    • Multiple noexcept functions with SCYLLA_ASSERT

Files with SCYLLA_ASSERT in destructors

  1. reader_concurrency_semaphore.cc

    • Line 1116: SCYLLA_ASSERT in destructor
  2. api/column_family.cc

    • Line 102: SCYLLA_ASSERT in destructor
  3. utils/logalloc.cc

    • Line 1991: SCYLLA_ASSERT in destructor
  4. utils/file_lock.cc

    • Lines 34, 36: SCYLLA_ASSERT in destructor
  5. utils/disk_space_monitor.cc

    • Line 66: SCYLLA_ASSERT in destructor

Conversion Strategy

Phase 1: Infrastructure (Completed)

  • Created scylla_assert() macro in utils/assert.hh
  • Uses on_internal_error() for exception-based error handling
  • Supports optional message parameters

Phase 2: Safe Conversions

Convert SCYLLA_ASSERT to scylla_assert in contexts where:

  • Function is not noexcept
  • Not in a destructor
  • Exception propagation is safe

Phase 3: Document Remaining Uses

For contexts that cannot be converted:

  • Add comments explaining why SCYLLA_ASSERT must remain
  • Consider alternative approaches (e.g., using on_fatal_internal_error() in noexcept)

Converted Files

Completed Conversions

  1. db/large_data_handler.cc (3 conversions)

    • Line 42: maybe_record_large_partitions()
    • Line 86: maybe_delete_large_data_entries()
    • Line 250: delete_large_data_entries()
  2. db/large_data_handler.hh (2 conversions)

    • Line 83: maybe_record_large_rows()
    • Line 103: maybe_record_large_cells()
  3. db/schema_applier.cc (1 conversion)

    • Line 1124: commit() coroutine
  4. db/system_distributed_keyspace.cc (1 conversion)

    • Line 234: get_updated_service_levels()
  5. db/commitlog/commitlog_replayer.cc (1 conversion)

    • Line 168: recover() coroutine
  6. db/view/row_locking.cc (2 conversions)

    • Line 156: unlock() - partition lock check
    • Line 163: unlock() - row lock check
  7. db/size_estimates_virtual_reader.cc (1 conversion)

    • Line 190: Lambda in get_local_ranges()
  8. db/corrupt_data_handler.cc (2 conversions)

    • Line 78: set_cell_raw lambda
    • Line 85: set_cell lambda
  9. raft/tracker.cc (2 conversions)

    • Line 49: Switch default case with descriptive error
    • Line 90: Switch default case with descriptive error
  10. service/topology_coordinator.cc (11 conversions)

    • Line 363: Node lookup assertion in retake_node()
    • Line 2313: Bootstrapping state ring check
    • Line 2362: Replacing state ring check
    • Line 2365: Normal nodes lookup assertion
    • Line 2366: Node ring and state validation
    • Line 3025: Join request ring check
    • Line 3036: Leave request ring check
    • Line 3049: Remove request ring check
    • Line 3061: Replace request ring check
    • Line 3166: Transition nodes empty check
    • Line 4016: Barrier validation in stop()
  11. service/storage_service.cc (28 conversions, 3 unsafe kept as SCYLLA_ASSERT)

    • Lines 603, 691, 857, 901, 969: Core service operations
    • Lines 1523, 1575, 1844, 2086, 2170, 2195: Bootstrap and join operations
    • Lines 2319, 2352, 2354: Replacement operations
    • Lines 3003, 3028, 3228: Cluster join and drain operations
    • Lines 3995, 4047, 4353: Decommission and removenode operations
    • Lines 4473, 5787, 5834, 5958: CDC and topology change operations
    • Lines 6490, 6491: Tablet streaming operations
    • Line 7512: Join node response handler
    • Unsafe (kept as SCYLLA_ASSERT): Lines 3398, 5760, 5775 (noexcept functions)
  12. sstables/ (58 conversions across 22 files)

    • sstables/trie/bti_node_reader.cc (6): Node reading operations
    • sstables/mx/writer.cc (6): MX format writing
    • sstables/sstable_set.cc (5): SSTable set management
    • sstables/compressor.cc (5): Compression/decompression
    • sstables/trie/trie_writer.hh (4): Trie writing
    • sstables/downsampling.hh (4): Downsampling operations
    • sstables/storage.{cc,hh} (6): Storage operations
    • sstables/sstables_manager.{cc,hh} (6): SSTable lifecycle management
    • sstables/trie/writer_node.{hh,impl.hh} (4): Trie node writing
    • sstables/trie/bti_key_translation.cc (2): Key translation
    • sstables/sstable_directory.cc (2): Directory management
    • sstables/trie/trie_writer.cc (1): Trie writer implementation
    • sstables/trie/trie_traversal.hh (1): Trie traversal
    • sstables/sstables.cc (1): Core SSTable operations
    • sstables/partition_index_cache.hh (1): Index caching
    • sstables/generation_type.hh (1): Generation management
    • sstables/compress.{cc,hh} (2): Compression utilities
    • sstables/exceptions.hh (1): Comment update

Testing

Manual Testing

Created test/manual/test_scylla_assert.cc to verify:

  • Passing assertions succeed
  • Failing assertions throw exceptions
  • Custom messages are properly formatted

Integration Testing

  • Run existing test suite with converted assertions
  • Verify no regressions in error handling
  • Confirm exception propagation works correctly

Future Work

  1. Automated Analysis Tool

    • Create tool to identify safe vs unsafe conversion contexts
    • Generate reports of remaining conversions
  2. Gradual Conversion

    • Convert additional safe usages incrementally
    • Monitor for any unexpected issues
  3. noexcept Review

    • Review functions marked noexcept that contain SCYLLA_ASSERT
    • Consider if they should use on_fatal_internal_error() instead

References

  • utils/assert.hh - Implementation of both SCYLLA_ASSERT and scylla_assert
  • utils/on_internal_error.hh - Exception-based error handling infrastructure
  • GitHub Issue: [Link to original issue tracking this work]