7.2 KiB
SCYLLA_ASSERT to scylla_assert() Conversion Guide
Overview
This document tracks the conversion of SCYLLA_ASSERT to the new scylla_assert() macro based on on_internal_error(). The new macro throws exceptions instead of crashing the process, preventing cluster-wide crashes and loss of availability.
Status Summary
- Total SCYLLA_ASSERT usages: ~1307 (including tests)
- Non-test usages: ~886
- Unsafe conversions (noexcept): ~187
- Unsafe conversions (destructors): ~36
- Safe conversions possible: ~668
- Converted so far: 112
Safe vs Unsafe Contexts
Safe to Convert ✓
- Regular functions (non-noexcept)
- Coroutine functions (returning
future<T>) - Member functions without noexcept specifier
- Functions where exception propagation is acceptable
Unsafe to Convert ✗
- noexcept functions - throwing exceptions from noexcept causes
std::terminate() - Destructors - destructors are implicitly noexcept
- noexcept lambdas and callbacks
- Code with explicit exception-safety requirements that cannot handle exceptions
Files with Unsafe Conversions
Files with SCYLLA_ASSERT in noexcept contexts (examples)
-
reader_concurrency_semaphore.cc
- Lines with noexcept functions containing SCYLLA_ASSERT
- Must remain as SCYLLA_ASSERT
-
db/large_data_handler.cc
- Line 86:
maybe_delete_large_data_entries()- marked noexcept but contains SCYLLA_ASSERT - Analysis shows this is actually safe (not truly noexcept)
- Line 86:
-
db/row_cache.cc
- Multiple SCYLLA_ASSERT usages in noexcept member functions
-
db/schema_tables.cc
- SCYLLA_ASSERT in noexcept contexts
-
raft/server.cc
- Multiple noexcept functions with SCYLLA_ASSERT
Files with SCYLLA_ASSERT in destructors
-
reader_concurrency_semaphore.cc
- Line 1116: SCYLLA_ASSERT in destructor
-
api/column_family.cc
- Line 102: SCYLLA_ASSERT in destructor
-
utils/logalloc.cc
- Line 1991: SCYLLA_ASSERT in destructor
-
utils/file_lock.cc
- Lines 34, 36: SCYLLA_ASSERT in destructor
-
utils/disk_space_monitor.cc
- Line 66: SCYLLA_ASSERT in destructor
Conversion Strategy
Phase 1: Infrastructure (Completed)
- Created
scylla_assert()macro inutils/assert.hh - Uses
on_internal_error()for exception-based error handling - Supports optional message parameters
Phase 2: Safe Conversions
Convert SCYLLA_ASSERT to scylla_assert in contexts where:
- Function is not noexcept
- Not in a destructor
- Exception propagation is safe
Phase 3: Document Remaining Uses
For contexts that cannot be converted:
- Add comments explaining why SCYLLA_ASSERT must remain
- Consider alternative approaches (e.g., using
on_fatal_internal_error()in noexcept)
Converted Files
Completed Conversions
-
db/large_data_handler.cc (3 conversions)
- Line 42:
maybe_record_large_partitions() - Line 86:
maybe_delete_large_data_entries() - Line 250:
delete_large_data_entries()
- Line 42:
-
db/large_data_handler.hh (2 conversions)
- Line 83:
maybe_record_large_rows() - Line 103:
maybe_record_large_cells()
- Line 83:
-
db/schema_applier.cc (1 conversion)
- Line 1124:
commit()coroutine
- Line 1124:
-
db/system_distributed_keyspace.cc (1 conversion)
- Line 234:
get_updated_service_levels()
- Line 234:
-
db/commitlog/commitlog_replayer.cc (1 conversion)
- Line 168:
recover()coroutine
- Line 168:
-
db/view/row_locking.cc (2 conversions)
- Line 156:
unlock()- partition lock check - Line 163:
unlock()- row lock check
- Line 156:
-
db/size_estimates_virtual_reader.cc (1 conversion)
- Line 190: Lambda in
get_local_ranges()
- Line 190: Lambda in
-
db/corrupt_data_handler.cc (2 conversions)
- Line 78:
set_cell_rawlambda - Line 85:
set_celllambda
- Line 78:
-
raft/tracker.cc (2 conversions)
- Line 49: Switch default case with descriptive error
- Line 90: Switch default case with descriptive error
-
service/topology_coordinator.cc (11 conversions)
- Line 363: Node lookup assertion in
retake_node() - Line 2313: Bootstrapping state ring check
- Line 2362: Replacing state ring check
- Line 2365: Normal nodes lookup assertion
- Line 2366: Node ring and state validation
- Line 3025: Join request ring check
- Line 3036: Leave request ring check
- Line 3049: Remove request ring check
- Line 3061: Replace request ring check
- Line 3166: Transition nodes empty check
- Line 4016: Barrier validation in
stop()
- Line 363: Node lookup assertion in
-
service/storage_service.cc (28 conversions, 3 unsafe kept as SCYLLA_ASSERT)
- Lines 603, 691, 857, 901, 969: Core service operations
- Lines 1523, 1575, 1844, 2086, 2170, 2195: Bootstrap and join operations
- Lines 2319, 2352, 2354: Replacement operations
- Lines 3003, 3028, 3228: Cluster join and drain operations
- Lines 3995, 4047, 4353: Decommission and removenode operations
- Lines 4473, 5787, 5834, 5958: CDC and topology change operations
- Lines 6490, 6491: Tablet streaming operations
- Line 7512: Join node response handler
- Unsafe (kept as SCYLLA_ASSERT): Lines 3398, 5760, 5775 (noexcept functions)
-
sstables/ (58 conversions across 22 files)
- sstables/trie/bti_node_reader.cc (6): Node reading operations
- sstables/mx/writer.cc (6): MX format writing
- sstables/sstable_set.cc (5): SSTable set management
- sstables/compressor.cc (5): Compression/decompression
- sstables/trie/trie_writer.hh (4): Trie writing
- sstables/downsampling.hh (4): Downsampling operations
- sstables/storage.{cc,hh} (6): Storage operations
- sstables/sstables_manager.{cc,hh} (6): SSTable lifecycle management
- sstables/trie/writer_node.{hh,impl.hh} (4): Trie node writing
- sstables/trie/bti_key_translation.cc (2): Key translation
- sstables/sstable_directory.cc (2): Directory management
- sstables/trie/trie_writer.cc (1): Trie writer implementation
- sstables/trie/trie_traversal.hh (1): Trie traversal
- sstables/sstables.cc (1): Core SSTable operations
- sstables/partition_index_cache.hh (1): Index caching
- sstables/generation_type.hh (1): Generation management
- sstables/compress.{cc,hh} (2): Compression utilities
- sstables/exceptions.hh (1): Comment update
Testing
Manual Testing
Created test/manual/test_scylla_assert.cc to verify:
- Passing assertions succeed
- Failing assertions throw exceptions
- Custom messages are properly formatted
Integration Testing
- Run existing test suite with converted assertions
- Verify no regressions in error handling
- Confirm exception propagation works correctly
Future Work
-
Automated Analysis Tool
- Create tool to identify safe vs unsafe conversion contexts
- Generate reports of remaining conversions
-
Gradual Conversion
- Convert additional safe usages incrementally
- Monitor for any unexpected issues
-
noexcept Review
- Review functions marked noexcept that contain SCYLLA_ASSERT
- Consider if they should use
on_fatal_internal_error()instead
References
utils/assert.hh- Implementation of both SCYLLA_ASSERT and scylla_assertutils/on_internal_error.hh- Exception-based error handling infrastructure- GitHub Issue: [Link to original issue tracking this work]