Files
scylladb/db/config.hh
Nadav Har'El 96dd3121e7 Merge 'cql: rewrite CassIO SAI metadata index to regular secondary index' from Szymon Wasik
CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index:

```sql
CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table>
ON <keyspace>.<table> (ENTRIES(metadata_s))
USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
```

ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with:

```
StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns
```

This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`).

CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern.

Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries:

- **Detection**: SAI class name + single `ENTRIES` target on a non-frozen `map` column
- **Rewrite**: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries)
- **Warning**: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI

The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication.

After this change, the CassIO schema setup succeeds on ScyllaDB:
- `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index
- The index is functional and can accelerate metadata filtering queries
- A CQL warning makes the rewrite transparent to operators
- SAI on non-vector, non-map-entries columns is still rejected as before
- Vector SAI indexes continue to be rewritten to `vector_index` as before

- `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name)
- `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias
- `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected

All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped).

Fixes: SCYLLADB-2113
Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added.

Closes scylladb/scylladb#29981

* github.com:scylladb/scylladb:
  cql: rewrite CassIO SAI metadata index to regular secondary index
  db/config: add enable_cassio_compatibility flag
2026-05-26 00:19:03 +03:00

769 lines
34 KiB
C++

/*
* Copyright (C) 2015-present ScyllaDB
*
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#pragma once
#include <unordered_map>
#include <seastar/core/sstring.hh>
#include <seastar/core/rwlock.hh>
#include <seastar/util/program-options.hh>
#include <seastar/util/log.hh>
#include "locator/abstract_replication_strategy.hh"
#include "seastarx.hh"
#include "utils/config_file.hh"
#include "utils/enum_option.hh"
#include "gms/inet_address.hh"
#include "db/hints/host_filter.hh"
#include "utils/error_injection.hh"
#include "message/dict_trainer.hh"
#include "message/advanced_rpc_compressor.hh"
#include "db/consistency_level_type.hh"
#include "db/tri_mode_restriction.hh"
#include "sstables/compressor.hh"
namespace boost::program_options {
class options_description_easy_init;
}
namespace seastar {
class file;
struct logging_settings;
namespace tls {
class credentials_builder;
}
namespace log_cli {
class options;
}
}
namespace db {
namespace fs = std::filesystem;
class extensions;
/*
* This type is not use, and probably never will be.
* So it makes sense to jump through hoops just to ensure
* it is in fact handled properly...
*/
struct seed_provider_type {
seed_provider_type() = default;
seed_provider_type(sstring n,
std::initializer_list<program_options::string_map::value_type> opts =
{ })
: class_name(std::move(n)), parameters(std::move(opts)) {
}
sstring class_name;
std::unordered_map<sstring, sstring> parameters;
bool operator==(const seed_provider_type& other) const {
return class_name == other.class_name && parameters == other.parameters;
}
};
inline std::istream& operator>>(std::istream& is, seed_provider_type&);
// Describes a single error injection that should be enabled at startup.
struct error_injection_at_startup {
sstring name;
bool one_shot = false;
utils::error_injection_parameters parameters;
bool operator==(const error_injection_at_startup& other) const {
return name == other.name
&& one_shot == other.one_shot
&& parameters == other.parameters;
}
};
std::istream& operator>>(std::istream& is, error_injection_at_startup&);
struct object_storage_endpoint_param;
}
namespace audit {
struct audit_rule;
std::istream& operator>>(std::istream& is, audit_rule&);
}
template<>
struct fmt::formatter<db::error_injection_at_startup> {
constexpr auto parse(format_parse_context& ctx) { return ctx.begin(); }
auto format(const db::error_injection_at_startup&, fmt::format_context& ctx) const -> decltype(ctx.out());
};
namespace utils {
sstring config_value_as_json(const db::seed_provider_type& v);
sstring config_value_as_json(const log_level& v);
sstring config_value_as_json(const std::unordered_map<sstring, log_level>& v);
}
namespace db {
/// Enumeration of all valid values for the `experimental` config entry.
struct experimental_features_t {
enum class feature {
UNUSED,
UDF,
BROADCAST_TABLES,
KEYSPACE_STORAGE_OPTIONS,
STRONGLY_CONSISTENT_TABLES,
LOGSTOR,
};
static std::map<sstring, feature> map(); // See enum_option.
static std::vector<enum_option<experimental_features_t>> all();
};
struct replication_strategy_restriction_t {
static std::unordered_map<sstring, locator::replication_strategy_type> map(); // for enum_option<>
};
struct consistency_level_restriction_t {
static std::unordered_map<sstring, db::consistency_level> map(); // for enum_option<>
};
constexpr unsigned default_murmur3_partitioner_ignore_msb_bits = 12;
struct tablets_mode_t {
// The `unset` mode is used internally for backward compatibility
// with the legacy `enable_tablets` option.
// It is defined as -1 as existing test code associates the value
// 0 with `false` and 1 with `true` when read from system.config.
enum class mode : int8_t {
unset = -1,
disabled = 0,
enabled = 1,
enforced = 2
};
static std::unordered_map<sstring, mode> map(); // for enum_option<>
};
class config final : public utils::config_file {
public:
config();
config(std::shared_ptr<db::extensions>);
~config();
// For testing only
void add_cdc_extension();
void add_per_partition_rate_limit_extension();
void add_tags_extension();
void add_tombstone_gc_extension();
void add_paxos_grace_seconds_extension();
void add_all_default_extensions();
/// True iff the feature is enabled.
bool check_experimental(experimental_features_t::feature f) const;
void setup_directories();
/**
* Scans the environment variables for configuration files directory
* definition. It's either $SCYLLA_CONF, $SCYLLA_HOME/conf or "conf" if none
* of SCYLLA_CONF and SCYLLA_HOME is defined.
*
* @return path of the directory where configuration files are located
* according the environment variables definitions.
*/
static fs::path get_conf_dir();
static fs::path get_conf_sub(fs::path);
future<rwlock::holder> lock_for_config_update() {
return _config_update_lock.hold_write_lock();
};
// Look up a config entry by name and return its JSON representation as a string.
// Runs on shard 0 under a read lock so the result is consistent with
// any in-progress SIGHUP reload + broadcast_to_all_shards() sequence.
// Returns std::nullopt if no config entry with the given name exists.
future<std::optional<sstring>> value_as_json_string_for_name(sstring name) const;
using string_map = std::unordered_map<sstring, sstring>;
//program_options::string_map;
using string_list = std::vector<sstring>;
using seed_provider_type = db::seed_provider_type;
using hinted_handoff_enabled_type = db::hints::host_filter;
using error_injection_at_startup = db::error_injection_at_startup;
using UUID = utils::UUID;
/*
* All values and documentation taken from
* http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
*/
named_value<sstring> cluster_name;
named_value<sstring> listen_address;
named_value<sstring> listen_interface;
named_value<bool> listen_interface_prefer_ipv6;
named_value<sstring> work_directory;
named_value<sstring> commitlog_directory;
named_value<sstring> schema_commitlog_directory;
named_value<string_list> data_file_directories;
named_value<uint64_t> data_file_capacity;
named_value<sstring> hints_directory;
named_value<sstring> view_hints_directory;
named_value<sstring> logstor_directory;
named_value<sstring> saved_caches_directory;
named_value<sstring> commit_failure_policy;
named_value<sstring> disk_failure_policy;
named_value<sstring> endpoint_snitch;
named_value<sstring> rpc_address;
named_value<sstring> rpc_interface;
named_value<bool> rpc_interface_prefer_ipv6;
named_value<seed_provider_type> seed_provider;
named_value<uint32_t> compaction_throughput_mb_per_sec;
named_value<uint32_t> compaction_large_partition_warning_threshold_mb;
named_value<uint32_t> compaction_large_row_warning_threshold_mb;
named_value<uint32_t> compaction_large_cell_warning_threshold_mb;
named_value<uint32_t> compaction_rows_count_warning_threshold;
named_value<uint32_t> compaction_collection_elements_count_warning_threshold;
named_value<uint32_t> compaction_large_data_records_per_sstable;
named_value<uint32_t> memtable_total_space_in_mb;
named_value<uint32_t> concurrent_reads;
named_value<uint32_t> concurrent_writes;
named_value<uint32_t> concurrent_counter_writes;
named_value<bool> incremental_backups;
named_value<bool> snapshot_before_compaction;
named_value<uint32_t> phi_convict_threshold;
named_value<uint32_t> failure_detector_timeout_in_ms;
named_value<uint32_t> direct_failure_detector_ping_timeout_in_ms;
named_value<sstring> commitlog_sync;
named_value<uint32_t> commitlog_segment_size_in_mb;
named_value<uint32_t> schema_commitlog_segment_size_in_mb;
named_value<uint32_t> commitlog_sync_period_in_ms;
named_value<uint32_t> commitlog_sync_batch_window_in_ms;
named_value<uint32_t> commitlog_max_data_lifetime_in_seconds;
named_value<int64_t> commitlog_total_space_in_mb;
named_value<bool> commitlog_reuse_segments; // unused. retained for upgrade compat
named_value<int64_t> commitlog_flush_threshold_in_mb;
named_value<bool> commitlog_use_o_dsync;
named_value<bool> commitlog_use_hard_size_limit;
named_value<bool> commitlog_use_fragmented_entries;
named_value<bool> compaction_preheat_key_cache;
named_value<uint32_t> concurrent_compactors;
named_value<uint32_t> in_memory_compaction_limit_in_mb;
named_value<bool> preheat_kernel_page_cache;
named_value<uint32_t> sstable_preemptive_open_interval_in_mb;
named_value<bool> defragment_memory_on_idle;
named_value<sstring> memtable_allocation_type;
named_value<double> memtable_cleanup_threshold;
named_value<uint32_t> logstor_disk_size_in_mb;
named_value<uint32_t> logstor_file_size_in_mb;
named_value<uint32_t> logstor_separator_delay_limit_ms;
named_value<uint32_t> logstor_separator_max_memory_in_mb;
named_value<uint32_t> file_cache_size_in_mb;
named_value<uint32_t> memtable_flush_queue_size;
named_value<uint32_t> memtable_flush_writers;
named_value<uint32_t> memtable_heap_space_in_mb;
named_value<uint32_t> memtable_offheap_space_in_mb;
named_value<uint32_t> column_index_size_in_kb;
named_value<uint32_t> column_index_auto_scale_threshold_in_kb;
named_value<uint32_t> index_summary_capacity_in_mb;
named_value<uint32_t> index_summary_resize_interval_in_minutes;
named_value<double> reduce_cache_capacity_to;
named_value<double> reduce_cache_sizes_at;
named_value<uint32_t> stream_throughput_outbound_megabits_per_sec;
named_value<uint32_t> inter_dc_stream_throughput_outbound_megabits_per_sec;
named_value<uint32_t> stream_io_throughput_mb_per_sec;
named_value<double> stream_plan_ranges_fraction;
named_value<bool> enable_file_stream;
named_value<bool> trickle_fsync;
named_value<uint32_t> trickle_fsync_interval_in_kb;
named_value<bool> auto_bootstrap;
named_value<uint32_t> batch_size_warn_threshold_in_kb;
named_value<uint32_t> batch_size_fail_threshold_in_kb;
named_value<sstring> broadcast_address;
named_value<bool> listen_on_broadcast_address;
named_value<sstring> initial_token;
named_value<uint32_t> num_tokens;
named_value<sstring> partitioner;
named_value<uint16_t> storage_port;
named_value<bool> auto_snapshot;
named_value<uint32_t> key_cache_keys_to_save;
named_value<uint32_t> key_cache_save_period;
named_value<uint32_t> key_cache_size_in_mb;
named_value<uint32_t> row_cache_keys_to_save;
named_value<uint32_t> row_cache_size_in_mb;
named_value<uint32_t> row_cache_save_period;
named_value<sstring> memory_allocator;
named_value<uint32_t> counter_cache_size_in_mb;
named_value<uint32_t> counter_cache_save_period;
named_value<uint32_t> counter_cache_keys_to_save;
named_value<uint32_t> tombstone_warn_threshold;
named_value<uint32_t> tombstone_failure_threshold;
named_value<uint64_t> query_tombstone_page_limit;
named_value<uint64_t> query_page_size_in_bytes;
named_value<uint32_t> group0_tombstone_gc_refresh_interval_in_ms;
named_value<uint32_t> range_request_timeout_in_ms;
named_value<uint32_t> read_request_timeout_in_ms;
named_value<uint32_t> counter_write_request_timeout_in_ms;
named_value<uint32_t> cas_contention_timeout_in_ms;
named_value<uint32_t> truncate_request_timeout_in_ms;
named_value<uint32_t> write_request_timeout_in_ms;
named_value<uint32_t> request_timeout_in_ms;
named_value<uint32_t> request_timeout_on_shutdown_in_seconds;
named_value<uint32_t> group0_raft_op_timeout_in_ms;
named_value<bool> cross_node_timeout;
named_value<uint32_t> internode_send_buff_size_in_bytes;
named_value<uint32_t> internode_recv_buff_size_in_bytes;
named_value<sstring> internode_compression;
named_value<float> internode_compression_zstd_max_cpu_fraction;
named_value<uint32_t> internode_compression_zstd_cpu_quota_refresh_period_ms;
named_value<float> internode_compression_zstd_max_longterm_cpu_fraction;
named_value<uint32_t> internode_compression_zstd_longterm_cpu_quota_refresh_period_ms;
named_value<uint32_t> internode_compression_zstd_min_message_size;
named_value<uint32_t> internode_compression_zstd_max_message_size;
named_value<bool> internode_compression_checksumming;
named_value<netw::advanced_rpc_compressor::tracker::algo_config> internode_compression_algorithms;
named_value<bool> internode_compression_enable_advanced;
named_value<enum_option<netw::dict_training_loop::when>> rpc_dict_training_when;
named_value<uint32_t> rpc_dict_training_min_time_seconds;
named_value<uint64_t> rpc_dict_training_min_bytes;
named_value<bool> inter_dc_tcp_nodelay;
named_value<uint32_t> streaming_socket_timeout_in_ms;
named_value<bool> start_native_transport;
named_value<uint16_t> native_transport_port;
named_value<sstring> maintenance_socket;
named_value<sstring> maintenance_socket_group;
named_value<bool> maintenance_mode;
named_value<uint16_t> native_transport_port_ssl;
named_value<uint16_t> native_shard_aware_transport_port;
named_value<uint16_t> native_shard_aware_transport_port_ssl;
named_value<uint16_t> native_transport_port_proxy_protocol;
named_value<uint16_t> native_transport_port_ssl_proxy_protocol;
named_value<uint16_t> native_shard_aware_transport_port_proxy_protocol;
named_value<uint16_t> native_shard_aware_transport_port_ssl_proxy_protocol;
named_value<uint32_t> native_transport_max_threads;
named_value<uint32_t> native_transport_max_frame_size_in_mb;
named_value<sstring> broadcast_rpc_address;
named_value<uint16_t> rpc_port;
named_value<bool> start_rpc;
named_value<bool> rpc_keepalive;
named_value<bool> cache_hit_rate_read_balancing;
named_value<double> dynamic_snitch_badness_threshold;
named_value<uint32_t> dynamic_snitch_reset_interval_in_ms;
named_value<uint32_t> dynamic_snitch_update_interval_in_ms;
named_value<hinted_handoff_enabled_type> hinted_handoff_enabled;
named_value<uint32_t> max_hinted_handoff_concurrency;
named_value<uint32_t> hinted_handoff_throttle_in_kb;
named_value<uint32_t> max_hint_window_in_ms;
named_value<uint32_t> max_hints_delivery_threads;
named_value<uint32_t> batchlog_replay_throttle_in_kb;
named_value<uint32_t> batchlog_replay_cleanup_after_replays;
named_value<sstring> request_scheduler;
named_value<sstring> request_scheduler_id;
named_value<string_map> request_scheduler_options;
named_value<sstring> vector_store_primary_uri;
named_value<sstring> vector_store_secondary_uri;
named_value<uint32_t> vector_store_unreachable_node_detection_time_in_ms;
named_value<string_map> vector_store_encryption_options;
named_value<bool> enable_cassio_compatibility;
named_value<sstring> authenticator;
named_value<sstring> internode_authenticator;
named_value<sstring> authorizer;
named_value<sstring> role_manager;
named_value<uint32_t> permissions_validity_in_ms;
named_value<uint32_t> permissions_update_interval_in_ms;
named_value<uint32_t> permissions_cache_max_entries;
named_value<string_map> server_encryption_options;
named_value<string_map> client_encryption_options;
named_value<string_map> alternator_encryption_options;
named_value<bool> alternator_force_read_before_write;
named_value<uint32_t> ssl_storage_port;
named_value<bool> enable_in_memory_data_store;
named_value<bool> enable_cache;
named_value<bool> enable_commitlog;
named_value<bool> volatile_system_keyspace_for_testing;
named_value<uint16_t> api_port;
named_value<sstring> api_address;
named_value<sstring> api_ui_dir;
named_value<sstring> api_doc_dir;
named_value<sstring> load_balance;
named_value<bool> consistent_rangemovement;
named_value<bool> join_ring;
named_value<bool> load_ring_state;
named_value<sstring> replace_node_first_boot;
named_value<sstring> replace_address;
named_value<sstring> replace_address_first_boot;
named_value<sstring> ignore_dead_nodes_for_replace;
named_value<bool> override_decommission;
named_value<bool> enable_repair_based_node_ops;
named_value<sstring> allowed_repair_based_node_ops;
named_value<bool> enable_compacting_data_for_streaming_and_repair;
named_value<bool> enable_tombstone_gc_for_streaming_and_repair;
named_value<double> repair_partition_count_estimation_ratio;
named_value<uint32_t> repair_hints_batchlog_flush_cache_time_in_ms;
named_value<uint64_t> repair_multishard_reader_buffer_hint_size;
named_value<uint64_t> repair_multishard_reader_enable_read_ahead;
named_value<bool> enable_small_table_optimization_for_rbno;
named_value<uint32_t> ring_delay_ms;
named_value<uint32_t> shadow_round_ms;
named_value<uint32_t> fd_max_interval_ms;
named_value<uint32_t> fd_initial_value_ms;
named_value<uint32_t> shutdown_announce_in_ms;
named_value<bool> developer_mode;
named_value<int32_t> skip_wait_for_gossip_to_settle;
named_value<int32_t> force_gossip_generation;
named_value<std::vector<enum_option<experimental_features_t>>> experimental_features;
named_value<size_t> lsa_reclamation_step;
named_value<uint16_t> prometheus_port;
named_value<sstring> prometheus_address;
named_value<sstring> prometheus_prefix;
named_value<bool> prometheus_allow_protobuf;
named_value<bool> abort_on_lsa_bad_alloc;
named_value<unsigned> murmur3_partitioner_ignore_msb_bits;
named_value<double> unspooled_dirty_soft_limit;
named_value<double> sstable_summary_ratio;
named_value<double> components_memory_reclaim_threshold;
named_value<size_t> large_memory_allocation_warning_threshold;
named_value<bool> enable_deprecated_partitioners;
named_value<bool> enable_keyspace_column_family_metrics;
named_value<bool> enable_node_aggregated_table_metrics;
named_value<bool> enable_sstable_data_integrity_check;
named_value<bool> enable_sstable_key_validation;
named_value<bool> ignore_component_digest_mismatch;
named_value<bool> cpu_scheduler;
named_value<bool> view_building;
named_value<bool> enable_sstables_mc_format;
named_value<bool> enable_sstables_md_format;
named_value<sstring> sstable_format;
// NOTE: Do not use this option directly.
// Use get_sstable_compression_user_table_options() instead.
named_value<compression_parameters> sstable_compression_user_table_options;
compression_parameters get_sstable_compression_user_table_options(bool dicts_feature_enabled) const;
named_value<bool> sstable_compression_dictionaries_allow_in_ddl;
named_value<bool> sstable_compression_dictionaries_enable_writing;
named_value<float> sstable_compression_dictionaries_memory_budget_fraction;
named_value<float> sstable_compression_dictionaries_retrain_period_in_seconds;
named_value<float> sstable_compression_dictionaries_autotrainer_tick_period_in_seconds;
named_value<uint64_t> sstable_compression_dictionaries_min_training_dataset_bytes;
named_value<float> sstable_compression_dictionaries_min_training_improvement_factor;
named_value<bool> uuid_sstable_identifiers_enabled;
named_value<bool> table_digest_insensitive_to_expiry;
named_value<bool> enable_dangerous_direct_import_of_cassandra_counters;
named_value<bool> enable_shard_aware_drivers;
named_value<bool> enable_ipv6_dns_lookup;
named_value<bool> abort_on_internal_error;
named_value<bool> abort_on_malformed_sstable_error;
named_value<uint32_t> max_partition_key_restrictions_per_query;
named_value<uint32_t> max_clustering_key_restrictions_per_query;
named_value<uint64_t> max_memory_for_unlimited_query_soft_limit;
named_value<uint64_t> max_memory_for_unlimited_query_hard_limit;
named_value<uint32_t> reader_concurrency_semaphore_serialize_limit_multiplier;
named_value<uint32_t> reader_concurrency_semaphore_kill_limit_multiplier;
named_value<uint32_t> reader_concurrency_semaphore_cpu_concurrency;
named_value<float> reader_concurrency_semaphore_preemptive_abort_factor;
named_value<uint32_t> view_update_reader_concurrency_semaphore_serialize_limit_multiplier;
named_value<uint32_t> view_update_reader_concurrency_semaphore_kill_limit_multiplier;
named_value<uint32_t> view_update_reader_concurrency_semaphore_cpu_concurrency;
named_value<int> maintenance_reader_concurrency_semaphore_count_limit;
named_value<uint32_t> twcs_max_window_count;
named_value<unsigned> initial_sstable_loading_concurrency;
named_value<bool> enable_3_1_0_compatibility_mode;
named_value<bool> enable_user_defined_functions;
named_value<unsigned> user_defined_function_time_limit_ms;
named_value<unsigned> user_defined_function_allocation_limit_bytes;
named_value<unsigned> user_defined_function_contiguous_allocation_limit_bytes;
named_value<uint32_t> schema_registry_grace_period;
named_value<uint32_t> max_concurrent_requests_per_shard;
named_value<uint32_t> uninitialized_connections_semaphore_cpu_concurrency;
named_value<bool> cdc_dont_rewrite_streams;
named_value<tri_mode_restriction> strict_allow_filtering;
named_value<tri_mode_restriction> strict_is_not_null_in_views;
named_value<bool> enable_cql_config_updates;
named_value<bool> enable_parallelized_aggregation;
named_value<bool> cql_duplicate_bind_variable_names_refer_to_same_variable;
named_value<uint32_t> select_internal_page_size;
named_value<uint16_t> alternator_port;
named_value<uint16_t> alternator_https_port;
named_value<uint16_t> alternator_port_proxy_protocol;
named_value<uint16_t> alternator_https_port_proxy_protocol;
named_value<sstring> alternator_address;
named_value<bool> alternator_enforce_authorization;
named_value<bool> alternator_warn_authorization;
named_value<sstring> alternator_write_isolation;
named_value<uint32_t> alternator_streams_time_window_s;
named_value<bool> alternator_streams_increased_compatibility;
named_value<uint32_t> alternator_timeout_in_ms;
named_value<double> alternator_ttl_period_in_seconds;
named_value<sstring> alternator_describe_endpoints;
named_value<uint32_t> alternator_max_items_in_batch_write;
named_value<bool> alternator_allow_system_table_write;
named_value<uint32_t> alternator_max_expression_cache_entries_per_shard;
named_value<uint64_t> alternator_max_users_query_size_in_trace_output;
named_value<uint32_t> alternator_describe_table_info_cache_validity_in_seconds;
named_value<int> alternator_response_gzip_compression_level;
named_value<uint32_t> alternator_response_compression_threshold_in_bytes;
named_value<bool> alternator_http_response_disable_content_type_header;
named_value<bool> alternator_http_response_disable_date_header;
named_value<sstring> alternator_http_response_server_header;
named_value<bool> abort_on_ebadf;
named_value<bool> sanitizer_report_backtrace;
named_value<bool> flush_schema_tables_after_modification;
// Options to restrict (forbid, warn or somehow limit) certain operations
// or options which non-expert users are more likely to regret than to
// enjoy:
named_value<tri_mode_restriction> restrict_replication_simplestrategy;
named_value<tri_mode_restriction> restrict_dtcs;
named_value<tri_mode_restriction> restrict_twcs_without_default_ttl;
named_value<bool> restrict_future_timestamp;
named_value<bool> ignore_truncation_record;
named_value<bool> force_schema_commit_log;
named_value<uint32_t> task_ttl_seconds;
named_value<uint32_t> user_task_ttl_seconds;
named_value<uint32_t> nodeops_watchdog_timeout_seconds;
named_value<uint32_t> nodeops_heartbeat_interval_seconds;
named_value<bool> cache_index_pages;
named_value<double> index_cache_fraction;
named_value<bool> consistent_cluster_management;
named_value<bool> force_gossip_topology_changes;
named_value<UUID> recovery_leader;
named_value<double> wasm_cache_memory_fraction;
named_value<uint32_t> wasm_cache_timeout_in_ms;
named_value<size_t> wasm_cache_instance_size_limit;
named_value<uint64_t> wasm_udf_yield_fuel;
named_value<uint64_t> wasm_udf_total_fuel;
named_value<size_t> wasm_udf_memory_limit;
named_value<sstring> relabel_config_file;
// wasm_udf_reserved_memory is static because the options in db::config
// are parsed using seastar::app_template, while this option is used for
// configuring the Seastar memory subsystem.
static constexpr size_t wasm_udf_reserved_memory = 50 * 1024 * 1024;
named_value<bool> live_updatable_config_params_changeable_via_cql;
bool are_live_updatable_config_params_changeable_via_cql() const override {
return live_updatable_config_params_changeable_via_cql();
}
// authenticator options
named_value<std::string> auth_superuser_name;
named_value<std::string> auth_superuser_salted_password;
named_value<std::vector<std::unordered_map<sstring, sstring>>> auth_certificate_role_queries;
// guardrails options
named_value<bool> enable_create_table_with_compact_storage;
named_value<int> minimum_replication_factor_fail_threshold;
named_value<int> minimum_replication_factor_warn_threshold;
named_value<int> maximum_replication_factor_fail_threshold;
named_value<int> maximum_replication_factor_warn_threshold;
named_value<std::vector<enum_option<replication_strategy_restriction_t>>> replication_strategy_fail_list;
named_value<std::vector<enum_option<replication_strategy_restriction_t>>> replication_strategy_warn_list;
named_value<std::vector<enum_option<consistency_level_restriction_t>>> write_consistency_levels_disallowed;
named_value<std::vector<enum_option<consistency_level_restriction_t>>> write_consistency_levels_warned;
named_value<double> tablets_initial_scale_factor;
named_value<unsigned> tablets_per_shard_goal;
named_value<uint64_t> target_tablet_size_in_bytes;
named_value<unsigned> tablet_streaming_read_concurrency_per_shard;
named_value<unsigned> tablet_streaming_write_concurrency_per_shard;
named_value<uint32_t> service_levels_interval;
named_value<sstring> audit;
named_value<sstring> audit_categories;
named_value<sstring> audit_tables;
named_value<sstring> audit_keyspaces;
named_value<sstring> audit_unix_socket_path;
named_value<size_t> audit_syslog_write_buffer_size;
named_value<std::vector<audit::audit_rule>> audit_rules;
named_value<sstring> ldap_url_template;
named_value<sstring> ldap_attr_role;
named_value<sstring> ldap_bind_dn;
named_value<sstring> ldap_bind_passwd;
named_value<sstring> saslauthd_socket_path;
seastar::logging_settings logging_settings(const log_cli::options&) const;
const db::extensions& extensions() const;
named_value<std::vector<object_storage_endpoint_param>> object_storage_endpoints;
named_value<unsigned> object_storage_connections_per_shard;
named_value<std::vector<error_injection_at_startup>> error_injections_at_startup;
named_value<double> topology_barrier_stall_detector_threshold_seconds;
named_value<bool> enable_tablets;
named_value<enum_option<tablets_mode_t>> tablets_mode_for_new_keyspaces;
named_value<bool> auto_repair_enabled_default;
named_value<int32_t> auto_repair_threshold_default_in_seconds;
bool enable_tablets_by_default() const noexcept {
switch (tablets_mode_for_new_keyspaces()) {
case tablets_mode_t::mode::unset:
return enable_tablets();
case tablets_mode_t::mode::disabled:
return false;
case tablets_mode_t::mode::enabled:
case tablets_mode_t::mode::enforced:
return true;
}
}
bool enforce_tablets() const noexcept {
return tablets_mode_for_new_keyspaces() == tablets_mode_t::mode::enforced;
}
named_value<uint32_t> view_flow_control_delay_limit_in_ms;
named_value<int> disk_space_monitor_normal_polling_interval_in_seconds;
named_value<int> disk_space_monitor_high_polling_interval_in_seconds;
named_value<float> disk_space_monitor_polling_interval_threshold;
named_value<float> critical_disk_utilization_level;
named_value<bool> rf_rack_valid_keyspaces;
named_value<bool> enforce_rack_list;
named_value<uint32_t> tablet_load_stats_refresh_interval_in_seconds;
named_value<bool> force_capacity_based_balancing;
named_value<float> size_based_balance_threshold_percentage;
named_value<uint64_t> minimal_tablet_size_for_balancing;
named_value<double> background_writer_scheduling_quota;
named_value<bool> auto_adjust_flush_quota;
named_value<float> memtable_flush_static_shares;
named_value<float> compaction_static_shares;
named_value<float> compaction_max_shares;
named_value<bool> compaction_enforce_min_threshold;
named_value<uint32_t> compaction_flush_all_tables_before_major_seconds;
named_value<uint32_t> maintenance_io_throughput_mb_per_sec;
named_value<uint32_t> backup_io_throughput_mb_per_sec;
static const sstring default_tls_priority;
private:
template<typename T>
struct log_legacy_value : public named_value<T> {
using MyBase = named_value<T>;
using MyBase::MyBase;
T value_or(T&& t) const {
return this->is_set() ? (*this)() : t;
}
// do not add to boost::options. We only care about yaml config
void add_command_line_option(boost::program_options::options_description_easy_init&) override {}
};
log_legacy_value<seastar::log_level> default_log_level;
log_legacy_value<std::unordered_map<sstring, seastar::log_level>> logger_log_level;
log_legacy_value<bool> log_to_stdout;
log_legacy_value<bool> log_to_syslog;
void maybe_in_workdir(named_value<sstring>&, const char*);
void maybe_in_workdir(named_value<string_list>&, const char*);
std::shared_ptr<db::extensions> _extensions;
// Read-write lock used to synchronize config updates (SIGHUP reload +
// broadcast to all shards) with config value readers.
// The SIGHUP handler holds the write lock across read_config() +
// broadcast_to_all_shards(). Readers acquire the read lock on shard 0
// via value_as_json_string_for_name() so they always see a consistent snapshot.
mutable rwlock _config_update_lock;
};
}
namespace utils {
template<typename K, typename V, typename... Args, typename K2, typename V2 = V>
V get_or_default(const std::unordered_map<K, V, Args...>& ss, const K2& key, const V2& def = V()) {
const auto iter = ss.find(key);
if (iter != ss.end()) {
return iter->second;
}
return def;
}
inline bool is_true(sstring val) {
std::transform(val.begin(), val.end(), val.begin(), ::tolower);
return val == "true" || val == "1";
}
future<> configure_tls_creds_builder(seastar::tls::credentials_builder& creds, db::config::string_map options);
future<gms::inet_address> resolve(const config_file::named_value<sstring>&, gms::inet_address::opt_family family = {}, gms::inet_address::opt_family preferred = {});
/*!
* \brief read the the relabel config from a file
*
* Will throw an exception if there is a conflict with the metrics names
*/
future<> update_relabel_config_from_file(const std::string& name);
std::vector<sstring> split_comma_separated_list(std::string_view comma_separated_list);
} // namespace utils
namespace utils {
// Declaration of the explicit specialization for seed_provider_type's
// add_command_line_option must appear before the extern template declaration
// below, to satisfy [temp.expl.spec]: a member specialization must be
// declared before any explicit instantiation (including extern template) of
// the enclosing class template.
template <>
void config_file::named_value<db::config::seed_provider_type>::add_command_line_option(
boost::program_options::options_description_easy_init&);
template <>
void config_file::named_value<std::vector<audit::audit_rule>>::add_command_line_option(
boost::program_options::options_description_easy_init&);
} // namespace utils
// Explicit instantiation declarations for named_value<T> specializations
// that use db-specific types. The definitions live in db/config.cc.
// Together with the declarations in utils/config_file.hh (for primitive
// types), this ensures the heavy template bodies from config_file_impl.hh
// (boost::program_options, boost::lexical_cast, boost::regex, yaml-cpp)
// are compiled only once.
extern template struct utils::config_file::named_value<db::tri_mode_restriction>;
extern template struct utils::config_file::named_value<db::seed_provider_type>;
extern template struct utils::config_file::named_value<db::hints::host_filter>;
extern template struct utils::config_file::named_value<utils::UUID>;
extern template struct utils::config_file::named_value<db::error_injection_at_startup>;
extern template struct utils::config_file::named_value<compression_parameters>;
extern template struct utils::config_file::named_value<enum_option<db::experimental_features_t>>;
extern template struct utils::config_file::named_value<enum_option<db::replication_strategy_restriction_t>>;
extern template struct utils::config_file::named_value<enum_option<db::consistency_level_restriction_t>>;
extern template struct utils::config_file::named_value<enum_option<db::tablets_mode_t>>;
extern template struct utils::config_file::named_value<enum_option<netw::dict_training_loop::when>>;
extern template struct utils::config_file::named_value<netw::advanced_rpc_compressor::tracker::algo_config>;
extern template struct utils::config_file::named_value<std::vector<enum_option<db::experimental_features_t>>>;
extern template struct utils::config_file::named_value<std::vector<enum_option<db::replication_strategy_restriction_t>>>;
extern template struct utils::config_file::named_value<std::vector<enum_option<db::consistency_level_restriction_t>>>;
extern template struct utils::config_file::named_value<std::vector<db::error_injection_at_startup>>;
extern template struct utils::config_file::named_value<std::vector<std::unordered_map<sstring, sstring>>>;
extern template struct utils::config_file::named_value<std::unordered_map<sstring, seastar::log_level>>;
extern template struct utils::config_file::named_value<std::vector<db::object_storage_endpoint_param>>;
extern template struct utils::config_file::named_value<std::vector<audit::audit_rule>>;