Compare commits

..

5 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
8f6296b905 Simplify ungzip implementation per review feedback
- Remove manual gzip header parsing - libdeflate handles all format details
- Rename linearize_chunked_content to build_input_buffer and free chunks as we copy
- Add output chunking to split large decompressed data into 1MB chunks
- Add comment explaining libdeflate's whole-buffer requirement
- Use better initial size heuristic based on compression ratio

Co-authored-by: nyh <584227+nyh@users.noreply.github.com>
2025-11-19 12:47:02 +00:00
copilot-swe-agent[bot]
4f44a61b3a Add edge case check for length limit in ungzip
- Check if total_decompressed >= length_limit before allocating output buffer
- Prevents allocating a zero-sized buffer when limit is already reached
- Ensures clear error message when limit is exceeded

Co-authored-by: nyh <584227+nyh@users.noreply.github.com>
2025-11-19 11:50:31 +00:00
copilot-swe-agent[bot]
362491a650 Fix ungzip implementation to properly handle concatenated gzip files
- Removed unused get_gzip_member_size function
- Rely on libdeflate_gzip_decompress to tell us how many input bytes were consumed
- Added check for zero bytes consumed to detect invalid state
- Simplified the logic by removing unnecessary header size tracking

Co-authored-by: nyh <584227+nyh@users.noreply.github.com>
2025-11-19 11:48:35 +00:00
copilot-swe-agent[bot]
b818331420 Add ungzip function implementation with libdeflate
- Created utils/gzip.hh header with ungzip function declaration
- Created utils/gzip.cc implementation using libdeflate
- Updated utils/CMakeLists.txt to include gzip.cc and link libdeflate
- Created comprehensive test suite in test/boost/gzip_test.cc
- Added gzip_test to test/boost/CMakeLists.txt

The implementation:
- Uses libdeflate for high-performance gzip decompression
- Handles chunked_content input/output (vector of temporary_buffer)
- Supports concatenated gzip files
- Validates gzip headers and detects invalid/truncated/corrupted data
- Enforces size limits to prevent memory exhaustion
- Runs in async context to avoid blocking the reactor

Co-authored-by: nyh <584227+nyh@users.noreply.github.com>
2025-11-19 11:46:29 +00:00
copilot-swe-agent[bot]
c714159d5c Initial plan 2025-11-19 11:32:38 +00:00
428 changed files with 5185 additions and 11069 deletions

1
.github/CODEOWNERS vendored
View File

@@ -57,6 +57,7 @@ repair/* @tgrabiec @asias
# SCHEMA MANAGEMENT # SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec db/schema_tables* @tgrabiec
db/legacy_schema_migrator* @tgrabiec
service/migration* @tgrabiec service/migration* @tgrabiec
schema* @tgrabiec schema* @tgrabiec

View File

@@ -1,86 +0,0 @@
# ScyllaDB Development Instructions
## Project Context
High-performance distributed NoSQL database. Core values: performance, correctness, readability.
## Build System
### Modern Build (configure.py + ninja)
```bash
# Configure (run once per mode, or when switching modes)
./configure.py --mode=<mode> # mode: dev, debug, release, sanitize
# Build everything
ninja <mode>-build # e.g., ninja dev-build
# Build Scylla binary only (sufficient for Python integration tests)
ninja build/<mode>/scylla
# Build specific test
ninja build/<mode>/test/boost/<test_name>
```
## Running Tests
### C++ Unit Tests
```bash
# Run all tests in a file
./test.py --mode=<mode> test/<suite>/<test_name>.cc
# Run a single test case from a file
./test.py --mode=<mode> test/<suite>/<test_name>.cc::<test_case_name>
# Examples
./test.py --mode=dev test/boost/memtable_test.cc
./test.py --mode=dev test/raft/raft_server_test.cc::test_check_abort_on_client_api
```
**Important:**
- Use full path with `.cc` extension (e.g., `test/boost/test_name.cc`, not `boost/test_name`)
- To run a single test case, append `::<test_case_name>` to the file path
- If you encounter permission issues with cgroup metric gathering, add `--no-gather-metrics` flag
**Rebuilding Tests:**
- test.py does NOT automatically rebuild when test source files are modified
- Many tests are part of composite binaries (e.g., `combined_tests` in test/boost contains multiple test files)
- To find which binary contains a test, check `configure.py` in the repository root (primary source) or `test/<suite>/CMakeLists.txt`
- To rebuild a specific test binary: `ninja build/<mode>/test/<suite>/<binary_name>`
- Examples:
- `ninja build/dev/test/boost/combined_tests` (contains group0_voter_calculator_test.cc and others)
- `ninja build/dev/test/raft/replication_test` (standalone Raft test)
### Python Integration Tests
```bash
# Only requires Scylla binary (full build usually not needed)
ninja build/<mode>/scylla
# Run all tests in a file
./test.py --mode=<mode> <test_path>
# Run a single test case from a file
./test.py --mode=<mode> <test_path>::<test_function_name>
# Examples
./test.py --mode=dev alternator/
./test.py --mode=dev cluster/test_raft_voters::test_raft_limited_voters_retain_coordinator
# Optional flags
./test.py --mode=dev cluster/test_raft_no_quorum -v # Verbose output
./test.py --mode=dev cluster/test_raft_no_quorum --repeat 5 # Repeat test 5 times
```
**Important:**
- Use path without `.py` extension (e.g., `cluster/test_raft_no_quorum`, not `cluster/test_raft_no_quorum.py`)
- To run a single test case, append `::<test_function_name>` to the file path
- Add `-v` for verbose output
- Add `--repeat <num>` to repeat a test multiple times
- After modifying C++ source files, only rebuild the Scylla binary for Python tests - building the entire repository is unnecessary
## Code Philosophy
- Performance matters in hot paths (data read/write, inner loops)
- Self-documenting code through clear naming
- Comments explain "why", not "what"
- Prefer standard library over custom implementations
- Strive for simplicity and clarity, add complexity only when clearly justified
- Question requests: don't blindly implement requests - evaluate trade-offs, identify issues, and suggest better alternatives when appropriate
- Consider different approaches, weigh pros and cons, and recommend the best fit for the specific context

View File

@@ -1,115 +0,0 @@
---
applyTo: "**/*.{cc,hh}"
---
# C++ Guidelines
**Important:** Always match the style and conventions of existing code in the file and directory.
## Memory Management
- Prefer stack allocation whenever possible
- Use `std::unique_ptr` by default for dynamic allocations
- `new`/`delete` are forbidden (use RAII)
- Use `seastar::lw_shared_ptr` or `seastar::shared_ptr` for shared ownership within same shard
- Use `seastar::foreign_ptr` for cross-shard sharing
- Avoid `std::shared_ptr` except when interfacing with external C++ APIs
- Avoid raw pointers except for non-owning references or C API interop
## Seastar Asynchronous Programming
- Use `seastar::future<T>` for all async operations
- Prefer coroutines (`co_await`, `co_return`) over `.then()` chains for readability
- Coroutines are preferred over `seastar::do_with()` for managing temporary state
- In hot paths where futures are ready, continuations may be more efficient than coroutines
- Chain futures with `.then()`, don't block with `.get()` (unless in `seastar::thread` context)
- All I/O must be asynchronous (no blocking calls)
- Use `seastar::gate` for shutdown coordination
- Use `seastar::semaphore` for resource limiting (not `std::mutex`)
- Break long loops with `maybe_yield()` to avoid reactor stalls
## Coroutines
```cpp
seastar::future<T> func() {
auto result = co_await async_operation();
co_return result;
}
```
## Error Handling
- Throw exceptions for errors (futures propagate them automatically)
- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead
- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)
- Database-specific: throw appropriate schema/query exceptions
## Performance
- Pass large objects by `const&` or `&&` (move semantics)
- Use `std::string_view` for non-owning string references
- Avoid copies: prefer move semantics
- Use `utils::chunked_vector` instead of `std::vector` for large allocations (>128KB)
- Minimize dynamic allocations in hot paths
## Database-Specific Types
- Use `schema_ptr` for schema references
- Use `mutation` and `mutation_partition` for data modifications
- Use `partition_key` and `clustering_key` for keys
- Use `api::timestamp_type` for database timestamps
- Use `gc_clock` for garbage collection timing
## Style
- C++23 standard (prefer modern features, especially coroutines)
- Use `auto` when type is obvious from RHS
- Avoid `auto` when it obscures the type
- Use range-based for loops: `for (const auto& item : container)`
- Use standard algorithms when they clearly simplify code (e.g., replacing 10-line loops)
- Avoid chaining multiple algorithms if a straightforward loop is clearer
- Mark functions and variables `const` whenever possible
- Use scoped enums: `enum class` (not unscoped `enum`)
## Headers
- Use `#pragma once`
- Include order: own header, C++ std, Seastar, Boost, project headers
- Forward declare when possible
- Never `using namespace` in headers (exception: `using namespace seastar` is globally available via `seastarx.hh`)
## Documentation
- Public APIs require clear documentation
- Implementation details should be self-evident from code
- Use `///` or Doxygen `/** */` for public documentation, `//` for implementation notes - follow the existing style
## Naming
- `snake_case` for most identifiers (classes, functions, variables, namespaces)
- Template parameters: `CamelCase` (e.g., `template<typename ValueType>`)
- Member variables: prefix with `_` (e.g., `int _count;`)
- Structs (value-only): no `_` prefix on members
- Constants and `constexpr`: `snake_case` (e.g., `static constexpr int max_size = 100;`)
- Files: `.hh` for headers, `.cc` for source
## Formatting
- 4 spaces indentation, never tabs
- Opening braces on same line as control structure (except namespaces)
- Space after keywords: `if (`, `while (`, `return `
- Whitespace around operators matches precedence: `*a + *b` not `* a+* b`
- Line length: keep reasonable (<160 chars), use continuation lines with double indent if needed
- Brace all nested scopes, even single statements
- Minimal patches: only format code you modify, never reformat entire files
## Logging
- Use structured logging with appropriate levels: DEBUG, INFO, WARN, ERROR
- Include context in log messages (e.g., request IDs)
- Never log sensitive data (credentials, PII)
## Forbidden
- `malloc`/`free`
- `printf` family (use logging or fmt)
- Raw pointers for ownership
- `using namespace` in headers
- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)
- `std::atomic` (reserved for very special circumstances only)
- Macros (use `inline`, `constexpr`, or templates instead)
## Testing
When modifying existing code, follow TDD: create/update test first, then implement.
- Examine existing tests for style and structure
- Use Boost.Test framework
- Use `SEASTAR_THREAD_TEST_CASE` for Seastar asynchronous tests
- Aim for high code coverage, especially for new features and bug fixes
- Maintain bisectability: all tests must pass in every commit. Mark failing tests with `BOOST_FAIL()` or similar, then fix in subsequent commit

View File

@@ -1,51 +0,0 @@
---
applyTo: "**/*.py"
---
# Python Guidelines
**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.
## Style
- Follow PEP 8
- Use type hints for function signatures (unless directory style omits them)
- Use f-strings for formatting
- Line length: 160 characters max
- 4 spaces for indentation
## Imports
Order: standard library, third-party, local imports
```python
import os
import sys
import pytest
from cassandra.cluster import Cluster
from test.utils import setup_keyspace
```
Never use `from module import *`
## Documentation
All public functions/classes need docstrings (unless the current directory conventions omit them):
```python
def my_function(arg1: str, arg2: int) -> bool:
"""
Brief summary of function purpose.
Args:
arg1: Description of first argument.
arg2: Description of second argument.
Returns:
Description of return value.
"""
pass
```
## Testing Best Practices
- Maintain bisectability: all tests must pass in every commit
- Mark currently-failing tests with `@pytest.mark.xfail`, unmark when fixed
- Use descriptive names that convey intent
- Docstrings/comments should explain what the test verifies and why, and if it reproduces a specific issue or how it fits into the larger test suite

View File

@@ -1,34 +0,0 @@
name: Docs / Validate metrics
on:
pull_request:
branches:
- master
- enterprise
paths:
- '**/*.cc'
- 'scripts/metrics-config.yml'
- 'scripts/get_description.py'
- 'docs/_ext/scylladb_metrics.py'
jobs:
validate-metrics:
runs-on: ubuntu-latest
name: Check metrics documentation coverage
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: true
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.10'
- name: Install dependencies
run: pip install PyYAML
- name: Validate metrics
run: python3 scripts/get_description.py --validate -c scripts/metrics-config.yml

View File

@@ -116,7 +116,6 @@ list(APPEND absl_cxx_flags
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU") if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags}) list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang") elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
list(APPEND absl_cxx_flags "-Wno-deprecated-builtins")
list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags}) list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags})
endif() endif()
set(ABSL_DEFAULT_LINKOPTS set(ABSL_DEFAULT_LINKOPTS
@@ -164,45 +163,7 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
include(add_version_library) include(add_version_library)
generate_scylla_version() generate_scylla_version()
option(Scylla_USE_PRECOMPILED_HEADER "Use precompiled header for Scylla" ON)
add_library(scylla-precompiled-header STATIC exported_templates.cc)
target_link_libraries(scylla-precompiled-header PRIVATE
absl::headers
absl::btree
absl::hash
absl::raw_hash_set
Seastar::seastar
Snappy::snappy
systemd
ZLIB::ZLIB
lz4::lz4_static
zstd::zstd_static)
if (Scylla_USE_PRECOMPILED_HEADER)
set(Scylla_USE_PRECOMPILED_HEADER_USE ON)
find_program(DISTCC_EXEC NAMES distcc OPTIONAL)
if (DISTCC_EXEC)
if(DEFINED ENV{DISTCC_HOSTS})
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
message(STATUS "Disabling precompiled header usage because distcc exists and DISTCC_HOSTS is set, assuming you're using distributed compilation.")
else()
file(REAL_PATH "~/.distcc/hosts" DIST_CC_HOSTS_PATH EXPAND_TILDE)
if (EXISTS ${DIST_CC_HOSTS_PATH})
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
message(STATUS "Disabling precompiled header usage because distcc and ~/.distcc/hosts exists, assuming you're using distributed compilation.")
endif()
endif()
endif()
if (Scylla_USE_PRECOMPILED_HEADER_USE)
message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")
target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")
target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)
endif()
else()
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
endif()
add_library(scylla-main STATIC) add_library(scylla-main STATIC)
target_sources(scylla-main target_sources(scylla-main
PRIVATE PRIVATE
absl-flat_hash_map.cc absl-flat_hash_map.cc
@@ -247,7 +208,6 @@ target_link_libraries(scylla-main
ZLIB::ZLIB ZLIB::ZLIB
lz4::lz4_static lz4::lz4_static
zstd::zstd_static zstd::zstd_static
scylla-precompiled-header
) )
option(Scylla_CHECK_HEADERS option(Scylla_CHECK_HEADERS

View File

@@ -34,8 +34,5 @@ target_link_libraries(alternator
idl idl
absl::headers) absl::headers)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(alternator REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers alternator check_headers(check-headers alternator
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -888,7 +888,7 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
schema_ptr schema = get_table(_proxy, request); schema_ptr schema = get_table(_proxy, request);
get_stats_from_schema(_proxy, *schema)->api_operations.describe_table++; get_stats_from_schema(_proxy, *schema)->api_operations.describe_table++;
tracing::add_alternator_table_name(trace_state, schema->cf_name()); tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value table_description = co_await fill_table_description(schema, table_status::active, _proxy, client_state, trace_state, permit); rjson::value table_description = co_await fill_table_description(schema, table_status::active, _proxy, client_state, trace_state, permit);
rjson::value response = rjson::empty_object(); rjson::value response = rjson::empty_object();
@@ -989,7 +989,7 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
std::string table_name = get_table_name(request); std::string table_name = get_table_name(request);
std::string keyspace_name = executor::KEYSPACE_NAME_PREFIX + table_name; std::string keyspace_name = executor::KEYSPACE_NAME_PREFIX + table_name;
tracing::add_alternator_table_name(trace_state, table_name); tracing::add_table_name(trace_state, keyspace_name, table_name);
auto& p = _proxy.container(); auto& p = _proxy.container();
schema_ptr schema = get_table(_proxy, request); schema_ptr schema = get_table(_proxy, request);
@@ -1008,8 +1008,8 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
throw api_error::resource_not_found(fmt::format("Requested resource not found: Table: {} not found", table_name)); throw api_error::resource_not_found(fmt::format("Requested resource not found: Table: {} not found", table_name));
} }
auto m = co_await service::prepare_column_family_drop_announcement(p.local(), keyspace_name, table_name, group0_guard.write_timestamp(), service::drop_views::yes); auto m = co_await service::prepare_column_family_drop_announcement(_proxy, keyspace_name, table_name, group0_guard.write_timestamp(), service::drop_views::yes);
auto m2 = co_await service::prepare_keyspace_drop_announcement(p.local(), keyspace_name, group0_guard.write_timestamp()); auto m2 = co_await service::prepare_keyspace_drop_announcement(_proxy, keyspace_name, group0_guard.write_timestamp());
std::move(m2.begin(), m2.end(), std::back_inserter(m)); std::move(m2.begin(), m2.end(), std::back_inserter(m));
@@ -1583,7 +1583,7 @@ static future<executor::request_return_type> create_table_on_shard0(service::cli
std::unordered_set<std::string> unused_attribute_definitions = std::unordered_set<std::string> unused_attribute_definitions =
validate_attribute_definitions("", *attribute_definitions); validate_attribute_definitions("", *attribute_definitions);
tracing::add_alternator_table_name(trace_state, table_name); tracing::add_table_name(trace_state, keyspace_name, table_name);
schema_builder builder(keyspace_name, table_name); schema_builder builder(keyspace_name, table_name);
auto [hash_key, range_key] = parse_key_schema(request, ""); auto [hash_key, range_key] = parse_key_schema(request, "");
@@ -1865,10 +1865,10 @@ future<executor::request_return_type> executor::create_table(client_state& clien
_stats.api_operations.create_table++; _stats.api_operations.create_table++;
elogger.trace("Creating table {}", request); elogger.trace("Creating table {}", request);
co_return co_await _mm.container().invoke_on(0, [&, tr = tracing::global_trace_state_ptr(trace_state), request = std::move(request), &sp = _proxy.container(), &g = _gossiper.container(), &e = this->container(), client_state_other_shard = client_state.move_to_other_shard(), enforce_authorization = bool(_enforce_authorization), warn_authorization = bool(_warn_authorization)] co_return co_await _mm.container().invoke_on(0, [&, tr = tracing::global_trace_state_ptr(trace_state), request = std::move(request), &sp = _proxy.container(), &g = _gossiper.container(), client_state_other_shard = client_state.move_to_other_shard(), enforce_authorization = bool(_enforce_authorization), warn_authorization = bool(_warn_authorization)]
(service::migration_manager& mm) mutable -> future<executor::request_return_type> { (service::migration_manager& mm) mutable -> future<executor::request_return_type> {
const db::tablets_mode_t::mode tablets_mode = _proxy.data_dictionary().get_config().tablets_mode_for_new_keyspaces(); // type cast const db::tablets_mode_t::mode tablets_mode = _proxy.data_dictionary().get_config().tablets_mode_for_new_keyspaces(); // type cast
co_return co_await create_table_on_shard0(client_state_other_shard.get(), tr, std::move(request), sp.local(), mm, g.local(), enforce_authorization, warn_authorization, e.local()._stats, std::move(tablets_mode)); co_return co_await create_table_on_shard0(client_state_other_shard.get(), tr, std::move(request), sp.local(), mm, g.local(), enforce_authorization, warn_authorization, _stats, std::move(tablets_mode));
}); });
} }
@@ -1930,7 +1930,7 @@ future<executor::request_return_type> executor::update_table(client_state& clien
schema_ptr tab = get_table(p.local(), request); schema_ptr tab = get_table(p.local(), request);
tracing::add_alternator_table_name(gt, tab->cf_name()); tracing::add_table_name(gt, tab->ks_name(), tab->cf_name());
// the ugly but harmless conversion to string_view here is because // the ugly but harmless conversion to string_view here is because
// Seastar's sstring is missing a find(std::string_view) :-() // Seastar's sstring is missing a find(std::string_view) :-()
@@ -2223,12 +2223,12 @@ void validate_value(const rjson::value& v, const char* caller) {
// The put_or_delete_item class builds the mutations needed by the PutItem and // The put_or_delete_item class builds the mutations needed by the PutItem and
// DeleteItem operations - either as stand-alone commands or part of a list // DeleteItem operations - either as stand-alone commands or part of a list
// of commands in BatchWriteItem. // of commands in BatchWriteItems.
// put_or_delete_item splits each operation into two stages: Constructing the // put_or_delete_item splits each operation into two stages: Constructing the
// object parses and validates the user input (throwing exceptions if there // object parses and validates the user input (throwing exceptions if there
// are input errors). Later, build() generates the actual mutation, with a // are input errors). Later, build() generates the actual mutation, with a
// specified timestamp. This split is needed because of the peculiar needs of // specified timestamp. This split is needed because of the peculiar needs of
// BatchWriteItem and LWT. BatchWriteItem needs all parsing to happen before // BatchWriteItems and LWT. BatchWriteItems needs all parsing to happen before
// any writing happens (if one of the commands has an error, none of the // any writing happens (if one of the commands has an error, none of the
// writes should be done). LWT makes it impossible for the parse step to // writes should be done). LWT makes it impossible for the parse step to
// generate "mutation" objects, because the timestamp still isn't known. // generate "mutation" objects, because the timestamp still isn't known.
@@ -2624,14 +2624,14 @@ std::optional<service::cas_shard> rmw_operation::shard_for_execute(bool needs_re
// Build the return value from the different RMW operations (UpdateItem, // Build the return value from the different RMW operations (UpdateItem,
// PutItem, DeleteItem). All these return nothing by default, but can // PutItem, DeleteItem). All these return nothing by default, but can
// optionally return Attributes if requested via the ReturnValues option. // optionally return Attributes if requested via the ReturnValues option.
static executor::request_return_type rmw_operation_return(rjson::value&& attributes, const consumed_capacity_counter& consumed_capacity, uint64_t& metric) { static future<executor::request_return_type> rmw_operation_return(rjson::value&& attributes, const consumed_capacity_counter& consumed_capacity, uint64_t& metric) {
rjson::value ret = rjson::empty_object(); rjson::value ret = rjson::empty_object();
consumed_capacity.add_consumed_capacity_to_response_if_needed(ret); consumed_capacity.add_consumed_capacity_to_response_if_needed(ret);
metric += consumed_capacity.get_consumed_capacity_units(); metric += consumed_capacity.get_consumed_capacity_units();
if (!attributes.IsNull()) { if (!attributes.IsNull()) {
rjson::add(ret, "Attributes", std::move(attributes)); rjson::add(ret, "Attributes", std::move(attributes));
} }
return rjson::print(std::move(ret)); return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
} }
static future<std::unique_ptr<rjson::value>> get_previous_item( static future<std::unique_ptr<rjson::value>> get_previous_item(
@@ -2697,10 +2697,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
stats& global_stats, stats& global_stats,
stats& per_table_stats, stats& per_table_stats,
uint64_t& wcu_total) { uint64_t& wcu_total) {
auto cdc_opts = cdc::per_request_options{ auto cdc_opts = cdc::per_request_options{};
.alternator = true,
.alternator_streams_increased_compatibility = schema()->cdc_options().enabled() && proxy.data_dictionary().get_config().alternator_streams_increased_compatibility(),
};
if (needs_read_before_write) { if (needs_read_before_write) {
if (_write_isolation == write_isolation::FORBID_RMW) { if (_write_isolation == write_isolation::FORBID_RMW) {
throw api_error::validation("Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information."); throw api_error::validation("Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information.");
@@ -2739,13 +2736,13 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
auto read_command = needs_read_before_write ? auto read_command = needs_read_before_write ?
previous_item_read_command(proxy, schema(), _ck, selection) : previous_item_read_command(proxy, schema(), _ck, selection) :
nullptr; nullptr;
return proxy.cas(schema(), std::move(*cas_shard), *this, read_command, to_partition_ranges(*schema(), _pk), return proxy.cas(schema(), std::move(*cas_shard), shared_from_this(), read_command, to_partition_ranges(*schema(), _pk),
{timeout, std::move(permit), client_state, trace_state}, {timeout, std::move(permit), client_state, trace_state},
db::consistency_level::LOCAL_SERIAL, db::consistency_level::LOCAL_QUORUM, timeout, timeout, true, std::move(cdc_opts)).then([this, read_command, &wcu_total] (bool is_applied) mutable { db::consistency_level::LOCAL_SERIAL, db::consistency_level::LOCAL_QUORUM, timeout, timeout, true, std::move(cdc_opts)).then([this, read_command, &wcu_total] (bool is_applied) mutable {
if (!is_applied) { if (!is_applied) {
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes))); return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));
} }
return make_ready_future<executor::request_return_type>(rmw_operation_return(std::move(_return_attributes), _consumed_capacity, wcu_total)); return rmw_operation_return(std::move(_return_attributes), _consumed_capacity, wcu_total);
}); });
} }
@@ -2859,7 +2856,7 @@ future<executor::request_return_type> executor::put_item(client_state& client_st
elogger.trace("put_item {}", request); elogger.trace("put_item {}", request);
auto op = make_shared<put_item_operation>(*_parsed_expression_cache, _proxy, std::move(request)); auto op = make_shared<put_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
tracing::add_alternator_table_name(trace_state, op->schema()->cf_name()); tracing::add_table_name(trace_state, op->schema()->ks_name(), op->schema()->cf_name());
const bool needs_read_before_write = op->needs_read_before_write(); const bool needs_read_before_write = op->needs_read_before_write();
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats); co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
@@ -2963,7 +2960,7 @@ future<executor::request_return_type> executor::delete_item(client_state& client
auto op = make_shared<delete_item_operation>(*_parsed_expression_cache, _proxy, std::move(request)); auto op = make_shared<delete_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *(op->schema())); lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *(op->schema()));
tracing::add_alternator_table_name(trace_state, op->schema()->cf_name()); tracing::add_table_name(trace_state, op->schema()->ks_name(), op->schema()->cf_name());
const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write(); const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write();
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats); co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
@@ -3026,20 +3023,17 @@ struct primary_key_equal {
}; };
// This is a cas_request subclass for applying given put_or_delete_items to // This is a cas_request subclass for applying given put_or_delete_items to
// one partition using LWT as part as BatchWriteItem. This is a write-only // one partition using LWT as part as BatchWriteItems. This is a write-only
// operation, not needing the previous value of the item (the mutation to be // operation, not needing the previous value of the item (the mutation to be
// done is known prior to starting the operation). Nevertheless, we want to // done is known prior to starting the operation). Nevertheless, we want to
// do this mutation via LWT to ensure that it is serialized with other LWT // do this mutation via LWT to ensure that it is serialized with other LWT
// mutations to the same partition. // mutations to the same partition.
//
// The std::vector<put_or_delete_item> must remain alive until the
// storage_proxy::cas() future is resolved.
class put_or_delete_item_cas_request : public service::cas_request { class put_or_delete_item_cas_request : public service::cas_request {
schema_ptr schema; schema_ptr schema;
const std::vector<put_or_delete_item>& _mutation_builders; std::vector<put_or_delete_item> _mutation_builders;
public: public:
put_or_delete_item_cas_request(schema_ptr s, const std::vector<put_or_delete_item>& b) : put_or_delete_item_cas_request(schema_ptr s, std::vector<put_or_delete_item>&& b) :
schema(std::move(s)), _mutation_builders(b) { } schema(std::move(s)), _mutation_builders(std::move(b)) { }
virtual ~put_or_delete_item_cas_request() = default; virtual ~put_or_delete_item_cas_request() = default;
virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts, cdc::per_request_options& cdc_opts) override { virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts, cdc::per_request_options& cdc_opts) override {
std::optional<mutation> ret; std::optional<mutation> ret;
@@ -3055,21 +3049,17 @@ public:
} }
}; };
static future<> cas_write(service::storage_proxy& proxy, schema_ptr schema, service::cas_shard cas_shard, const dht::decorated_key& dk, const std::vector<put_or_delete_item>& mutation_builders, static future<> cas_write(service::storage_proxy& proxy, schema_ptr schema, service::cas_shard cas_shard, dht::decorated_key dk, std::vector<put_or_delete_item>&& mutation_builders,
service::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit) { service::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit) {
auto timeout = executor::default_timeout(); auto timeout = executor::default_timeout();
auto op = std::make_unique<put_or_delete_item_cas_request>(schema, mutation_builders); auto op = seastar::make_shared<put_or_delete_item_cas_request>(schema, std::move(mutation_builders));
auto* op_ptr = op.get();
auto cdc_opts = cdc::per_request_options{ auto cdc_opts = cdc::per_request_options{
.alternator = true,
.alternator_streams_increased_compatibility =
schema->cdc_options().enabled() && proxy.data_dictionary().get_config().alternator_streams_increased_compatibility(),
}; };
return proxy.cas(schema, std::move(cas_shard), *op_ptr, nullptr, to_partition_ranges(dk), return proxy.cas(schema, std::move(cas_shard), op, nullptr, to_partition_ranges(dk),
{timeout, std::move(permit), client_state, trace_state}, {timeout, std::move(permit), client_state, trace_state},
db::consistency_level::LOCAL_SERIAL, db::consistency_level::LOCAL_QUORUM, db::consistency_level::LOCAL_SERIAL, db::consistency_level::LOCAL_QUORUM,
timeout, timeout, true, std::move(cdc_opts)).finally([op = std::move(op)]{}).discard_result(); timeout, timeout, true, std::move(cdc_opts)).discard_result();
// We discarded cas()'s future value ("is_applied") because BatchWriteItem // We discarded cas()'s future value ("is_applied") because BatchWriteItems
// does not need to support conditional updates. // does not need to support conditional updates.
} }
@@ -3114,10 +3104,8 @@ static future<> do_batch_write(service::storage_proxy& proxy,
utils::chunked_vector<mutation> mutations; utils::chunked_vector<mutation> mutations;
mutations.reserve(mutation_builders.size()); mutations.reserve(mutation_builders.size());
api::timestamp_type now = api::new_timestamp(); api::timestamp_type now = api::new_timestamp();
bool any_cdc_enabled = false;
for (auto& b : mutation_builders) { for (auto& b : mutation_builders) {
mutations.push_back(b.second.build(b.first, now)); mutations.push_back(b.second.build(b.first, now));
any_cdc_enabled |= b.first->cdc_options().enabled();
} }
return proxy.mutate(std::move(mutations), return proxy.mutate(std::move(mutations),
db::consistency_level::LOCAL_QUORUM, db::consistency_level::LOCAL_QUORUM,
@@ -3126,43 +3114,36 @@ static future<> do_batch_write(service::storage_proxy& proxy,
std::move(permit), std::move(permit),
db::allow_per_partition_rate_limit::yes, db::allow_per_partition_rate_limit::yes,
false, false,
cdc::per_request_options{ cdc::per_request_options{});
.alternator = true,
.alternator_streams_increased_compatibility = any_cdc_enabled && proxy.data_dictionary().get_config().alternator_streams_increased_compatibility(),
});
} else { } else {
// Do the write via LWT: // Do the write via LWT:
// Multiple mutations may be destined for the same partition, adding // Multiple mutations may be destined for the same partition, adding
// or deleting different items of one partition. Join them together // or deleting different items of one partition. Join them together
// because we can do them in one cas() call. // because we can do them in one cas() call.
using map_type = std::unordered_map<schema_decorated_key, std::unordered_map<schema_decorated_key, std::vector<put_or_delete_item>, schema_decorated_key_hash, schema_decorated_key_equal>
std::vector<put_or_delete_item>, key_builders(1, schema_decorated_key_hash{}, schema_decorated_key_equal{});
schema_decorated_key_hash,
schema_decorated_key_equal>;
auto key_builders = std::make_unique<map_type>(1, schema_decorated_key_hash{}, schema_decorated_key_equal{});
for (auto& b : mutation_builders) { for (auto& b : mutation_builders) {
auto dk = dht::decorate_key(*b.first, b.second.pk()); auto dk = dht::decorate_key(*b.first, b.second.pk());
auto [it, added] = key_builders->try_emplace(schema_decorated_key{b.first, dk}); auto [it, added] = key_builders.try_emplace(schema_decorated_key{b.first, dk});
it->second.push_back(std::move(b.second)); it->second.push_back(std::move(b.second));
} }
auto* key_builders_ptr = key_builders.get(); return parallel_for_each(std::move(key_builders), [&proxy, &client_state, &stats, trace_state, ssg, permit = std::move(permit)] (auto& e) {
return parallel_for_each(*key_builders_ptr, [&proxy, &client_state, &stats, trace_state, ssg, permit = std::move(permit)] (const auto& e) {
stats.write_using_lwt++; stats.write_using_lwt++;
auto desired_shard = service::cas_shard(*e.first.schema, e.first.dk.token()); auto desired_shard = service::cas_shard(*e.first.schema, e.first.dk.token());
if (desired_shard.this_shard()) { if (desired_shard.this_shard()) {
return cas_write(proxy, e.first.schema, std::move(desired_shard), e.first.dk, e.second, client_state, trace_state, permit); return cas_write(proxy, e.first.schema, std::move(desired_shard), e.first.dk, std::move(e.second), client_state, trace_state, permit);
} else { } else {
stats.shard_bounce_for_lwt++; stats.shard_bounce_for_lwt++;
return proxy.container().invoke_on(desired_shard.shard(), ssg, return proxy.container().invoke_on(desired_shard.shard(), ssg,
[cs = client_state.move_to_other_shard(), [cs = client_state.move_to_other_shard(),
&mb = e.second, mb = e.second,
&dk = e.first.dk, dk = e.first.dk,
ks = e.first.schema->ks_name(), ks = e.first.schema->ks_name(),
cf = e.first.schema->cf_name(), cf = e.first.schema->cf_name(),
gt = tracing::global_trace_state_ptr(trace_state), gt = tracing::global_trace_state_ptr(trace_state),
permit = std::move(permit)] permit = std::move(permit)]
(service::storage_proxy& proxy) mutable { (service::storage_proxy& proxy) mutable {
return do_with(cs.get(), [&proxy, &mb, &dk, ks = std::move(ks), cf = std::move(cf), return do_with(cs.get(), [&proxy, mb = std::move(mb), dk = std::move(dk), ks = std::move(ks), cf = std::move(cf),
trace_state = tracing::trace_state_ptr(gt)] trace_state = tracing::trace_state_ptr(gt)]
(service::client_state& client_state) mutable { (service::client_state& client_state) mutable {
auto schema = proxy.data_dictionary().find_schema(ks, cf); auto schema = proxy.data_dictionary().find_schema(ks, cf);
@@ -3176,11 +3157,11 @@ static future<> do_batch_write(service::storage_proxy& proxy,
//FIXME: Instead of passing empty_service_permit() to the background operation, //FIXME: Instead of passing empty_service_permit() to the background operation,
// the current permit's lifetime should be prolonged, so that it's destructed // the current permit's lifetime should be prolonged, so that it's destructed
// only after all background operations are finished as well. // only after all background operations are finished as well.
return cas_write(proxy, schema, std::move(cas_shard), dk, mb, client_state, std::move(trace_state), empty_service_permit()); return cas_write(proxy, schema, std::move(cas_shard), dk, std::move(mb), client_state, std::move(trace_state), empty_service_permit());
}); });
}).finally([desired_shard = std::move(desired_shard)]{}); }).finally([desired_shard = std::move(desired_shard)]{});
} }
}).finally([key_builders = std::move(key_builders)]{}); });
} }
} }
@@ -3223,7 +3204,7 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c
per_table_stats->api_operations.batch_write_item++; per_table_stats->api_operations.batch_write_item++;
per_table_stats->api_operations.batch_write_item_batch_total += it->value.Size(); per_table_stats->api_operations.batch_write_item_batch_total += it->value.Size();
per_table_stats->api_operations.batch_write_item_histogram.add(it->value.Size()); per_table_stats->api_operations.batch_write_item_histogram.add(it->value.Size());
tracing::add_alternator_table_name(trace_state, schema->cf_name()); tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
std::unordered_set<primary_key, primary_key_hash, primary_key_equal> used_keys( std::unordered_set<primary_key, primary_key_hash, primary_key_equal> used_keys(
1, primary_key_hash{schema}, primary_key_equal{schema}); 1, primary_key_hash{schema}, primary_key_equal{schema});
@@ -4483,7 +4464,7 @@ future<executor::request_return_type> executor::update_item(client_state& client
elogger.trace("update_item {}", request); elogger.trace("update_item {}", request);
auto op = make_shared<update_item_operation>(*_parsed_expression_cache, _proxy, std::move(request)); auto op = make_shared<update_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
tracing::add_alternator_table_name(trace_state, op->schema()->cf_name()); tracing::add_table_name(trace_state, op->schema()->ks_name(), op->schema()->cf_name());
const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write(); const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write();
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats); co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
@@ -4564,7 +4545,7 @@ future<executor::request_return_type> executor::get_item(client_state& client_st
schema_ptr schema = get_table(_proxy, request); schema_ptr schema = get_table(_proxy, request);
lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *schema); lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *schema);
per_table_stats->api_operations.get_item++; per_table_stats->api_operations.get_item++;
tracing::add_alternator_table_name(trace_state, schema->cf_name()); tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value& query_key = request["Key"]; rjson::value& query_key = request["Key"];
db::consistency_level cl = get_read_consistency(request); db::consistency_level cl = get_read_consistency(request);
@@ -4713,7 +4694,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
uint batch_size = 0; uint batch_size = 0;
for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) { for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
table_requests rs(get_table_from_batch_request(_proxy, it)); table_requests rs(get_table_from_batch_request(_proxy, it));
tracing::add_alternator_table_name(trace_state, rs.schema->cf_name()); tracing::add_table_name(trace_state, sstring(executor::KEYSPACE_NAME_PREFIX) + rs.schema->cf_name(), rs.schema->cf_name());
rs.cl = get_read_consistency(it->value); rs.cl = get_read_consistency(it->value);
std::unordered_set<std::string> used_attribute_names; std::unordered_set<std::string> used_attribute_names;
rs.attrs_to_get = ::make_shared<const std::optional<attrs_to_get>>(calculate_attrs_to_get(it->value, *_parsed_expression_cache, used_attribute_names)); rs.attrs_to_get = ::make_shared<const std::optional<attrs_to_get>>(calculate_attrs_to_get(it->value, *_parsed_expression_cache, used_attribute_names));
@@ -5149,15 +5130,13 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
} }
auto pos = paging_state.get_position_in_partition(); auto pos = paging_state.get_position_in_partition();
if (pos.has_key()) { if (pos.has_key()) {
// Alternator itself allows at most one column in clustering key, but auto exploded_ck = pos.key().explode();
// user can use Alternator api to access system tables which might have auto exploded_ck_it = exploded_ck.begin();
// multiple clustering key columns. So we need to handle that case here. for (const column_definition& cdef : schema.clustering_key_columns()) {
auto cdef_it = schema.clustering_key_columns().begin(); rjson::add_with_string_name(last_evaluated_key, std::string_view(cdef.name_as_text()), rjson::empty_object());
for(const auto &exploded_ck : pos.key().explode()) { rjson::value& key_entry = last_evaluated_key[cdef.name_as_text()];
rjson::add_with_string_name(last_evaluated_key, std::string_view(cdef_it->name_as_text()), rjson::empty_object()); rjson::add_with_string_name(key_entry, type_to_string(cdef.type), json_key_column_value(*exploded_ck_it, cdef));
rjson::value& key_entry = last_evaluated_key[cdef_it->name_as_text()]; ++exploded_ck_it;
rjson::add_with_string_name(key_entry, type_to_string(cdef_it->type), json_key_column_value(exploded_ck, *cdef_it));
++cdef_it;
} }
} }
// To avoid possible conflicts (and thus having to reserve these names) we // To avoid possible conflicts (and thus having to reserve these names) we
@@ -5317,7 +5296,6 @@ future<executor::request_return_type> executor::scan(client_state& client_state,
elogger.trace("Scanning {}", request); elogger.trace("Scanning {}", request);
auto [schema, table_type] = get_table_or_view(_proxy, request); auto [schema, table_type] = get_table_or_view(_proxy, request);
tracing::add_alternator_table_name(trace_state, schema->cf_name());
get_stats_from_schema(_proxy, *schema)->api_operations.scan++; get_stats_from_schema(_proxy, *schema)->api_operations.scan++;
auto segment = get_int_attribute(request, "Segment"); auto segment = get_int_attribute(request, "Segment");
auto total_segments = get_int_attribute(request, "TotalSegments"); auto total_segments = get_int_attribute(request, "TotalSegments");
@@ -5797,7 +5775,7 @@ future<executor::request_return_type> executor::query(client_state& client_state
auto [schema, table_type] = get_table_or_view(_proxy, request); auto [schema, table_type] = get_table_or_view(_proxy, request);
get_stats_from_schema(_proxy, *schema)->api_operations.query++; get_stats_from_schema(_proxy, *schema)->api_operations.query++;
tracing::add_alternator_table_name(trace_state, schema->cf_name()); tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
rjson::value* exclusive_start_key = rjson::find(request, "ExclusiveStartKey"); rjson::value* exclusive_start_key = rjson::find(request, "ExclusiveStartKey");
db::consistency_level cl = get_read_consistency(request); db::consistency_level cl = get_read_consistency(request);

View File

@@ -282,23 +282,15 @@ std::string type_to_string(data_type type) {
return it->second; return it->second;
} }
std::optional<bytes> try_get_key_column_value(const rjson::value& item, const column_definition& column) { bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text(); std::string column_name = column.name_as_text();
const rjson::value* key_typed_value = rjson::find(item, column_name); const rjson::value* key_typed_value = rjson::find(item, column_name);
if (!key_typed_value) { if (!key_typed_value) {
return std::nullopt; throw api_error::validation(fmt::format("Key column {} not found", column_name));
} }
return get_key_from_typed_value(*key_typed_value, column); return get_key_from_typed_value(*key_typed_value, column);
} }
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
auto value = try_get_key_column_value(item, column);
if (!value) {
throw api_error::validation(fmt::format("Key column {} not found", column.name_as_text()));
}
return std::move(*value);
}
// Parses the JSON encoding for a key value, which is a map with a single // Parses the JSON encoding for a key value, which is a map with a single
// entry whose key is the type and the value is the encoded value. // entry whose key is the type and the value is the encoded value.
// If this type does not match the desired "type_str", an api_error::validation // If this type does not match the desired "type_str", an api_error::validation
@@ -388,38 +380,20 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
return clustering_key::make_empty(); return clustering_key::make_empty();
} }
std::vector<bytes> raw_ck; std::vector<bytes> raw_ck;
// Note: it's possible to get more than one clustering column here, as // FIXME: this is a loop, but we really allow only one clustering key column.
// Alternator can be used to read scylla internal tables.
for (const column_definition& cdef : schema->clustering_key_columns()) { for (const column_definition& cdef : schema->clustering_key_columns()) {
auto raw_value = get_key_column_value(item, cdef); bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value)); raw_ck.push_back(std::move(raw_value));
} }
return clustering_key::from_exploded(raw_ck); return clustering_key::from_exploded(raw_ck);
} }
clustering_key_prefix ck_prefix_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key_prefix::make_empty();
}
std::vector<bytes> raw_ck;
for (const column_definition& cdef : schema->clustering_key_columns()) {
auto raw_value = try_get_key_column_value(item, cdef);
if (!raw_value) {
break;
}
raw_ck.push_back(std::move(*raw_value));
}
return clustering_key_prefix::from_exploded(raw_ck);
}
position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema) { position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema) {
const bool is_alternator_ks = is_alternator_keyspace(schema->ks_name()); auto ck = ck_from_json(item, schema);
if (is_alternator_ks) { if (is_alternator_keyspace(schema->ks_name())) {
return position_in_partition::for_key(ck_from_json(item, schema)); return position_in_partition::for_key(std::move(ck));
} }
const auto region_item = rjson::find(item, scylla_paging_region); const auto region_item = rjson::find(item, scylla_paging_region);
const auto weight_item = rjson::find(item, scylla_paging_weight); const auto weight_item = rjson::find(item, scylla_paging_weight);
if (bool(region_item) != bool(weight_item)) { if (bool(region_item) != bool(weight_item)) {
@@ -439,9 +413,8 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
} else { } else {
throw std::runtime_error(fmt::format("Invalid value for weight: {}", weight_view)); throw std::runtime_error(fmt::format("Invalid value for weight: {}", weight_view));
} }
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(ck_prefix_from_json(item, schema)) : std::nullopt); return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(std::move(ck)) : std::nullopt);
} }
auto ck = ck_from_json(item, schema);
if (ck.is_empty()) { if (ck.is_empty()) {
return position_in_partition::for_partition_start(); return position_in_partition::for_partition_start();
} }

View File

@@ -13,7 +13,6 @@
#include <seastar/http/function_handlers.hh> #include <seastar/http/function_handlers.hh>
#include <seastar/http/short_streams.hh> #include <seastar/http/short_streams.hh>
#include <seastar/core/coroutine.hh> #include <seastar/core/coroutine.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/util/defer.hh> #include <seastar/util/defer.hh>
#include <seastar/util/short_streams.hh> #include <seastar/util/short_streams.hh>
#include "seastarx.hh" #include "seastarx.hh"
@@ -33,7 +32,6 @@
#include "utils/aws_sigv4.hh" #include "utils/aws_sigv4.hh"
#include "client_data.hh" #include "client_data.hh"
#include "utils/updateable_value.hh" #include "utils/updateable_value.hh"
#include <zlib.h>
static logging::logger slogger("alternator-server"); static logging::logger slogger("alternator-server");
@@ -553,106 +551,6 @@ read_entire_stream(input_stream<char>& inp, size_t length_limit) {
co_return ret; co_return ret;
} }
// safe_gzip_stream is an exception-safe wrapper for zlib's z_stream.
// The "z_stream" struct is used by zlib to hold state while decompressing a
// stream of data. It allocates memory which must be freed with inflateEnd(),
// which the destructor of this class does.
class safe_gzip_zstream {
z_stream _zs;
public:
safe_gzip_zstream() {
memset(&_zs, 0, sizeof(_zs));
// The strange 16 + WMAX_BITS tells zlib to expect and decode
// a gzip header, not a zlib header.
if (inflateInit2(&_zs, 16 + MAX_WBITS) != Z_OK) {
// Should only happen if memory allocation fails
throw std::bad_alloc();
}
}
~safe_gzip_zstream() {
inflateEnd(&_zs);
}
z_stream* operator->() {
return &_zs;
}
z_stream* get() {
return &_zs;
}
void reset() {
inflateReset(&_zs);
}
};
// ungzip() takes a chunked_content with a gzip-compressed request body,
// uncompresses it, and returns the uncompressed content as a chunked_content.
// If the uncompressed content exceeds length_limit, an error is thrown.
static future<chunked_content>
ungzip(chunked_content&& compressed_body, size_t length_limit) {
chunked_content ret;
// output_buf can be any size - when uncompressing input_buf, it doesn't
// need to fit in a single output_buf, we'll use multiple output_buf for
// a single input_buf if needed.
constexpr size_t OUTPUT_BUF_SIZE = 4096;
temporary_buffer<char> output_buf;
safe_gzip_zstream strm;
bool complete_stream = false; // empty input is not a valid gzip
size_t total_out_bytes = 0;
for (const temporary_buffer<char>& input_buf : compressed_body) {
if (input_buf.empty()) {
continue;
}
complete_stream = false;
strm->next_in = (Bytef*) input_buf.get();
strm->avail_in = (uInt) input_buf.size();
do {
co_await coroutine::maybe_yield();
if (output_buf.empty()) {
output_buf = temporary_buffer<char>(OUTPUT_BUF_SIZE);
}
strm->next_out = (Bytef*) output_buf.get();
strm->avail_out = OUTPUT_BUF_SIZE;
int e = inflate(strm.get(), Z_NO_FLUSH);
size_t out_bytes = OUTPUT_BUF_SIZE - strm->avail_out;
if (out_bytes > 0) {
// If output_buf is nearly full, we save it as-is in ret. But
// if it only has little data, better copy to a small buffer.
if (out_bytes > OUTPUT_BUF_SIZE/2) {
ret.push_back(std::move(output_buf).prefix(out_bytes));
// output_buf is now empty. if this loop finds more input,
// we'll allocate a new output buffer.
} else {
ret.push_back(temporary_buffer<char>(output_buf.get(), out_bytes));
}
total_out_bytes += out_bytes;
if (total_out_bytes > length_limit) {
throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
}
}
if (e == Z_STREAM_END) {
// There may be more input after the first gzip stream - in
// either this input_buf or the next one. The additional input
// should be a second concatenated gzip. We need to allow that
// by resetting the gzip stream and continuing the input loop
// until there's no more input.
strm.reset();
if (strm->avail_in == 0) {
complete_stream = true;
break;
}
} else if (e != Z_OK && e != Z_BUF_ERROR) {
// DynamoDB returns an InternalServerError when given a bad
// gzip request body. See test test_broken_gzip_content
throw api_error::internal("Error during gzip decompression of request body");
}
} while (strm->avail_in > 0 || strm->avail_out == 0);
}
if (!complete_stream) {
// The gzip stream was not properly finished with Z_STREAM_END
throw api_error::internal("Truncated gzip in request body");
}
co_return ret;
}
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) { future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
_executor._stats.total_operations++; _executor._stats.total_operations++;
sstring target = req->get_header("X-Amz-Target"); sstring target = req->get_header("X-Amz-Target");
@@ -690,21 +588,6 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
units.return_units(mem_estimate - new_mem_estimate); units.return_units(mem_estimate - new_mem_estimate);
} }
auto username = co_await verify_signature(*req, content); auto username = co_await verify_signature(*req, content);
// If the request is compressed, uncompress it now, after we checked
// the signature (the signature is computed on the compressed content).
// We apply the request_content_length_limit again to the uncompressed
// content - we don't want to allow a tiny compressed request to
// expand to a huge uncompressed request.
sstring content_encoding = req->get_header("Content-Encoding");
if (content_encoding == "gzip") {
content = co_await ungzip(std::move(content), request_content_length_limit);
} else if (!content_encoding.empty()) {
// DynamoDB returns a 500 error for unsupported Content-Encoding.
// I'm not sure if this is the best error code, but let's do it too.
// See the test test_garbage_content_encoding confirming this case.
co_return api_error::internal("Unsupported Content-Encoding");
}
// As long as the system_clients_entry object is alive, this request will // As long as the system_clients_entry object is alive, this request will
// be visible in the "system.clients" virtual table. When requested, this // be visible in the "system.clients" virtual table. When requested, this
// entry will be formatted by server::ongoing_request::make_client_data(). // entry will be formatted by server::ongoing_request::make_client_data().

View File

@@ -106,8 +106,5 @@ target_link_libraries(api
wasmtime_bindings wasmtime_bindings
absl::headers) absl::headers)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(api REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers api check_headers(check-headers api
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -349,13 +349,9 @@
"type":"long", "type":"long",
"description":"The shard the task is running on" "description":"The shard the task is running on"
}, },
"creation_time":{
"type":"datetime",
"description":"The creation time of the task (when it was queued); extracted from the task_id UUID"
},
"start_time":{ "start_time":{
"type":"datetime", "type":"datetime",
"description":"The start time of the task (when execution began); unspecified (equal to epoch) when state == created" "description":"The start time of the task; unspecified (equal to epoch) when state == created"
}, },
"end_time":{ "end_time":{
"type":"datetime", "type":"datetime",
@@ -402,17 +398,13 @@
"type":"boolean", "type":"boolean",
"description":"Boolean flag indicating whether the task can be aborted" "description":"Boolean flag indicating whether the task can be aborted"
}, },
"creation_time":{
"type":"datetime",
"description":"The creation time of the task (when it was queued); extracted from the task_id UUID"
},
"start_time":{ "start_time":{
"type":"datetime", "type":"datetime",
"description":"The start time of the task (when execution began); unspecified (equal to epoch) when state == created" "description":"The start time of the task"
}, },
"end_time":{ "end_time":{
"type":"datetime", "type":"datetime",
"description":"The end time of the task (when execution completed); unspecified (equal to epoch) when the task is not completed" "description":"The end time of the task (unspecified when the task is not completed)"
}, },
"error":{ "error":{
"type":"string", "type":"string",

View File

@@ -66,13 +66,6 @@ static future<json::json_return_type> get_cf_stats(sharded<replica::database>&
}, std::plus<int64_t>()); }, std::plus<int64_t>());
} }
static future<json::json_return_type> get_cf_stats(sharded<replica::database>& db,
std::function<int64_t(const replica::column_family_stats&)> f) {
return map_reduce_cf(db, int64_t(0), [f](const replica::column_family& cf) {
return f(cf.get_stats());
}, std::plus<int64_t>());
}
static future<json::json_return_type> for_tables_on_all_shards(sharded<replica::database>& db, std::vector<table_info> tables, std::function<future<>(replica::table&)> set) { static future<json::json_return_type> for_tables_on_all_shards(sharded<replica::database>& db, std::vector<table_info> tables, std::function<future<>(replica::table&)> set) {
return do_with(std::move(tables), [&db, set] (const std::vector<table_info>& tables) { return do_with(std::move(tables), [&db, set] (const std::vector<table_info>& tables) {
return db.invoke_on_all([&tables, set] (replica::database& db) { return db.invoke_on_all([&tables, set] (replica::database& db) {
@@ -1073,14 +1066,10 @@ void set_column_family(http_context& ctx, routes& r, sharded<replica::database>&
}); });
ss::get_load.set(r, [&db] (std::unique_ptr<http::request> req) { ss::get_load.set(r, [&db] (std::unique_ptr<http::request> req) {
return get_cf_stats(db, [](const replica::column_family_stats& stats) { return get_cf_stats(db, &replica::column_family_stats::live_disk_space_used);
return stats.live_disk_space_used.on_disk;
});
}); });
ss::get_metrics_load.set(r, [&db] (std::unique_ptr<http::request> req) { ss::get_metrics_load.set(r, [&db] (std::unique_ptr<http::request> req) {
return get_cf_stats(db, [](const replica::column_family_stats& stats) { return get_cf_stats(db, &replica::column_family_stats::live_disk_space_used);
return stats.live_disk_space_used.on_disk;
});
}); });
ss::get_keyspaces.set(r, [&db] (const_req req) { ss::get_keyspaces.set(r, [&db] (const_req req) {

View File

@@ -55,7 +55,6 @@ tm::task_status make_status(tasks::task_status status, sharded<gms::gossiper>& g
res.scope = status.scope; res.scope = status.scope;
res.state = status.state; res.state = status.state;
res.is_abortable = bool(status.is_abortable); res.is_abortable = bool(status.is_abortable);
res.creation_time = get_time(status.creation_time);
res.start_time = get_time(status.start_time); res.start_time = get_time(status.start_time);
res.end_time = get_time(status.end_time); res.end_time = get_time(status.end_time);
res.error = status.error; res.error = status.error;
@@ -84,7 +83,6 @@ tm::task_stats make_stats(tasks::task_stats stats) {
res.table = stats.table; res.table = stats.table;
res.entity = stats.entity; res.entity = stats.entity;
res.shard = stats.shard; res.shard = stats.shard;
res.creation_time = get_time(stats.creation_time);
res.start_time = get_time(stats.start_time); res.start_time = get_time(stats.start_time);
res.end_time = get_time(stats.end_time);; res.end_time = get_time(stats.end_time);;
return res; return res;

View File

@@ -17,7 +17,4 @@ target_link_libraries(scylla_audit
PRIVATE PRIVATE
cql3) cql3)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(scylla_audit REUSE_FROM scylla-precompiled-header)
endif()
add_whole_archive(audit scylla_audit) add_whole_archive(audit scylla_audit)

View File

@@ -9,7 +9,6 @@ target_sources(scylla_auth
allow_all_authorizer.cc allow_all_authorizer.cc
authenticated_user.cc authenticated_user.cc
authenticator.cc authenticator.cc
cache.cc
certificate_authenticator.cc certificate_authenticator.cc
common.cc common.cc
default_authorizer.cc default_authorizer.cc
@@ -45,8 +44,5 @@ target_link_libraries(scylla_auth
add_whole_archive(auth scylla_auth) add_whole_archive(auth scylla_auth)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(scylla_auth REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers scylla_auth check_headers(check-headers scylla_auth
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -23,7 +23,6 @@ static const class_registrator<
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&,
cache&,
utils::alien_worker&> registration("org.apache.cassandra.auth.AllowAllAuthenticator"); utils::alien_worker&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
} }

View File

@@ -12,7 +12,6 @@
#include "auth/authenticated_user.hh" #include "auth/authenticated_user.hh"
#include "auth/authenticator.hh" #include "auth/authenticator.hh"
#include "auth/cache.hh"
#include "auth/common.hh" #include "auth/common.hh"
#include "utils/alien_worker.hh" #include "utils/alien_worker.hh"
@@ -30,7 +29,7 @@ extern const std::string_view allow_all_authenticator_name;
class allow_all_authenticator final : public authenticator { class allow_all_authenticator final : public authenticator {
public: public:
allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&) { allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&) {
} }
virtual future<> start() override { virtual future<> start() override {

View File

@@ -1,180 +0,0 @@
/*
* Copyright (C) 2017-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "auth/cache.hh"
#include "auth/common.hh"
#include "auth/roles-metadata.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/consistency_level_type.hh"
#include "db/system_keyspace.hh"
#include "schema/schema.hh"
#include <iterator>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/core/format.hh>
namespace auth {
logging::logger logger("auth-cache");
cache::cache(cql3::query_processor& qp) noexcept
: _current_version(0)
, _qp(qp) {
}
lw_shared_ptr<const cache::role_record> cache::get(const role_name_t& role) const noexcept {
auto it = _roles.find(role);
if (it == _roles.end()) {
return {};
}
return it->second;
}
future<lw_shared_ptr<cache::role_record>> cache::fetch_role(const role_name_t& role) const {
auto rec = make_lw_shared<role_record>();
rec->version = _current_version;
auto fetch = [this, &role](const sstring& q) {
return _qp.execute_internal(q, db::consistency_level::LOCAL_ONE,
internal_distributed_query_state(), {role},
cql3::query_processor::cache_internal::yes);
};
// roles
{
static const sstring q = format("SELECT * FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, meta::roles_table::name);
auto rs = co_await fetch(q);
if (!rs->empty()) {
auto& r = rs->one();
rec->is_superuser = r.get_or<bool>("is_superuser", false);
rec->can_login = r.get_or<bool>("can_login", false);
rec->salted_hash = r.get_or<sstring>("salted_hash", "");
if (r.has("member_of")) {
auto mo = r.get_set<sstring>("member_of");
rec->member_of.insert(
std::make_move_iterator(mo.begin()),
std::make_move_iterator(mo.end()));
}
} else {
// role got deleted
co_return nullptr;
}
}
// members
{
static const sstring q = format("SELECT role, member FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, ROLE_MEMBERS_CF);
auto rs = co_await fetch(q);
for (const auto& r : *rs) {
rec->members.insert(r.get_as<sstring>("member"));
co_await coroutine::maybe_yield();
}
}
// attributes
{
static const sstring q = format("SELECT role, name, value FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, ROLE_ATTRIBUTES_CF);
auto rs = co_await fetch(q);
for (const auto& r : *rs) {
rec->attributes[r.get_as<sstring>("name")] =
r.get_as<sstring>("value");
co_await coroutine::maybe_yield();
}
}
// permissions
{
static const sstring q = format("SELECT role, resource, permissions FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, PERMISSIONS_CF);
auto rs = co_await fetch(q);
for (const auto& r : *rs) {
auto resource = r.get_as<sstring>("resource");
auto perms_strings = r.get_set<sstring>("permissions");
std::unordered_set<sstring> perms_set(perms_strings.begin(), perms_strings.end());
auto pset = permissions::from_strings(perms_set);
rec->permissions[std::move(resource)] = std::move(pset);
co_await coroutine::maybe_yield();
}
}
co_return rec;
}
future<> cache::prune_all() noexcept {
for (auto it = _roles.begin(); it != _roles.end(); ) {
if (it->second->version != _current_version) {
_roles.erase(it++);
co_await coroutine::maybe_yield();
} else {
++it;
}
}
co_return;
}
future<> cache::load_all() {
if (legacy_mode(_qp)) {
co_return;
}
SCYLLA_ASSERT(this_shard_id() == 0);
++_current_version;
logger.info("Loading all roles");
const uint32_t page_size = 128;
auto loader = [this](const cql3::untyped_result_set::row& r) -> future<stop_iteration> {
const auto name = r.get_as<sstring>("role");
auto role = co_await fetch_role(name);
if (role) {
_roles[name] = role;
}
co_return stop_iteration::no;
};
co_await _qp.query_internal(format("SELECT * FROM {}.{}",
db::system_keyspace::NAME, meta::roles_table::name),
db::consistency_level::LOCAL_ONE, {}, page_size, loader);
co_await prune_all();
for (const auto& [name, role] : _roles) {
co_await distribute_role(name, role);
}
co_await container().invoke_on_others([this](cache& c) -> future<> {
c._current_version = _current_version;
co_await c.prune_all();
});
}
future<> cache::load_roles(std::unordered_set<role_name_t> roles) {
if (legacy_mode(_qp)) {
co_return;
}
for (const auto& name : roles) {
logger.info("Loading role {}", name);
auto role = co_await fetch_role(name);
if (role) {
_roles[name] = role;
} else {
_roles.erase(name);
}
co_await distribute_role(name, role);
}
}
future<> cache::distribute_role(const role_name_t& name, lw_shared_ptr<role_record> role) {
auto role_ptr = role.get();
co_await container().invoke_on_others([&name, role_ptr](cache& c) {
if (!role_ptr) {
c._roles.erase(name);
return;
}
auto role_copy = make_lw_shared<role_record>(*role_ptr);
c._roles[name] = std::move(role_copy);
});
}
bool cache::includes_table(const table_id& id) noexcept {
return id == db::system_keyspace::roles()->id()
|| id == db::system_keyspace::role_members()->id()
|| id == db::system_keyspace::role_attributes()->id()
|| id == db::system_keyspace::role_permissions()->id();
}
} // namespace auth

View File

@@ -1,61 +0,0 @@
/*
* Copyright (C) 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <unordered_set>
#include <unordered_map>
#include <seastar/core/sstring.hh>
#include <seastar/core/future.hh>
#include <seastar/core/sharded.hh>
#include <seastar/core/shared_ptr.hh>
#include <absl/container/flat_hash_map.h>
#include "auth/permission.hh"
#include "auth/common.hh"
namespace cql3 { class query_processor; }
namespace auth {
class cache : public peering_sharded_service<cache> {
public:
using role_name_t = sstring;
using version_tag_t = char;
struct role_record {
bool can_login = false;
bool is_superuser = false;
std::unordered_set<role_name_t> member_of;
std::unordered_set<role_name_t> members;
sstring salted_hash;
std::unordered_map<sstring, sstring> attributes;
std::unordered_map<sstring, permission_set> permissions;
version_tag_t version; // used for seamless cache reloads
};
explicit cache(cql3::query_processor& qp) noexcept;
lw_shared_ptr<const role_record> get(const role_name_t& role) const noexcept;
future<> load_all();
future<> load_roles(std::unordered_set<role_name_t> roles);
static bool includes_table(const table_id&) noexcept;
private:
using roles_map = absl::flat_hash_map<role_name_t, lw_shared_ptr<role_record>>;
roles_map _roles;
version_tag_t _current_version;
cql3::query_processor& _qp;
future<lw_shared_ptr<role_record>> fetch_role(const role_name_t& role) const;
future<> prune_all() noexcept;
future<> distribute_role(const role_name_t& name, const lw_shared_ptr<role_record> role);
};
} // namespace auth

View File

@@ -8,7 +8,6 @@
*/ */
#include "auth/certificate_authenticator.hh" #include "auth/certificate_authenticator.hh"
#include "auth/cache.hh"
#include <boost/regex.hpp> #include <boost/regex.hpp>
#include <fmt/ranges.h> #include <fmt/ranges.h>
@@ -35,14 +34,13 @@ static const class_registrator<auth::authenticator
, cql3::query_processor& , cql3::query_processor&
, ::service::raft_group0_client& , ::service::raft_group0_client&
, ::service::migration_manager& , ::service::migration_manager&
, auth::cache&
, utils::alien_worker&> cert_auth_reg(CERT_AUTH_NAME); , utils::alien_worker&> cert_auth_reg(CERT_AUTH_NAME);
enum class auth::certificate_authenticator::query_source { enum class auth::certificate_authenticator::query_source {
subject, altname subject, altname
}; };
auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, auth::cache&, utils::alien_worker&) auth::certificate_authenticator::certificate_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&)
: _queries([&] { : _queries([&] {
auto& conf = qp.db().get_config(); auto& conf = qp.db().get_config();
auto queries = conf.auth_certificate_role_queries(); auto queries = conf.auth_certificate_role_queries();

View File

@@ -26,15 +26,13 @@ class raft_group0_client;
namespace auth { namespace auth {
class cache;
extern const std::string_view certificate_authenticator_name; extern const std::string_view certificate_authenticator_name;
class certificate_authenticator : public authenticator { class certificate_authenticator : public authenticator {
enum class query_source; enum class query_source;
std::vector<std::pair<query_source, boost::regex>> _queries; std::vector<std::pair<query_source, boost::regex>> _queries;
public: public:
certificate_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&); certificate_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
~certificate_authenticator(); ~certificate_authenticator();
future<> start() override; future<> start() override;

View File

@@ -48,10 +48,6 @@ extern constinit const std::string_view AUTH_PACKAGE_NAME;
} // namespace meta } // namespace meta
constexpr std::string_view PERMISSIONS_CF = "role_permissions";
constexpr std::string_view ROLE_MEMBERS_CF = "role_members";
constexpr std::string_view ROLE_ATTRIBUTES_CF = "role_attributes";
// This is a helper to check whether auth-v2 is on. // This is a helper to check whether auth-v2 is on.
bool legacy_mode(cql3::query_processor& qp); bool legacy_mode(cql3::query_processor& qp);

View File

@@ -37,6 +37,7 @@ std::string_view default_authorizer::qualified_java_name() const {
static constexpr std::string_view ROLE_NAME = "role"; static constexpr std::string_view ROLE_NAME = "role";
static constexpr std::string_view RESOURCE_NAME = "resource"; static constexpr std::string_view RESOURCE_NAME = "resource";
static constexpr std::string_view PERMISSIONS_NAME = "permissions"; static constexpr std::string_view PERMISSIONS_NAME = "permissions";
static constexpr std::string_view PERMISSIONS_CF = "role_permissions";
static logging::logger alogger("default_authorizer"); static logging::logger alogger("default_authorizer");

View File

@@ -83,18 +83,17 @@ static const class_registrator<
ldap_role_manager, ldap_role_manager,
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&> registration(ldap_role_manager_full_name);
cache&> registration(ldap_role_manager_full_name);
ldap_role_manager::ldap_role_manager( ldap_role_manager::ldap_role_manager(
std::string_view query_template, std::string_view target_attr, std::string_view bind_name, std::string_view bind_password, std::string_view query_template, std::string_view target_attr, std::string_view bind_name, std::string_view bind_password,
cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache) cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm)
: _std_mgr(qp, rg0c, mm, cache), _group0_client(rg0c), _query_template(query_template), _target_attr(target_attr), _bind_name(bind_name) : _std_mgr(qp, rg0c, mm), _group0_client(rg0c), _query_template(query_template), _target_attr(target_attr), _bind_name(bind_name)
, _bind_password(bind_password) , _bind_password(bind_password)
, _connection_factory(bind(std::mem_fn(&ldap_role_manager::reconnect), std::ref(*this))) { , _connection_factory(bind(std::mem_fn(&ldap_role_manager::reconnect), std::ref(*this))) {
} }
ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache) ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm)
: ldap_role_manager( : ldap_role_manager(
qp.db().get_config().ldap_url_template(), qp.db().get_config().ldap_url_template(),
qp.db().get_config().ldap_attr_role(), qp.db().get_config().ldap_attr_role(),
@@ -102,8 +101,7 @@ ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_
qp.db().get_config().ldap_bind_passwd(), qp.db().get_config().ldap_bind_passwd(),
qp, qp,
rg0c, rg0c,
mm, mm) {
cache) {
} }
std::string_view ldap_role_manager::qualified_java_name() const noexcept { std::string_view ldap_role_manager::qualified_java_name() const noexcept {

View File

@@ -14,7 +14,6 @@
#include "ent/ldap/ldap_connection.hh" #include "ent/ldap/ldap_connection.hh"
#include "standard_role_manager.hh" #include "standard_role_manager.hh"
#include "auth/cache.hh"
namespace auth { namespace auth {
@@ -44,13 +43,12 @@ class ldap_role_manager : public role_manager {
std::string_view bind_password, ///< LDAP bind credentials. std::string_view bind_password, ///< LDAP bind credentials.
cql3::query_processor& qp, ///< Passed to standard_role_manager. cql3::query_processor& qp, ///< Passed to standard_role_manager.
::service::raft_group0_client& rg0c, ///< Passed to standard_role_manager. ::service::raft_group0_client& rg0c, ///< Passed to standard_role_manager.
::service::migration_manager& mm, ///< Passed to standard_role_manager. ::service::migration_manager& mm ///< Passed to standard_role_manager.
cache& cache ///< Passed to standard_role_manager.
); );
/// Retrieves LDAP configuration entries from qp and invokes the other constructor. Required by /// Retrieves LDAP configuration entries from qp and invokes the other constructor. Required by
/// class_registrator<role_manager>. /// class_registrator<role_manager>.
ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache); ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm);
/// Thrown when query-template parsing fails. /// Thrown when query-template parsing fails.
struct url_error : public std::runtime_error { struct url_error : public std::runtime_error {

View File

@@ -11,7 +11,6 @@
#include <seastar/core/future.hh> #include <seastar/core/future.hh>
#include <stdexcept> #include <stdexcept>
#include <string_view> #include <string_view>
#include "auth/cache.hh"
#include "cql3/description.hh" #include "cql3/description.hh"
#include "utils/class_registrator.hh" #include "utils/class_registrator.hh"
@@ -24,8 +23,7 @@ static const class_registrator<
maintenance_socket_role_manager, maintenance_socket_role_manager,
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&> registration(sstring{maintenance_socket_role_manager_name});
cache&> registration(sstring{maintenance_socket_role_manager_name});
std::string_view maintenance_socket_role_manager::qualified_java_name() const noexcept { std::string_view maintenance_socket_role_manager::qualified_java_name() const noexcept {

View File

@@ -8,7 +8,6 @@
#pragma once #pragma once
#include "auth/cache.hh"
#include "auth/resource.hh" #include "auth/resource.hh"
#include "auth/role_manager.hh" #include "auth/role_manager.hh"
#include <seastar/core/future.hh> #include <seastar/core/future.hh>
@@ -30,7 +29,7 @@ extern const std::string_view maintenance_socket_role_manager_name;
// system_auth keyspace, which may be not yet created when the maintenance socket starts listening. // system_auth keyspace, which may be not yet created when the maintenance socket starts listening.
class maintenance_socket_role_manager final : public role_manager { class maintenance_socket_role_manager final : public role_manager {
public: public:
maintenance_socket_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&) {} maintenance_socket_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {}
virtual std::string_view qualified_java_name() const noexcept override; virtual std::string_view qualified_java_name() const noexcept override;

View File

@@ -49,7 +49,6 @@ static const class_registrator<
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&,
cache&,
utils::alien_worker&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator"); utils::alien_worker&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}()); static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
@@ -64,11 +63,10 @@ std::string password_authenticator::default_superuser(const db::config& cfg) {
password_authenticator::~password_authenticator() { password_authenticator::~password_authenticator() {
} }
password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache, utils::alien_worker& hashing_worker) password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, utils::alien_worker& hashing_worker)
: _qp(qp) : _qp(qp)
, _group0_client(g0) , _group0_client(g0)
, _migration_manager(mm) , _migration_manager(mm)
, _cache(cache)
, _stopped(make_ready_future<>()) , _stopped(make_ready_future<>())
, _superuser(default_superuser(qp.db().get_config())) , _superuser(default_superuser(qp.db().get_config()))
, _hashing_worker(hashing_worker) , _hashing_worker(hashing_worker)
@@ -317,20 +315,11 @@ future<authenticated_user> password_authenticator::authenticate(
const sstring password = credentials.at(PASSWORD_KEY); const sstring password = credentials.at(PASSWORD_KEY);
try { try {
std::optional<sstring> salted_hash; const std::optional<sstring> salted_hash = co_await get_password_hash(username);
if (legacy_mode(_qp)) { if (!salted_hash) {
salted_hash = co_await get_password_hash(username); throw exceptions::authentication_exception("Username and/or password are incorrect");
if (!salted_hash) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
} else {
auto role = _cache.get(username);
if (!role || role->salted_hash.empty()) {
throw exceptions::authentication_exception("Username and/or password are incorrect");
}
salted_hash = role->salted_hash;
} }
const bool password_match = co_await _hashing_worker.submit<bool>([password = std::move(password), salted_hash] { const bool password_match = co_await _hashing_worker.submit<bool>([password = std::move(password), salted_hash = std::move(salted_hash)]{
return passwords::check(password, *salted_hash); return passwords::check(password, *salted_hash);
}); });
if (!password_match) { if (!password_match) {

View File

@@ -16,7 +16,6 @@
#include "db/consistency_level_type.hh" #include "db/consistency_level_type.hh"
#include "auth/authenticator.hh" #include "auth/authenticator.hh"
#include "auth/passwords.hh" #include "auth/passwords.hh"
#include "auth/cache.hh"
#include "service/raft/raft_group0_client.hh" #include "service/raft/raft_group0_client.hh"
#include "utils/alien_worker.hh" #include "utils/alien_worker.hh"
@@ -42,7 +41,6 @@ class password_authenticator : public authenticator {
cql3::query_processor& _qp; cql3::query_processor& _qp;
::service::raft_group0_client& _group0_client; ::service::raft_group0_client& _group0_client;
::service::migration_manager& _migration_manager; ::service::migration_manager& _migration_manager;
cache& _cache;
future<> _stopped; future<> _stopped;
abort_source _as; abort_source _as;
std::string _superuser; // default superuser name from the config (may or may not be present in roles table) std::string _superuser; // default superuser name from the config (may or may not be present in roles table)
@@ -55,7 +53,7 @@ public:
static db::consistency_level consistency_for_user(std::string_view role_name); static db::consistency_level consistency_for_user(std::string_view role_name);
static std::string default_superuser(const db::config&); static std::string default_superuser(const db::config&);
password_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&); password_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
~password_authenticator(); ~password_authenticator();

View File

@@ -35,10 +35,9 @@ static const class_registrator<
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&,
cache&,
utils::alien_worker&> saslauthd_auth_reg("com.scylladb.auth.SaslauthdAuthenticator"); utils::alien_worker&> saslauthd_auth_reg("com.scylladb.auth.SaslauthdAuthenticator");
saslauthd_authenticator::saslauthd_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&) saslauthd_authenticator::saslauthd_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&)
: _socket_path(qp.db().get_config().saslauthd_socket_path()) : _socket_path(qp.db().get_config().saslauthd_socket_path())
{} {}

View File

@@ -11,7 +11,6 @@
#pragma once #pragma once
#include "auth/authenticator.hh" #include "auth/authenticator.hh"
#include "auth/cache.hh"
#include "utils/alien_worker.hh" #include "utils/alien_worker.hh"
namespace cql3 { namespace cql3 {
@@ -30,7 +29,7 @@ namespace auth {
class saslauthd_authenticator : public authenticator { class saslauthd_authenticator : public authenticator {
sstring _socket_path; ///< Path to the domain socket on which saslauthd is listening. sstring _socket_path; ///< Path to the domain socket on which saslauthd is listening.
public: public:
saslauthd_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&,utils::alien_worker&); saslauthd_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
future<> start() override; future<> start() override;

View File

@@ -17,7 +17,6 @@
#include <chrono> #include <chrono>
#include <seastar/core/future-util.hh> #include <seastar/core/future-util.hh>
#include <seastar/core/shard_id.hh>
#include <seastar/core/sharded.hh> #include <seastar/core/sharded.hh>
#include <seastar/core/shared_ptr.hh> #include <seastar/core/shared_ptr.hh>
@@ -158,7 +157,6 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n
service::service( service::service(
utils::loading_cache_config c, utils::loading_cache_config c,
cache& cache,
cql3::query_processor& qp, cql3::query_processor& qp,
::service::raft_group0_client& g0, ::service::raft_group0_client& g0,
::service::migration_notifier& mn, ::service::migration_notifier& mn,
@@ -168,7 +166,6 @@ service::service(
maintenance_socket_enabled used_by_maintenance_socket) maintenance_socket_enabled used_by_maintenance_socket)
: _loading_cache_config(std::move(c)) : _loading_cache_config(std::move(c))
, _permissions_cache(nullptr) , _permissions_cache(nullptr)
, _cache(cache)
, _qp(qp) , _qp(qp)
, _group0_client(g0) , _group0_client(g0)
, _mnotifier(mn) , _mnotifier(mn)
@@ -191,17 +188,15 @@ service::service(
::service::migration_manager& mm, ::service::migration_manager& mm,
const service_config& sc, const service_config& sc,
maintenance_socket_enabled used_by_maintenance_socket, maintenance_socket_enabled used_by_maintenance_socket,
cache& cache,
utils::alien_worker& hashing_worker) utils::alien_worker& hashing_worker)
: service( : service(
std::move(c), std::move(c),
cache,
qp, qp,
g0, g0,
mn, mn,
create_object<authorizer>(sc.authorizer_java_name, qp, g0, mm), create_object<authorizer>(sc.authorizer_java_name, qp, g0, mm),
create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm, cache, hashing_worker), create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm, hashing_worker),
create_object<role_manager>(sc.role_manager_java_name, qp, g0, mm, cache), create_object<role_manager>(sc.role_manager_java_name, qp, g0, mm),
used_by_maintenance_socket) { used_by_maintenance_socket) {
} }
@@ -237,9 +232,6 @@ future<> service::start(::service::migration_manager& mm, db::system_keyspace& s
auto auth_version = co_await sys_ks.get_auth_version(); auto auth_version = co_await sys_ks.get_auth_version();
// version is set in query processor to be easily available in various places we call auth::legacy_mode check. // version is set in query processor to be easily available in various places we call auth::legacy_mode check.
_qp.auth_version = auth_version; _qp.auth_version = auth_version;
if (this_shard_id() == 0) {
co_await _cache.load_all();
}
if (!_used_by_maintenance_socket) { if (!_used_by_maintenance_socket) {
// this legacy keyspace is only used by cqlsh // this legacy keyspace is only used by cqlsh
// it's needed when executing `list roles` or `list users` // it's needed when executing `list roles` or `list users`

View File

@@ -21,7 +21,6 @@
#include "auth/authorizer.hh" #include "auth/authorizer.hh"
#include "auth/permission.hh" #include "auth/permission.hh"
#include "auth/permissions_cache.hh" #include "auth/permissions_cache.hh"
#include "auth/cache.hh"
#include "auth/role_manager.hh" #include "auth/role_manager.hh"
#include "auth/common.hh" #include "auth/common.hh"
#include "cql3/description.hh" #include "cql3/description.hh"
@@ -78,7 +77,6 @@ public:
class service final : public seastar::peering_sharded_service<service> { class service final : public seastar::peering_sharded_service<service> {
utils::loading_cache_config _loading_cache_config; utils::loading_cache_config _loading_cache_config;
std::unique_ptr<permissions_cache> _permissions_cache; std::unique_ptr<permissions_cache> _permissions_cache;
cache& _cache;
cql3::query_processor& _qp; cql3::query_processor& _qp;
@@ -109,7 +107,6 @@ class service final : public seastar::peering_sharded_service<service> {
public: public:
service( service(
utils::loading_cache_config, utils::loading_cache_config,
cache& cache,
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_notifier&, ::service::migration_notifier&,
@@ -131,7 +128,6 @@ public:
::service::migration_manager&, ::service::migration_manager&,
const service_config&, const service_config&,
maintenance_socket_enabled, maintenance_socket_enabled,
cache&,
utils::alien_worker&); utils::alien_worker&);
future<> start(::service::migration_manager&, db::system_keyspace&); future<> start(::service::migration_manager&, db::system_keyspace&);

View File

@@ -41,6 +41,21 @@
namespace auth { namespace auth {
namespace meta {
namespace role_members_table {
constexpr std::string_view name{"role_members" , 12};
}
namespace role_attributes_table {
constexpr std::string_view name{"role_attributes", 15};
}
}
static logging::logger log("standard_role_manager"); static logging::logger log("standard_role_manager");
@@ -49,8 +64,7 @@ static const class_registrator<
standard_role_manager, standard_role_manager,
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&> registration("org.apache.cassandra.auth.CassandraRoleManager");
cache&> registration("org.apache.cassandra.auth.CassandraRoleManager");
struct record final { struct record final {
sstring name; sstring name;
@@ -107,11 +121,10 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob_unfragmented("can_login")).is_null()); return row.has("can_login") && !(boolean_type->deserialize(row.get_blob_unfragmented("can_login")).is_null());
} }
standard_role_manager::standard_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache) standard_role_manager::standard_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm)
: _qp(qp) : _qp(qp)
, _group0_client(g0) , _group0_client(g0)
, _migration_manager(mm) , _migration_manager(mm)
, _cache(cache)
, _stopped(make_ready_future<>()) , _stopped(make_ready_future<>())
, _superuser(password_authenticator::default_superuser(qp.db().get_config())) , _superuser(password_authenticator::default_superuser(qp.db().get_config()))
{} {}
@@ -123,7 +136,7 @@ std::string_view standard_role_manager::qualified_java_name() const noexcept {
const resource_set& standard_role_manager::protected_resources() const { const resource_set& standard_role_manager::protected_resources() const {
static const resource_set resources({ static const resource_set resources({
make_data_resource(meta::legacy::AUTH_KS, meta::roles_table::name), make_data_resource(meta::legacy::AUTH_KS, meta::roles_table::name),
make_data_resource(meta::legacy::AUTH_KS, ROLE_MEMBERS_CF)}); make_data_resource(meta::legacy::AUTH_KS, meta::role_members_table::name)});
return resources; return resources;
} }
@@ -147,7 +160,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
" PRIMARY KEY (role, member)" " PRIMARY KEY (role, member)"
")", ")",
meta::legacy::AUTH_KS, meta::legacy::AUTH_KS,
ROLE_MEMBERS_CF); meta::role_members_table::name);
static const sstring create_role_attributes_query = seastar::format( static const sstring create_role_attributes_query = seastar::format(
"CREATE TABLE {}.{} (" "CREATE TABLE {}.{} ("
" role text," " role text,"
@@ -156,7 +169,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
" PRIMARY KEY(role, name)" " PRIMARY KEY(role, name)"
")", ")",
meta::legacy::AUTH_KS, meta::legacy::AUTH_KS,
ROLE_ATTRIBUTES_CF); meta::role_attributes_table::name);
return when_all_succeed( return when_all_succeed(
create_legacy_metadata_table_if_missing( create_legacy_metadata_table_if_missing(
meta::roles_table::name, meta::roles_table::name,
@@ -164,12 +177,12 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
create_roles_query, create_roles_query,
_migration_manager), _migration_manager),
create_legacy_metadata_table_if_missing( create_legacy_metadata_table_if_missing(
ROLE_MEMBERS_CF, meta::role_members_table::name,
_qp, _qp,
create_role_members_query, create_role_members_query,
_migration_manager), _migration_manager),
create_legacy_metadata_table_if_missing( create_legacy_metadata_table_if_missing(
ROLE_ATTRIBUTES_CF, meta::role_attributes_table::name,
_qp, _qp,
create_role_attributes_query, create_role_attributes_query,
_migration_manager)).discard_result(); _migration_manager)).discard_result();
@@ -416,7 +429,7 @@ future<> standard_role_manager::drop(std::string_view role_name, ::service::grou
const auto revoke_from_members = [this, role_name, &mc] () -> future<> { const auto revoke_from_members = [this, role_name, &mc] () -> future<> {
const sstring query = seastar::format("SELECT member FROM {}.{} WHERE role = ?", const sstring query = seastar::format("SELECT member FROM {}.{} WHERE role = ?",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_MEMBERS_CF); meta::role_members_table::name);
const auto members = co_await _qp.execute_internal( const auto members = co_await _qp.execute_internal(
query, query,
consistency_for_role(role_name), consistency_for_role(role_name),
@@ -448,7 +461,7 @@ future<> standard_role_manager::drop(std::string_view role_name, ::service::grou
const auto remove_attributes_of = [this, role_name, &mc] () -> future<> { const auto remove_attributes_of = [this, role_name, &mc] () -> future<> {
const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ?", const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ?",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_ATTRIBUTES_CF); meta::role_attributes_table::name);
if (legacy_mode(_qp)) { if (legacy_mode(_qp)) {
co_await _qp.execute_internal(query, {sstring(role_name)}, co_await _qp.execute_internal(query, {sstring(role_name)},
cql3::query_processor::cache_internal::yes).discard_result(); cql3::query_processor::cache_internal::yes).discard_result();
@@ -504,7 +517,7 @@ standard_role_manager::legacy_modify_membership(
case membership_change::add: { case membership_change::add: {
const sstring insert_query = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)", const sstring insert_query = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_MEMBERS_CF); meta::role_members_table::name);
co_return co_await _qp.execute_internal( co_return co_await _qp.execute_internal(
insert_query, insert_query,
consistency_for_role(role_name), consistency_for_role(role_name),
@@ -516,7 +529,7 @@ standard_role_manager::legacy_modify_membership(
case membership_change::remove: { case membership_change::remove: {
const sstring delete_query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?", const sstring delete_query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_MEMBERS_CF); meta::role_members_table::name);
co_return co_await _qp.execute_internal( co_return co_await _qp.execute_internal(
delete_query, delete_query,
consistency_for_role(role_name), consistency_for_role(role_name),
@@ -554,12 +567,12 @@ standard_role_manager::modify_membership(
case membership_change::add: case membership_change::add:
modify_role_members = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)", modify_role_members = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_MEMBERS_CF); meta::role_members_table::name);
break; break;
case membership_change::remove: case membership_change::remove:
modify_role_members = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?", modify_role_members = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_MEMBERS_CF); meta::role_members_table::name);
break; break;
default: default:
on_internal_error(log, format("unknown membership_change value: {}", int(ch))); on_internal_error(log, format("unknown membership_change value: {}", int(ch)));
@@ -653,7 +666,7 @@ future<role_set> standard_role_manager::query_granted(std::string_view grantee_n
future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted(::service::query_state& qs) { future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted(::service::query_state& qs) {
const sstring query = seastar::format("SELECT * FROM {}.{}", const sstring query = seastar::format("SELECT * FROM {}.{}",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_MEMBERS_CF); meta::role_members_table::name);
const auto results = co_await _qp.execute_internal( const auto results = co_await _qp.execute_internal(
query, query,
@@ -718,21 +731,15 @@ future<bool> standard_role_manager::is_superuser(std::string_view role_name) {
} }
future<bool> standard_role_manager::can_login(std::string_view role_name) { future<bool> standard_role_manager::can_login(std::string_view role_name) {
if (legacy_mode(_qp)) { return require_record(_qp, role_name).then([](record r) {
const auto r = co_await require_record(_qp, role_name); return r.can_login;
co_return r.can_login; });
}
auto role = _cache.get(sstring(role_name));
if (!role) {
throw nonexistant_role(role_name);
}
co_return role->can_login;
} }
future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) { future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) {
const sstring query = seastar::format("SELECT name, value FROM {}.{} WHERE role = ? AND name = ?", const sstring query = seastar::format("SELECT name, value FROM {}.{} WHERE role = ? AND name = ?",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_ATTRIBUTES_CF); meta::role_attributes_table::name);
const auto result_set = co_await _qp.execute_internal(query, db::consistency_level::ONE, qs, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes); const auto result_set = co_await _qp.execute_internal(query, db::consistency_level::ONE, qs, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes);
if (!result_set->empty()) { if (!result_set->empty()) {
const cql3::untyped_result_set_row &row = result_set->one(); const cql3::untyped_result_set_row &row = result_set->one();
@@ -763,7 +770,7 @@ future<> standard_role_manager::set_attribute(std::string_view role_name, std::s
} }
const sstring query = seastar::format("INSERT INTO {}.{} (role, name, value) VALUES (?, ?, ?)", const sstring query = seastar::format("INSERT INTO {}.{} (role, name, value) VALUES (?, ?, ?)",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_ATTRIBUTES_CF); meta::role_attributes_table::name);
if (legacy_mode(_qp)) { if (legacy_mode(_qp)) {
co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, cql3::query_processor::cache_internal::yes).discard_result(); co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, cql3::query_processor::cache_internal::yes).discard_result();
} else { } else {
@@ -778,7 +785,7 @@ future<> standard_role_manager::remove_attribute(std::string_view role_name, std
} }
const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND name = ?", const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND name = ?",
get_auth_ks_name(_qp), get_auth_ks_name(_qp),
ROLE_ATTRIBUTES_CF); meta::role_attributes_table::name);
if (legacy_mode(_qp)) { if (legacy_mode(_qp)) {
co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes).discard_result(); co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes).discard_result();
} else { } else {

View File

@@ -10,7 +10,6 @@
#include "auth/common.hh" #include "auth/common.hh"
#include "auth/role_manager.hh" #include "auth/role_manager.hh"
#include "auth/cache.hh"
#include <string_view> #include <string_view>
@@ -37,14 +36,13 @@ class standard_role_manager final : public role_manager {
cql3::query_processor& _qp; cql3::query_processor& _qp;
::service::raft_group0_client& _group0_client; ::service::raft_group0_client& _group0_client;
::service::migration_manager& _migration_manager; ::service::migration_manager& _migration_manager;
cache& _cache;
future<> _stopped; future<> _stopped;
abort_source _as; abort_source _as;
std::string _superuser; std::string _superuser;
shared_promise<> _superuser_created_promise; shared_promise<> _superuser_created_promise;
public: public:
standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&); standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&);
virtual std::string_view qualified_java_name() const noexcept override; virtual std::string_view qualified_java_name() const noexcept override;

View File

@@ -13,7 +13,6 @@
#include "auth/authorizer.hh" #include "auth/authorizer.hh"
#include "auth/default_authorizer.hh" #include "auth/default_authorizer.hh"
#include "auth/password_authenticator.hh" #include "auth/password_authenticator.hh"
#include "auth/cache.hh"
#include "auth/permission.hh" #include "auth/permission.hh"
#include "service/raft/raft_group0_client.hh" #include "service/raft/raft_group0_client.hh"
#include "utils/class_registrator.hh" #include "utils/class_registrator.hh"
@@ -38,8 +37,8 @@ class transitional_authenticator : public authenticator {
public: public:
static const sstring PASSWORD_AUTHENTICATOR_NAME; static const sstring PASSWORD_AUTHENTICATOR_NAME;
transitional_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache, utils::alien_worker& hashing_worker) transitional_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, utils::alien_worker& hashing_worker)
: transitional_authenticator(std::make_unique<password_authenticator>(qp, g0, mm, cache, hashing_worker)) { : transitional_authenticator(std::make_unique<password_authenticator>(qp, g0, mm, hashing_worker)) {
} }
transitional_authenticator(std::unique_ptr<authenticator> a) transitional_authenticator(std::unique_ptr<authenticator> a)
: _authenticator(std::move(a)) { : _authenticator(std::move(a)) {
@@ -241,7 +240,6 @@ static const class_registrator<
cql3::query_processor&, cql3::query_processor&,
::service::raft_group0_client&, ::service::raft_group0_client&,
::service::migration_manager&, ::service::migration_manager&,
auth::cache&,
utils::alien_worker&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator"); utils::alien_worker&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator");
static const class_registrator< static const class_registrator<

View File

@@ -15,7 +15,6 @@
#include <cmath> #include <cmath>
#include "seastarx.hh" #include "seastarx.hh"
#include "backlog_controller_fwd.hh"
// Simple proportional controller to adjust shares for processes for which a backlog can be clearly // Simple proportional controller to adjust shares for processes for which a backlog can be clearly
// defined. // defined.
@@ -129,21 +128,11 @@ public:
static constexpr unsigned normalization_factor = 30; static constexpr unsigned normalization_factor = 30;
static constexpr float disable_backlog = std::numeric_limits<double>::infinity(); static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); } static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
static inline const std::vector<backlog_controller::control_point> default_control_points = { compaction_controller(backlog_controller::scheduling_group sg, float static_shares, std::chrono::milliseconds interval, std::function<float()> current_backlog)
backlog_controller::control_point{0.0, 50}, {1.5, 100}, {normalization_factor, default_compaction_maximum_shares}};
compaction_controller(backlog_controller::scheduling_group sg, float static_shares, std::optional<float> max_shares,
std::chrono::milliseconds interval, std::function<float()> current_backlog)
: backlog_controller(std::move(sg), std::move(interval), : backlog_controller(std::move(sg), std::move(interval),
default_control_points, std::vector<backlog_controller::control_point>({{0.0, 50}, {1.5, 100} , {normalization_factor, 1000}}),
std::move(current_backlog), std::move(current_backlog),
static_shares static_shares
) )
{ {}
if (max_shares) {
set_max_shares(*max_shares);
}
}
// Updates the maximum output value for control points.
void set_max_shares(float max_shares);
}; };

View File

@@ -1,13 +0,0 @@
/*
* Copyright (C) 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <cstdint>
static constexpr uint64_t default_compaction_maximum_shares = 1000;

View File

@@ -17,8 +17,5 @@ target_link_libraries(cdc
PRIVATE PRIVATE
replica) replica)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(cdc REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers cdc check_headers(check-headers cdc
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -25,7 +25,6 @@
#include "locator/abstract_replication_strategy.hh" #include "locator/abstract_replication_strategy.hh"
#include "locator/topology.hh" #include "locator/topology.hh"
#include "replica/database.hh" #include "replica/database.hh"
#include "db/config.hh"
#include "db/schema_tables.hh" #include "db/schema_tables.hh"
#include "gms/feature_service.hh" #include "gms/feature_service.hh"
#include "schema/schema.hh" #include "schema/schema.hh"
@@ -587,9 +586,11 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
return to_bytes(cdc_deleted_elements_column_prefix) + column_name; return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
} }
static void set_default_properties_log_table(schema_builder& b, const schema& s, static schema_ptr create_log_schema(const schema& s, const replica::database& db,
const replica::database& db, const keyspace_metadata& ksm) const keyspace_metadata& ksm, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old)
{ {
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.with_partitioner(cdc::cdc_partitioner::classname);
b.set_compaction_strategy(compaction::compaction_strategy_type::time_window); b.set_compaction_strategy(compaction::compaction_strategy_type::time_window);
b.set_comment(fmt::format("CDC log for {}.{}", s.ks_name(), s.cf_name())); b.set_comment(fmt::format("CDC log for {}.{}", s.ks_name(), s.cf_name()));
auto ttl_seconds = s.cdc_options().ttl(); auto ttl_seconds = s.cdc_options().ttl();
@@ -615,22 +616,13 @@ static void set_default_properties_log_table(schema_builder& b, const schema& s,
std::to_string(std::max(1, window_seconds / 2))}, std::to_string(std::max(1, window_seconds / 2))},
}); });
} }
b.set_caching_options(caching_options::get_disabled_caching_options());
auto rs = generate_replication_strategy(ksm, db.get_token_metadata().get_topology());
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(*rs, db.get_token_metadata(), false));
b.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
}
static void add_columns_to_cdc_log(schema_builder& b, const schema& s,
const api::timestamp_type timestamp, const schema_ptr old)
{
b.with_column(log_meta_column_name_bytes("stream_id"), bytes_type, column_kind::partition_key); b.with_column(log_meta_column_name_bytes("stream_id"), bytes_type, column_kind::partition_key);
b.with_column(log_meta_column_name_bytes("time"), timeuuid_type, column_kind::clustering_key); b.with_column(log_meta_column_name_bytes("time"), timeuuid_type, column_kind::clustering_key);
b.with_column(log_meta_column_name_bytes("batch_seq_no"), int32_type, column_kind::clustering_key); b.with_column(log_meta_column_name_bytes("batch_seq_no"), int32_type, column_kind::clustering_key);
b.with_column(log_meta_column_name_bytes("operation"), data_type_for<operation_native_type>()); b.with_column(log_meta_column_name_bytes("operation"), data_type_for<operation_native_type>());
b.with_column(log_meta_column_name_bytes("ttl"), long_type); b.with_column(log_meta_column_name_bytes("ttl"), long_type);
b.with_column(log_meta_column_name_bytes("end_of_batch"), boolean_type); b.with_column(log_meta_column_name_bytes("end_of_batch"), boolean_type);
b.set_caching_options(caching_options::get_disabled_caching_options());
auto validate_new_column = [&] (const sstring& name) { auto validate_new_column = [&] (const sstring& name) {
// When dropping a column from a CDC log table, we set the drop timestamp to be // When dropping a column from a CDC log table, we set the drop timestamp to be
@@ -700,28 +692,15 @@ static void add_columns_to_cdc_log(schema_builder& b, const schema& s,
add_columns(s.clustering_key_columns()); add_columns(s.clustering_key_columns());
add_columns(s.static_columns(), true); add_columns(s.static_columns(), true);
add_columns(s.regular_columns(), true); add_columns(s.regular_columns(), true);
}
static schema_ptr create_log_schema(const schema& s, const replica::database& db,
const keyspace_metadata& ksm, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old)
{
schema_builder b(s.ks_name(), log_name(s.cf_name()));
b.with_partitioner(cdc::cdc_partitioner::classname);
if (old) {
// If the user reattaches the log table, do not change its properties.
b.set_properties(old->get_properties());
} else {
set_default_properties_log_table(b, s, db, ksm);
}
add_columns_to_cdc_log(b, s, timestamp, old);
if (uuid) { if (uuid) {
b.set_uuid(*uuid); b.set_uuid(*uuid);
} }
auto rs = generate_replication_strategy(ksm, db.get_token_metadata().get_topology());
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(*rs, db.get_token_metadata()));
b.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
/** /**
* #10473 - if we are redefining the log table, we need to ensure any dropped * #10473 - if we are redefining the log table, we need to ensure any dropped
* columns are registered in "dropped_columns" table, otherwise clients will not * columns are registered in "dropped_columns" table, otherwise clients will not
@@ -952,6 +931,9 @@ static managed_bytes merge(const abstract_type& type, const managed_bytes_opt& p
throw std::runtime_error(format("cdc merge: unknown type {}", type.name())); throw std::runtime_error(format("cdc merge: unknown type {}", type.name()));
} }
using cell_map = std::unordered_map<const column_definition*, managed_bytes_opt>;
using row_states_map = std::unordered_map<clustering_key, cell_map, clustering_key::hashing, clustering_key::equality>;
static managed_bytes_opt get_col_from_row_state(const cell_map* state, const column_definition& cdef) { static managed_bytes_opt get_col_from_row_state(const cell_map* state, const column_definition& cdef) {
if (state) { if (state) {
if (auto it = state->find(&cdef); it != state->end()) { if (auto it = state->find(&cdef); it != state->end()) {
@@ -961,12 +943,7 @@ static managed_bytes_opt get_col_from_row_state(const cell_map* state, const col
return std::nullopt; return std::nullopt;
} }
cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck) { static cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck) {
auto it = row_states.find(ck);
return it == row_states.end() ? nullptr : &it->second;
}
const cell_map* get_row_state(const row_states_map& row_states, const clustering_key& ck) {
auto it = row_states.find(ck); auto it = row_states.find(ck);
return it == row_states.end() ? nullptr : &it->second; return it == row_states.end() ? nullptr : &it->second;
} }
@@ -1436,8 +1413,6 @@ struct process_change_visitor {
row_states_map& _clustering_row_states; row_states_map& _clustering_row_states;
cell_map& _static_row_state; cell_map& _static_row_state;
const bool _is_update = false;
const bool _generate_delta_values = true; const bool _generate_delta_values = true;
void static_row_cells(auto&& visit_row_cells) { void static_row_cells(auto&& visit_row_cells) {
@@ -1461,13 +1436,12 @@ struct process_change_visitor {
struct clustering_row_cells_visitor : public process_row_visitor { struct clustering_row_cells_visitor : public process_row_visitor {
operation _cdc_op = operation::update; operation _cdc_op = operation::update;
operation _marker_op = operation::insert;
using process_row_visitor::process_row_visitor; using process_row_visitor::process_row_visitor;
void marker(const row_marker& rm) { void marker(const row_marker& rm) {
_ttl_column = get_ttl(rm); _ttl_column = get_ttl(rm);
_cdc_op = _marker_op; _cdc_op = operation::insert;
} }
}; };
@@ -1475,9 +1449,6 @@ struct process_change_visitor {
log_ck, _touched_parts, _builder, log_ck, _touched_parts, _builder,
_enable_updating_state, &ckey, get_row_state(_clustering_row_states, ckey), _enable_updating_state, &ckey, get_row_state(_clustering_row_states, ckey),
_clustering_row_states, _generate_delta_values); _clustering_row_states, _generate_delta_values);
if (_is_update && _request_options.alternator) {
v._marker_op = operation::update;
}
visit_row_cells(v); visit_row_cells(v);
if (_enable_updating_state) { if (_enable_updating_state) {
@@ -1631,11 +1602,6 @@ private:
row_states_map _clustering_row_states; row_states_map _clustering_row_states;
cell_map _static_row_state; cell_map _static_row_state;
// True if the mutated row existed before applying the mutation. In other
// words, if the preimage is enabled and it isn't empty (otherwise, we
// assume that the row is non-existent). Used for Alternator Streams (see
// #6918).
bool _is_update = false;
const bool _uses_tablets; const bool _uses_tablets;
@@ -1762,7 +1728,6 @@ public:
._enable_updating_state = _enable_updating_state, ._enable_updating_state = _enable_updating_state,
._clustering_row_states = _clustering_row_states, ._clustering_row_states = _clustering_row_states,
._static_row_state = _static_row_state, ._static_row_state = _static_row_state,
._is_update = _is_update,
._generate_delta_values = generate_delta_values(_builder->base_schema()) ._generate_delta_values = generate_delta_values(_builder->base_schema())
}; };
cdc::inspect_mutation(m, v); cdc::inspect_mutation(m, v);
@@ -1773,10 +1738,6 @@ public:
_builder->end_record(); _builder->end_record();
} }
const row_states_map& clustering_row_states() const override {
return _clustering_row_states;
}
// Takes and returns generated cdc log mutations and associated statistics about parts touched during transformer's lifetime. // Takes and returns generated cdc log mutations and associated statistics about parts touched during transformer's lifetime.
// The `transformer` object on which this method was called on should not be used anymore. // The `transformer` object on which this method was called on should not be used anymore.
std::tuple<utils::chunked_vector<mutation>, stats::part_type_set> finish() && { std::tuple<utils::chunked_vector<mutation>, stats::part_type_set> finish() && {
@@ -1900,7 +1861,6 @@ public:
_static_row_state[&c] = std::move(*maybe_cell_view); _static_row_state[&c] = std::move(*maybe_cell_view);
} }
} }
_is_update = true;
} }
if (static_only) { if (static_only) {
@@ -1988,7 +1948,6 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
return make_ready_future<>(); return make_ready_future<>();
} }
const bool alternator_increased_compatibility = options.alternator && options.alternator_streams_increased_compatibility;
transformer trans(_ctxt, s, m.decorated_key(), options); transformer trans(_ctxt, s, m.decorated_key(), options);
auto f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(nullptr); auto f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(nullptr);
@@ -1996,7 +1955,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
// Preimage has been fetched by upper layers. // Preimage has been fetched by upper layers.
tracing::trace(tr_state, "CDC: Using a prefetched preimage"); tracing::trace(tr_state, "CDC: Using a prefetched preimage");
f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(options.preimage); f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(options.preimage);
} else if (s->cdc_options().preimage() || s->cdc_options().postimage() || alternator_increased_compatibility) { } else if (s->cdc_options().preimage() || s->cdc_options().postimage()) {
// Note: further improvement here would be to coalesce the pre-image selects into one // Note: further improvement here would be to coalesce the pre-image selects into one
// if a batch contains several modifications to the same table. Otoh, batch is rare(?) // if a batch contains several modifications to the same table. Otoh, batch is rare(?)
// so this is premature. // so this is premature.
@@ -2013,7 +1972,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
tracing::trace(tr_state, "CDC: Preimage not enabled for the table, not querying current value of {}", m.decorated_key()); tracing::trace(tr_state, "CDC: Preimage not enabled for the table, not querying current value of {}", m.decorated_key());
} }
return f.then([alternator_increased_compatibility, trans = std::move(trans), &mutations, idx, tr_state, &details, &options] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable { return f.then([trans = std::move(trans), &mutations, idx, tr_state, &details] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
auto& m = mutations[idx]; auto& m = mutations[idx];
auto& s = m.schema(); auto& s = m.schema();
@@ -2028,13 +1987,13 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
details.had_preimage |= preimage; details.had_preimage |= preimage;
details.had_postimage |= postimage; details.had_postimage |= postimage;
tracing::trace(tr_state, "CDC: Generating log mutations for {}", m.decorated_key()); tracing::trace(tr_state, "CDC: Generating log mutations for {}", m.decorated_key());
if (should_split(m, options)) { if (should_split(m)) {
tracing::trace(tr_state, "CDC: Splitting {}", m.decorated_key()); tracing::trace(tr_state, "CDC: Splitting {}", m.decorated_key());
details.was_split = true; details.was_split = true;
process_changes_with_splitting(m, trans, preimage, postimage, alternator_increased_compatibility); process_changes_with_splitting(m, trans, preimage, postimage);
} else { } else {
tracing::trace(tr_state, "CDC: No need to split {}", m.decorated_key()); tracing::trace(tr_state, "CDC: No need to split {}", m.decorated_key());
process_changes_without_splitting(m, trans, preimage, postimage, alternator_increased_compatibility); process_changes_without_splitting(m, trans, preimage, postimage);
} }
auto [log_mut, touched_parts] = std::move(trans).finish(); auto [log_mut, touched_parts] = std::move(trans).finish();
const int generated_count = log_mut.size(); const int generated_count = log_mut.size();

View File

@@ -52,9 +52,6 @@ class database;
namespace cdc { namespace cdc {
using cell_map = std::unordered_map<const column_definition*, managed_bytes_opt>;
using row_states_map = std::unordered_map<clustering_key, cell_map, clustering_key::hashing, clustering_key::equality>;
// cdc log table operation // cdc log table operation
enum class operation : int8_t { enum class operation : int8_t {
// note: these values will eventually be read by a third party, probably not privvy to this // note: these values will eventually be read by a third party, probably not privvy to this
@@ -76,14 +73,6 @@ struct per_request_options {
// Scylla. Currently, only TTL expiration implementation for Alternator // Scylla. Currently, only TTL expiration implementation for Alternator
// uses this. // uses this.
const bool is_system_originated = false; const bool is_system_originated = false;
// True if this mutation was emitted by Alternator.
const bool alternator = false;
// Sacrifice performance for the sake of better compatibility with DynamoDB
// Streams. It's important for correctness that
// alternator_streams_increased_compatibility config flag be read once per
// request, because it's live-updateable. As a result, the flag may change
// between reads.
const bool alternator_streams_increased_compatibility = false;
}; };
struct operation_result_tracker; struct operation_result_tracker;
@@ -153,7 +142,4 @@ bool is_cdc_metacolumn_name(const sstring& name);
utils::UUID generate_timeuuid(api::timestamp_type t); utils::UUID generate_timeuuid(api::timestamp_type t);
cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck);
const cell_map* get_row_state(const row_states_map& row_states, const clustering_key& ck);
} // namespace cdc } // namespace cdc

View File

@@ -6,28 +6,15 @@
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/ */
#include "bytes.hh"
#include "bytes_fwd.hh"
#include "mutation/atomic_cell.hh"
#include "mutation/atomic_cell_or_collection.hh"
#include "mutation/collection_mutation.hh"
#include "mutation/mutation.hh" #include "mutation/mutation.hh"
#include "mutation/tombstone.hh"
#include "schema/schema.hh" #include "schema/schema.hh"
#include "seastar/core/sstring.hh"
#include "types/concrete_types.hh" #include "types/concrete_types.hh"
#include "types/types.hh"
#include "types/user.hh" #include "types/user.hh"
#include "split.hh" #include "split.hh"
#include "log.hh" #include "log.hh"
#include "change_visitor.hh" #include "change_visitor.hh"
#include "utils/managed_bytes.hh"
#include <string_view>
#include <unordered_map>
extern logging::logger cdc_log;
struct atomic_column_update { struct atomic_column_update {
column_id id; column_id id;
@@ -503,8 +490,6 @@ struct should_split_visitor {
// Otherwise we store the change's ttl. // Otherwise we store the change's ttl.
std::optional<gc_clock::duration> _ttl = std::nullopt; std::optional<gc_clock::duration> _ttl = std::nullopt;
virtual ~should_split_visitor() = default;
inline bool finished() const { return _result; } inline bool finished() const { return _result; }
inline void stop() { _result = true; } inline void stop() { _result = true; }
@@ -527,7 +512,7 @@ struct should_split_visitor {
void collection_tombstone(const tombstone& t) { visit(t.timestamp + 1); } void collection_tombstone(const tombstone& t) { visit(t.timestamp + 1); }
virtual void live_collection_cell(bytes_view, const atomic_cell_view& cell) { void live_collection_cell(bytes_view, const atomic_cell_view& cell) {
if (_had_row_marker) { if (_had_row_marker) {
// nonatomic updates cannot be expressed with an INSERT. // nonatomic updates cannot be expressed with an INSERT.
return stop(); return stop();
@@ -537,7 +522,7 @@ struct should_split_visitor {
void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); } void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); } void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); }
virtual void marker(const row_marker& rm) { void marker(const row_marker& rm) {
_had_row_marker = true; _had_row_marker = true;
visit(rm.timestamp(), get_ttl(rm)); visit(rm.timestamp(), get_ttl(rm));
} }
@@ -578,29 +563,7 @@ struct should_split_visitor {
} }
}; };
// This is the same as the above, but it doesn't split a row marker away from bool should_split(const mutation& m) {
// an update. As a result, updates that create an item appear as a single log
// row.
class alternator_should_split_visitor : public should_split_visitor {
public:
~alternator_should_split_visitor() override = default;
void live_collection_cell(bytes_view, const atomic_cell_view& cell) override {
visit(cell.timestamp());
}
void marker(const row_marker& rm) override {
visit(rm.timestamp());
}
};
bool should_split(const mutation& m, const per_request_options& options) {
if (options.alternator) {
alternator_should_split_visitor v;
cdc::inspect_mutation(m, v);
return v._result || v._ts == api::missing_timestamp;
}
should_split_visitor v; should_split_visitor v;
cdc::inspect_mutation(m, v); cdc::inspect_mutation(m, v);
@@ -610,109 +573,8 @@ bool should_split(const mutation& m, const per_request_options& options) {
|| v._ts == api::missing_timestamp; || v._ts == api::missing_timestamp;
} }
// Returns true if the row state and the atomic and nonatomic entries represent
// an equivalent item.
static bool entries_match_row_state(const schema_ptr& base_schema, const cell_map& row_state, const std::vector<atomic_column_update>& atomic_entries,
std::vector<nonatomic_column_update>& nonatomic_entries) {
for (const auto& update : atomic_entries) {
const column_definition& cdef = base_schema->column_at(column_kind::regular_column, update.id);
const auto it = row_state.find(&cdef);
if (it == row_state.end()) {
return false;
}
if (to_managed_bytes_opt(update.cell.value().linearize()) != it->second) {
return false;
}
}
if (nonatomic_entries.empty()) {
return true;
}
for (const auto& update : nonatomic_entries) {
const column_definition& cdef = base_schema->column_at(column_kind::regular_column, update.id);
const auto it = row_state.find(&cdef);
if (it == row_state.end()) {
return false;
}
// The only collection used by Alternator is a non-frozen map.
auto current_raw_map = cdef.type->deserialize(*it->second);
map_type_impl::native_type current_values = value_cast<map_type_impl::native_type>(current_raw_map);
if (current_values.size() != update.cells.size()) {
return false;
}
std::unordered_map<sstring_view, bytes> current_values_map;
for (const auto& entry : current_values) {
const auto attr_name = std::string_view(value_cast<sstring>(entry.first));
current_values_map[attr_name] = value_cast<bytes>(entry.second);
}
for (const auto& [key, value] : update.cells) {
const auto key_str = to_string_view(key);
if (!value.is_live()) {
if (current_values_map.contains(key_str)) {
return false;
}
} else if (current_values_map[key_str] != value.value().linearize()) {
return false;
}
}
}
return true;
}
bool should_skip(batch& changes, const mutation& base_mutation, change_processor& processor) {
const schema_ptr& base_schema = base_mutation.schema();
// Alternator doesn't use static updates and clustered range deletions.
if (!changes.static_updates.empty() || !changes.clustered_range_deletions.empty()) {
return false;
}
for (clustered_row_insert& u : changes.clustered_inserts) {
const cell_map* row_state = get_row_state(processor.clustering_row_states(), u.key);
if (!row_state) {
return false;
}
if (!entries_match_row_state(base_schema, *row_state, u.atomic_entries, u.nonatomic_entries)) {
return false;
}
}
for (clustered_row_update& u : changes.clustered_updates) {
const cell_map* row_state = get_row_state(processor.clustering_row_states(), u.key);
if (!row_state) {
return false;
}
if (!entries_match_row_state(base_schema, *row_state, u.atomic_entries, u.nonatomic_entries)) {
return false;
}
}
// Skip only if the row being deleted does not exist (i.e. the deletion is a no-op).
for (const auto& row_deletion : changes.clustered_row_deletions) {
if (processor.clustering_row_states().contains(row_deletion.key)) {
return false;
}
}
// Don't skip if the item exists.
//
// Increased DynamoDB Streams compatibility guarantees that single-item
// operations will read the item and store it in the clustering row states.
// If it is not found there, we may skip CDC. This is safe as long as the
// assumptions of this operation's write isolation are not violated.
if (changes.partition_deletions && processor.clustering_row_states().contains(clustering_key::make_empty())) {
return false;
}
cdc_log.trace("Skipping CDC log for mutation {}", base_mutation);
return true;
}
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor, void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility) { bool enable_preimage, bool enable_postimage) {
const auto base_schema = base_mutation.schema(); const auto base_schema = base_mutation.schema();
auto changes = extract_changes(base_mutation); auto changes = extract_changes(base_mutation);
auto pk = base_mutation.key(); auto pk = base_mutation.key();
@@ -724,6 +586,9 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
const auto last_timestamp = changes.rbegin()->first; const auto last_timestamp = changes.rbegin()->first;
for (auto& [change_ts, btch] : changes) { for (auto& [change_ts, btch] : changes) {
const bool is_last = change_ts == last_timestamp;
processor.begin_timestamp(change_ts, is_last);
clustered_column_set affected_clustered_columns_per_row{clustering_key::less_compare(*base_schema)}; clustered_column_set affected_clustered_columns_per_row{clustering_key::less_compare(*base_schema)};
one_kind_column_set affected_static_columns{base_schema->static_columns_count()}; one_kind_column_set affected_static_columns{base_schema->static_columns_count()};
@@ -732,12 +597,6 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
affected_clustered_columns_per_row = btch.get_affected_clustered_columns_per_row(*base_mutation.schema()); affected_clustered_columns_per_row = btch.get_affected_clustered_columns_per_row(*base_mutation.schema());
} }
if (alternator_strict_compatibility && should_skip(btch, base_mutation, processor)) {
continue;
}
const bool is_last = change_ts == last_timestamp;
processor.begin_timestamp(change_ts, is_last);
if (enable_preimage) { if (enable_preimage) {
if (affected_static_columns.count() > 0) { if (affected_static_columns.count() > 0) {
processor.produce_preimage(nullptr, affected_static_columns); processor.produce_preimage(nullptr, affected_static_columns);
@@ -825,13 +684,7 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
} }
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor, void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility) { bool enable_preimage, bool enable_postimage) {
if (alternator_strict_compatibility) {
auto changes = extract_changes(base_mutation);
if (should_skip(changes.begin()->second, base_mutation, processor)) {
return;
}
}
auto ts = find_timestamp(base_mutation); auto ts = find_timestamp(base_mutation);
processor.begin_timestamp(ts, true); processor.begin_timestamp(ts, true);

View File

@@ -9,7 +9,6 @@
#pragma once #pragma once
#include <boost/dynamic_bitset.hpp> // IWYU pragma: keep #include <boost/dynamic_bitset.hpp> // IWYU pragma: keep
#include "cdc/log.hh"
#include "replica/database_fwd.hh" #include "replica/database_fwd.hh"
#include "mutation/timestamp.hh" #include "mutation/timestamp.hh"
@@ -66,14 +65,12 @@ public:
// Tells processor we have reached end of record - last part // Tells processor we have reached end of record - last part
// of a given timestamp batch // of a given timestamp batch
virtual void end_record() = 0; virtual void end_record() = 0;
virtual const row_states_map& clustering_row_states() const = 0;
}; };
bool should_split(const mutation& base_mutation, const per_request_options& options); bool should_split(const mutation& base_mutation);
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor, void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility); bool enable_preimage, bool enable_postimage);
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor, void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility); bool enable_preimage, bool enable_postimage);
} }

View File

@@ -21,8 +21,5 @@ target_link_libraries(compaction
mutation_writer mutation_writer
replica) replica)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(compaction REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers compaction check_headers(check-headers compaction
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -867,8 +867,8 @@ auto fmt::formatter<compaction::compaction_task_executor>::format(const compacti
namespace compaction { namespace compaction {
inline compaction_controller make_compaction_controller(const compaction_manager::scheduling_group& csg, uint64_t static_shares, std::optional<float> max_shares, std::function<double()> fn) { inline compaction_controller make_compaction_controller(const compaction_manager::scheduling_group& csg, uint64_t static_shares, std::function<double()> fn) {
return compaction_controller(csg, static_shares, max_shares, 250ms, std::move(fn)); return compaction_controller(csg, static_shares, 250ms, std::move(fn));
} }
compaction::compaction_state::~compaction_state() { compaction::compaction_state::~compaction_state() {
@@ -1014,7 +1014,7 @@ compaction_manager::compaction_manager(config cfg, abort_source& as, tasks::task
, _sys_ks("compaction_manager::system_keyspace") , _sys_ks("compaction_manager::system_keyspace")
, _cfg(std::move(cfg)) , _cfg(std::move(cfg))
, _compaction_submission_timer(compaction_sg(), compaction_submission_callback()) , _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), _cfg.max_shares.get(), [this] () -> float { , _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), [this] () -> float {
_last_backlog = backlog(); _last_backlog = backlog();
auto b = _last_backlog / available_memory(); auto b = _last_backlog / available_memory();
// This means we are using an unimplemented strategy // This means we are using an unimplemented strategy
@@ -1033,10 +1033,6 @@ compaction_manager::compaction_manager(config cfg, abort_source& as, tasks::task
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); })) , _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
, _update_compaction_static_shares_action([this] { return update_static_shares(static_shares()); }) , _update_compaction_static_shares_action([this] { return update_static_shares(static_shares()); })
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer())) , _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
, _compaction_max_shares_observer(_cfg.max_shares.observe([this] (const float& max_shares) {
cmlog.info("Updating max shares to {}", max_shares);
_compaction_controller.set_max_shares(max_shares);
}))
, _strategy_control(std::make_unique<strategy_control>(*this)) , _strategy_control(std::make_unique<strategy_control>(*this))
, _tombstone_gc_state(_shared_tombstone_gc_state) { , _tombstone_gc_state(_shared_tombstone_gc_state) {
tm.register_module(_task_manager_module->get_name(), _task_manager_module); tm.register_module(_task_manager_module->get_name(), _task_manager_module);
@@ -1055,12 +1051,11 @@ compaction_manager::compaction_manager(tasks::task_manager& tm)
, _sys_ks("compaction_manager::system_keyspace") , _sys_ks("compaction_manager::system_keyspace")
, _cfg(config{ .available_memory = 1 }) , _cfg(config{ .available_memory = 1 })
, _compaction_submission_timer(compaction_sg(), compaction_submission_callback()) , _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
, _compaction_controller(make_compaction_controller(compaction_sg(), 1, std::nullopt, [] () -> float { return 1.0; })) , _compaction_controller(make_compaction_controller(compaction_sg(), 1, [] () -> float { return 1.0; }))
, _backlog_manager(_compaction_controller) , _backlog_manager(_compaction_controller)
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); })) , _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
, _update_compaction_static_shares_action([] { return make_ready_future<>(); }) , _update_compaction_static_shares_action([] { return make_ready_future<>(); })
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer())) , _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
, _compaction_max_shares_observer(_cfg.max_shares.observe([] (const float& max_shares) {}))
, _strategy_control(std::make_unique<strategy_control>(*this)) , _strategy_control(std::make_unique<strategy_control>(*this))
, _tombstone_gc_state(_shared_tombstone_gc_state) { , _tombstone_gc_state(_shared_tombstone_gc_state) {
tm.register_module(_task_manager_module->get_name(), _task_manager_module); tm.register_module(_task_manager_module->get_name(), _task_manager_module);

View File

@@ -80,7 +80,6 @@ public:
scheduling_group maintenance_sched_group; scheduling_group maintenance_sched_group;
size_t available_memory = 0; size_t available_memory = 0;
utils::updateable_value<float> static_shares = utils::updateable_value<float>(0); utils::updateable_value<float> static_shares = utils::updateable_value<float>(0);
utils::updateable_value<float> max_shares = utils::updateable_value<float>(0);
utils::updateable_value<uint32_t> throughput_mb_per_sec = utils::updateable_value<uint32_t>(0); utils::updateable_value<uint32_t> throughput_mb_per_sec = utils::updateable_value<uint32_t>(0);
std::chrono::seconds flush_all_tables_before_major = std::chrono::duration_cast<std::chrono::seconds>(std::chrono::days(1)); std::chrono::seconds flush_all_tables_before_major = std::chrono::duration_cast<std::chrono::seconds>(std::chrono::days(1));
}; };
@@ -160,7 +159,6 @@ private:
std::optional<utils::observer<uint32_t>> _throughput_option_observer; std::optional<utils::observer<uint32_t>> _throughput_option_observer;
serialized_action _update_compaction_static_shares_action; serialized_action _update_compaction_static_shares_action;
utils::observer<float> _compaction_static_shares_observer; utils::observer<float> _compaction_static_shares_observer;
utils::observer<float> _compaction_max_shares_observer;
uint64_t _validation_errors = 0; uint64_t _validation_errors = 0;
class strategy_control; class strategy_control;
@@ -293,10 +291,6 @@ public:
return _cfg.static_shares.get(); return _cfg.static_shares.get();
} }
float max_shares() const noexcept {
return _cfg.max_shares.get();
}
uint32_t throughput_mbs() const noexcept { uint32_t throughput_mbs() const noexcept {
return _cfg.throughput_mb_per_sec.get(); return _cfg.throughput_mb_per_sec.get();
} }

View File

@@ -227,7 +227,7 @@ future<> run_table_tasks(replica::database& db, std::vector<table_tasks_info> ta
// Tables will be kept in descending order. // Tables will be kept in descending order.
std::ranges::sort(table_tasks, std::greater<>(), [&] (const table_tasks_info& tti) { std::ranges::sort(table_tasks, std::greater<>(), [&] (const table_tasks_info& tti) {
try { try {
return db.find_column_family(tti.ti.id).get_stats().live_disk_space_used.on_disk; return db.find_column_family(tti.ti.id).get_stats().live_disk_space_used;
} catch (const replica::no_such_column_family& e) { } catch (const replica::no_such_column_family& e) {
return int64_t(-1); return int64_t(-1);
} }
@@ -281,7 +281,7 @@ future<> run_keyspace_tasks(replica::database& db, std::vector<keyspace_tasks_in
try { try {
return std::accumulate(kti.table_infos.begin(), kti.table_infos.end(), int64_t(0), [&] (int64_t sum, const table_info& t) { return std::accumulate(kti.table_infos.begin(), kti.table_infos.end(), int64_t(0), [&] (int64_t sum, const table_info& t) {
try { try {
sum += db.find_column_family(t.id).get_stats().live_disk_space_used.on_disk; sum += db.find_column_family(t.id).get_stats().live_disk_space_used;
} catch (const replica::no_such_column_family&) { } catch (const replica::no_such_column_family&) {
// ignore // ignore
} }

View File

@@ -888,18 +888,9 @@ rf_rack_valid_keyspaces: false
# #
# Vector Store options # Vector Store options
# #
# HTTP and HTTPS schemes are supported. Port number is mandatory. # A comma-separated list of URIs for the vector store using DNS name. Only HTTP schema is supported. Port number is mandatory.
# If both `vector_store_primary_uri` and `vector_store_secondary_uri` are unset or empty, vector search is disabled. # Default is empty, which means that the vector store is not used.
#
# A comma-separated list of primary vector store node URIs. These nodes are preferred for vector search operations.
# vector_store_primary_uri: http://vector-store.dns.name:{port} # vector_store_primary_uri: http://vector-store.dns.name:{port}
#
# A comma-separated list of secondary vector store node URIs. These nodes are used as a fallback when all primary nodes are unavailable, and are typically located in a different availability zone for high availability.
# vector_store_secondary_uri: http://vector-store.dns.name:{port}
#
# Options for encrypted connections to the vector store. These options are used for HTTPS URIs in vector_store_primary_uri and vector_store_secondary_uri.
# vector_store_encryption_options:
# truststore: <not set, use system trust>
# #
# io-streaming rate limiting # io-streaming rate limiting

View File

@@ -445,7 +445,6 @@ ldap_tests = set([
scylla_tests = set([ scylla_tests = set([
'test/boost/combined_tests', 'test/boost/combined_tests',
'test/boost/UUID_test', 'test/boost/UUID_test',
'test/boost/url_parse_test',
'test/boost/advanced_rpc_compressor_test', 'test/boost/advanced_rpc_compressor_test',
'test/boost/allocation_strategy_test', 'test/boost/allocation_strategy_test',
'test/boost/alternator_unit_test', 'test/boost/alternator_unit_test',
@@ -647,28 +646,6 @@ vector_search_tests = set([
'test/vector_search/client_test' 'test/vector_search/client_test'
]) ])
vector_search_validator_bin = 'vector-search-validator/bin/vector-search-validator'
vector_search_validator_deps = set([
'test/vector_search_validator/build-validator',
'test/vector_search_validator/Cargo.toml',
'test/vector_search_validator/crates/validator/Cargo.toml',
'test/vector_search_validator/crates/validator/src/main.rs',
'test/vector_search_validator/crates/validator-scylla/Cargo.toml',
'test/vector_search_validator/crates/validator-scylla/src/lib.rs',
'test/vector_search_validator/crates/validator-scylla/src/cql.rs',
])
vector_store_bin = 'vector-search-validator/bin/vector-store'
vector_store_deps = set([
'test/vector_search_validator/build-env',
'test/vector_search_validator/build-vector-store',
])
vector_search_validator_bins = set([
vector_search_validator_bin,
vector_store_bin,
])
wasms = set([ wasms = set([
'wasm/return_input.wat', 'wasm/return_input.wat',
'wasm/test_complex_null_values.wat', 'wasm/test_complex_null_values.wat',
@@ -702,7 +679,7 @@ other = set([
'iotune', 'iotune',
]) ])
all_artifacts = apps | cpp_apps | tests | other | wasms | vector_search_validator_bins all_artifacts = apps | cpp_apps | tests | other | wasms
arg_parser = argparse.ArgumentParser('Configure scylla', add_help=False, formatter_class=argparse.ArgumentDefaultsHelpFormatter) arg_parser = argparse.ArgumentParser('Configure scylla', add_help=False, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
arg_parser.add_argument('--out', dest='buildfile', action='store', default='build.ninja', arg_parser.add_argument('--out', dest='buildfile', action='store', default='build.ninja',
@@ -786,7 +763,6 @@ arg_parser.add_argument('--use-cmake', action=argparse.BooleanOptionalAction, de
arg_parser.add_argument('--coverage', action = 'store_true', help = 'Compile scylla with coverage instrumentation') arg_parser.add_argument('--coverage', action = 'store_true', help = 'Compile scylla with coverage instrumentation')
arg_parser.add_argument('--build-dir', action='store', default='build', arg_parser.add_argument('--build-dir', action='store', default='build',
help='Build directory path') help='Build directory path')
arg_parser.add_argument('--disable-precompiled-header', action='store_true', default=False, help='Disable precompiled header for scylla binary')
arg_parser.add_argument('-h', '--help', action='store_true', help='show this help message and exit') arg_parser.add_argument('-h', '--help', action='store_true', help='show this help message and exit')
args = arg_parser.parse_args() args = arg_parser.parse_args()
if args.help: if args.help:
@@ -1062,6 +1038,7 @@ scylla_core = (['message/messaging_service.cc',
'db/hints/resource_manager.cc', 'db/hints/resource_manager.cc',
'db/hints/sync_point.cc', 'db/hints/sync_point.cc',
'db/large_data_handler.cc', 'db/large_data_handler.cc',
'db/legacy_schema_migrator.cc',
'db/marshal/type_parser.cc', 'db/marshal/type_parser.cc',
'db/per_partition_rate_limit_options.cc', 'db/per_partition_rate_limit_options.cc',
'db/rate_limiter.cc', 'db/rate_limiter.cc',
@@ -1195,7 +1172,6 @@ scylla_core = (['message/messaging_service.cc',
'auth/allow_all_authorizer.cc', 'auth/allow_all_authorizer.cc',
'auth/authenticated_user.cc', 'auth/authenticated_user.cc',
'auth/authenticator.cc', 'auth/authenticator.cc',
'auth/cache.cc',
'auth/common.cc', 'auth/common.cc',
'auth/default_authorizer.cc', 'auth/default_authorizer.cc',
'auth/resource.cc', 'auth/resource.cc',
@@ -1292,8 +1268,7 @@ scylla_core = (['message/messaging_service.cc',
'vector_search/vector_store_client.cc', 'vector_search/vector_store_client.cc',
'vector_search/dns.cc', 'vector_search/dns.cc',
'vector_search/client.cc', 'vector_search/client.cc',
'vector_search/clients.cc', 'vector_search/clients.cc'
'vector_search/truststore.cc'
] + [Antlr3Grammar('cql3/Cql.g')] \ ] + [Antlr3Grammar('cql3/Cql.g')] \
+ scylla_raft_core + scylla_raft_core
) )
@@ -1604,7 +1579,6 @@ deps['test/boost/combined_tests'] += [
'test/boost/query_processor_test.cc', 'test/boost/query_processor_test.cc',
'test/boost/reader_concurrency_semaphore_test.cc', 'test/boost/reader_concurrency_semaphore_test.cc',
'test/boost/repair_test.cc', 'test/boost/repair_test.cc',
'test/boost/replicator_test.cc',
'test/boost/restrictions_test.cc', 'test/boost/restrictions_test.cc',
'test/boost/role_manager_test.cc', 'test/boost/role_manager_test.cc',
'test/boost/row_cache_test.cc', 'test/boost/row_cache_test.cc',
@@ -1647,7 +1621,6 @@ deps['test/boost/bytes_ostream_test'] = [
] ]
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc'] deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc'] deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
deps['test/boost/url_parse_test'] = ['utils/http.cc', 'test/boost/url_parse_test.cc', ]
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc'] deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'utils/labels.cc'] deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'utils/labels.cc']
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc'] deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
@@ -2212,15 +2185,7 @@ if os.path.exists(kmipc_lib):
user_cflags += f' -I{kmipc_dir}/include -DHAVE_KMIP' user_cflags += f' -I{kmipc_dir}/include -DHAVE_KMIP'
def get_extra_cxxflags(mode, mode_config, cxx, debuginfo): def get_extra_cxxflags(mode, mode_config, cxx, debuginfo):
cxxflags = [ cxxflags = []
# we need this flag for correct precompiled header handling in connection with ccache (or similar)
# `git` tools don't preserve timestamps, so when using ccache it might be possible to add pch to ccache
# and then later (after for example rebase) get `stdafx.hh` with different timestamp, but the same content.
# this will tell ccache to bring pch from its cache. Later on clang will check if timestamps match and complain.
# Adding `-fpch-validate-input-files-content` tells clang to check content of stdafx.hh if timestamps don't match.
# The flag seems to be present in gcc as well.
"" if args.disable_precompiled_header else '-fpch-validate-input-files-content'
]
optimization_level = mode_config['optimization-level'] optimization_level = mode_config['optimization-level']
cxxflags.append(f'-O{optimization_level}') cxxflags.append(f'-O{optimization_level}')
@@ -2285,7 +2250,6 @@ def write_build_file(f,
scylla_version, scylla_version,
scylla_release, scylla_release,
args): args):
use_precompiled_header = not args.disable_precompiled_header
warnings = get_warning_options(args.cxx) warnings = get_warning_options(args.cxx)
rustc_target = pick_rustc_target('wasm32-wasi', 'wasm32-wasip1') rustc_target = pick_rustc_target('wasm32-wasi', 'wasm32-wasip1')
f.write(textwrap.dedent('''\ f.write(textwrap.dedent('''\
@@ -2392,10 +2356,7 @@ def write_build_file(f,
for mode in build_modes: for mode in build_modes:
modeval = modes[mode] modeval = modes[mode]
seastar_lib_ext = 'so' if modeval['build_seastar_shared_libs'] else 'a'
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
fmt_lib = 'fmt' fmt_lib = 'fmt'
f.write(textwrap.dedent('''\ f.write(textwrap.dedent('''\
cxx_ld_flags_{mode} = {cxx_ld_flags} cxx_ld_flags_{mode} = {cxx_ld_flags}
@@ -2408,14 +2369,6 @@ def write_build_file(f,
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
description = CXX $out description = CXX $out
depfile = $out.d depfile = $out.d
rule cxx_build_precompiled_header.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in -Winvalid-pch -fpch-instantiate-templates -Xclang -emit-pch -DSCYLLA_USE_PRECOMPILED_HEADER
description = CXX-PRECOMPILED-HEADER $out
depfile = $out.d
rule cxx_with_pch.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in -Winvalid-pch -Xclang -include-pch -Xclang $builddir/{mode}/stdafx.hh.pch
description = CXX $out
depfile = $out.d
rule link.{mode} rule link.{mode}
command = $cxx $ld_flags_{mode} $ldflags -o $out $in $libs $libs_{mode} command = $cxx $ld_flags_{mode} $ldflags -o $out $in $libs $libs_{mode}
description = LINK $out description = LINK $out
@@ -2449,7 +2402,7 @@ def write_build_file(f,
$builddir/{mode}/gen/${{stem}}Parser.cpp $builddir/{mode}/gen/${{stem}}Parser.cpp
description = ANTLR3 $in description = ANTLR3 $in
rule checkhh.{mode} rule checkhh.{mode}
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc -USCYLLA_USE_PRECOMPILED_HEADER command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc
description = CHECKHH $in description = CHECKHH $in
depfile = $out.d depfile = $out.d
rule test.{mode} rule test.{mode}
@@ -2463,11 +2416,10 @@ def write_build_file(f,
description = RUST_LIB $out description = RUST_LIB $out
''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=args.test_repeat, test_timeout=args.test_timeout, **modeval)) ''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=args.test_repeat, test_timeout=args.test_timeout, **modeval))
f.write( f.write(
'build {mode}-build: phony {artifacts} {wasms} {vector_search_validator_bins}\n'.format( 'build {mode}-build: phony {artifacts} {wasms}\n'.format(
mode=mode, mode=mode,
artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms - vector_search_validator_bins)]), artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms)]),
wasms = str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & wasms)]), wasms = str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & wasms)]),
vector_search_validator_bins=str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & vector_search_validator_bins)]),
) )
) )
if profile_recipe := modes[mode].get('profile_recipe'): if profile_recipe := modes[mode].get('profile_recipe'):
@@ -2476,7 +2428,6 @@ def write_build_file(f,
include_dist_target = f'dist-{mode}' if args.enable_dist is None or args.enable_dist else '' include_dist_target = f'dist-{mode}' if args.enable_dist is None or args.enable_dist else ''
f.write(f'build {mode}: phony {include_cxx_target} {include_dist_target}\n') f.write(f'build {mode}: phony {include_cxx_target} {include_dist_target}\n')
compiles = {} compiles = {}
compiles_with_pch = set()
swaggers = set() swaggers = set()
serializers = {} serializers = {}
ragels = {} ragels = {}
@@ -2491,16 +2442,16 @@ def write_build_file(f,
# object code. And we enable LTO when linking the main Scylla executable, while disable # object code. And we enable LTO when linking the main Scylla executable, while disable
# it when linking anything else. # it when linking anything else.
seastar_lib_ext = 'so' if modeval['build_seastar_shared_libs'] else 'a'
for binary in sorted(build_artifacts): for binary in sorted(build_artifacts):
if modeval['is_profile'] and binary != "scylla": if modeval['is_profile'] and binary != "scylla":
# Just to avoid clutter in build.ninja # Just to avoid clutter in build.ninja
continue continue
profile_dep = modes[mode].get('profile_target', "") profile_dep = modes[mode].get('profile_target', "")
if binary in other or binary in wasms or binary in vector_search_validator_bins: if binary in other or binary in wasms:
continue continue
srcs = deps[binary] srcs = deps[binary]
# 'scylla'
objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o') objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
for src in srcs for src in srcs
if src.endswith('.cc')] if src.endswith('.cc')]
@@ -2536,6 +2487,9 @@ def write_build_file(f,
continue continue
do_lto = modes[mode]['has_lto'] and binary in lto_binaries do_lto = modes[mode]['has_lto'] and binary in lto_binaries
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
seastar_testing_libs = f'$seastar_testing_libs_{mode}' seastar_testing_libs = f'$seastar_testing_libs_{mode}'
local_libs = f'$seastar_libs_{mode} $libs' local_libs = f'$seastar_libs_{mode} $libs'
@@ -2545,7 +2499,6 @@ def write_build_file(f,
local_libs += ' -flto=thin -ffat-lto-objects' local_libs += ' -flto=thin -ffat-lto-objects'
else: else:
local_libs += ' -fno-lto' local_libs += ' -fno-lto'
use_pch = use_precompiled_header and binary == 'scylla'
if binary in tests: if binary in tests:
if binary in pure_boost_tests: if binary in pure_boost_tests:
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework') local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
@@ -2574,8 +2527,6 @@ def write_build_file(f,
if src.endswith('.cc'): if src.endswith('.cc'):
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o') obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
compiles[obj] = src compiles[obj] = src
if use_pch:
compiles_with_pch.add(obj)
elif src.endswith('.idl.hh'): elif src.endswith('.idl.hh'):
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh') hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
serializers[hh] = src serializers[hh] = src
@@ -2608,11 +2559,10 @@ def write_build_file(f,
) )
f.write( f.write(
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms} {vector_search_validator_bins} \n'.format( 'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms}\n'.format(
mode=mode, mode=mode,
test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in sorted(tests)]), test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in sorted(tests)]),
wasms=' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]), wasms=' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]),
vector_search_validator_bins=' '.join([f'$builddir/{binary}' for binary in sorted(vector_search_validator_bins)]),
) )
) )
f.write( f.write(
@@ -2655,9 +2605,7 @@ def write_build_file(f,
src = compiles[obj] src = compiles[obj]
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}' seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs) abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
pch_dep = f'$builddir/{mode}/stdafx.hh.pch' if obj in compiles_with_pch else '' f.write(f'build {obj}: cxx.{mode} {src} | {profile_dep} || {seastar_dep} {abseil_dep} {gen_headers_dep}\n')
cxx_cmd = 'cxx_with_pch' if obj in compiles_with_pch else 'cxx'
f.write(f'build {obj}: {cxx_cmd}.{mode} {src} | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
if src in modeval['per_src_extra_cxxflags']: if src in modeval['per_src_extra_cxxflags']:
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=modeval["per_src_extra_cxxflags"][src], **modeval)) f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=modeval["per_src_extra_cxxflags"][src], **modeval))
for swagger in swaggers: for swagger in swaggers:
@@ -2718,8 +2666,6 @@ def write_build_file(f,
f.write(' target = {lib}\n'.format(**locals())) f.write(' target = {lib}\n'.format(**locals()))
f.write(' profile_dep = {profile_dep}\n'.format(**locals())) f.write(' profile_dep = {profile_dep}\n'.format(**locals()))
f.write(f'build $builddir/{mode}/stdafx.hh.pch: cxx_build_precompiled_header.{mode} stdafx.hh | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n' f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n'
.format(**locals())) .format(**locals()))
f.write(' pool = submodule_pool\n') f.write(' pool = submodule_pool\n')
@@ -2783,19 +2729,6 @@ def write_build_file(f,
'build compiler-training: phony {}\n'.format(' '.join(['{mode}-compiler-training'.format(mode=mode) for mode in default_modes])) 'build compiler-training: phony {}\n'.format(' '.join(['{mode}-compiler-training'.format(mode=mode) for mode in default_modes]))
) )
f.write(textwrap.dedent(f'''\
rule build-vector-search-validator
command = test/vector_search_validator/build-validator $builddir
rule build-vector-store
command = test/vector_search_validator/build-vector-store $builddir
'''))
f.write(
'build $builddir/{vector_search_validator_bin}: build-vector-search-validator {}\n'.format(' '.join([dep for dep in sorted(vector_search_validator_deps)]), vector_search_validator_bin=vector_search_validator_bin)
)
f.write(
'build $builddir/{vector_store_bin}: build-vector-store {}\n'.format(' '.join([dep for dep in sorted(vector_store_deps)]), vector_store_bin=vector_store_bin)
)
f.write(textwrap.dedent(f'''\ f.write(textwrap.dedent(f'''\
build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-{scylla_version}-{scylla_release}.{arch}.tar.gz' for mode in default_modes])} build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-{scylla_version}-{scylla_release}.{arch}.tar.gz' for mode in default_modes])}
build dist-unified: phony dist-unified-tar build dist-unified: phony dist-unified-tar
@@ -3009,7 +2942,7 @@ def configure_using_cmake(args):
'CMAKE_DEFAULT_CONFIGS': selected_configs, 'CMAKE_DEFAULT_CONFIGS': selected_configs,
'CMAKE_C_COMPILER': args.cc, 'CMAKE_C_COMPILER': args.cc,
'CMAKE_CXX_COMPILER': args.cxx, 'CMAKE_CXX_COMPILER': args.cxx,
'CMAKE_CXX_FLAGS': args.user_cflags + ("" if args.disable_precompiled_header else " -fpch-validate-input-files-content"), 'CMAKE_CXX_FLAGS': args.user_cflags,
'CMAKE_EXE_LINKER_FLAGS': args.user_ldflags, 'CMAKE_EXE_LINKER_FLAGS': args.user_ldflags,
'CMAKE_EXPORT_COMPILE_COMMANDS': 'ON', 'CMAKE_EXPORT_COMPILE_COMMANDS': 'ON',
'Scylla_CHECK_HEADERS': 'ON', 'Scylla_CHECK_HEADERS': 'ON',
@@ -3018,7 +2951,6 @@ def configure_using_cmake(args):
'Scylla_TEST_REPEAT': args.test_repeat, 'Scylla_TEST_REPEAT': args.test_repeat,
'Scylla_ENABLE_LTO': 'ON' if args.lto else 'OFF', 'Scylla_ENABLE_LTO': 'ON' if args.lto else 'OFF',
'Scylla_WITH_DEBUG_INFO' : 'ON' if args.debuginfo else 'OFF', 'Scylla_WITH_DEBUG_INFO' : 'ON' if args.debuginfo else 'OFF',
'Scylla_USE_PRECOMPILED_HEADER': 'OFF' if args.disable_precompiled_header else 'ON',
} }
if args.date_stamp: if args.date_stamp:
settings['Scylla_DATE_STAMP'] = args.date_stamp settings['Scylla_DATE_STAMP'] = args.date_stamp

View File

@@ -138,8 +138,5 @@ target_link_libraries(cql3
lang lang
transport) transport)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(cql3 REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers cql3 check_headers(check-headers cql3
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -575,15 +575,6 @@ usingTimeoutServiceLevelClauseObjective[std::unique_ptr<cql3::attributes::raw>&
| serviceLevel sl_name=serviceLevelOrRoleName { attrs->service_level = std::move(sl_name); } | serviceLevel sl_name=serviceLevelOrRoleName { attrs->service_level = std::move(sl_name); }
; ;
usingTimeoutConcurrencyClause[std::unique_ptr<cql3::attributes::raw>& attrs]
: K_USING usingTimeoutConcurrencyClauseObjective[attrs] ( K_AND usingTimeoutConcurrencyClauseObjective[attrs] )*
;
usingTimeoutConcurrencyClauseObjective[std::unique_ptr<cql3::attributes::raw>& attrs]
: K_TIMEOUT to=term { attrs->timeout = std::move(to); }
| K_CONCURRENCY c=term { attrs->concurrency = std::move(c); }
;
/** /**
* UPDATE <CF> * UPDATE <CF>
* USING TIMESTAMP <long> * USING TIMESTAMP <long>
@@ -675,7 +666,7 @@ pruneMaterializedViewStatement returns [std::unique_ptr<raw::select_statement> e
auto attrs = std::make_unique<cql3::attributes::raw>(); auto attrs = std::make_unique<cql3::attributes::raw>();
expression wclause = conjunction{}; expression wclause = conjunction{};
} }
: K_PRUNE K_MATERIALIZED K_VIEW cf=columnFamilyName (K_WHERE w=whereClause { wclause = std::move(w); } )? ( usingTimeoutConcurrencyClause[attrs] )? : K_PRUNE K_MATERIALIZED K_VIEW cf=columnFamilyName (K_WHERE w=whereClause { wclause = std::move(w); } )? ( usingClause[attrs] )?
{ {
auto params = make_lw_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, statement_subtype, bypass_cache); auto params = make_lw_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, statement_subtype, bypass_cache);
return std::make_unique<raw::select_statement>(std::move(cf), std::move(params), return std::make_unique<raw::select_statement>(std::move(cf), std::move(params),
@@ -1569,10 +1560,6 @@ serviceLevelOrRoleName returns [sstring name]
| t=QUOTED_NAME { $name = sstring($t.text); } | t=QUOTED_NAME { $name = sstring($t.text); }
| k=unreserved_keyword { $name = k; | k=unreserved_keyword { $name = k;
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);} std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
// The literal `default` will not be parsed by any of the previous
// rules, so we need to cover it manually. Needed by CREATE SERVICE
// LEVEL and ATTACH SERVICE LEVEL.
| t=K_DEFAULT { $name = sstring("default"); }
| QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");} | QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
; ;
@@ -2379,7 +2366,6 @@ K_LIKE: L I K E;
K_TIMEOUT: T I M E O U T; K_TIMEOUT: T I M E O U T;
K_PRUNE: P R U N E; K_PRUNE: P R U N E;
K_CONCURRENCY: C O N C U R R E N C Y;
K_EXECUTE: E X E C U T E; K_EXECUTE: E X E C U T E;

View File

@@ -20,21 +20,19 @@
namespace cql3 { namespace cql3 {
std::unique_ptr<attributes> attributes::none() { std::unique_ptr<attributes> attributes::none() {
return std::unique_ptr<attributes>{new attributes{{}, {}, {}, {}, {}}}; return std::unique_ptr<attributes>{new attributes{{}, {}, {}, {}}};
} }
attributes::attributes(std::optional<cql3::expr::expression>&& timestamp, attributes::attributes(std::optional<cql3::expr::expression>&& timestamp,
std::optional<cql3::expr::expression>&& time_to_live, std::optional<cql3::expr::expression>&& time_to_live,
std::optional<cql3::expr::expression>&& timeout, std::optional<cql3::expr::expression>&& timeout,
std::optional<sstring> service_level, std::optional<sstring> service_level)
std::optional<cql3::expr::expression>&& concurrency)
: _timestamp_unset_guard(timestamp) : _timestamp_unset_guard(timestamp)
, _timestamp{std::move(timestamp)} , _timestamp{std::move(timestamp)}
, _time_to_live_unset_guard(time_to_live) , _time_to_live_unset_guard(time_to_live)
, _time_to_live{std::move(time_to_live)} , _time_to_live{std::move(time_to_live)}
, _timeout{std::move(timeout)} , _timeout{std::move(timeout)}
, _service_level(std::move(service_level)) , _service_level(std::move(service_level))
, _concurrency{std::move(concurrency)}
{ } { }
bool attributes::is_timestamp_set() const { bool attributes::is_timestamp_set() const {
@@ -53,10 +51,6 @@ bool attributes::is_service_level_set() const {
return bool(_service_level); return bool(_service_level);
} }
bool attributes::is_concurrency_set() const {
return bool(_concurrency);
}
int64_t attributes::get_timestamp(int64_t now, const query_options& options) { int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
if (!_timestamp.has_value() || _timestamp_unset_guard.is_unset(options)) { if (!_timestamp.has_value() || _timestamp_unset_guard.is_unset(options)) {
return now; return now;
@@ -129,27 +123,6 @@ qos::service_level_options attributes::get_service_level(qos::service_level_cont
return sl_controller.get_service_level(sl_name).slo; return sl_controller.get_service_level(sl_name).slo;
} }
std::optional<int32_t> attributes::get_concurrency(const query_options& options) const {
if (!_concurrency.has_value()) {
return std::nullopt;
}
cql3::raw_value concurrency_raw = expr::evaluate(*_concurrency, options);
if (concurrency_raw.is_null()) {
throw exceptions::invalid_request_exception("Invalid null value of concurrency");
}
int32_t concurrency;
try {
concurrency = concurrency_raw.view().validate_and_deserialize<int32_t>(*int32_type);
} catch (marshal_exception& e) {
throw exceptions::invalid_request_exception("Invalid concurrency value");
}
if (concurrency <= 0) {
throw exceptions::invalid_request_exception("Concurrency must be a positive integer");
}
return concurrency;
}
void attributes::fill_prepare_context(prepare_context& ctx) { void attributes::fill_prepare_context(prepare_context& ctx) {
if (_timestamp.has_value()) { if (_timestamp.has_value()) {
expr::fill_prepare_context(*_timestamp, ctx); expr::fill_prepare_context(*_timestamp, ctx);
@@ -160,13 +133,10 @@ void attributes::fill_prepare_context(prepare_context& ctx) {
if (_timeout.has_value()) { if (_timeout.has_value()) {
expr::fill_prepare_context(*_timeout, ctx); expr::fill_prepare_context(*_timeout, ctx);
} }
if (_concurrency.has_value()) {
expr::fill_prepare_context(*_concurrency, ctx);
}
} }
std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const { std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const {
std::optional<expr::expression> ts, ttl, to, conc; std::optional<expr::expression> ts, ttl, to;
if (timestamp.has_value()) { if (timestamp.has_value()) {
ts = prepare_expression(*timestamp, db, ks_name, nullptr, timestamp_receiver(ks_name, cf_name)); ts = prepare_expression(*timestamp, db, ks_name, nullptr, timestamp_receiver(ks_name, cf_name));
@@ -183,12 +153,7 @@ std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database d
verify_no_aggregate_functions(*timeout, "USING clause"); verify_no_aggregate_functions(*timeout, "USING clause");
} }
if (concurrency.has_value()) { return std::unique_ptr<attributes>{new attributes{std::move(ts), std::move(ttl), std::move(to), std::move(service_level)}};
conc = prepare_expression(*concurrency, db, ks_name, nullptr, concurrency_receiver(ks_name, cf_name));
verify_no_aggregate_functions(*concurrency, "USING clause");
}
return std::unique_ptr<attributes>{new attributes{std::move(ts), std::move(ttl), std::move(to), std::move(service_level), std::move(conc)}};
} }
lw_shared_ptr<column_specification> attributes::raw::timestamp_receiver(const sstring& ks_name, const sstring& cf_name) const { lw_shared_ptr<column_specification> attributes::raw::timestamp_receiver(const sstring& ks_name, const sstring& cf_name) const {
@@ -203,8 +168,4 @@ lw_shared_ptr<column_specification> attributes::raw::timeout_receiver(const sstr
return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[timeout]", true), duration_type); return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[timeout]", true), duration_type);
} }
lw_shared_ptr<column_specification> attributes::raw::concurrency_receiver(const sstring& ks_name, const sstring& cf_name) const {
return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[concurrency]", true), data_type_for<int32_t>());
}
} }

View File

@@ -36,15 +36,13 @@ private:
std::optional<cql3::expr::expression> _time_to_live; std::optional<cql3::expr::expression> _time_to_live;
std::optional<cql3::expr::expression> _timeout; std::optional<cql3::expr::expression> _timeout;
std::optional<sstring> _service_level; std::optional<sstring> _service_level;
std::optional<cql3::expr::expression> _concurrency;
public: public:
static std::unique_ptr<attributes> none(); static std::unique_ptr<attributes> none();
private: private:
attributes(std::optional<cql3::expr::expression>&& timestamp, attributes(std::optional<cql3::expr::expression>&& timestamp,
std::optional<cql3::expr::expression>&& time_to_live, std::optional<cql3::expr::expression>&& time_to_live,
std::optional<cql3::expr::expression>&& timeout, std::optional<cql3::expr::expression>&& timeout,
std::optional<sstring> service_level, std::optional<sstring> service_level);
std::optional<cql3::expr::expression>&& concurrency);
public: public:
bool is_timestamp_set() const; bool is_timestamp_set() const;
@@ -54,8 +52,6 @@ public:
bool is_service_level_set() const; bool is_service_level_set() const;
bool is_concurrency_set() const;
int64_t get_timestamp(int64_t now, const query_options& options); int64_t get_timestamp(int64_t now, const query_options& options);
std::optional<int32_t> get_time_to_live(const query_options& options); std::optional<int32_t> get_time_to_live(const query_options& options);
@@ -64,8 +60,6 @@ public:
qos::service_level_options get_service_level(qos::service_level_controller& sl_controller) const; qos::service_level_options get_service_level(qos::service_level_controller& sl_controller) const;
std::optional<int32_t> get_concurrency(const query_options& options) const;
void fill_prepare_context(prepare_context& ctx); void fill_prepare_context(prepare_context& ctx);
class raw final { class raw final {
@@ -74,7 +68,6 @@ public:
std::optional<cql3::expr::expression> time_to_live; std::optional<cql3::expr::expression> time_to_live;
std::optional<cql3::expr::expression> timeout; std::optional<cql3::expr::expression> timeout;
std::optional<sstring> service_level; std::optional<sstring> service_level;
std::optional<cql3::expr::expression> concurrency;
std::unique_ptr<attributes> prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const; std::unique_ptr<attributes> prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const;
private: private:
@@ -83,8 +76,6 @@ public:
lw_shared_ptr<column_specification> time_to_live_receiver(const sstring& ks_name, const sstring& cf_name) const; lw_shared_ptr<column_specification> time_to_live_receiver(const sstring& ks_name, const sstring& cf_name) const;
lw_shared_ptr<column_specification> timeout_receiver(const sstring& ks_name, const sstring& cf_name) const; lw_shared_ptr<column_specification> timeout_receiver(const sstring& ks_name, const sstring& cf_name) const;
lw_shared_ptr<column_specification> concurrency_receiver(const sstring& ks_name, const sstring& cf_name) const;
}; };
}; };

View File

@@ -165,7 +165,8 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
service::topology_mutation_builder builder(ts); service::topology_mutation_builder builder(ts);
service::topology_request_tracking_mutation_builder rtbuilder{global_request_id, qp.proxy().features().topology_requests_type_column}; service::topology_request_tracking_mutation_builder rtbuilder{global_request_id, qp.proxy().features().topology_requests_type_column};
rtbuilder.set("done", false); rtbuilder.set("done", false)
.set("start_time", db_clock::now());
if (!qp.proxy().features().topology_global_request_queue) { if (!qp.proxy().features().topology_global_request_queue) {
builder.set_global_topology_request(service::global_topology_request::keyspace_rf_change); builder.set_global_topology_request(service::global_topology_request::keyspace_rf_change);
builder.set_global_topology_request_id(global_request_id); builder.set_global_topology_request_id(global_request_id);

View File

@@ -37,12 +37,6 @@ future<::shared_ptr<cql_transport::messages::result_message>>
alter_service_level_statement::execute(query_processor& qp, alter_service_level_statement::execute(query_processor& qp,
service::query_state &state, service::query_state &state,
const query_options &, std::optional<service::group0_guard> guard) const { const query_options &, std::optional<service::group0_guard> guard) const {
if (_service_level == qos::service_level_controller::default_service_level_name) {
sstring reason = seastar::format("The default service level, {}, cannot be altered",
qos::service_level_controller::default_service_level_name);
throw exceptions::invalid_request_exception(std::move(reason));
}
service::group0_batch mc{std::move(guard)}; service::group0_batch mc{std::move(guard)};
validate_shares_option(qp, _slo); validate_shares_option(qp, _slo);
qos::service_level& sl = state.get_service_level_controller().get_service_level(_service_level); qos::service_level& sl = state.get_service_level_controller().get_service_level(_service_level);

View File

@@ -422,14 +422,7 @@ std::pair<schema_ptr, std::vector<view_ptr>> alter_table_statement::prepare_sche
throw exceptions::invalid_request_exception(format("The synchronous_updates option is only applicable to materialized views, not to base tables")); throw exceptions::invalid_request_exception(format("The synchronous_updates option is only applicable to materialized views, not to base tables"));
} }
if (is_cdc_log_table) { _properties->apply_to_builder(cfm, std::move(schema_extensions), db, keyspace());
auto gc_opts = _properties->get_tombstone_gc_options(schema_extensions);
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on CDC log tables.");
}
}
_properties->apply_to_builder(cfm, std::move(schema_extensions), db, keyspace(), !is_cdc_log_table);
} }
break; break;

View File

@@ -55,29 +55,8 @@ view_ptr alter_view_statement::prepare_view(data_dictionary::database db) const
auto schema_extensions = _properties->make_schema_extensions(db.extensions()); auto schema_extensions = _properties->make_schema_extensions(db.extensions());
_properties->validate(db, keyspace(), schema_extensions); _properties->validate(db, keyspace(), schema_extensions);
bool is_colocated = [&] {
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
return false;
}
auto base_schema = db.find_schema(schema->view_info()->base_id());
if (!base_schema) {
return false;
}
return std::ranges::equal(
schema->partition_key_columns(),
base_schema->partition_key_columns(),
[](const column_definition& a, const column_definition& b) { return a.name() == b.name(); });
}();
if (is_colocated) {
auto gc_opts = _properties->get_tombstone_gc_options(schema_extensions);
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on co-located materialized view tables.");
}
}
auto builder = schema_builder(schema); auto builder = schema_builder(schema);
_properties->apply_to_builder(builder, std::move(schema_extensions), db, keyspace(), !is_colocated); _properties->apply_to_builder(builder, std::move(schema_extensions), db, keyspace());
if (builder.get_gc_grace_seconds() == 0) { if (builder.get_gc_grace_seconds() == 0) {
throw exceptions::invalid_request_exception( throw exceptions::invalid_request_exception(

View File

@@ -43,14 +43,6 @@ attach_service_level_statement::execute(query_processor& qp,
service::query_state &state, service::query_state &state,
const query_options &, const query_options &,
std::optional<service::group0_guard> guard) const { std::optional<service::group0_guard> guard) const {
if (_service_level == qos::service_level_controller::default_service_level_name) {
sstring reason = seastar::format("The default service level, {}, cannot be "
"attached to a role. If you want to detach an attached service level, "
"use the DETACH SERVICE LEVEL statement",
qos::service_level_controller::default_service_level_name);
throw exceptions::invalid_request_exception(std::move(reason));
}
auto sli = co_await state.get_service_level_controller().get_distributed_service_level(_service_level); auto sli = co_await state.get_service_level_controller().get_distributed_service_level(_service_level);
if (sli.empty()) { if (sli.empty()) {
throw qos::nonexistant_service_level_exception(_service_level); throw qos::nonexistant_service_level_exception(_service_level);

View File

@@ -331,7 +331,7 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::exe
if (!cl_for_paxos) [[unlikely]] { if (!cl_for_paxos) [[unlikely]] {
return make_exception_future<shared_ptr<cql_transport::messages::result_message>>(std::move(cl_for_paxos).assume_error()); return make_exception_future<shared_ptr<cql_transport::messages::result_message>>(std::move(cl_for_paxos).assume_error());
} }
std::unique_ptr<cas_request> request; seastar::shared_ptr<cas_request> request;
schema_ptr schema; schema_ptr schema;
db::timeout_clock::time_point now = db::timeout_clock::now(); db::timeout_clock::time_point now = db::timeout_clock::now();
@@ -354,9 +354,9 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::exe
if (keys.empty()) { if (keys.empty()) {
continue; continue;
} }
if (!request) { if (request.get() == nullptr) {
schema = statement.s; schema = statement.s;
request = std::make_unique<cas_request>(schema, std::move(keys)); request = seastar::make_shared<cas_request>(schema, std::move(keys));
} else if (keys.size() != 1 || keys.front().equal(request->key().front(), dht::ring_position_comparator(*schema)) == false) { } else if (keys.size() != 1 || keys.front().equal(request->key().front(), dht::ring_position_comparator(*schema)) == false) {
throw exceptions::invalid_request_exception("BATCH with conditions cannot span multiple partitions"); throw exceptions::invalid_request_exception("BATCH with conditions cannot span multiple partitions");
} }
@@ -366,7 +366,7 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::exe
request->add_row_update(statement, std::move(ranges), std::move(json_cache), statement_options); request->add_row_update(statement, std::move(ranges), std::move(json_cache), statement_options);
} }
if (!request) { if (request.get() == nullptr) {
throw exceptions::invalid_request_exception(format("Unrestricted partition key in a conditional BATCH")); throw exceptions::invalid_request_exception(format("Unrestricted partition key in a conditional BATCH"));
} }
@@ -377,10 +377,9 @@ future<shared_ptr<cql_transport::messages::result_message>> batch_statement::exe
); );
} }
auto* request_ptr = request.get(); return qp.proxy().cas(schema, std::move(cas_shard), request, request->read_command(qp), request->key(),
return qp.proxy().cas(schema, std::move(cas_shard), *request_ptr, request->read_command(qp), request->key(),
{read_timeout, qs.get_permit(), qs.get_client_state(), qs.get_trace_state()}, {read_timeout, qs.get_permit(), qs.get_client_state(), qs.get_trace_state()},
std::move(cl_for_paxos).assume_value(), cl_for_learn, batch_timeout, cas_timeout).then([this, request = std::move(request)] (bool is_applied) { std::move(cl_for_paxos).assume_value(), cl_for_learn, batch_timeout, cas_timeout).then([this, request] (bool is_applied) {
return request->build_cas_result_set(_metadata, _columns_of_cas_result_set, is_applied); return request->build_cas_result_set(_metadata, _columns_of_cas_result_set, is_applied);
}); });
} }

View File

@@ -293,7 +293,7 @@ std::optional<db::tablet_options::map_type> cf_prop_defs::get_tablet_options() c
return std::nullopt; return std::nullopt;
} }
void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name, bool supports_repair) const { void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name) const {
if (has_property(KW_COMMENT)) { if (has_property(KW_COMMENT)) {
builder.set_comment(get_string(KW_COMMENT, "")); builder.set_comment(get_string(KW_COMMENT, ""));
} }
@@ -379,7 +379,7 @@ void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_
} }
// Set default tombstone_gc mode. // Set default tombstone_gc mode.
if (!schema_extensions.contains(tombstone_gc_extension::NAME)) { if (!schema_extensions.contains(tombstone_gc_extension::NAME)) {
auto ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, ks_name, supports_repair)); auto ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, ks_name));
schema_extensions.emplace(tombstone_gc_extension::NAME, std::move(ext)); schema_extensions.emplace(tombstone_gc_extension::NAME, std::move(ext));
} }
builder.set_extensions(std::move(schema_extensions)); builder.set_extensions(std::move(schema_extensions));

View File

@@ -110,7 +110,7 @@ public:
bool get_synchronous_updates_flag() const; bool get_synchronous_updates_flag() const;
std::optional<db::tablet_options::map_type> get_tablet_options() const; std::optional<db::tablet_options::map_type> get_tablet_options() const;
void apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name, bool supports_repair) const; void apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name) const;
void validate_minimum_int(const sstring& field, int32_t minimum_value, int32_t default_value) const; void validate_minimum_int(const sstring& field, int32_t minimum_value, int32_t default_value) const;
}; };

View File

@@ -201,14 +201,7 @@ view_ptr create_index_statement::create_view_for_index(const schema_ptr schema,
""; "";
builder.with_view_info(schema, false, where_clause); builder.with_view_info(schema, false, where_clause);
bool is_colocated = [&] { auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, schema->ks_name()));
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
return false;
}
return im.local();
}();
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, schema->ks_name(), !is_colocated));
builder.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext)); builder.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
// A local secondary index should be backed by a *synchronous* view, // A local secondary index should be backed by a *synchronous* view,
@@ -279,15 +272,11 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
throw exceptions::invalid_request_exception(format("index names shouldn't be more than {:d} characters long (got \"{}\")", schema::NAME_LENGTH, _index_name.c_str())); throw exceptions::invalid_request_exception(format("index names shouldn't be more than {:d} characters long (got \"{}\")", schema::NAME_LENGTH, _index_name.c_str()));
} }
// Regular secondary indexes require rf-rack-validity. try {
// Custom indexes need to validate this property themselves, if they need it. db::view::validate_view_keyspace(db, keyspace());
if (!_properties || !_properties->custom_class) { } catch (const std::exception& e) {
try { // The type of the thrown exception is not specified, so we need to wrap it here.
db::view::validate_view_keyspace(db, keyspace()); throw exceptions::invalid_request_exception(e.what());
} catch (const std::exception& e) {
// The type of the thrown exception is not specified, so we need to wrap it here.
throw exceptions::invalid_request_exception(e.what());
}
} }
validate_for_local_index(*schema); validate_for_local_index(*schema);
@@ -303,7 +292,7 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
throw exceptions::invalid_request_exception(format("Non-supported custom class \'{}\' provided", *(_properties->custom_class))); throw exceptions::invalid_request_exception(format("Non-supported custom class \'{}\' provided", *(_properties->custom_class)));
} }
auto custom_index = (*custom_index_factory)(); auto custom_index = (*custom_index_factory)();
custom_index->validate(*schema, *_properties, targets, db.features(), db); custom_index->validate(*schema, *_properties, targets, db.features());
_properties->index_version = custom_index->index_version(*schema); _properties->index_version = custom_index->index_version(*schema);
} }

View File

@@ -45,12 +45,6 @@ create_service_level_statement::execute(query_processor& qp,
throw exceptions::invalid_request_exception("Names starting with '$' are reserved for internal tenants. Use a different name."); throw exceptions::invalid_request_exception("Names starting with '$' are reserved for internal tenants. Use a different name.");
} }
if (_service_level == qos::service_level_controller::default_service_level_name) {
sstring reason = seastar::format("The default service level, {}, already exists "
"and cannot be created", qos::service_level_controller::default_service_level_name);
throw exceptions::invalid_request_exception(std::move(reason));
}
service::group0_batch mc{std::move(guard)}; service::group0_batch mc{std::move(guard)};
validate_shares_option(qp, _slo); validate_shares_option(qp, _slo);

View File

@@ -128,7 +128,7 @@ void create_table_statement::apply_properties_to(schema_builder& builder, const
builder.set_compressor_params(db.get_config().sstable_compression_user_table_options()); builder.set_compressor_params(db.get_config().sstable_compression_user_table_options());
} }
_properties->apply_to_builder(builder, _properties->make_schema_extensions(db.extensions()), db, keyspace(), true); _properties->apply_to_builder(builder, _properties->make_schema_extensions(db.extensions()), db, keyspace());
} }
void create_table_statement::add_column_metadata_from_aliases(schema_builder& builder, std::vector<bytes> aliases, const std::vector<data_type>& types, column_kind kind) const void create_table_statement::add_column_metadata_from_aliases(schema_builder& builder, std::vector<bytes> aliases, const std::vector<data_type>& types, column_kind kind) const

View File

@@ -373,30 +373,7 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
db::view::create_virtual_column(builder, def->name(), def->type); db::view::create_virtual_column(builder, def->name(), def->type);
} }
} }
_properties.properties()->apply_to_builder(builder, std::move(schema_extensions), db, keyspace());
bool is_colocated = [&] {
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
return false;
}
if (target_partition_keys.size() != schema->partition_key_columns().size()) {
return false;
}
for (size_t i = 0; i < target_partition_keys.size(); ++i) {
if (target_partition_keys[i] != &schema->partition_key_columns()[i]) {
return false;
}
}
return true;
}();
if (is_colocated) {
auto gc_opts = _properties.properties()->get_tombstone_gc_options(schema_extensions);
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on co-located materialized view tables.");
}
}
_properties.properties()->apply_to_builder(builder, std::move(schema_extensions), db, keyspace(), !is_colocated);
if (builder.default_time_to_live().count() > 0) { if (builder.default_time_to_live().count() > 0) {
throw exceptions::invalid_request_exception( throw exceptions::invalid_request_exception(

View File

@@ -34,11 +34,6 @@ drop_service_level_statement::execute(query_processor& qp,
service::query_state &state, service::query_state &state,
const query_options &, const query_options &,
std::optional<service::group0_guard> guard) const { std::optional<service::group0_guard> guard) const {
if (_service_level == qos::service_level_controller::default_service_level_name) {
sstring reason = seastar::format("The default service level, {}, cannot be dropped",
qos::service_level_controller::default_service_level_name);
throw exceptions::invalid_request_exception(std::move(reason));
}
service::group0_batch mc{std::move(guard)}; service::group0_batch mc{std::move(guard)};
auto& sl = state.get_service_level_controller(); auto& sl = state.get_service_level_controller();
co_await sl.drop_distributed_service_level(_service_level, _if_exists, mc); co_await sl.drop_distributed_service_level(_service_level, _if_exists, mc);

View File

@@ -8,7 +8,6 @@
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0) * SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
*/ */
#include "seastar/core/format.hh"
#include "seastar/core/sstring.hh" #include "seastar/core/sstring.hh"
#include "utils/assert.hh" #include "utils/assert.hh"
#include "cql3/statements/ks_prop_defs.hh" #include "cql3/statements/ks_prop_defs.hh"
@@ -114,17 +113,6 @@ static locator::replication_strategy_config_options prepare_options(
return options; return options;
} }
if (uses_tablets) {
for (const auto& opt: old_options) {
if (opt.first == ks_prop_defs::REPLICATION_FACTOR_KEY) {
on_internal_error(logger, format("prepare_options: old_options contains invalid key '{}'", ks_prop_defs::REPLICATION_FACTOR_KEY));
}
if (!options.contains(opt.first)) {
throw exceptions::configuration_exception(fmt::format("Attempted to implicitly drop replicas in datacenter {}. If this is the desired behavior, set replication factor to 0 in {} explicitly.", opt.first, opt.first));
}
}
}
// For users' convenience, expand the 'replication_factor' option into a replication factor for each DC. // For users' convenience, expand the 'replication_factor' option into a replication factor for each DC.
// If the user simply switches from another strategy without providing any options, // If the user simply switches from another strategy without providing any options,
// but the other strategy used the 'replication_factor' option, it will also be expanded. // but the other strategy used the 'replication_factor' option, it will also be expanded.

View File

@@ -401,8 +401,7 @@ modification_statement::execute_with_condition(query_processor& qp, service::que
type.is_update() ? "update" : "deletion")); type.is_update() ? "update" : "deletion"));
} }
auto request = std::make_unique<cas_request>(s, std::move(keys)); auto request = seastar::make_shared<cas_request>(s, std::move(keys));
auto* request_ptr = request.get();
// cas_request can be used for batches as well single statements; Here we have just a single // cas_request can be used for batches as well single statements; Here we have just a single
// modification in the list of CAS commands, since we're handling single-statement execution. // modification in the list of CAS commands, since we're handling single-statement execution.
request->add_row_update(*this, std::move(ranges), std::move(json_cache), options); request->add_row_update(*this, std::move(ranges), std::move(json_cache), options);
@@ -428,9 +427,9 @@ modification_statement::execute_with_condition(query_processor& qp, service::que
tablet_info = erm->check_locality(token); tablet_info = erm->check_locality(token);
} }
return qp.proxy().cas(s, std::move(cas_shard), *request_ptr, request->read_command(qp), request->key(), return qp.proxy().cas(s, std::move(cas_shard), request, request->read_command(qp), request->key(),
{read_timeout, qs.get_permit(), qs.get_client_state(), qs.get_trace_state()}, {read_timeout, qs.get_permit(), qs.get_client_state(), qs.get_trace_state()},
std::move(cl_for_paxos).assume_value(), cl_for_learn, statement_timeout, cas_timeout).then([this, request = std::move(request), tablet_replicas = std::move(tablet_info->tablet_replicas), token_range = tablet_info->token_range] (bool is_applied) { std::move(cl_for_paxos).assume_value(), cl_for_learn, statement_timeout, cas_timeout).then([this, request, tablet_replicas = std::move(tablet_info->tablet_replicas), token_range = tablet_info->token_range] (bool is_applied) {
auto result = request->build_cas_result_set(_metadata, _columns_of_cas_result_set, is_applied); auto result = request->build_cas_result_set(_metadata, _columns_of_cas_result_set, is_applied);
result->add_tablet_info(tablet_replicas, token_range); result->add_tablet_info(tablet_replicas, token_range);
return result; return result;

View File

@@ -21,7 +21,7 @@ namespace cql3 {
namespace statements { namespace statements {
static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges, std::vector<query::clustering_range> clustering_bounds, view_ptr view, static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges, std::vector<query::clustering_range> clustering_bounds, view_ptr view,
service::storage_proxy& proxy, service::query_state& state, const query_options& options, cql_stats& stats, db::timeout_clock::duration timeout_duration, size_t concurrency) { service::storage_proxy& proxy, service::query_state& state, const query_options& options, cql_stats& stats, db::timeout_clock::duration timeout_duration) {
auto key_columns = std::ranges::to<std::vector<const column_definition*>>( auto key_columns = std::ranges::to<std::vector<const column_definition*>>(
view->all_columns() view->all_columns()
| std::views::filter([] (const column_definition& cdef) { return cdef.is_primary_key(); }) | std::views::filter([] (const column_definition& cdef) { return cdef.is_primary_key(); })
@@ -35,7 +35,7 @@ static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges,
tracing::trace(state.get_trace_state(), "Deleting ghost rows from partition ranges {}", partition_ranges); tracing::trace(state.get_trace_state(), "Deleting ghost rows from partition ranges {}", partition_ranges);
auto p = service::pager::query_pagers::ghost_row_deleting_pager(schema_ptr(view), selection, state, auto p = service::pager::query_pagers::ghost_row_deleting_pager(schema_ptr(view), selection, state,
options, std::move(command), std::move(partition_ranges), stats, proxy, timeout_duration, concurrency); options, std::move(command), std::move(partition_ranges), stats, proxy, timeout_duration);
int32_t page_size = std::max(options.get_page_size(), 1000); int32_t page_size = std::max(options.get_page_size(), 1000);
auto now = gc_clock::now(); auto now = gc_clock::now();
@@ -62,8 +62,7 @@ future<::shared_ptr<cql_transport::messages::result_message>> prune_materialized
auto timeout_duration = get_timeout(state.get_client_state(), options); auto timeout_duration = get_timeout(state.get_client_state(), options);
dht::partition_range_vector key_ranges = _restrictions->get_partition_key_ranges(options); dht::partition_range_vector key_ranges = _restrictions->get_partition_key_ranges(options);
std::vector<query::clustering_range> clustering_bounds = _restrictions->get_clustering_bounds(options); std::vector<query::clustering_range> clustering_bounds = _restrictions->get_clustering_bounds(options);
size_t concurrency = _attrs->is_concurrency_set() ? _attrs->get_concurrency(options).value() : 1; return delete_ghost_rows(std::move(key_ranges), std::move(clustering_bounds), view_ptr(_schema), qp.proxy(), state, options, _stats, timeout_duration).then([] {
return delete_ghost_rows(std::move(key_ranges), std::move(clustering_bounds), view_ptr(_schema), qp.proxy(), state, options, _stats, timeout_duration, concurrency).then([] {
return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(::make_shared<cql_transport::messages::result_message::void_message>()); return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(::make_shared<cql_transport::messages::result_message::void_message>());
}); });
} }

View File

@@ -2031,16 +2031,14 @@ future<shared_ptr<cql_transport::messages::result_message>> vector_indexed_table
fmt::format("Use of ANN OF in an ORDER BY clause requires a LIMIT that is not greater than {}. LIMIT was {}", max_ann_query_limit, limit))); fmt::format("Use of ANN OF in an ORDER BY clause requires a LIMIT that is not greater than {}. LIMIT was {}", max_ann_query_limit, limit)));
} }
auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options); auto as = abort_source();
auto aoe = abort_on_expiry(timeout); auto pkeys = co_await qp.vector_store_client().ann(_schema->ks_name(), _index.metadata().name(), _schema, get_ann_ordering_vector(options), limit, as);
auto pkeys = co_await qp.vector_store_client().ann(
_schema->ks_name(), _index.metadata().name(), _schema, get_ann_ordering_vector(options), limit, aoe.abort_source());
if (!pkeys.has_value()) { if (!pkeys.has_value()) {
co_await coroutine::return_exception( co_await coroutine::return_exception(
exceptions::invalid_request_exception(std::visit(vector_search::vector_store_client::ann_error_visitor{}, pkeys.error()))); exceptions::invalid_request_exception(std::visit(vector_search::vector_store_client::ann_error_visitor{}, pkeys.error())));
} }
co_return co_await query_base_table(qp, state, options, pkeys.value(), timeout); co_return co_await query_base_table(qp, state, options, pkeys.value());
}); });
auto page_size = options.get_page_size(); auto page_size = options.get_page_size();
@@ -2075,10 +2073,10 @@ std::vector<float> vector_indexed_table_select_statement::get_ann_ordering_vecto
return util::to_vector<float>(values); return util::to_vector<float>(values);
} }
future<::shared_ptr<cql_transport::messages::result_message>> vector_indexed_table_select_statement::query_base_table(query_processor& qp, future<::shared_ptr<cql_transport::messages::result_message>> vector_indexed_table_select_statement::query_base_table(
service::query_state& state, const query_options& options, const std::vector<vector_search::primary_key>& pkeys, query_processor& qp, service::query_state& state, const query_options& options, const std::vector<vector_search::primary_key>& pkeys) const {
lowres_clock::time_point timeout) const {
auto command = prepare_command_for_base_query(qp, state, options); auto command = prepare_command_for_base_query(qp, state, options);
auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options);
// For tables without clustering columns, we can optimize by querying // For tables without clustering columns, we can optimize by querying
// partition ranges instead of individual primary keys, since the // partition ranges instead of individual primary keys, since the

View File

@@ -389,8 +389,8 @@ private:
std::vector<float> get_ann_ordering_vector(const query_options& options) const; std::vector<float> get_ann_ordering_vector(const query_options& options) const;
future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(query_processor& qp, service::query_state& state, future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(
const query_options& options, const std::vector<vector_search::primary_key>& pkeys, lowres_clock::time_point timeout) const; query_processor& qp, service::query_state& state, const query_options& options, const std::vector<vector_search::primary_key>& pkeys) const;
future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(query_processor& qp, service::query_state& state, future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(query_processor& qp, service::query_state& state,
const query_options& options, lw_shared_ptr<query::read_command> command, lowres_clock::time_point timeout, const query_options& options, lw_shared_ptr<query::read_command> command, lowres_clock::time_point timeout,

View File

@@ -12,8 +12,5 @@ target_link_libraries(data_dictionary
Seastar::seastar Seastar::seastar
xxHash::xxhash) xxHash::xxhash)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(data_dictionary REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers data_dictionary check_headers(check-headers data_dictionary
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -10,6 +10,7 @@ target_sources(db
schema_applier.cc schema_applier.cc
schema_tables.cc schema_tables.cc
cql_type_parser.cc cql_type_parser.cc
legacy_schema_migrator.cc
commitlog/commitlog.cc commitlog/commitlog.cc
commitlog/commitlog_replayer.cc commitlog/commitlog_replayer.cc
commitlog/commitlog_entry.cc commitlog/commitlog_entry.cc
@@ -59,8 +60,5 @@ target_link_libraries(db
data_dictionary data_dictionary
cql3) cql3)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(db REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers db check_headers(check-headers db
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh) GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -3461,15 +3461,12 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
clogger.debug("Read {} bytes of data ({}, {})", size, pos, rem); clogger.debug("Read {} bytes of data ({}, {})", size, pos, rem);
while (rem < size) { while (rem < size) {
const auto initial_size = initial.size_bytes();
if (eof) { if (eof) {
auto reason = fmt::format("unexpected EOF, pos={}, rem={}, size={}, alignment={}, initial_size={}", auto reason = fmt::format("unexpected EOF, rem={}, size={}", rem, size);
pos, rem, size, alignment, initial_size);
throw segment_truncation(std::move(reason), block_boundry); throw segment_truncation(std::move(reason), block_boundry);
} }
auto block_size = alignment - initial_size; auto block_size = alignment - initial.size_bytes();
// using a stream is perhaps not 100% effective, but we need to // using a stream is perhaps not 100% effective, but we need to
// potentially address data in pages smaller than the current // potentially address data in pages smaller than the current
// disk/fs we are reading from can handle (but please no). // disk/fs we are reading from can handle (but please no).
@@ -3477,9 +3474,8 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
if (tmp.size_bytes() == 0) { if (tmp.size_bytes() == 0) {
eof = true; eof = true;
auto reason = fmt::format("read 0 bytes, while tried to read {} bytes. " auto reason = fmt::format("read 0 bytes, while tried to read {} bytes. rem={}, size={}",
"pos={}, rem={}, size={}, alignment={}, initial_size={}", block_size, rem, size);
block_size, pos, rem, size, alignment, initial_size);
throw segment_truncation(std::move(reason), block_boundry); throw segment_truncation(std::move(reason), block_boundry);
} }
@@ -3515,13 +3511,13 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
auto checksum = crc.checksum(); auto checksum = crc.checksum();
if (check != checksum) { if (check != checksum) {
auto reason = fmt::format("checksums do not match: {:x} vs. {:x}. pos={}, rem={}, size={}, alignment={}, initial_size={}", auto reason = fmt::format("checksums do not match: {:x} vs. {:x}. rem={}, size={}",
check, checksum, pos, rem, size, alignment, initial_size); check, checksum, rem, size);
throw segment_data_corruption_error(std::move(reason), alignment); throw segment_data_corruption_error(std::move(reason), alignment);
} }
if (id != this->id) { if (id != this->id) {
auto reason = fmt::format("IDs do not match: {} vs. {}. pos={}, rem={}, size={}, alignment={}, initial_size={}", auto reason = fmt::format("IDs do not match: {} vs. {}. rem={}, size={}",
id, this->id, pos, rem, size, alignment, initial_size); id, this->id, rem, size);
throw segment_truncation(std::move(reason), pos + rem); throw segment_truncation(std::move(reason), pos + rem);
} }
} }
@@ -3630,10 +3626,6 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
auto old = pos; auto old = pos;
pos = next_pos(off); pos = next_pos(off);
clogger.trace("Pos {} -> {} ({})", old, pos, off); clogger.trace("Pos {} -> {} ({})", old, pos, off);
// #24346 check eof status whenever we move file pos.
if (pos >= file_size) {
eof = true;
}
} }
future<> read_entry() { future<> read_entry() {

View File

@@ -36,7 +36,6 @@
#include "sstables/compressor.hh" #include "sstables/compressor.hh"
#include "utils/log.hh" #include "utils/log.hh"
#include "service/tablet_allocator_fwd.hh" #include "service/tablet_allocator_fwd.hh"
#include "backlog_controller_fwd.hh"
#include "utils/config_file_impl.hh" #include "utils/config_file_impl.hh"
#include "exceptions/exceptions.hh" #include "exceptions/exceptions.hh"
#include <seastar/core/metrics_api.hh> #include <seastar/core/metrics_api.hh>
@@ -631,8 +630,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"If set to higher than 0, ignore the controller's output and set the memtable shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.") "If set to higher than 0, ignore the controller's output and set the memtable shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
, compaction_static_shares(this, "compaction_static_shares", liveness::LiveUpdate, value_status::Used, 0, , compaction_static_shares(this, "compaction_static_shares", liveness::LiveUpdate, value_status::Used, 0,
"If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.") "If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
, compaction_max_shares(this, "compaction_max_shares", liveness::LiveUpdate, value_status::Used, default_compaction_maximum_shares,
"Set the maximum shares of regular compaction to the specific value. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
, compaction_enforce_min_threshold(this, "compaction_enforce_min_threshold", liveness::LiveUpdate, value_status::Used, false, , compaction_enforce_min_threshold(this, "compaction_enforce_min_threshold", liveness::LiveUpdate, value_status::Used, false,
"If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold.") "If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold.")
, compaction_flush_all_tables_before_major_seconds(this, "compaction_flush_all_tables_before_major_seconds", value_status::Used, 86400, , compaction_flush_all_tables_before_major_seconds(this, "compaction_flush_all_tables_before_major_seconds", value_status::Used, 86400,
@@ -1038,9 +1035,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"Controls whether traffic between nodes is compressed. The valid values are:\n" "Controls whether traffic between nodes is compressed. The valid values are:\n"
"* all: All traffic is compressed.\n" "* all: All traffic is compressed.\n"
"* dc : Traffic between data centers is compressed.\n" "* dc : Traffic between data centers is compressed.\n"
"* rack : Traffic between racks is compressed.\n"
"* none : No compression.", "* none : No compression.",
{"all", "dc", "rack", "none"}) {"all", "dc", "none"})
, internode_compression_zstd_max_cpu_fraction(this, "internode_compression_zstd_max_cpu_fraction", liveness::LiveUpdate, value_status::Used, 0.000, , internode_compression_zstd_max_cpu_fraction(this, "internode_compression_zstd_max_cpu_fraction", liveness::LiveUpdate, value_status::Used, 0.000,
"ZSTD compression of RPC will consume at most this fraction of each internode_compression_zstd_quota_refresh_period_ms time slice.\n" "ZSTD compression of RPC will consume at most this fraction of each internode_compression_zstd_quota_refresh_period_ms time slice.\n"
"If you wish to try out zstd for RPC compression, 0.05 is a reasonable starting point.") "If you wish to try out zstd for RPC compression, 0.05 is a reasonable starting point.")
@@ -1172,17 +1168,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"* default_weight: (Default: 1 **) How many requests are handled during each turn of the RoundRobin.\n" "* default_weight: (Default: 1 **) How many requests are handled during each turn of the RoundRobin.\n"
"* weights: (Default: Keyspace: 1) Takes a list of keyspaces. It sets how many requests are handled during each turn of the RoundRobin, based on the request_scheduler_id.") "* weights: (Default: Keyspace: 1) Takes a list of keyspaces. It sets how many requests are handled during each turn of the RoundRobin, based on the request_scheduler_id.")
/** /**
* @Group Vector search settings
* @GroupDescription Settings for configuring and tuning vector search functionality.
*/
, vector_store_primary_uri(this, "vector_store_primary_uri", liveness::LiveUpdate, value_status::Used, "",
"A comma-separated list of primary vector store node URIs. These nodes are preferred for vector search operations.")
, vector_store_secondary_uri(this, "vector_store_secondary_uri", liveness::LiveUpdate, value_status::Used, "",
"A comma-separated list of secondary vector store node URIs. These nodes are used as a fallback when all primary nodes are unavailable, and are typically located in a different availability zone for high availability.")
, vector_store_encryption_options(this, "vector_store_encryption_options", value_status::Used, {},
"Options for encrypted connections to the vector store. These options are used for HTTPS URIs in `vector_store_primary_uri` and `vector_store_secondary_uri`. The available options are:\n"
"* truststore: (Default: <not set, use system truststore>) Location of the truststore containing the trusted certificate for authenticating remote servers.")
/**
* @Group Security properties * @Group Security properties
* @GroupDescription Server and client security settings. * @GroupDescription Server and client security settings.
*/ */
@@ -1444,11 +1429,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, alternator_warn_authorization(this, "alternator_warn_authorization", liveness::LiveUpdate, value_status::Used, false, "Count and log warnings about failed authentication or authorization") , alternator_warn_authorization(this, "alternator_warn_authorization", liveness::LiveUpdate, value_status::Used, false, "Count and log warnings about failed authentication or authorization")
, alternator_write_isolation(this, "alternator_write_isolation", value_status::Used, "", "Default write isolation policy for Alternator.") , alternator_write_isolation(this, "alternator_write_isolation", value_status::Used, "", "Default write isolation policy for Alternator.")
, alternator_streams_time_window_s(this, "alternator_streams_time_window_s", value_status::Used, 10, "CDC query confidence window for alternator streams.") , alternator_streams_time_window_s(this, "alternator_streams_time_window_s", value_status::Used, 10, "CDC query confidence window for alternator streams.")
, alternator_streams_increased_compatibility(this, "alternator_streams_increased_compatibility", liveness::LiveUpdate, value_status::Used, false,
"Increases compatibility with DynamoDB Streams at the cost of performance. "
"If enabled, Alternator compares the existing item with the new one during "
"data-modifying operations to determine which event type should be emitted. "
"This penalty is incurred only for tables with Alternator Streams enabled.")
, alternator_timeout_in_ms(this, "alternator_timeout_in_ms", liveness::LiveUpdate, value_status::Used, 10000, , alternator_timeout_in_ms(this, "alternator_timeout_in_ms", liveness::LiveUpdate, value_status::Used, 10000,
"The server-side timeout for completing Alternator API requests.") "The server-side timeout for completing Alternator API requests.")
, alternator_ttl_period_in_seconds(this, "alternator_ttl_period_in_seconds", value_status::Used, , alternator_ttl_period_in_seconds(this, "alternator_ttl_period_in_seconds", value_status::Used,
@@ -1470,6 +1450,7 @@ db::config::config(std::shared_ptr<db::extensions> exts)
, alternator_max_expression_cache_entries_per_shard(this, "alternator_max_expression_cache_entries_per_shard", liveness::LiveUpdate, value_status::Used, 2000, "Maximum number of cached parsed request expressions, per shard.") , alternator_max_expression_cache_entries_per_shard(this, "alternator_max_expression_cache_entries_per_shard", liveness::LiveUpdate, value_status::Used, 2000, "Maximum number of cached parsed request expressions, per shard.")
, alternator_max_users_query_size_in_trace_output(this, "alternator_max_users_query_size_in_trace_output", liveness::LiveUpdate, value_status::Used, uint64_t(4096), , alternator_max_users_query_size_in_trace_output(this, "alternator_max_users_query_size_in_trace_output", liveness::LiveUpdate, value_status::Used, uint64_t(4096),
"Maximum size of user's command in trace output (`alternator_op` entry). Larger traces will be truncated and have `<truncated>` message appended - which doesn't count to the maximum limit.") "Maximum size of user's command in trace output (`alternator_op` entry). Larger traces will be truncated and have `<truncated>` message appended - which doesn't count to the maximum limit.")
, vector_store_primary_uri(this, "vector_store_primary_uri", liveness::LiveUpdate, value_status::Used, "", "A comma-separated list of vector store node URIs. If not set, vector search is disabled.")
, abort_on_ebadf(this, "abort_on_ebadf", value_status::Used, true, "Abort the server on incorrect file descriptor access. Throws exception when disabled.") , abort_on_ebadf(this, "abort_on_ebadf", value_status::Used, true, "Abort the server on incorrect file descriptor access. Throws exception when disabled.")
, sanitizer_report_backtrace(this, "sanitizer_report_backtrace", value_status::Used, false, , sanitizer_report_backtrace(this, "sanitizer_report_backtrace", value_status::Used, false,
"In debug mode, report log-structured allocator sanitizer violations with a backtrace. Slow.") "In debug mode, report log-structured allocator sanitizer violations with a backtrace. Slow.")

View File

@@ -189,7 +189,6 @@ public:
named_value<bool> auto_adjust_flush_quota; named_value<bool> auto_adjust_flush_quota;
named_value<float> memtable_flush_static_shares; named_value<float> memtable_flush_static_shares;
named_value<float> compaction_static_shares; named_value<float> compaction_static_shares;
named_value<float> compaction_max_shares;
named_value<bool> compaction_enforce_min_threshold; named_value<bool> compaction_enforce_min_threshold;
named_value<uint32_t> compaction_flush_all_tables_before_major_seconds; named_value<uint32_t> compaction_flush_all_tables_before_major_seconds;
named_value<sstring> cluster_name; named_value<sstring> cluster_name;
@@ -344,9 +343,6 @@ public:
named_value<sstring> request_scheduler; named_value<sstring> request_scheduler;
named_value<sstring> request_scheduler_id; named_value<sstring> request_scheduler_id;
named_value<string_map> request_scheduler_options; named_value<string_map> request_scheduler_options;
named_value<sstring> vector_store_primary_uri;
named_value<sstring> vector_store_secondary_uri;
named_value<string_map> vector_store_encryption_options;
named_value<sstring> authenticator; named_value<sstring> authenticator;
named_value<sstring> internode_authenticator; named_value<sstring> internode_authenticator;
named_value<sstring> authorizer; named_value<sstring> authorizer;
@@ -465,7 +461,6 @@ public:
named_value<bool> alternator_warn_authorization; named_value<bool> alternator_warn_authorization;
named_value<sstring> alternator_write_isolation; named_value<sstring> alternator_write_isolation;
named_value<uint32_t> alternator_streams_time_window_s; named_value<uint32_t> alternator_streams_time_window_s;
named_value<bool> alternator_streams_increased_compatibility;
named_value<uint32_t> alternator_timeout_in_ms; named_value<uint32_t> alternator_timeout_in_ms;
named_value<double> alternator_ttl_period_in_seconds; named_value<double> alternator_ttl_period_in_seconds;
named_value<sstring> alternator_describe_endpoints; named_value<sstring> alternator_describe_endpoints;
@@ -474,6 +469,8 @@ public:
named_value<uint32_t> alternator_max_expression_cache_entries_per_shard; named_value<uint32_t> alternator_max_expression_cache_entries_per_shard;
named_value<uint64_t> alternator_max_users_query_size_in_trace_output; named_value<uint64_t> alternator_max_users_query_size_in_trace_output;
named_value<sstring> vector_store_primary_uri;
named_value<bool> abort_on_ebadf; named_value<bool> abort_on_ebadf;
named_value<bool> sanitizer_report_backtrace; named_value<bool> sanitizer_report_backtrace;

View File

@@ -0,0 +1,602 @@
/*
* Modified by ScyllaDB
* Copyright (C) 2017-present ScyllaDB
*/
/*
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
*/
// Since Scylla 2.0, we use system tables whose schemas were introduced in
// Cassandra 3. If Scylla boots to find a data directory with system tables
// with older schemas - produced by pre-2.0 Scylla or by pre-3.0 Cassandra,
// we need to migrate these old tables to the new format.
//
// We provide here a function, db::legacy_schema_migrator::migrate(),
// for a one-time migration from old to new system tables. The function
// reads old system tables, write them back in the new format, and finally
// delete the old system tables. Scylla's main should call this function and
// wait for the returned future, before starting to serve the database.
#include <boost/iterator/filter_iterator.hpp>
#include <seastar/core/future-util.hh>
#include <seastar/util/log.hh>
#include <map>
#include <unordered_set>
#include <chrono>
#include "replica/database.hh"
#include "legacy_schema_migrator.hh"
#include "system_keyspace.hh"
#include "schema_tables.hh"
#include "schema/schema_builder.hh"
#include "service/storage_proxy.hh"
#include "utils/rjson.hh"
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "cql3/util.hh"
#include "cql3/statements/property_definitions.hh"
static seastar::logger mlogger("legacy_schema_migrator");
namespace db {
namespace legacy_schema_migrator {
// local data carriers
class migrator {
public:
static const std::unordered_set<sstring> legacy_schema_tables;
migrator(sharded<service::storage_proxy>& sp, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks, cql3::query_processor& qp)
: _sp(sp), _db(db), _sys_ks(sys_ks), _qp(qp) {
}
migrator(migrator&&) = default;
typedef db_clock::time_point time_point;
// TODO: we don't support triggers.
// this is a placeholder.
struct trigger {
time_point timestamp;
sstring name;
std::unordered_map<sstring, sstring> options;
};
struct table {
time_point timestamp;
schema_ptr metadata;
std::vector<trigger> triggers;
};
struct type {
time_point timestamp;
user_type metadata;
};
struct function {
time_point timestamp;
sstring ks_name;
sstring fn_name;
std::vector<sstring> arg_names;
std::vector<sstring> arg_types;
sstring return_type;
bool called_on_null_input;
sstring language;
sstring body;
};
struct aggregate {
time_point timestamp;
sstring ks_name;
sstring fn_name;
std::vector<sstring> arg_names;
std::vector<sstring> arg_types;
sstring return_type;
sstring final_func;
sstring initcond;
sstring state_func;
sstring state_type;
};
struct keyspace {
time_point timestamp;
sstring name;
bool durable_writes;
std::map<sstring, sstring> replication_params;
std::vector<table> tables;
std::vector<type> types;
std::vector<function> functions;
std::vector<aggregate> aggregates;
};
class unsupported_feature : public std::runtime_error {
public:
using runtime_error::runtime_error;
};
static sstring fmt_query(const char* fmt, const char* table) {
return fmt::format(fmt::runtime(fmt), db::system_keyspace::NAME, table);
}
typedef ::shared_ptr<cql3::untyped_result_set> result_set_type;
typedef const cql3::untyped_result_set::row row_type;
future<> read_table(keyspace& dst, sstring cf_name, time_point timestamp) {
auto fmt = "SELECT * FROM {}.{} WHERE keyspace_name = ? AND columnfamily_name = ?";
auto tq = fmt_query(fmt, db::system_keyspace::legacy::COLUMNFAMILIES);
auto cq = fmt_query(fmt, db::system_keyspace::legacy::COLUMNS);
auto zq = fmt_query(fmt, db::system_keyspace::legacy::TRIGGERS);
typedef std::tuple<future<result_set_type>, future<result_set_type>, future<result_set_type>, future<db::schema_tables::legacy::schema_mutations>> result_tuple;
return when_all(_qp.execute_internal(tq, { dst.name, cf_name }, cql3::query_processor::cache_internal::yes),
_qp.execute_internal(cq, { dst.name, cf_name }, cql3::query_processor::cache_internal::yes),
_qp.execute_internal(zq, { dst.name, cf_name }, cql3::query_processor::cache_internal::yes),
db::schema_tables::legacy::read_table_mutations(_sp, dst.name, cf_name, db::system_keyspace::legacy::column_families()))
.then([&dst, cf_name, timestamp](result_tuple&& t) {
result_set_type tables = std::get<0>(t).get();
result_set_type columns = std::get<1>(t).get();
result_set_type triggers = std::get<2>(t).get();
db::schema_tables::legacy::schema_mutations sm = std::get<3>(t).get();
row_type& td = tables->one();
auto ks_name = td.get_as<sstring>("keyspace_name");
auto cf_name = td.get_as<sstring>("columnfamily_name");
auto id = table_id(td.get_or("cf_id", generate_legacy_id(ks_name, cf_name).uuid()));
schema_builder builder(dst.name, cf_name, id);
builder.with_version(sm.digest());
cf_type cf = sstring_to_cf_type(td.get_or("type", sstring("standard")));
if (cf == cf_type::super) {
fail(unimplemented::cause::SUPER);
}
auto comparator = td.get_as<sstring>("comparator");
bool is_compound = cell_comparator::check_compound(comparator);
builder.set_is_compound(is_compound);
cell_comparator::read_collections(builder, comparator);
bool filter_sparse = false;
data_type default_validator = {};
if (td.has("default_validator")) {
default_validator = db::schema_tables::parse_type(td.get_as<sstring>("default_validator"));
if (default_validator->is_counter()) {
builder.set_is_counter(true);
}
builder.set_default_validation_class(default_validator);
}
/*
* Determine whether or not the table is *really* dense
* We cannot trust is_dense value of true (see CASSANDRA-11502, that fixed the issue for 2.2 only, and not retroactively),
* but we can trust is_dense value of false.
*/
auto is_dense = td.get_opt<bool>("is_dense");
if (!is_dense || *is_dense) {
is_dense = [&] {
/*
* As said above, this method is only here because we need to deal with thrift upgrades.
* Once a CF has been "upgraded", i.e. we've rebuilt and save its CQL3 metadata at least once,
* then we'll have saved the "is_dense" value and will be good to go.
*
* But non-upgraded thrift CF (and pre-7744 CF) will have no value for "is_dense", so we need
* to infer that information without relying on it in that case. And for the most part this is
* easy, a CF that has at least one REGULAR definition is not dense. But the subtlety is that not
* having a REGULAR definition may not mean dense because of CQL3 definitions that have only the
* PRIMARY KEY defined.
*
* So we need to recognize those special case CQL3 table with only a primary key. If we have some
* clustering columns, we're fine as said above. So the only problem is that we cannot decide for
* sure if a CF without REGULAR columns nor CLUSTERING_COLUMN definition is meant to be dense, or if it
* has been created in CQL3 by say:
* CREATE TABLE test (k int PRIMARY KEY)
* in which case it should not be dense. However, we can limit our margin of error by assuming we are
* in the latter case only if the comparator is exactly CompositeType(UTF8Type).
*/
std::optional<column_id> max_cl_idx;
const cql3::untyped_result_set::row * regular = nullptr;
for (auto& row : *columns) {
auto kind_str = row.get_as<sstring>("type");
if (kind_str == "compact_value") {
continue;
}
auto kind = db::schema_tables::deserialize_kind(kind_str);
if (kind == column_kind::regular_column) {
if (regular != nullptr) {
return false;
}
regular = &row;
continue;
}
if (kind == column_kind::clustering_key) {
max_cl_idx = std::max(column_id(row.get_or("component_index", 0)), max_cl_idx.value_or(column_id()));
}
}
auto is_cql3_only_pk_comparator = [](const sstring& comparator) {
if (!cell_comparator::check_compound(comparator)) {
return false;
}
// CMH. We don't have composites, nor a parser for it. This is a simple way of c
// checking the same.
auto comma = comparator.find(',');
if (comma != sstring::npos) {
return false;
}
auto off = comparator.find('(');
auto end = comparator.find(')');
return comparator.compare(off, end - off, utf8_type->name()) == 0;
};
if (max_cl_idx) {
auto n = std::count(comparator.begin(), comparator.end(), ','); // num comp - 1
return *max_cl_idx == n;
}
if (regular) {
return false;
}
return !is_cql3_only_pk_comparator(comparator);
}();
// now, if switched to sparse, remove redundant compact_value column and the last clustering column,
// directly copying CASSANDRA-11502 logic. See CASSANDRA-11315.
filter_sparse = !*is_dense;
}
builder.set_is_dense(*is_dense);
auto is_cql = !*is_dense && is_compound;
auto is_static_compact = !*is_dense && !is_compound;
// org.apache.cassandra.schema.LegacySchemaMigrator#isEmptyCompactValueColumn
auto is_empty_compact_value = [](const cql3::untyped_result_set::row& column_row) {
auto kind_str = column_row.get_as<sstring>("type");
// Cassandra only checks for "compact_value", but Scylla generates "regular" instead (#2586)
return (kind_str == "compact_value" || kind_str == "regular")
&& column_row.get_as<sstring>("column_name").empty();
};
for (auto& row : *columns) {
auto kind_str = row.get_as<sstring>("type");
auto kind = db::schema_tables::deserialize_kind(kind_str);
auto component_index = kind > column_kind::clustering_key ? 0 : column_id(row.get_or("component_index", 0));
auto name = row.get_or<sstring>("column_name", sstring());
auto validator = db::schema_tables::parse_type(row.get_as<sstring>("validator"));
if (is_empty_compact_value(row)) {
continue;
}
if (filter_sparse) {
if (kind_str == "compact_value") {
continue;
}
if (kind == column_kind::clustering_key) {
if (cf == cf_type::super && component_index != 0) {
continue;
}
if (cf != cf_type::super && !is_compound) {
continue;
}
}
}
std::optional<index_metadata_kind> index_kind;
sstring index_name;
index_options_map options;
if (row.has("index_type")) {
index_kind = schema_tables::deserialize_index_kind(row.get_as<sstring>("index_type"));
}
if (row.has("index_name")) {
index_name = row.get_as<sstring>("index_name");
}
if (row.has("index_options")) {
sstring index_options_str = row.get_as<sstring>("index_options");
options = rjson::parse_to_map<index_options_map>(std::string_view(index_options_str));
sstring type;
auto i = options.find("index_keys");
if (i != options.end()) {
options.erase(i);
type = "KEYS";
}
i = options.find("index_keys_and_values");
if (i != options.end()) {
options.erase(i);
type = "KEYS_AND_VALUES";
}
if (type.empty()) {
if (validator->is_collection() && validator->is_multi_cell()) {
type = "FULL";
} else {
type = "VALUES";
}
}
auto column = cql3::util::maybe_quote(name);
options["target"] = validator->is_collection()
? type + "(" + column + ")"
: column;
}
if (index_kind) {
// Origin assumes index_name is always set, so let's do the same
builder.with_index(index_metadata(index_name, options, *index_kind, index_metadata::is_local_index::no));
}
data_type column_name_type = [&] {
if (is_static_compact && kind == column_kind::regular_column) {
return db::schema_tables::parse_type(comparator);
}
return utf8_type;
}();
auto column_name = [&] {
try {
return column_name_type->from_string(name);
} catch (marshal_exception&) {
// #2597: Scylla < 2.0 writes names in serialized form, try to recover
column_name_type->validate(to_bytes_view(name));
return to_bytes(name);
}
}();
builder.with_column_ordered(column_definition(std::move(column_name), std::move(validator), kind, component_index));
}
if (is_static_compact) {
builder.set_regular_column_name_type(db::schema_tables::parse_type(comparator));
}
if (td.has("gc_grace_seconds")) {
builder.set_gc_grace_seconds(td.get_as<int32_t>("gc_grace_seconds"));
}
if (td.has("min_compaction_threshold")) {
builder.set_min_compaction_threshold(td.get_as<int32_t>("min_compaction_threshold"));
}
if (td.has("max_compaction_threshold")) {
builder.set_max_compaction_threshold(td.get_as<int32_t>("max_compaction_threshold"));
}
if (td.has("comment")) {
builder.set_comment(td.get_as<sstring>("comment"));
}
if (td.has("memtable_flush_period_in_ms")) {
builder.set_memtable_flush_period(td.get_as<int32_t>("memtable_flush_period_in_ms"));
}
if (td.has("caching")) {
builder.set_caching_options(caching_options::from_sstring(td.get_as<sstring>("caching")));
}
if (td.has("default_time_to_live")) {
builder.set_default_time_to_live(gc_clock::duration(td.get_as<int32_t>("default_time_to_live")));
}
if (td.has("speculative_retry")) {
builder.set_speculative_retry(td.get_as<sstring>("speculative_retry"));
}
if (td.has("compaction_strategy_class")) {
auto strategy = td.get_as<sstring>("compaction_strategy_class");
try {
builder.set_compaction_strategy(compaction::compaction_strategy::type(strategy));
} catch (const exceptions::configuration_exception& e) {
// If compaction strategy class isn't supported, fallback to incremental.
mlogger.warn("Falling back to incremental compaction strategy after the problem: {}", e.what());
builder.set_compaction_strategy(compaction::compaction_strategy_type::incremental);
}
}
if (td.has("compaction_strategy_options")) {
sstring strategy_options_str = td.get_as<sstring>("compaction_strategy_options");
builder.set_compaction_strategy_options(rjson::parse_to_map<std::map<sstring, sstring>>(std::string_view(strategy_options_str)));
}
auto comp_param = td.get_as<sstring>("compression_parameters");
compression_parameters cp(rjson::parse_to_map<std::map<sstring, sstring>>(std::string_view(comp_param)));
builder.set_compressor_params(cp);
if (td.has("min_index_interval")) {
builder.set_min_index_interval(td.get_as<int32_t>("min_index_interval"));
} else if (td.has("index_interval")) { // compatibility
builder.set_min_index_interval(td.get_as<int32_t>("index_interval"));
}
if (td.has("max_index_interval")) {
builder.set_max_index_interval(td.get_as<int32_t>("max_index_interval"));
}
if (td.has("bloom_filter_fp_chance")) {
builder.set_bloom_filter_fp_chance(td.get_as<double>("bloom_filter_fp_chance"));
} else {
builder.set_bloom_filter_fp_chance(builder.get_bloom_filter_fp_chance());
}
if (td.has("dropped_columns")) {
auto map = td.get_map<sstring, int64_t>("dropped_columns");
for (auto&& e : map) {
builder.without_column(e.first, api::timestamp_type(e.second));
};
}
// ignore version. we're transient
if (!triggers->empty()) {
throw unsupported_feature("triggers");
}
dst.tables.emplace_back(table{timestamp, builder.build() });
});
}
future<> read_tables(keyspace& dst) {
auto query = fmt_query("SELECT columnfamily_name, writeTime(type) AS timestamp FROM {}.{} WHERE keyspace_name = ?",
db::system_keyspace::legacy::COLUMNFAMILIES);
return _qp.execute_internal(query, {dst.name}, cql3::query_processor::cache_internal::yes).then([this, &dst](result_set_type result) {
return parallel_for_each(*result, [this, &dst](row_type& row) {
return read_table(dst, row.get_as<sstring>("columnfamily_name"), row.get_as<time_point>("timestamp"));
}).finally([result] {});
});
}
future<time_point> read_type_timestamp(keyspace& dst, sstring type_name) {
// TODO: Unfortunately there is not a single REGULAR column in system.schema_usertypes, so annoyingly we cannot
// use the writeTime() CQL function, and must resort to a lower level.
// Origin digs up the actual cells of target partition and gets timestamp from there.
// We should do the same, but g-dam that's messy. Lets give back dung value for now.
return make_ready_future<time_point>(dst.timestamp);
}
future<> read_types(keyspace& dst) {
auto query = fmt_query("SELECT * FROM {}.{} WHERE keyspace_name = ?", db::system_keyspace::legacy::USERTYPES);
return _qp.execute_internal(query, {dst.name}, cql3::query_processor::cache_internal::yes).then([this, &dst](result_set_type result) {
return parallel_for_each(*result, [this, &dst](row_type& row) {
auto name = row.get_blob_unfragmented("type_name");
auto columns = row.get_list<bytes>("field_names");
auto types = row.get_list<sstring>("field_types");
std::vector<data_type> field_types;
for (auto&& value : types) {
field_types.emplace_back(db::schema_tables::parse_type(value));
}
auto ut = user_type_impl::get_instance(dst.name, name, columns, field_types, false);
return read_type_timestamp(dst, value_cast<sstring>(utf8_type->deserialize(name))).then([ut = std::move(ut), &dst](time_point timestamp) {
dst.types.emplace_back(type{timestamp, ut});
});
}).finally([result] {});
});
}
future<> read_functions(keyspace& dst) {
auto query = fmt_query("SELECT * FROM {}.{} WHERE keyspace_name = ?", db::system_keyspace::legacy::FUNCTIONS);
return _qp.execute_internal(query, {dst.name}, cql3::query_processor::cache_internal::yes).then([](result_set_type result) {
if (!result->empty()) {
throw unsupported_feature("functions");
}
});
}
future<> read_aggregates(keyspace& dst) {
auto query = fmt_query("SELECT * FROM {}.{} WHERE keyspace_name = ?", db::system_keyspace::legacy::AGGREGATES);
return _qp.execute_internal(query, {dst.name}, cql3::query_processor::cache_internal::yes).then([](result_set_type result) {
if (!result->empty()) {
throw unsupported_feature("aggregates");
}
});
}
future<keyspace> read_keyspace(sstring ks_name, bool durable_writes, sstring strategy_class, sstring strategy_options, time_point timestamp) {
auto map = rjson::parse_to_map<std::map<sstring, sstring>>(std::string_view(strategy_options));
map.emplace("class", std::move(strategy_class));
auto ks = ::make_lw_shared<keyspace>(keyspace{timestamp, std::move(ks_name), durable_writes, std::move(map) });
return read_tables(*ks).then([this, ks] {
//Collection<Type> types = readTypes(keyspaceName);
return read_types(*ks);
}).then([this, ks] {
return read_functions(*ks);
}).then([this, ks] {
return read_aggregates(*ks);
}).then([ks] {
return make_ready_future<keyspace>(std::move(*ks));
});
}
future<> read_all_keyspaces() {
static auto ks_filter = [](row_type& row) {
auto ks_name = row.get_as<sstring>("keyspace_name");
return ks_name != db::system_keyspace::NAME && ks_name != db::schema_tables::v3::NAME;
};
auto query = fmt_query("SELECT keyspace_name, durable_writes, strategy_options, strategy_class, writeTime(durable_writes) AS timestamp FROM {}.{}",
db::system_keyspace::legacy::KEYSPACES);
return _qp.execute_internal(query, cql3::query_processor::cache_internal::yes).then([this](result_set_type result) {
auto i = boost::make_filter_iterator(ks_filter, result->begin(), result->end());
auto e = boost::make_filter_iterator(ks_filter, result->end(), result->end());
return parallel_for_each(i, e, [this](row_type& row) {
return read_keyspace(row.get_as<sstring>("keyspace_name")
, row.get_as<bool>("durable_writes")
, row.get_as<sstring>("strategy_class")
, row.get_as<sstring>("strategy_options")
, row.get_as<db_clock::time_point>("timestamp")
).then([this](keyspace ks) {
_keyspaces.emplace_back(std::move(ks));
});
}).finally([result] {});
});
}
future<> drop_legacy_tables() {
mlogger.info("Dropping legacy schema tables");
auto with_snapshot = !_keyspaces.empty();
for (const sstring& cfname : legacy_schema_tables) {
co_await replica::database::legacy_drop_table_on_all_shards(_db, _sys_ks, db::system_keyspace::NAME, cfname, with_snapshot);
}
}
future<> store_keyspaces_in_new_schema_tables() {
mlogger.info("Moving {} keyspaces from legacy schema tables to the new schema keyspace ({})",
_keyspaces.size(), db::schema_tables::v3::NAME);
utils::chunked_vector<mutation> mutations;
for (auto& ks : _keyspaces) {
auto ksm = ::make_lw_shared<keyspace_metadata>(ks.name
, ks.replication_params["class"] // TODO, make ksm like c3?
, cql3::statements::property_definitions::to_extended_map(ks.replication_params)
, std::nullopt
, std::nullopt
, ks.durable_writes);
// we want separate time stamps for tables/types, so cannot bulk them into the ksm.
for (auto&& m : db::schema_tables::make_create_keyspace_mutations(schema_features::full(), ksm, ks.timestamp.time_since_epoch().count(), false)) {
mutations.emplace_back(std::move(m));
}
for (auto& t : ks.tables) {
db::schema_tables::add_table_or_view_to_schema_mutation(t.metadata, t.timestamp.time_since_epoch().count(), true, mutations);
}
for (auto& t : ks.types) {
db::schema_tables::add_type_to_schema_mutation(t.metadata, t.timestamp.time_since_epoch().count(), mutations);
}
}
return _qp.proxy().mutate_locally(std::move(mutations), tracing::trace_state_ptr());
}
future<> flush_schemas() {
auto& db = _qp.db().real_database().container();
return replica::database::flush_tables_on_all_shards(db, db::schema_tables::all_table_infos(schema_features::full()));
}
future<> migrate() {
return read_all_keyspaces().then([this]() {
// write metadata to the new schema tables
return store_keyspaces_in_new_schema_tables()
.then(std::bind(&migrator::flush_schemas, this))
.then(std::bind(&migrator::drop_legacy_tables, this))
.then([] { mlogger.info("Completed migration of legacy schema tables"); });
});
}
sharded<service::storage_proxy>& _sp;
sharded<replica::database>& _db;
sharded<db::system_keyspace>& _sys_ks;
cql3::query_processor& _qp;
std::vector<keyspace> _keyspaces;
};
const std::unordered_set<sstring> migrator::legacy_schema_tables = {
db::system_keyspace::legacy::KEYSPACES,
db::system_keyspace::legacy::COLUMNFAMILIES,
db::system_keyspace::legacy::COLUMNS,
db::system_keyspace::legacy::TRIGGERS,
db::system_keyspace::legacy::USERTYPES,
db::system_keyspace::legacy::FUNCTIONS,
db::system_keyspace::legacy::AGGREGATES,
};
}
}
future<>
db::legacy_schema_migrator::migrate(sharded<service::storage_proxy>& sp, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks, cql3::query_processor& qp) {
return do_with(migrator(sp, db, sys_ks, qp), std::bind(&migrator::migrate, std::placeholders::_1));
}

View File

@@ -0,0 +1,37 @@
/*
* Modified by ScyllaDB
* Copyright (C) 2017-present ScyllaDB
*/
/*
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
*/
#pragma once
#include <seastar/core/future.hh>
#include <seastar/core/sharded.hh>
#include "seastarx.hh"
namespace replica {
class database;
}
namespace cql3 {
class query_processor;
}
namespace service {
class storage_proxy;
}
namespace db {
class system_keyspace;
namespace legacy_schema_migrator {
future<> migrate(sharded<service::storage_proxy>&, sharded<replica::database>& db, sharded<db::system_keyspace>& sys_ks, cql3::query_processor&);
}
}

View File

@@ -542,7 +542,6 @@ public:
// Returns the range tombstone for the key range adjacent to the cursor's position from the side of smaller keys. // Returns the range tombstone for the key range adjacent to the cursor's position from the side of smaller keys.
// Excludes the range for the row itself. That information is returned by range_tombstone_for_row(). // Excludes the range for the row itself. That information is returned by range_tombstone_for_row().
// It's possible that range_tombstone() is empty and range_tombstone_for_row() is not empty. // It's possible that range_tombstone() is empty and range_tombstone_for_row() is not empty.
// Note that this is different from the meaning of rows_entry::range_tombstone(), which includes the row itself.
tombstone range_tombstone() const { return _range_tombstone; } tombstone range_tombstone() const { return _range_tombstone; }
// Can be called when cursor is pointing at a row. // Can be called when cursor is pointing at a row.

View File

@@ -1287,15 +1287,6 @@ row_cache::row_cache(schema_ptr s, snapshot_source src, cache_tracker& tracker,
, _partitions(dht::raw_token_less_comparator{}) , _partitions(dht::raw_token_less_comparator{})
, _underlying(src()) , _underlying(src())
, _snapshot_source(std::move(src)) , _snapshot_source(std::move(src))
, _update_section(abstract_formatter([this] (fmt::context& ctx) {
fmt::format_to(ctx.out(), "cache.update {}.{}", _schema->ks_name(), _schema->cf_name());
}))
, _populate_section(abstract_formatter([this] (fmt::context& ctx) {
fmt::format_to(ctx.out(), "cache.populate {}.{}", _schema->ks_name(), _schema->cf_name());
}))
, _read_section(abstract_formatter([this] (fmt::context& ctx) {
fmt::format_to(ctx.out(), "cache.read {}.{}", _schema->ks_name(), _schema->cf_name());
}))
{ {
try { try {
with_allocator(_tracker.allocator(), [this, cont] { with_allocator(_tracker.allocator(), [this, cont] {

View File

@@ -404,7 +404,10 @@ const std::unordered_set<table_id>& schema_tables_holding_schema_mutations() {
computed_columns(), computed_columns(),
dropped_columns(), dropped_columns(),
indexes(), indexes(),
scylla_tables()}) { scylla_tables(),
db::system_keyspace::legacy::column_families(),
db::system_keyspace::legacy::columns(),
db::system_keyspace::legacy::triggers()}) {
SCYLLA_ASSERT(s->clustering_key_size() > 0); SCYLLA_ASSERT(s->clustering_key_size() > 0);
auto&& first_column_name = s->clustering_column_at(0).name_as_text(); auto&& first_column_name = s->clustering_column_at(0).name_as_text();
SCYLLA_ASSERT(first_column_name == "table_name" SCYLLA_ASSERT(first_column_name == "table_name"
@@ -2837,6 +2840,26 @@ void check_no_legacy_secondary_index_mv_schema(replica::database& db, const view
} }
namespace legacy {
table_schema_version schema_mutations::digest() const {
md5_hasher h;
const db::schema_features no_features;
db::schema_tables::feed_hash_for_schema_digest(h, _columnfamilies, no_features);
db::schema_tables::feed_hash_for_schema_digest(h, _columns, no_features);
return table_schema_version(utils::UUID_gen::get_name_UUID(h.finalize()));
}
future<schema_mutations> read_table_mutations(sharded<service::storage_proxy>& proxy,
sstring keyspace_name, sstring table_name, schema_ptr s)
{
mutation cf_m = co_await read_schema_partition_for_table(proxy, s, keyspace_name, table_name);
mutation col_m = co_await read_schema_partition_for_table(proxy, db::system_keyspace::legacy::columns(), keyspace_name, table_name);
co_return schema_mutations{std::move(cf_m), std::move(col_m)};
}
} // namespace legacy
static auto GET_COLUMN_MAPPING_QUERY = format("SELECT column_name, clustering_order, column_name_bytes, kind, position, type FROM system.{} WHERE cf_id = ? AND schema_version = ?", static auto GET_COLUMN_MAPPING_QUERY = format("SELECT column_name, clustering_order, column_name_bytes, kind, position, type FROM system.{} WHERE cf_id = ? AND schema_version = ?",
db::schema_tables::SCYLLA_TABLE_SCHEMA_HISTORY); db::schema_tables::SCYLLA_TABLE_SCHEMA_HISTORY);

View File

@@ -155,6 +155,24 @@ schema_ptr scylla_table_schema_history();
const std::unordered_set<table_id>& schema_tables_holding_schema_mutations(); const std::unordered_set<table_id>& schema_tables_holding_schema_mutations();
} }
namespace legacy {
class schema_mutations {
mutation _columnfamilies;
mutation _columns;
public:
schema_mutations(mutation columnfamilies, mutation columns)
: _columnfamilies(std::move(columnfamilies))
, _columns(std::move(columns))
{ }
table_schema_version digest() const;
};
future<schema_mutations> read_table_mutations(sharded<service::storage_proxy>& proxy,
sstring keyspace_name, sstring table_name, schema_ptr s);
}
struct qualified_name { struct qualified_name {
sstring keyspace_name; sstring keyspace_name;
sstring table_name; sstring table_name;

View File

@@ -766,6 +766,9 @@ schema_ptr system_keyspace::size_estimates() {
"partitions larger than specified threshold" "partitions larger than specified threshold"
); );
builder.set_gc_grace_seconds(0); builder.set_gc_grace_seconds(0);
// FIXME re-enable caching for this and the other two
// system.large_* tables once
// https://github.com/scylladb/scylla/issues/3288 is fixed
builder.set_caching_options(caching_options::get_disabled_caching_options()); builder.set_caching_options(caching_options::get_disabled_caching_options());
builder.with_hash_version(); builder.with_hash_version();
return builder.build(schema_builder::compact_storage::no); return builder.build(schema_builder::compact_storage::no);
@@ -847,6 +850,8 @@ schema_ptr system_keyspace::corrupt_data() {
return corrupt_data; return corrupt_data;
} }
static constexpr auto schema_gc_grace = std::chrono::duration_cast<std::chrono::seconds>(days(7)).count();
/*static*/ schema_ptr system_keyspace::scylla_local() { /*static*/ schema_ptr system_keyspace::scylla_local() {
static thread_local auto scylla_local = [] { static thread_local auto scylla_local = [] {
schema_builder builder(generate_legacy_id(NAME, SCYLLA_LOCAL), NAME, SCYLLA_LOCAL, schema_builder builder(generate_legacy_id(NAME, SCYLLA_LOCAL), NAME, SCYLLA_LOCAL,
@@ -1358,6 +1363,289 @@ schema_ptr system_keyspace::role_permissions() {
return schema; return schema;
} }
schema_ptr system_keyspace::legacy::hints() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, HINTS), NAME, HINTS,
// partition key
{{"target_id", uuid_type}},
// clustering key
{{"hint_id", timeuuid_type}, {"message_version", int32_type}},
// regular columns
{{"mutation", bytes_type}},
// static columns
{},
// regular column name type
utf8_type,
// comment
"*DEPRECATED* hints awaiting delivery"
);
builder.set_gc_grace_seconds(0);
builder.set_compaction_strategy(compaction::compaction_strategy_type::incremental);
builder.set_compaction_strategy_options({{"enabled", "false"}});
builder.with(schema_builder::compact_storage::yes);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::batchlog() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, BATCHLOG), NAME, BATCHLOG,
// partition key
{{"id", uuid_type}},
// clustering key
{},
// regular columns
{{"data", bytes_type}, {"version", int32_type}, {"written_at", timestamp_type}},
// static columns
{},
// regular column name type
utf8_type,
// comment
"*DEPRECATED* batchlog entries"
);
builder.set_gc_grace_seconds(0);
builder.set_compaction_strategy(compaction::compaction_strategy_type::incremental);
builder.set_compaction_strategy_options({{"min_threshold", "2"}});
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::keyspaces() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, KEYSPACES), NAME, KEYSPACES,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{},
// regular columns
{
{"durable_writes", boolean_type},
{"strategy_class", utf8_type},
{"strategy_options", utf8_type}
},
// static columns
{},
// regular column name type
utf8_type,
// comment
"*DEPRECATED* keyspace definitions"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::yes);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::column_families() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, COLUMNFAMILIES), NAME, COLUMNFAMILIES,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{{"columnfamily_name", utf8_type}},
// regular columns
{
{"bloom_filter_fp_chance", double_type},
{"caching", utf8_type},
{"cf_id", uuid_type},
{"comment", utf8_type},
{"compaction_strategy_class", utf8_type},
{"compaction_strategy_options", utf8_type},
{"comparator", utf8_type},
{"compression_parameters", utf8_type},
{"default_time_to_live", int32_type},
{"default_validator", utf8_type},
{"dropped_columns", map_type_impl::get_instance(utf8_type, long_type, true)},
{"gc_grace_seconds", int32_type},
{"is_dense", boolean_type},
{"key_validator", utf8_type},
{"max_compaction_threshold", int32_type},
{"max_index_interval", int32_type},
{"memtable_flush_period_in_ms", int32_type},
{"min_compaction_threshold", int32_type},
{"min_index_interval", int32_type},
{"speculative_retry", utf8_type},
{"subcomparator", utf8_type},
{"type", utf8_type},
// The following 4 columns are only present up until 2.1.8 tables
{"key_aliases", utf8_type},
{"value_alias", utf8_type},
{"column_aliases", utf8_type},
{"index_interval", int32_type},},
// static columns
{},
// regular column name type
utf8_type,
// comment
"*DEPRECATED* table definitions"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::columns() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, COLUMNS), NAME, COLUMNS,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{{"columnfamily_name", utf8_type}, {"column_name", utf8_type}},
// regular columns
{
{"component_index", int32_type},
{"index_name", utf8_type},
{"index_options", utf8_type},
{"index_type", utf8_type},
{"type", utf8_type},
{"validator", utf8_type},
},
// static columns
{},
// regular column name type
utf8_type,
// comment
"column definitions"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::triggers() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, TRIGGERS), NAME, TRIGGERS,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{{"columnfamily_name", utf8_type}, {"trigger_name", utf8_type}},
// regular columns
{
{"trigger_options", map_type_impl::get_instance(utf8_type, utf8_type, true)},
},
// static columns
{},
// regular column name type
utf8_type,
// comment
"trigger definitions"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::usertypes() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, USERTYPES), NAME, USERTYPES,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{{"type_name", utf8_type}},
// regular columns
{
{"field_names", list_type_impl::get_instance(utf8_type, true)},
{"field_types", list_type_impl::get_instance(utf8_type, true)},
},
// static columns
{},
// regular column name type
utf8_type,
// comment
"user defined type definitions"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::functions() {
/**
* Note: we have our own "legacy" version of this table (in schema_tables),
* but it is (afaik) not used, and differs slightly from the origin one.
* This is based on the origin schema, since we're more likely to encounter
* installations of that to migrate, rather than our own (if we dont use the table).
*/
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, FUNCTIONS), NAME, FUNCTIONS,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{{"function_name", utf8_type},{"signature", list_type_impl::get_instance(utf8_type, false)}},
// regular columns
{
{"argument_names", list_type_impl::get_instance(utf8_type, true)},
{"argument_types", list_type_impl::get_instance(utf8_type, true)},
{"body", utf8_type},
{"language", utf8_type},
{"return_type", utf8_type},
{"called_on_null_input", boolean_type},
},
// static columns
{},
// regular column name type
utf8_type,
// comment
"*DEPRECATED* user defined type definitions"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::legacy::aggregates() {
static thread_local auto schema = [] {
schema_builder builder(generate_legacy_id(NAME, AGGREGATES), NAME, AGGREGATES,
// partition key
{{"keyspace_name", utf8_type}},
// clustering key
{{"aggregate_name", utf8_type},{"signature", list_type_impl::get_instance(utf8_type, false)}},
// regular columns
{
{"argument_types", list_type_impl::get_instance(utf8_type, true)},
{"final_func", utf8_type},
{"initcond", bytes_type},
{"return_type", utf8_type},
{"state_func", utf8_type},
{"state_type", utf8_type},
},
// static columns
{},
// regular column name type
utf8_type,
// comment
"*DEPRECATED* user defined aggregate definition"
);
builder.set_gc_grace_seconds(schema_gc_grace);
builder.with(schema_builder::compact_storage::no);
builder.with_hash_version();
return builder.build();
}();
return schema;
}
schema_ptr system_keyspace::dicts() { schema_ptr system_keyspace::dicts() {
static thread_local auto schema = [] { static thread_local auto schema = [] {
auto id = generate_legacy_id(NAME, DICTS); auto id = generate_legacy_id(NAME, DICTS);
@@ -1379,7 +1667,7 @@ schema_ptr system_keyspace::view_building_tasks() {
.with_column("key", utf8_type, column_kind::partition_key) .with_column("key", utf8_type, column_kind::partition_key)
.with_column("id", timeuuid_type, column_kind::clustering_key) .with_column("id", timeuuid_type, column_kind::clustering_key)
.with_column("type", utf8_type) .with_column("type", utf8_type)
.with_column("aborted", boolean_type) .with_column("state", utf8_type)
.with_column("base_id", uuid_type) .with_column("base_id", uuid_type)
.with_column("view_id", uuid_type) .with_column("view_id", uuid_type)
.with_column("last_token", long_type) .with_column("last_token", long_type)
@@ -2330,6 +2618,13 @@ std::vector<schema_ptr> system_keyspace::all_tables(const db::config& cfg) {
if (cfg.check_experimental(db::experimental_features_t::feature::KEYSPACE_STORAGE_OPTIONS)) { if (cfg.check_experimental(db::experimental_features_t::feature::KEYSPACE_STORAGE_OPTIONS)) {
r.insert(r.end(), {sstables_registry()}); r.insert(r.end(), {sstables_registry()});
} }
// legacy schema
r.insert(r.end(), {
// TODO: once we migrate hints/batchlog and add converter
// legacy::hints(), legacy::batchlog(),
legacy::keyspaces(), legacy::column_families(),
legacy::columns(), legacy::triggers(), legacy::usertypes(),
legacy::functions(), legacy::aggregates(), });
return r; return r;
} }
@@ -2767,14 +3062,14 @@ future<mutation> system_keyspace::make_remove_view_build_status_on_host_mutation
static constexpr auto VIEW_BUILDING_KEY = "view_building"; static constexpr auto VIEW_BUILDING_KEY = "view_building";
future<db::view::building_tasks> system_keyspace::get_view_building_tasks() { future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
static const sstring query = format("SELECT id, type, aborted, base_id, view_id, last_token, host_id, shard FROM {}.{} WHERE key = '{}'", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY); static const sstring query = format("SELECT id, type, state, base_id, view_id, last_token, host_id, shard FROM {}.{} WHERE key = '{}'", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
using namespace db::view; using namespace db::view;
building_tasks tasks; building_tasks tasks;
co_await _qp.query_internal(query, [&] (const cql3::untyped_result_set_row& row) -> future<stop_iteration> { co_await _qp.query_internal(query, [&] (const cql3::untyped_result_set_row& row) -> future<stop_iteration> {
auto id = row.get_as<utils::UUID>("id"); auto id = row.get_as<utils::UUID>("id");
auto type = task_type_from_string(row.get_as<sstring>("type")); auto type = task_type_from_string(row.get_as<sstring>("type"));
auto aborted = row.get_as<bool>("aborted"); auto state = task_state_from_string(row.get_as<sstring>("state"));
auto base_id = table_id(row.get_as<utils::UUID>("base_id")); auto base_id = table_id(row.get_as<utils::UUID>("base_id"));
auto view_id = row.get_opt<utils::UUID>("view_id").transform([] (const utils::UUID& uuid) { return table_id(uuid); }); auto view_id = row.get_opt<utils::UUID>("view_id").transform([] (const utils::UUID& uuid) { return table_id(uuid); });
auto last_token = dht::token::from_int64(row.get_as<int64_t>("last_token")); auto last_token = dht::token::from_int64(row.get_as<int64_t>("last_token"));
@@ -2782,7 +3077,7 @@ future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
auto shard = unsigned(row.get_as<int32_t>("shard")); auto shard = unsigned(row.get_as<int32_t>("shard"));
locator::tablet_replica replica{host_id, shard}; locator::tablet_replica replica{host_id, shard};
view_building_task task{id, type, aborted, base_id, view_id, replica, last_token}; view_building_task task{id, type, state, base_id, view_id, replica, last_token};
switch (type) { switch (type) {
case db::view::view_building_task::task_type::build_range: case db::view::view_building_task::task_type::build_range:
@@ -2801,7 +3096,7 @@ future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
} }
future<mutation> system_keyspace::make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task) { future<mutation> system_keyspace::make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task) {
static const sstring stmt = format("INSERT INTO {}.{}(key, id, type, aborted, base_id, view_id, last_token, host_id, shard) VALUES ('{}', ?, ?, ?, ?, ?, ?, ?, ?)", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY); static const sstring stmt = format("INSERT INTO {}.{}(key, id, type, state, base_id, view_id, last_token, host_id, shard) VALUES ('{}', ?, ?, ?, ?, ?, ?, ?, ?)", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
using namespace db::view; using namespace db::view;
data_value_or_unset view_id = unset_value{}; data_value_or_unset view_id = unset_value{};
@@ -2812,7 +3107,7 @@ future<mutation> system_keyspace::make_view_building_task_mutation(api::timestam
view_id = data_value(task.view_id->uuid()); view_id = data_value(task.view_id->uuid());
} }
auto muts = co_await _qp.get_mutations_internal(stmt, internal_system_query_state(), ts, { auto muts = co_await _qp.get_mutations_internal(stmt, internal_system_query_state(), ts, {
task.id, task_type_to_sstring(task.type), task.aborted, task.id, task_type_to_sstring(task.type), task_state_to_sstring(task.state),
task.base_id.uuid(), view_id, dht::token::to_int64(task.last_token), task.base_id.uuid(), view_id, dht::token::to_int64(task.last_token),
task.replica.host.uuid(), int32_t(task.replica.shard) task.replica.host.uuid(), int32_t(task.replica.shard)
}); });
@@ -2822,6 +3117,18 @@ future<mutation> system_keyspace::make_view_building_task_mutation(api::timestam
co_return std::move(muts[0]); co_return std::move(muts[0]);
} }
future<mutation> system_keyspace::make_update_view_building_task_state_mutation(api::timestamp_type ts, utils::UUID id, db::view::view_building_task::task_state state) {
static const sstring stmt = format("UPDATE {}.{} SET state = ? WHERE key = '{}' AND id = ?", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
auto muts = co_await _qp.get_mutations_internal(stmt, internal_system_query_state(), ts, {
task_state_to_sstring(state), id
});
if (muts.size() != 1) {
on_internal_error(slogger, fmt::format("expected 1 mutation got {}", muts.size()));
}
co_return std::move(muts[0]);
}
future<mutation> system_keyspace::make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id) { future<mutation> system_keyspace::make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id) {
static const sstring stmt = format("DELETE FROM {}.{} WHERE key = '{}' AND id = ?", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY); static const sstring stmt = format("DELETE FROM {}.{} WHERE key = '{}' AND id = ?", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);

View File

@@ -241,6 +241,28 @@ public:
static schema_ptr cdc_local(); static schema_ptr cdc_local();
}; };
struct legacy {
static constexpr auto HINTS = "hints";
static constexpr auto BATCHLOG = "batchlog";
static constexpr auto KEYSPACES = "schema_keyspaces";
static constexpr auto COLUMNFAMILIES = "schema_columnfamilies";
static constexpr auto COLUMNS = "schema_columns";
static constexpr auto TRIGGERS = "schema_triggers";
static constexpr auto USERTYPES = "schema_usertypes";
static constexpr auto FUNCTIONS = "schema_functions";
static constexpr auto AGGREGATES = "schema_aggregates";
static schema_ptr keyspaces();
static schema_ptr column_families();
static schema_ptr columns();
static schema_ptr triggers();
static schema_ptr usertypes();
static schema_ptr functions();
static schema_ptr aggregates();
static schema_ptr hints();
static schema_ptr batchlog();
};
// Partition estimates for a given range of tokens. // Partition estimates for a given range of tokens.
struct range_estimates { struct range_estimates {
schema_ptr schema; schema_ptr schema;
@@ -554,6 +576,7 @@ public:
// system.view_building_tasks // system.view_building_tasks
future<db::view::building_tasks> get_view_building_tasks(); future<db::view::building_tasks> get_view_building_tasks();
future<mutation> make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task); future<mutation> make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task);
future<mutation> make_update_view_building_task_state_mutation(api::timestamp_type ts, utils::UUID id, db::view::view_building_task::task_state state);
future<mutation> make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id); future<mutation> make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id);
// system.scylla_local, view_building_processing_base key // system.scylla_local, view_building_processing_base key

View File

@@ -9,8 +9,6 @@
#include "query/query-result-reader.hh" #include "query/query-result-reader.hh"
#include "replica/database_fwd.hh" #include "replica/database_fwd.hh"
#include "db/timeout_clock.hh" #include "db/timeout_clock.hh"
#include <seastar/core/future.hh>
#include <seastar/core/gate.hh>
namespace service { namespace service {
class storage_proxy; class storage_proxy;
@@ -27,14 +25,8 @@ class delete_ghost_rows_visitor {
replica::table& _view_table; replica::table& _view_table;
schema_ptr _base_schema; schema_ptr _base_schema;
std::optional<partition_key> _view_pk; std::optional<partition_key> _view_pk;
db::timeout_semaphore _concurrency_semaphore;
seastar::gate _gate;
std::exception_ptr& _ex;
public: public:
delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration, size_t concurrency, std::exception_ptr& ex); delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration);
delete_ghost_rows_visitor(delete_ghost_rows_visitor&&) = default;
~delete_ghost_rows_visitor() noexcept;
void add_value(const column_definition& def, query::result_row_view::iterator_type& i) { void add_value(const column_definition& def, query::result_row_view::iterator_type& i) {
} }
@@ -53,9 +45,6 @@ public:
uint32_t accept_partition_end(const query::result_row_view& static_row) { uint32_t accept_partition_end(const query::result_row_view& static_row) {
return 0; return 0;
} }
private:
future<> do_accept_new_row(partition_key pk, clustering_key ck);
}; };
} //namespace db::view } //namespace db::view

View File

@@ -3597,7 +3597,7 @@ view_updating_consumer::view_updating_consumer(view_update_generator& gen, schem
}) })
{ } { }
delete_ghost_rows_visitor::delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration, size_t concurrency, std::exception_ptr& ex) delete_ghost_rows_visitor::delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration)
: _proxy(proxy) : _proxy(proxy)
, _state(state) , _state(state)
, _timeout_duration(timeout_duration) , _timeout_duration(timeout_duration)
@@ -3605,20 +3605,8 @@ delete_ghost_rows_visitor::delete_ghost_rows_visitor(service::storage_proxy& pro
, _view_table(_proxy.get_db().local().find_column_family(view)) , _view_table(_proxy.get_db().local().find_column_family(view))
, _base_schema(_proxy.get_db().local().find_schema(_view->view_info()->base_id())) , _base_schema(_proxy.get_db().local().find_schema(_view->view_info()->base_id()))
, _view_pk() , _view_pk()
, _concurrency_semaphore(concurrency)
, _ex(ex)
{} {}
delete_ghost_rows_visitor::~delete_ghost_rows_visitor() noexcept {
try {
_gate.close().get();
} catch (...) {
// Closing the gate should never throw, but if it does anyway, capture the exception.
_ex = std::current_exception();
}
}
void delete_ghost_rows_visitor::accept_new_partition(const partition_key& key, uint32_t row_count) { void delete_ghost_rows_visitor::accept_new_partition(const partition_key& key, uint32_t row_count) {
SCYLLA_ASSERT(thread::running_in_thread()); SCYLLA_ASSERT(thread::running_in_thread());
_view_pk = key; _view_pk = key;
@@ -3626,18 +3614,7 @@ void delete_ghost_rows_visitor::accept_new_partition(const partition_key& key, u
// Assumes running in seastar::thread // Assumes running in seastar::thread
void delete_ghost_rows_visitor::accept_new_row(const clustering_key& ck, const query::result_row_view& static_row, const query::result_row_view& row) { void delete_ghost_rows_visitor::accept_new_row(const clustering_key& ck, const query::result_row_view& static_row, const query::result_row_view& row) {
auto units = get_units(_concurrency_semaphore, 1).get(); auto view_exploded_pk = _view_pk->explode();
(void)seastar::try_with_gate(_gate, [this, pk = _view_pk.value(), units = std::move(units), ck] () mutable {
return do_accept_new_row(std::move(pk), std::move(ck)).then_wrapped([this, units = std::move(units)] (future<>&& f) mutable {
if (f.failed()) {
_ex = f.get_exception();
}
});
});
}
future<> delete_ghost_rows_visitor::do_accept_new_row(partition_key pk, clustering_key ck) {
auto view_exploded_pk = pk.explode();
auto view_exploded_ck = ck.explode(); auto view_exploded_ck = ck.explode();
std::vector<bytes> base_exploded_pk(_base_schema->partition_key_size()); std::vector<bytes> base_exploded_pk(_base_schema->partition_key_size());
std::vector<bytes> base_exploded_ck(_base_schema->clustering_key_size()); std::vector<bytes> base_exploded_ck(_base_schema->clustering_key_size());
@@ -3672,17 +3649,17 @@ future<> delete_ghost_rows_visitor::do_accept_new_row(partition_key pk, clusteri
_proxy.get_max_result_size(partition_slice), query::tombstone_limit(_proxy.get_tombstone_limit())); _proxy.get_max_result_size(partition_slice), query::tombstone_limit(_proxy.get_tombstone_limit()));
auto timeout = db::timeout_clock::now() + _timeout_duration; auto timeout = db::timeout_clock::now() + _timeout_duration;
service::storage_proxy::coordinator_query_options opts{timeout, _state.get_permit(), _state.get_client_state(), _state.get_trace_state()}; service::storage_proxy::coordinator_query_options opts{timeout, _state.get_permit(), _state.get_client_state(), _state.get_trace_state()};
auto base_qr = co_await _proxy.query(_base_schema, command, std::move(partition_ranges), db::consistency_level::ALL, opts); auto base_qr = _proxy.query(_base_schema, command, std::move(partition_ranges), db::consistency_level::ALL, opts).get();
query::result& result = *base_qr.query_result; query::result& result = *base_qr.query_result;
auto delete_ghost_row = [&]() -> future<> { auto delete_ghost_row = [&]() {
mutation m(_view, pk); mutation m(_view, *_view_pk);
auto& row = m.partition().clustered_row(*_view, ck); auto& row = m.partition().clustered_row(*_view, ck);
row.apply(tombstone(api::new_timestamp(), gc_clock::now())); row.apply(tombstone(api::new_timestamp(), gc_clock::now()));
timeout = db::timeout_clock::now() + _timeout_duration; timeout = db::timeout_clock::now() + _timeout_duration;
return _proxy.mutate({m}, db::consistency_level::ALL, timeout, _state.get_trace_state(), empty_service_permit(), db::allow_per_partition_rate_limit::no); _proxy.mutate({m}, db::consistency_level::ALL, timeout, _state.get_trace_state(), empty_service_permit(), db::allow_per_partition_rate_limit::no).get();
}; };
if (result.row_count().value_or(0) == 0) { if (result.row_count().value_or(0) == 0) {
co_await delete_ghost_row(); delete_ghost_row();
} else if (!view_key_cols_not_in_base_key.empty()) { } else if (!view_key_cols_not_in_base_key.empty()) {
if (result.row_count().value_or(0) != 1) { if (result.row_count().value_or(0) != 1) {
on_internal_error(vlogger, format("Got multiple base rows corresponding to a single view row when pruning {}.{}", _view->ks_name(), _view->cf_name())); on_internal_error(vlogger, format("Got multiple base rows corresponding to a single view row when pruning {}.{}", _view->ks_name(), _view->cf_name()));
@@ -3692,7 +3669,7 @@ future<> delete_ghost_rows_visitor::do_accept_new_row(partition_key pk, clusteri
for (const auto& [col_def, col_val] : view_key_cols_not_in_base_key) { for (const auto& [col_def, col_val] : view_key_cols_not_in_base_key) {
const data_value* base_val = base_row.get_data_value(col_def->name_as_text()); const data_value* base_val = base_row.get_data_value(col_def->name_as_text());
if (!base_val || base_val->is_null() || col_val != base_val->serialize_nonnull()) { if (!base_val || base_val->is_null() || col_val != base_val->serialize_nonnull()) {
co_await delete_ghost_row(); delete_ghost_row();
break; break;
} }
} }

View File

@@ -104,8 +104,6 @@ future<> view_building_coordinator::run() {
_vb_sm.event.broadcast(); _vb_sm.event.broadcast();
}); });
auto finished_tasks_gc_fiber = finished_task_gc_fiber();
while (!_as.abort_requested()) { while (!_as.abort_requested()) {
co_await utils::get_local_injector().inject("view_building_coordinator_pause_main_loop", utils::wait_for_message(std::chrono::minutes(2))); co_await utils::get_local_injector().inject("view_building_coordinator_pause_main_loop", utils::wait_for_message(std::chrono::minutes(2)));
if (utils::get_local_injector().enter("view_building_coordinator_skip_main_loop")) { if (utils::get_local_injector().enter("view_building_coordinator_skip_main_loop")) {
@@ -123,7 +121,12 @@ future<> view_building_coordinator::run() {
continue; continue;
} }
co_await work_on_view_building(std::move(*guard_opt)); auto started_new_work = co_await work_on_view_building(std::move(*guard_opt));
if (started_new_work) {
// If any tasks were started, do another iteration, so the coordinator can attach itself to the tasks (via RPC)
vbc_logger.debug("view building coordinator started new tasks, do next iteration without waiting for event");
continue;
}
co_await await_event(); co_await await_event();
} catch (...) { } catch (...) {
handle_coordinator_error(std::current_exception()); handle_coordinator_error(std::current_exception());
@@ -139,66 +142,6 @@ future<> view_building_coordinator::run() {
} }
} }
} }
co_await std::move(finished_tasks_gc_fiber);
}
future<> view_building_coordinator::finished_task_gc_fiber() {
static auto task_gc_interval = 200ms;
while (!_as.abort_requested()) {
try {
co_await clean_finished_tasks();
co_await sleep_abortable(task_gc_interval, _as);
} catch (abort_requested_exception&) {
vbc_logger.debug("view_building_coordinator::finished_task_gc_fiber got abort_requested_exception");
} catch (service::group0_concurrent_modification&) {
vbc_logger.info("view_building_coordinator::finished_task_gc_fiber got group0_concurrent_modification");
} catch (raft::request_aborted&) {
vbc_logger.debug("view_building_coordinator::finished_task_gc_fiber got raft::request_aborted");
} catch (service::term_changed_error&) {
vbc_logger.debug("view_building_coordinator::finished_task_gc_fiber notices term change {} -> {}", _term, _raft.get_current_term());
} catch (raft::commit_status_unknown&) {
vbc_logger.warn("view_building_coordinator::finished_task_gc_fiber got raft::commit_status_unknown");
} catch (...) {
vbc_logger.error("view_building_coordinator::finished_task_gc_fiber got error: {}", std::current_exception());
}
}
}
future<> view_building_coordinator::clean_finished_tasks() {
// Avoid acquiring a group0 operation if there are no tasks.
if (_finished_tasks.empty()) {
co_return;
}
auto guard = co_await start_operation();
auto lock = co_await get_unique_lock(_mutex);
if (!_vb_sm.building_state.currently_processed_base_table || std::ranges::all_of(_finished_tasks, [] (auto& e) { return e.second.empty(); })) {
co_return;
}
view_building_task_mutation_builder builder(guard.write_timestamp());
for (auto& [replica, tasks]: _finished_tasks) {
for (auto& task_id: tasks) {
// The task might be aborted in the meantime. In this case we cannot remove it because we need it to create a new task.
//
// TODO: When we're aborting a view building task (for instance due to tablet migration),
// we can look if we already finished it (check if it's in `_finished_tasks`).
// If yes, we can just remove it instead of aborting it.
auto task_opt = _vb_sm.building_state.get_task(*_vb_sm.building_state.currently_processed_base_table, replica, task_id);
if (task_opt && !task_opt->get().aborted) {
builder.del_task(task_id);
vbc_logger.debug("Removing finished task with ID: {}", task_id);
}
}
}
co_await commit_mutations(std::move(guard), {builder.build()}, "remove finished view building tasks");
for (auto& [_, tasks_set]: _finished_tasks) {
tasks_set.clear();
}
} }
future<std::optional<service::group0_guard>> view_building_coordinator::update_state(service::group0_guard guard) { future<std::optional<service::group0_guard>> view_building_coordinator::update_state(service::group0_guard guard) {
@@ -358,16 +301,18 @@ future<> view_building_coordinator::update_views_statuses(const service::group0_
} }
} }
future<> view_building_coordinator::work_on_view_building(service::group0_guard guard) { future<bool> view_building_coordinator::work_on_view_building(service::group0_guard guard) {
if (!_vb_sm.building_state.currently_processed_base_table) { if (!_vb_sm.building_state.currently_processed_base_table) {
vbc_logger.debug("No base table is selected, nothing to do."); vbc_logger.debug("No base table is selected, nothing to do.");
co_return; co_return false;
} }
// Acquire unique lock of `_finished_tasks` to ensure each replica has its own entry in it utils::chunked_vector<mutation> muts;
// and to select tasks for them. std::unordered_set<locator::tablet_replica> _remote_work_keys_to_erase;
auto lock = co_await get_unique_lock(_mutex);
for (auto& replica: get_replicas_with_tasks()) { for (auto& replica: get_replicas_with_tasks()) {
// Check whether the coordinator already waits for the remote work on the replica to be finished.
// If so: check if the work is done and and remove the shared_future, skip this replica otherwise.
bool skip_work_on_this_replica = false;
if (_remote_work.contains(replica)) { if (_remote_work.contains(replica)) {
if (!_remote_work[replica].available()) { if (!_remote_work[replica].available()) {
vbc_logger.debug("Replica {} is still doing work", replica); vbc_logger.debug("Replica {} is still doing work", replica);
@@ -375,7 +320,21 @@ future<> view_building_coordinator::work_on_view_building(service::group0_guard
} }
auto remote_results_opt = co_await _remote_work[replica].get_future(); auto remote_results_opt = co_await _remote_work[replica].get_future();
_remote_work.erase(replica); if (remote_results_opt) {
auto results_muts = co_await update_state_after_work_is_done(guard, replica, std::move(*remote_results_opt));
muts.insert(muts.end(), std::make_move_iterator(results_muts.begin()), std::make_move_iterator(results_muts.end()));
// If the replica successfully finished its work, we need to commit mutations generated above before selecting next task
skip_work_on_this_replica = !results_muts.empty();
}
// If there were no mutations for this replica, we can just remove the entry from `_remote_work` map
// and start new work in the same iteration.
// Otherwise, the entry needs to be removed after the mutations are committed successfully.
if (skip_work_on_this_replica) {
_remote_work_keys_to_erase.insert(replica);
} else {
_remote_work.erase(replica);
}
} }
const bool ignore_gossiper = utils::get_local_injector().enter("view_building_coordinator_ignore_gossiper"); const bool ignore_gossiper = utils::get_local_injector().enter("view_building_coordinator_ignore_gossiper");
@@ -384,16 +343,31 @@ future<> view_building_coordinator::work_on_view_building(service::group0_guard
continue; continue;
} }
if (!_finished_tasks.contains(replica)) { if (skip_work_on_this_replica) {
_finished_tasks.insert({replica, {}}); continue;
} }
if (auto todo_ids = select_tasks_for_replica(replica); !todo_ids.empty()) { if (auto already_started_ids = _vb_sm.building_state.get_started_tasks(*_vb_sm.building_state.currently_processed_base_table, replica); !already_started_ids.empty()) {
start_remote_worker(replica, std::move(todo_ids)); // If the replica has any task in `STARTED` state, attach the coordinator to the work.
attach_to_started_tasks(replica, std::move(already_started_ids));
} else if (auto todo_ids = select_tasks_for_replica(replica); !todo_ids.empty()) {
// If the replica has no started tasks and there are tasks to do, mark them as started.
// The coordinator will attach itself to the work in next iteration.
auto new_mutations = co_await start_tasks(guard, std::move(todo_ids));
muts.insert(muts.end(), std::make_move_iterator(new_mutations.begin()), std::make_move_iterator(new_mutations.end()));
} else { } else {
vbc_logger.debug("Nothing to do for replica {}", replica); vbc_logger.debug("Nothing to do for replica {}", replica);
} }
} }
if (!muts.empty()) {
co_await commit_mutations(std::move(guard), std::move(muts), "start view building tasks");
for (auto& key: _remote_work_keys_to_erase) {
_remote_work.erase(key);
}
co_return true;
}
co_return false;
} }
std::set<locator::tablet_replica> view_building_coordinator::get_replicas_with_tasks() { std::set<locator::tablet_replica> view_building_coordinator::get_replicas_with_tasks() {
@@ -416,7 +390,7 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
// Select only building tasks and return theirs ids // Select only building tasks and return theirs ids
auto filter_building_tasks = [] (const std::vector<view_building_task>& tasks) -> std::vector<utils::UUID> { auto filter_building_tasks = [] (const std::vector<view_building_task>& tasks) -> std::vector<utils::UUID> {
return tasks | std::views::filter([] (const view_building_task& t) { return tasks | std::views::filter([] (const view_building_task& t) {
return t.type == view_building_task::task_type::build_range && !t.aborted; return t.type == view_building_task::task_type::build_range && t.state == view_building_task::task_state::idle;
}) | std::views::transform([] (const view_building_task& t) { }) | std::views::transform([] (const view_building_task& t) {
return t.id; return t.id;
}) | std::ranges::to<std::vector>(); }) | std::ranges::to<std::vector>();
@@ -430,29 +404,7 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
} }
auto& tablet_map = _db.get_token_metadata().tablets().get_tablet_map(*_vb_sm.building_state.currently_processed_base_table); auto& tablet_map = _db.get_token_metadata().tablets().get_tablet_map(*_vb_sm.building_state.currently_processed_base_table);
auto tasks_by_last_token = _vb_sm.building_state.collect_tasks_by_last_token(*_vb_sm.building_state.currently_processed_base_table, replica); for (auto& [token, tasks]: _vb_sm.building_state.collect_tasks_by_last_token(*_vb_sm.building_state.currently_processed_base_table, replica)) {
// Remove completed tasks in `_finished_tasks` from `tasks_by_last_token`
auto it = tasks_by_last_token.begin();
while (it != tasks_by_last_token.end()) {
auto task_it = it->second.begin();
while (task_it != it->second.end()) {
if (_finished_tasks.at(replica).contains(task_it->id)) {
task_it = it->second.erase(task_it);
} else {
++task_it;
}
}
// Remove the entry from `tasks_by_last_token` if its vector is empty
if (it->second.empty()) {
it = tasks_by_last_token.erase(it);
} else {
++it;
}
}
for (auto& [token, tasks]: tasks_by_last_token) {
auto tid = tablet_map.get_tablet_id(token); auto tid = tablet_map.get_tablet_id(token);
if (tablet_map.get_tablet_transition_info(tid)) { if (tablet_map.get_tablet_transition_info(tid)) {
vbc_logger.debug("Tablet {} on replica {} is in transition.", tid, replica); vbc_logger.debug("Tablet {} on replica {} is in transition.", tid, replica);
@@ -464,7 +416,7 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
return building_tasks; return building_tasks;
} else { } else {
return tasks | std::views::filter([] (const view_building_task& t) { return tasks | std::views::filter([] (const view_building_task& t) {
return !t.aborted; return t.state == view_building_task::task_state::idle;
}) | std::views::transform([] (const view_building_task& t) { }) | std::views::transform([] (const view_building_task& t) {
return t.id; return t.id;
}) | std::ranges::to<std::vector>(); }) | std::ranges::to<std::vector>();
@@ -474,21 +426,32 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
return {}; return {};
} }
void view_building_coordinator::start_remote_worker(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks) { future<utils::chunked_vector<mutation>> view_building_coordinator::start_tasks(const service::group0_guard& guard, std::vector<utils::UUID> tasks) {
vbc_logger.info("Starting tasks {}", tasks);
utils::chunked_vector<mutation> muts;
for (auto& t: tasks) {
auto mut = co_await _sys_ks.make_update_view_building_task_state_mutation(guard.write_timestamp(), t, view_building_task::task_state::started);
muts.push_back(std::move(mut));
}
co_return muts;
}
void view_building_coordinator::attach_to_started_tasks(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks) {
vbc_logger.debug("Attaching to started tasks {} on replica {}", tasks, replica); vbc_logger.debug("Attaching to started tasks {} on replica {}", tasks, replica);
shared_future<std::optional<std::vector<utils::UUID>>> work = work_on_tasks(replica, std::move(tasks)); shared_future<std::optional<remote_work_results>> work = work_on_tasks(replica, std::move(tasks));
_remote_work.insert({replica, std::move(work)}); _remote_work.insert({replica, std::move(work)});
} }
future<std::optional<std::vector<utils::UUID>>> view_building_coordinator::work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks) { future<std::optional<view_building_coordinator::remote_work_results>> view_building_coordinator::work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks) {
constexpr auto backoff_duration = std::chrono::seconds(1); constexpr auto backoff_duration = std::chrono::seconds(1);
static thread_local logger::rate_limit rate_limit{backoff_duration}; static thread_local logger::rate_limit rate_limit{backoff_duration};
std::vector<utils::UUID> remote_results; std::vector<view_task_result> remote_results;
bool rpc_failed = false; bool rpc_failed = false;
try { try {
remote_results = co_await ser::view_rpc_verbs::send_work_on_view_building_tasks(&_messaging, replica.host, _as, _term, replica.shard, tasks); remote_results = co_await ser::view_rpc_verbs::send_work_on_view_building_tasks(&_messaging, replica.host, _as, tasks);
} catch (...) { } catch (...) {
vbc_logger.log(log_level::warn, rate_limit, "Work on tasks {} on replica {}, failed with error: {}", vbc_logger.log(log_level::warn, rate_limit, "Work on tasks {} on replica {}, failed with error: {}",
tasks, replica, std::current_exception()); tasks, replica, std::current_exception());
@@ -501,14 +464,44 @@ future<std::optional<std::vector<utils::UUID>>> view_building_coordinator::work_
co_return std::nullopt; co_return std::nullopt;
} }
// In `view_building_coordinator::work_on_view_building()` we made sure that, if (tasks.size() != remote_results.size()) {
// each replica has its own entry in the `_finished_tasks`, so now we can just take a shared lock on_internal_error(vbc_logger, fmt::format("Number of tasks ({}) and results ({}) do not match for replica {}", tasks.size(), remote_results.size(), replica));
// and insert its of finished tasks to this replica bucket as there is at most one instance of this method for each replica. }
auto lock = co_await get_shared_lock(_mutex);
_finished_tasks.at(replica).insert_range(remote_results);
remote_work_results results;
for (size_t i = 0; i < tasks.size(); ++i) {
results.push_back({tasks[i], remote_results[i]});
}
_vb_sm.event.broadcast(); _vb_sm.event.broadcast();
co_return remote_results; co_return results;
}
// Mark finished task as done (remove them from the table).
// Retry failed tasks if possible (if failed tasks wasn't aborted).
future<utils::chunked_vector<mutation>> view_building_coordinator::update_state_after_work_is_done(const service::group0_guard& guard, const locator::tablet_replica& replica, view_building_coordinator::remote_work_results results) {
vbc_logger.debug("Got results from replica {}: {}", replica, results);
utils::chunked_vector<mutation> muts;
for (auto& result: results) {
vbc_logger.info("Task {} was finished with result: {}", result.first, result.second);
if (!_vb_sm.building_state.currently_processed_base_table) {
continue;
}
// A task can be aborted by deleting it or by setting its state to `ABORTED`.
// If the task was aborted by changing the state,
// we shouldn't remove it here because it might be needed
// to generate updated after tablet operation (migration/resize)
// is finished.
auto task_opt = _vb_sm.building_state.get_task(*_vb_sm.building_state.currently_processed_base_table, replica, result.first);
if (task_opt && task_opt->get().state != view_building_task::task_state::aborted) {
// Otherwise, the task was completed successfully and we can remove it.
auto delete_mut = co_await _sys_ks.make_remove_view_building_task_mutation(guard.write_timestamp(), result.first);
muts.push_back(std::move(delete_mut));
}
}
co_return muts;
} }
future<> view_building_coordinator::stop() { future<> view_building_coordinator::stop() {
@@ -538,7 +531,7 @@ void view_building_coordinator::generate_tablet_migration_updates(utils::chunked
auto create_task_copy_on_pending_replica = [&] (const view_building_task& task) { auto create_task_copy_on_pending_replica = [&] (const view_building_task& task) {
auto new_id = builder.new_id(); auto new_id = builder.new_id();
builder.set_type(new_id, task.type) builder.set_type(new_id, task.type)
.set_aborted(new_id, false) .set_state(new_id, view_building_task::task_state::idle)
.set_base_id(new_id, task.base_id) .set_base_id(new_id, task.base_id)
.set_last_token(new_id, task.last_token) .set_last_token(new_id, task.last_token)
.set_replica(new_id, *trinfo.pending_replica); .set_replica(new_id, *trinfo.pending_replica);
@@ -606,7 +599,7 @@ void view_building_coordinator::generate_tablet_resize_updates(utils::chunked_ve
auto create_task_copy = [&] (const view_building_task& task, dht::token last_token) -> utils::UUID { auto create_task_copy = [&] (const view_building_task& task, dht::token last_token) -> utils::UUID {
auto new_id = builder.new_id(); auto new_id = builder.new_id();
builder.set_type(new_id, task.type) builder.set_type(new_id, task.type)
.set_aborted(new_id, false) .set_state(new_id, view_building_task::task_state::idle)
.set_base_id(new_id, task.base_id) .set_base_id(new_id, task.base_id)
.set_last_token(new_id, last_token) .set_last_token(new_id, last_token)
.set_replica(new_id, task.replica); .set_replica(new_id, task.replica);
@@ -675,7 +668,7 @@ void view_building_coordinator::abort_tasks(utils::chunked_vector<canonical_muta
auto abort_task_map = [&] (const task_map& task_map) { auto abort_task_map = [&] (const task_map& task_map) {
for (auto& [id, _]: task_map) { for (auto& [id, _]: task_map) {
vbc_logger.debug("Aborting task {}", id); vbc_logger.debug("Aborting task {}", id);
builder.set_aborted(id, true); builder.set_state(id, view_building_task::task_state::aborted);
} }
}; };
@@ -705,7 +698,7 @@ void abort_view_building_tasks(const view_building_state_machine& vb_sm,
for (auto& [id, task]: task_map) { for (auto& [id, task]: task_map) {
if (task.last_token == last_token) { if (task.last_token == last_token) {
vbc_logger.debug("Aborting task {}", id); vbc_logger.debug("Aborting task {}", id);
builder.set_aborted(id, true); builder.set_state(id, view_building_task::task_state::aborted);
} }
} }
}; };
@@ -721,10 +714,10 @@ void abort_view_building_tasks(const view_building_state_machine& vb_sm,
static void rollback_task_map(view_building_task_mutation_builder& builder, const task_map& task_map) { static void rollback_task_map(view_building_task_mutation_builder& builder, const task_map& task_map) {
for (auto& [id, task]: task_map) { for (auto& [id, task]: task_map) {
if (task.aborted) { if (task.state == view_building_task::task_state::aborted) {
auto new_id = builder.new_id(); auto new_id = builder.new_id();
builder.set_type(new_id, task.type) builder.set_type(new_id, task.type)
.set_aborted(new_id, false) .set_state(new_id, view_building_task::task_state::idle)
.set_base_id(new_id, task.base_id) .set_base_id(new_id, task.base_id)
.set_last_token(new_id, task.last_token) .set_last_token(new_id, task.last_token)
.set_replica(new_id, task.replica); .set_replica(new_id, task.replica);

View File

@@ -54,9 +54,9 @@ class view_building_coordinator : public service::endpoint_lifecycle_subscriber
const raft::term_t _term; const raft::term_t _term;
abort_source& _as; abort_source& _as;
std::unordered_map<locator::tablet_replica, shared_future<std::optional<std::vector<utils::UUID>>>> _remote_work;
shared_mutex _mutex; // guards `_finished_tasks` field using remote_work_results = std::vector<std::pair<utils::UUID, db::view::view_task_result>>;
std::unordered_map<locator::tablet_replica, std::unordered_set<utils::UUID>> _finished_tasks; std::unordered_map<locator::tablet_replica, shared_future<std::optional<remote_work_results>>> _remote_work;
public: public:
view_building_coordinator(replica::database& db, raft::server& raft, service::raft_group0& group0, view_building_coordinator(replica::database& db, raft::server& raft, service::raft_group0& group0,
@@ -86,11 +86,9 @@ private:
future<> commit_mutations(service::group0_guard guard, utils::chunked_vector<mutation> mutations, std::string_view description); future<> commit_mutations(service::group0_guard guard, utils::chunked_vector<mutation> mutations, std::string_view description);
void handle_coordinator_error(std::exception_ptr eptr); void handle_coordinator_error(std::exception_ptr eptr);
future<> finished_task_gc_fiber();
future<> clean_finished_tasks();
future<std::optional<service::group0_guard>> update_state(service::group0_guard guard); future<std::optional<service::group0_guard>> update_state(service::group0_guard guard);
future<> work_on_view_building(service::group0_guard guard); // Returns if any new tasks were started
future<bool> work_on_view_building(service::group0_guard guard);
future<> mark_view_build_status_started(const service::group0_guard& guard, table_id view_id, utils::chunked_vector<mutation>& out); future<> mark_view_build_status_started(const service::group0_guard& guard, table_id view_id, utils::chunked_vector<mutation>& out);
future<> mark_all_remaining_view_build_statuses_started(const service::group0_guard& guard, table_id base_id, utils::chunked_vector<mutation>& out); future<> mark_all_remaining_view_build_statuses_started(const service::group0_guard& guard, table_id base_id, utils::chunked_vector<mutation>& out);
@@ -99,8 +97,10 @@ private:
std::set<locator::tablet_replica> get_replicas_with_tasks(); std::set<locator::tablet_replica> get_replicas_with_tasks();
std::vector<utils::UUID> select_tasks_for_replica(locator::tablet_replica replica); std::vector<utils::UUID> select_tasks_for_replica(locator::tablet_replica replica);
void start_remote_worker(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks); future<utils::chunked_vector<mutation>> start_tasks(const service::group0_guard& guard, std::vector<utils::UUID> tasks);
future<std::optional<std::vector<utils::UUID>>> work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks); void attach_to_started_tasks(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks);
future<std::optional<remote_work_results>> work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks);
future<utils::chunked_vector<mutation>> update_state_after_work_is_done(const service::group0_guard& guard, const locator::tablet_replica& replica, remote_work_results results);
}; };
void abort_view_building_tasks(const db::view::view_building_state_machine& vb_sm, void abort_view_building_tasks(const db::view::view_building_state_machine& vb_sm,

View File

@@ -13,10 +13,10 @@ namespace db {
namespace view { namespace view {
view_building_task::view_building_task(utils::UUID id, task_type type, bool aborted, table_id base_id, std::optional<table_id> view_id, locator::tablet_replica replica, dht::token last_token) view_building_task::view_building_task(utils::UUID id, task_type type, task_state state, table_id base_id, std::optional<table_id> view_id, locator::tablet_replica replica, dht::token last_token)
: id(id) : id(id)
, type(type) , type(type)
, aborted(aborted) , state(state)
, base_id(base_id) , base_id(base_id)
, view_id(view_id) , view_id(view_id)
, replica(replica) , replica(replica)
@@ -49,6 +49,30 @@ seastar::sstring task_type_to_sstring(view_building_task::task_type type) {
} }
} }
view_building_task::task_state task_state_from_string(std::string_view str) {
if (str == "IDLE") {
return view_building_task::task_state::idle;
}
if (str == "STARTED") {
return view_building_task::task_state::started;
}
if (str == "ABORTED") {
return view_building_task::task_state::aborted;
}
throw std::runtime_error(fmt::format("Unknown view building task state: {}", str));
}
seastar::sstring task_state_to_sstring(view_building_task::task_state state) {
switch (state) {
case view_building_task::task_state::idle:
return "IDLE";
case view_building_task::task_state::started:
return "STARTED";
case view_building_task::task_state::aborted:
return "ABORTED";
}
}
std::optional<std::reference_wrapper<const view_building_task>> view_building_state::get_task(table_id base_id, locator::tablet_replica replica, utils::UUID id) const { std::optional<std::reference_wrapper<const view_building_task>> view_building_state::get_task(table_id base_id, locator::tablet_replica replica, utils::UUID id) const {
if (!tasks_state.contains(base_id) || !tasks_state.at(base_id).contains(replica)) { if (!tasks_state.contains(base_id) || !tasks_state.at(base_id).contains(replica)) {
return {}; return {};
@@ -127,6 +151,46 @@ std::map<dht::token, std::vector<view_building_task>> view_building_state::colle
return tasks; return tasks;
} }
// Returns all tasks for `_vb_sm.building_state.currently_processed_base_table` and `replica` with `STARTED` state.
std::vector<utils::UUID> view_building_state::get_started_tasks(table_id base_table_id, locator::tablet_replica replica) const {
if (!tasks_state.contains(base_table_id) || !tasks_state.at(base_table_id).contains(replica)) {
// No tasks for this replica
return {};
}
std::vector<view_building_task> tasks;
auto& replica_tasks = tasks_state.at(base_table_id).at(replica);
for (auto& [_, view_tasks]: replica_tasks.view_tasks) {
for (auto& [_, task]: view_tasks) {
if (task.state == view_building_task::task_state::started) {
tasks.push_back(task);
}
}
}
for (auto& [_, task]: replica_tasks.staging_tasks) {
if (task.state == view_building_task::task_state::started) {
tasks.push_back(task);
}
}
// All collected tasks should have the same: type, base_id and last_token,
// so they can be executed in the same view_building_worker::batch.
#ifdef SEASTAR_DEBUG
if (!tasks.empty()) {
auto& task = tasks.front();
for (auto& t: tasks) {
SCYLLA_ASSERT(task.type == t.type);
SCYLLA_ASSERT(task.base_id == t.base_id);
SCYLLA_ASSERT(task.last_token == t.last_token);
}
}
#endif
return tasks | std::views::transform([] (const view_building_task& t) {
return t.id;
}) | std::ranges::to<std::vector>();
}
} }
} }

View File

@@ -39,17 +39,28 @@ struct view_building_task {
process_staging, process_staging,
}; };
// When a task is created, it starts with `IDLE` state.
// Then, the view building coordinator will decide to do the task and it will
// set the state to `STARTED`.
// When a task is finished the entry is removed.
//
// If a task is in progress when a tablet operation (migration/resize) starts,
// the task's state is set to `ABORTED`.
enum class task_state {
idle,
started,
aborted,
};
utils::UUID id; utils::UUID id;
task_type type; task_type type;
bool aborted; task_state state;
table_id base_id; table_id base_id;
std::optional<table_id> view_id; // nullopt when task_type is `process_staging` std::optional<table_id> view_id; // nullopt when task_type is `process_staging`
locator::tablet_replica replica; locator::tablet_replica replica;
dht::token last_token; dht::token last_token;
view_building_task(utils::UUID id, task_type type, bool aborted, view_building_task(utils::UUID id, task_type type, task_state state,
table_id base_id, std::optional<table_id> view_id, table_id base_id, std::optional<table_id> view_id,
locator::tablet_replica replica, dht::token last_token); locator::tablet_replica replica, dht::token last_token);
}; };
@@ -81,6 +92,7 @@ struct view_building_state {
std::vector<std::reference_wrapper<const view_building_task>> get_tasks_for_host(table_id base_id, locator::host_id host) const; std::vector<std::reference_wrapper<const view_building_task>> get_tasks_for_host(table_id base_id, locator::host_id host) const;
std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id) const; std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id) const;
std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id, const locator::tablet_replica& replica) const; std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id, const locator::tablet_replica& replica) const;
std::vector<utils::UUID> get_started_tasks(table_id base_table_id, locator::tablet_replica replica) const;
}; };
// Represents global state of tablet-based views. // Represents global state of tablet-based views.
@@ -101,8 +113,18 @@ struct view_building_state_machine {
condition_variable event; condition_variable event;
}; };
struct view_task_result {
enum class command_status: uint8_t {
success = 0,
abort = 1,
};
db::view::view_task_result::command_status status;
};
view_building_task::task_type task_type_from_string(std::string_view str); view_building_task::task_type task_type_from_string(std::string_view str);
seastar::sstring task_type_to_sstring(view_building_task::task_type type); seastar::sstring task_type_to_sstring(view_building_task::task_type type);
view_building_task::task_state task_state_from_string(std::string_view str);
seastar::sstring task_state_to_sstring(view_building_task::task_state state);
} // namespace view_building } // namespace view_building
@@ -114,11 +136,17 @@ template <> struct fmt::formatter<db::view::view_building_task::task_type> : fmt
} }
}; };
template <> struct fmt::formatter<db::view::view_building_task::task_state> : fmt::formatter<string_view> {
auto format(db::view::view_building_task::task_state state, fmt::format_context& ctx) const {
return fmt::format_to(ctx.out(), "{}", db::view::task_state_to_sstring(state));
}
};
template <> struct fmt::formatter<db::view::view_building_task> : fmt::formatter<string_view> { template <> struct fmt::formatter<db::view::view_building_task> : fmt::formatter<string_view> {
auto format(db::view::view_building_task task, fmt::format_context& ctx) const { auto format(db::view::view_building_task task, fmt::format_context& ctx) const {
auto view_id = task.view_id ? fmt::to_string(*task.view_id) : "nullopt"; auto view_id = task.view_id ? fmt::to_string(*task.view_id) : "nullopt";
return fmt::format_to(ctx.out(), "view_building_task{{type: {}, aborted: {}, base_id: {}, view_id: {}, last_token: {}}}", return fmt::format_to(ctx.out(), "view_building_task{{type: {}, state: {}, base_id: {}, view_id: {}, last_token: {}}}",
task.type, task.aborted, task.base_id, view_id, task.last_token); task.type, task.state, task.base_id, view_id, task.last_token);
} }
}; };
@@ -133,3 +161,18 @@ template <> struct fmt::formatter<db::view::replica_tasks> : fmt::formatter<stri
return fmt::format_to(ctx.out(), "{{view_tasks: {}, staging_tasks: {}}}", replica_tasks.view_tasks, replica_tasks.staging_tasks); return fmt::format_to(ctx.out(), "{{view_tasks: {}, staging_tasks: {}}}", replica_tasks.view_tasks, replica_tasks.staging_tasks);
} }
}; };
template <> struct fmt::formatter<db::view::view_task_result> : fmt::formatter<string_view> {
auto format(db::view::view_task_result result, fmt::format_context& ctx) const {
std::string_view res;
switch (result.status) {
case db::view::view_task_result::command_status::success:
res = "success";
break;
case db::view::view_task_result::command_status::abort:
res = "abort";
break;
}
return format_to(ctx.out(), "{}", res);
}
};

View File

@@ -25,8 +25,8 @@ view_building_task_mutation_builder& view_building_task_mutation_builder::set_ty
_m.set_clustered_cell(get_ck(id), "type", data_value(task_type_to_sstring(type)), _ts); _m.set_clustered_cell(get_ck(id), "type", data_value(task_type_to_sstring(type)), _ts);
return *this; return *this;
} }
view_building_task_mutation_builder& view_building_task_mutation_builder::set_aborted(utils::UUID id, bool aborted) { view_building_task_mutation_builder& view_building_task_mutation_builder::set_state(utils::UUID id, db::view::view_building_task::task_state state) {
_m.set_clustered_cell(get_ck(id), "aborted", data_value(aborted), _ts); _m.set_clustered_cell(get_ck(id), "state", data_value(task_state_to_sstring(state)), _ts);
return *this; return *this;
} }
view_building_task_mutation_builder& view_building_task_mutation_builder::set_base_id(utils::UUID id, table_id base_id) { view_building_task_mutation_builder& view_building_task_mutation_builder::set_base_id(utils::UUID id, table_id base_id) {

View File

@@ -32,7 +32,7 @@ public:
static utils::UUID new_id(); static utils::UUID new_id();
view_building_task_mutation_builder& set_type(utils::UUID id, db::view::view_building_task::task_type type); view_building_task_mutation_builder& set_type(utils::UUID id, db::view::view_building_task::task_type type);
view_building_task_mutation_builder& set_aborted(utils::UUID id, bool aborted); view_building_task_mutation_builder& set_state(utils::UUID id, db::view::view_building_task::task_state state);
view_building_task_mutation_builder& set_base_id(utils::UUID id, table_id base_id); view_building_task_mutation_builder& set_base_id(utils::UUID id, table_id base_id);
view_building_task_mutation_builder& set_view_id(utils::UUID id, table_id view_id); view_building_task_mutation_builder& set_view_id(utils::UUID id, table_id view_id);
view_building_task_mutation_builder& set_last_token(utils::UUID id, dht::token last_token); view_building_task_mutation_builder& set_last_token(utils::UUID id, dht::token last_token);

View File

@@ -22,7 +22,6 @@
#include "replica/database.hh" #include "replica/database.hh"
#include "service/storage_proxy.hh" #include "service/storage_proxy.hh"
#include "service/raft/raft_group0_client.hh" #include "service/raft/raft_group0_client.hh"
#include "service/raft/raft_group0.hh"
#include "schema/schema_fwd.hh" #include "schema/schema_fwd.hh"
#include "idl/view.dist.hh" #include "idl/view.dist.hh"
#include "sstables/sstables.hh" #include "sstables/sstables.hh"
@@ -115,11 +114,11 @@ static locator::tablet_id get_sstable_tablet_id(const locator::tablet_map& table
return tablet_id; return tablet_id;
} }
view_building_worker::view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier, service::raft_group0& group0, view_update_generator& vug, netw::messaging_service& ms, view_building_state_machine& vbsm) view_building_worker::view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier, service::raft_group0_client& group0_client, view_update_generator& vug, netw::messaging_service& ms, view_building_state_machine& vbsm)
: _db(db) : _db(db)
, _sys_ks(sys_ks) , _sys_ks(sys_ks)
, _mnotifier(mnotifier) , _mnotifier(mnotifier)
, _group0(group0) , _group0_client(group0_client)
, _vug(vug) , _vug(vug)
, _messaging(ms) , _messaging(ms)
, _vb_state_machine(vbsm) , _vb_state_machine(vbsm)
@@ -146,7 +145,6 @@ future<> view_building_worker::drain() {
if (!_as.abort_requested()) { if (!_as.abort_requested()) {
_as.request_abort(); _as.request_abort();
} }
_state._mutex.broken();
_staging_sstables_mutex.broken(); _staging_sstables_mutex.broken();
_sstables_to_register_event.broken(); _sstables_to_register_event.broken();
if (this_shard_id() == 0) { if (this_shard_id() == 0) {
@@ -156,7 +154,8 @@ future<> view_building_worker::drain() {
co_await std::move(state_observer); co_await std::move(state_observer);
co_await _mnotifier.unregister_listener(this); co_await _mnotifier.unregister_listener(this);
} }
co_await _state.clear(); co_await _state.clear_state();
_state.state_updated_cv.broken();
co_await uninit_messaging_service(); co_await uninit_messaging_service();
} }
@@ -225,22 +224,22 @@ future<> view_building_worker::create_staging_sstable_tasks() {
utils::chunked_vector<canonical_mutation> cmuts; utils::chunked_vector<canonical_mutation> cmuts;
auto guard = co_await _group0.client().start_operation(_as); auto guard = co_await _group0_client.start_operation(_as);
auto my_host_id = _db.get_token_metadata().get_topology().my_host_id(); auto my_host_id = _db.get_token_metadata().get_topology().my_host_id();
for (auto& [table_id, sst_infos]: _sstables_to_register) { for (auto& [table_id, sst_infos]: _sstables_to_register) {
for (auto& sst_info: sst_infos) { for (auto& sst_info: sst_infos) {
view_building_task task { view_building_task task {
utils::UUID_gen::get_time_UUID(), view_building_task::task_type::process_staging, false, utils::UUID_gen::get_time_UUID(), view_building_task::task_type::process_staging, view_building_task::task_state::idle,
table_id, ::table_id{}, {my_host_id, sst_info.shard}, sst_info.last_token table_id, ::table_id{}, {my_host_id, sst_info.shard}, sst_info.last_token
}; };
auto mut = co_await _group0.client().sys_ks().make_view_building_task_mutation(guard.write_timestamp(), task); auto mut = co_await _group0_client.sys_ks().make_view_building_task_mutation(guard.write_timestamp(), task);
cmuts.emplace_back(std::move(mut)); cmuts.emplace_back(std::move(mut));
} }
} }
vbw_logger.debug("Creating {} process_staging view_building_tasks", cmuts.size()); vbw_logger.debug("Creating {} process_staging view_building_tasks", cmuts.size());
auto cmd = _group0.client().prepare_command(service::write_mutations{std::move(cmuts)}, guard, "create view building tasks"); auto cmd = _group0_client.prepare_command(service::write_mutations{std::move(cmuts)}, guard, "create view building tasks");
co_await _group0.client().add_entry(std::move(cmd), std::move(guard), _as); co_await _group0_client.add_entry(std::move(cmd), std::move(guard), _as);
// Move staging sstables from `_sstables_to_register` (on shard0) to `_staging_sstables` on corresponding shards. // Move staging sstables from `_sstables_to_register` (on shard0) to `_staging_sstables` on corresponding shards.
// Firstly reorgenize `_sstables_to_register` for easier movement. // Firstly reorgenize `_sstables_to_register` for easier movement.
@@ -341,16 +340,22 @@ future<> view_building_worker::run_view_building_state_observer() {
while (!_as.abort_requested()) { while (!_as.abort_requested()) {
bool sleep = false; bool sleep = false;
_state.some_batch_finished = false;
try { try {
vbw_logger.trace("view_building_state_observer() iteration"); vbw_logger.trace("view_building_state_observer() iteration");
auto read_apply_mutex_holder = co_await _group0.client().hold_read_apply_mutex(_as); auto read_apply_mutex_holder = co_await _group0_client.hold_read_apply_mutex(_as);
co_await update_built_views(); co_await update_built_views();
co_await check_for_aborted_tasks(); co_await update_building_state();
_as.check(); _as.check();
read_apply_mutex_holder.return_all(); read_apply_mutex_holder.return_all();
co_await _vb_state_machine.event.wait();
// A batch could finished its work while the worker was
// updating the state. In that case we should do another iteration.
if (!_state.some_batch_finished) {
co_await _vb_state_machine.event.wait();
}
} catch (abort_requested_exception&) { } catch (abort_requested_exception&) {
} catch (broken_condition_variable&) { } catch (broken_condition_variable&) {
} catch (...) { } catch (...) {
@@ -377,7 +382,7 @@ future<> view_building_worker::update_built_views() {
auto schema = _db.find_schema(table_id); auto schema = _db.find_schema(table_id);
return std::make_pair(schema->ks_name(), schema->cf_name()); return std::make_pair(schema->ks_name(), schema->cf_name());
}; };
auto& sys_ks = _group0.client().sys_ks(); auto& sys_ks = _group0_client.sys_ks();
std::set<std::pair<sstring, sstring>> built_views; std::set<std::pair<sstring, sstring>> built_views;
for (auto& [id, statuses]: _vb_state_machine.views_state.status_map) { for (auto& [id, statuses]: _vb_state_machine.views_state.status_map) {
@@ -406,35 +411,22 @@ future<> view_building_worker::update_built_views() {
} }
} }
// Must be executed on shard0 future<> view_building_worker::update_building_state() {
future<> view_building_worker::check_for_aborted_tasks() { co_await _state.update(*this);
return container().invoke_on_all([building_state = _vb_state_machine.building_state] (view_building_worker& vbw) -> future<> { co_await _state.finish_completed_tasks();
auto lock = co_await get_units(vbw._state._mutex, 1, vbw._as); _state.state_updated_cv.broadcast();
co_await vbw._state.update_processing_base_table(vbw._db, building_state, vbw._as); }
if (!vbw._state._batch) {
co_return;
}
auto my_host_id = vbw._db.get_token_metadata().get_topology().my_host_id(); bool view_building_worker::is_shard_free(shard_id shard) {
auto my_replica = locator::tablet_replica{my_host_id, this_shard_id()}; return !std::ranges::any_of(_state.tasks_map, [&shard] (auto& task_entry) {
auto tasks_map = vbw._state._batch->tasks; // Potentially, we'll remove elements from the map, so we need a copy to iterate over it return task_entry.second->replica.shard == shard && task_entry.second->state == view_building_worker::batch_state::in_progress;
for (auto& [id, t]: tasks_map) {
auto task_opt = building_state.get_task(t.base_id, my_replica, id);
if (!task_opt || task_opt->get().aborted) {
co_await vbw._state._batch->abort_task(id);
}
}
if (vbw._state._batch->tasks.empty()) {
co_await vbw._state.clean_up_after_batch();
}
}); });
} }
void view_building_worker::init_messaging_service() { void view_building_worker::init_messaging_service() {
ser::view_rpc_verbs::register_work_on_view_building_tasks(&_messaging, [this] (raft::term_t term, shard_id shard, std::vector<utils::UUID> ids) -> future<std::vector<utils::UUID>> { ser::view_rpc_verbs::register_work_on_view_building_tasks(&_messaging, [this] (std::vector<utils::UUID> ids) -> future<std::vector<view_task_result>> {
return container().invoke_on(shard, [term, ids = std::move(ids)] (auto& vbw) mutable -> future<std::vector<utils::UUID>> { return container().invoke_on(0, [ids = std::move(ids)] (view_building_worker& vbw) mutable -> future<std::vector<view_task_result>> {
return vbw.work_on_tasks(term, std::move(ids)); return vbw.work_on_tasks(std::move(ids));
}); });
}); });
} }
@@ -443,53 +435,236 @@ future<> view_building_worker::uninit_messaging_service() {
return ser::view_rpc_verbs::unregister(&_messaging); return ser::view_rpc_verbs::unregister(&_messaging);
} }
future<std::vector<view_task_result>> view_building_worker::work_on_tasks(std::vector<utils::UUID> ids) {
vbw_logger.debug("Got request for results of tasks: {}", ids);
auto guard = co_await _group0_client.start_operation(_as, service::raft_timeout{});
auto processing_base_table = _state.processing_base_table;
auto are_tasks_finished = [&] () {
return std::ranges::all_of(ids, [this] (const utils::UUID& id) {
return _state.finished_tasks.contains(id) || _state.aborted_tasks.contains(id);
});
};
auto get_results = [&] () -> std::vector<view_task_result> {
std::vector<view_task_result> results;
for (const auto& id: ids) {
if (_state.finished_tasks.contains(id)) {
results.emplace_back(view_task_result::command_status::success);
} else if (_state.aborted_tasks.contains(id)) {
results.emplace_back(view_task_result::command_status::abort);
} else {
// This means that the task was aborted. Throw an error,
// so the coordinator will refresh its state and retry without aborted IDs.
throw std::runtime_error(fmt::format("No status for task {}", id));
}
}
return results;
};
if (are_tasks_finished()) {
// If the batch is already finished, we can return the results immediately.
vbw_logger.debug("Batch with tasks {} is already finished, returning results", ids);
co_return get_results();
}
// All of the tasks should be executed in the same batch
// (their statuses are set to started in the same group0 operation).
// If any ID is not present in the `tasks_map`, it means that it was aborted and we should fail this RPC call,
// so the coordinator can retry without aborted IDs.
// That's why we can identify the batch by random (.front()) ID from the `ids` vector.
auto id = ids.front();
while (!_state.tasks_map.contains(id) && processing_base_table == _state.processing_base_table) {
vbw_logger.warn("Batch with task {} is not found in tasks map, waiting until worker updates its state", id);
service::release_guard(std::move(guard));
co_await _state.state_updated_cv.wait();
guard = co_await _group0_client.start_operation(_as, service::raft_timeout{});
}
if (processing_base_table != _state.processing_base_table) {
// If the processing base table was changed, we should fail this RPC call because the tasks were aborted.
throw std::runtime_error(fmt::format("Processing base table was changed to {} ", _state.processing_base_table));
}
// Validate that any of the IDs wasn't aborted.
for (const auto& tid: ids) {
if (!_state.tasks_map[id]->tasks.contains(tid)) {
vbw_logger.warn("Task {} is not found in the batch", tid);
throw std::runtime_error(fmt::format("Task {} is not found in the batch", tid));
}
}
if (_state.tasks_map[id]->state == view_building_worker::batch_state::idle) {
vbw_logger.debug("Starting batch with tasks {}", _state.tasks_map[id]->tasks);
if (!is_shard_free(_state.tasks_map[id]->replica.shard)) {
throw std::runtime_error(fmt::format("Tried to start view building tasks ({}) on shard {} but the shard is busy", _state.tasks_map[id]->tasks, _state.tasks_map[id]->replica.shard, _state.tasks_map[id]->tasks));
}
_state.tasks_map[id]->start();
}
service::release_guard(std::move(guard));
while (!_as.abort_requested()) {
auto read_apply_mutex_holder = co_await _group0_client.hold_read_apply_mutex(_as);
if (are_tasks_finished()) {
co_return get_results();
}
// Check if the batch is still alive
if (!_state.tasks_map.contains(id)) {
throw std::runtime_error(fmt::format("Batch with task {} is not found in tasks map anymore.", id));
}
read_apply_mutex_holder.return_all();
co_await _state.tasks_map[id]->batch_done_cv.wait();
}
throw std::runtime_error("View building worker was aborted");
}
// Validates if the task can be executed in a batch on the same shard.
static bool validate_can_be_one_batch(const view_building_task& t1, const view_building_task& t2) {
return t1.type == t2.type && t1.base_id == t2.base_id && t1.replica == t2.replica && t1.last_token == t2.last_token;
}
static std::unordered_set<table_id> get_ids_of_all_views(replica::database& db, table_id table_id) { static std::unordered_set<table_id> get_ids_of_all_views(replica::database& db, table_id table_id) {
return db.find_column_family(table_id).views() | std::views::transform([] (view_ptr vptr) { return db.find_column_family(table_id).views() | std::views::transform([] (view_ptr vptr) {
return vptr->id(); return vptr->id();
}) | std::ranges::to<std::unordered_set>();; }) | std::ranges::to<std::unordered_set>();;
} }
// If `state::processing_base_table` is diffrent that the `view_building_state::currently_processed_base_table`, future<> view_building_worker::local_state::flush_table(view_building_worker& vbw, table_id table_id) {
// clear the state, save and flush new base table // `table_id` should point to currently processing base table but
future<> view_building_worker::state::update_processing_base_table(replica::database& db, const view_building_state& building_state, abort_source& as) { // `view_building_worker::local_state::processing_base_table` may not be set to it yet,
if (processing_base_table != building_state.currently_processed_base_table) { // so we need to pass it directly
co_await clear(); co_await vbw.container().invoke_on_all([table_id] (view_building_worker& local_vbw) -> future<> {
if (building_state.currently_processed_base_table) { auto base_cf = local_vbw._db.find_column_family(table_id).shared_from_this();
co_await flush_base_table(db, *building_state.currently_processed_base_table, as); co_await when_all(base_cf->await_pending_writes(), base_cf->await_pending_streams());
co_await flush_base(base_cf, local_vbw._as);
});
flushed_views = get_ids_of_all_views(vbw._db, table_id);
}
future<> view_building_worker::local_state::update(view_building_worker& vbw) {
const auto& vb_state = vbw._vb_state_machine.building_state;
// Check if the base table to process was changed.
// If so, we clear the state, aborting tasks for previous base table and starting new ones for the new base table.
if (processing_base_table != vb_state.currently_processed_base_table) {
co_await clear_state();
if (vb_state.currently_processed_base_table) {
// When we start to process new base table, we need to flush its current data, so we can build the view.
co_await flush_table(vbw, *vb_state.currently_processed_base_table);
} }
processing_base_table = building_state.currently_processed_base_table;
}
}
// If `_batch` ptr points to valid object, co_await its `work` future, save completed tasks and delete the object processing_base_table = vb_state.currently_processed_base_table;
future<> view_building_worker::state::clean_up_after_batch() { vbw_logger.info("Processing base table was changed to: {}", processing_base_table);
if (_batch) { }
co_await std::move(_batch->work);
for (auto& [id, _]: _batch->tasks) { if (!processing_base_table) {
completed_tasks.insert(id); vbw_logger.debug("No base table is selected to be processed.");
co_return;
}
std::vector<table_id> new_views;
auto all_view_ids = get_ids_of_all_views(vbw._db, *processing_base_table);
std::ranges::set_difference(all_view_ids, flushed_views, std::back_inserter(new_views));
if (!new_views.empty()) {
// Flush base table again in any new view was created, so the view building tasks will see up-to-date sstables.
// Otherwise, we may lose mutations created after previous flush but before the new view was created.
co_await flush_table(vbw, *processing_base_table);
}
auto erm = vbw._db.find_column_family(*processing_base_table).get_effective_replication_map();
auto my_host_id = erm->get_topology().my_host_id();
auto current_tasks_for_this_host = vb_state.get_tasks_for_host(*processing_base_table, my_host_id);
// scan view building state, collect alive and new (in STARTED state but not started by this worker) tasks
std::unordered_map<shard_id, std::vector<view_building_task>> new_tasks;
std::unordered_set<utils::UUID> alive_tasks; // save information about alive tasks to cleanup done/aborted ones
for (auto& task_ref: current_tasks_for_this_host) {
auto& task = task_ref.get();
auto id = task.id;
if (task.state != view_building_task::task_state::aborted) {
alive_tasks.insert(id);
}
if (tasks_map.contains(id) || finished_tasks.contains(id)) {
continue;
}
else if (task.state == view_building_task::task_state::started) {
auto shard = task.replica.shard;
if (new_tasks.contains(shard) && !validate_can_be_one_batch(new_tasks[shard].front(), task)) {
// Currently we allow only one batch per shard at a time
on_internal_error(vbw_logger, fmt::format("Got not-compatible tasks for the same shard. Task: {}, other: {}", new_tasks[shard].front(), task));
}
new_tasks[shard].push_back(task);
}
co_await coroutine::maybe_yield();
}
auto tasks_map_copy = tasks_map;
// Clear aborted tasks from tasks_map
for (auto it = tasks_map_copy.begin(); it != tasks_map_copy.end();) {
if (!alive_tasks.contains(it->first)) {
vbw_logger.debug("Aborting task {}", it->first);
aborted_tasks.insert(it->first);
co_await it->second->abort_task(it->first);
it = tasks_map_copy.erase(it);
} else {
++it;
}
}
// Create batches for new tasks
for (const auto& [shard, shard_tasks]: new_tasks) {
auto tasks = shard_tasks | std::views::transform([] (const view_building_task& t) {
return std::make_pair(t.id, t);
}) | std::ranges::to<std::unordered_map>();
auto batch = seastar::make_shared<view_building_worker::batch>(vbw.container(), tasks, shard_tasks.front().base_id, shard_tasks.front().replica);
for (auto& [id, _]: tasks) {
tasks_map_copy.insert({id, batch});
}
co_await coroutine::maybe_yield();
}
tasks_map = std::move(tasks_map_copy);
}
future<> view_building_worker::local_state::finish_completed_tasks() {
for (auto it = tasks_map.begin(); it != tasks_map.end();) {
if (it->second->state == view_building_worker::batch_state::idle) {
++it;
} else if (it->second->state == view_building_worker::batch_state::in_progress) {
vbw_logger.debug("Task {} is still in progress", it->first);
++it;
} else {
co_await it->second->work.get_future();
finished_tasks.insert(it->first);
vbw_logger.info("Task {} was completed", it->first);
it->second->batch_done_cv.broadcast();
it = tasks_map.erase(it);
} }
_batch = nullptr;
} }
} }
// Flush base table, set is as currently processing base table and save which views exist at the time of flush future<> view_building_worker::local_state::clear_state() {
future<> view_building_worker::state::flush_base_table(replica::database& db, table_id base_table_id, abort_source& as) { for (auto& [_, batch]: tasks_map) {
auto cf = db.find_column_family(base_table_id).shared_from_this(); co_await batch->abort();
co_await when_all(cf->await_pending_writes(), cf->await_pending_streams());
co_await flush_base(cf, as);
processing_base_table = base_table_id;
flushed_views = get_ids_of_all_views(db, base_table_id);
}
future<> view_building_worker::state::clear() {
if (_batch) {
_batch->as.request_abort();
co_await std::move(_batch->work);
_batch = nullptr;
} }
processing_base_table.reset(); processing_base_table.reset();
completed_tasks.clear();
flushed_views.clear(); flushed_views.clear();
tasks_map.clear();
finished_tasks.clear();
aborted_tasks.clear();
state_updated_cv.broadcast();
some_batch_finished = false;
vbw_logger.debug("View building worker state was cleared.");
} }
view_building_worker::batch::batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica) view_building_worker::batch::batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica)
@@ -499,12 +674,17 @@ view_building_worker::batch::batch(sharded<view_building_worker>& vbw, std::unor
, _vbw(vbw) {} , _vbw(vbw) {}
void view_building_worker::batch::start() { void view_building_worker::batch::start() {
if (this_shard_id() != replica.shard) { if (this_shard_id() != 0) {
on_internal_error(vbw_logger, "view_building_worker::batch should be started on replica shard"); on_internal_error(vbw_logger, "view_building_worker::batch should be started on shard0");
} }
work = do_work().finally([this] { state = batch_state::in_progress;
promise.set_value(); work = smp::submit_to(replica.shard, [this] () -> future<> {
return do_work();
}).finally([this] () {
state = batch_state::finished;
_vbw.local()._state.some_batch_finished = true;
_vbw.local()._vb_state_machine.event.broadcast();
}); });
} }
@@ -519,6 +699,10 @@ future<> view_building_worker::batch::abort() {
co_await smp::submit_to(replica.shard, [this] () { co_await smp::submit_to(replica.shard, [this] () {
as.request_abort(); as.request_abort();
}); });
if (work.valid()) {
co_await work.get_future();
}
} }
future<> view_building_worker::batch::do_work() { future<> view_building_worker::batch::do_work() {
@@ -712,124 +896,6 @@ void view_building_worker::cleanup_staging_sstables(locator::effective_replicati
_staging_sstables[table_id].erase(first, last); _staging_sstables[table_id].erase(first, last);
} }
future<view_building_state> view_building_worker::get_latest_view_building_state(raft::term_t term) {
return smp::submit_to(0, [&sharded_vbw = container(), term] () -> future<view_building_state> {
auto& vbw = sharded_vbw.local();
// auto guard = vbw._group0.client().start_operation(vbw._as);
auto& raft_server = vbw._group0.group0_server();
auto group0_holder = vbw._group0.hold_group0_gate();
co_await raft_server.read_barrier(&vbw._as);
if (raft_server.get_current_term() != term) {
throw std::runtime_error(fmt::format("Invalid raft term. Got {} but current term is {}", term, raft_server.get_current_term()));
}
co_return vbw._vb_state_machine.building_state;
});
}
future<std::vector<utils::UUID>> view_building_worker::work_on_tasks(raft::term_t term, std::vector<utils::UUID> ids) {
auto collect_completed_tasks = [&] {
std::vector<utils::UUID> completed;
for (auto& id: ids) {
if (_state.completed_tasks.contains(id)) {
completed.push_back(id);
}
}
return completed;
};
auto lock = co_await get_units(_state._mutex, 1, _as);
// Firstly check if there is any batch that is finished but wasn't cleaned up.
if (_state._batch && _state._batch->promise.available()) {
co_await _state.clean_up_after_batch();
}
// Check if tasks were already completed.
// If only part of the tasks were finished, return the subset and don't execute the remaining tasks.
std::vector<utils::UUID> completed = collect_completed_tasks();
if (!completed.empty()) {
co_return completed;
}
lock.return_all();
auto building_state = co_await get_latest_view_building_state(term);
lock = co_await get_units(_state._mutex, 1, _as);
co_await _state.update_processing_base_table(_db, building_state, _as);
// If there is no running batch, create it.
if (!_state._batch) {
if (!_state.processing_base_table) {
throw std::runtime_error("view_building_worker::state::processing_base_table needs to be set to work on view building");
}
auto my_host_id = _db.get_token_metadata().get_topology().my_host_id();
auto my_replica = locator::tablet_replica{my_host_id, this_shard_id()};
std::unordered_map<utils::UUID, view_building_task> tasks;
for (auto& id: ids) {
auto task_opt = building_state.get_task(*_state.processing_base_table, my_replica, id);
if (!task_opt) {
throw std::runtime_error(fmt::format("Task {} was not found for base table {} on replica {}", id, *building_state.currently_processed_base_table, my_replica));
}
tasks.insert({id, *task_opt});
}
#ifdef SEASTAR_DEBUG
auto& some_task = tasks.begin()->second;
for (auto& [_, t]: tasks) {
SCYLLA_ASSERT(t.base_id == some_task.base_id);
SCYLLA_ASSERT(t.last_token == some_task.last_token);
SCYLLA_ASSERT(t.replica == some_task.replica);
SCYLLA_ASSERT(t.type == some_task.type);
SCYLLA_ASSERT(t.replica.shard == this_shard_id());
}
#endif
// If any view was added after we did the initial flush, we need to do it again
if (std::ranges::any_of(tasks | std::views::values, [&] (const view_building_task& t) {
return t.view_id && !_state.flushed_views.contains(*t.view_id);
})) {
co_await _state.flush_base_table(_db, *_state.processing_base_table, _as);
}
// Create and start the batch
_state._batch = std::make_unique<batch>(container(), std::move(tasks), *building_state.currently_processed_base_table, my_replica);
_state._batch->start();
}
if (std::ranges::all_of(ids, [&] (auto& id) { return !_state._batch->tasks.contains(id); })) {
throw std::runtime_error(fmt::format(
"None of the tasks requested to work on is executed in current view building batch. Batch executes: {}, the RPC requested: {}",
_state._batch->tasks | std::views::keys, ids));
}
auto batch_future = _state._batch->promise.get_shared_future();
lock.return_all();
co_await std::move(batch_future);
lock = co_await get_units(_state._mutex, 1, _as);
co_await _state.clean_up_after_batch();
co_return collect_completed_tasks();
}
} }
} }

View File

@@ -16,7 +16,6 @@
#include <unordered_set> #include <unordered_set>
#include "locator/abstract_replication_strategy.hh" #include "locator/abstract_replication_strategy.hh"
#include "locator/tablets.hh" #include "locator/tablets.hh"
#include "raft/raft.hh"
#include "seastar/core/gate.hh" #include "seastar/core/gate.hh"
#include "db/view/view_building_state.hh" #include "db/view/view_building_state.hh"
#include "sstables/shared_sstable.hh" #include "sstables/shared_sstable.hh"
@@ -32,7 +31,7 @@ class messaging_service;
} }
namespace service { namespace service {
class raft_group0; class raft_group0_client;
} }
namespace db { namespace db {
@@ -66,16 +65,27 @@ class view_building_worker : public seastar::peering_sharded_service<view_buildi
* *
* When `work` future is finished, it means all tasks in `tasks_ids` are done. * When `work` future is finished, it means all tasks in `tasks_ids` are done.
* *
* The batch lives on shard, where its executing its work exclusively. * The batch lives on shard 0 exclusively.
* When the batch starts to execute its tasks, it firstly copies all necessary data
* to the designated shard, then the work is done on the local copy of the data only.
*/ */
enum class batch_state {
idle,
in_progress,
finished,
};
class batch { class batch {
public: public:
batch_state state = batch_state::idle;
table_id base_id; table_id base_id;
locator::tablet_replica replica; locator::tablet_replica replica;
std::unordered_map<utils::UUID, view_building_task> tasks; std::unordered_map<utils::UUID, view_building_task> tasks;
shared_promise<> promise; shared_future<> work;
future<> work = make_ready_future(); condition_variable batch_done_cv;
// The abort has to be used only on `replica.shard`
abort_source as; abort_source as;
batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica); batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica);
@@ -91,18 +101,35 @@ class view_building_worker : public seastar::peering_sharded_service<view_buildi
friend class batch; friend class batch;
struct state { struct local_state {
std::optional<table_id> processing_base_table = std::nullopt; std::optional<table_id> processing_base_table = std::nullopt;
std::unordered_set<utils::UUID> completed_tasks; // Stores ids of views for which the flush was done.
std::unique_ptr<batch> _batch = nullptr; // When a new view is created, we need to flush the base table again,
// as data might be inserted.
std::unordered_set<table_id> flushed_views; std::unordered_set<table_id> flushed_views;
std::unordered_map<utils::UUID, shared_ptr<batch>> tasks_map;
semaphore _mutex = semaphore(1); std::unordered_set<utils::UUID> finished_tasks;
// All of the methods below should be executed while holding `_mutex` unit! std::unordered_set<utils::UUID> aborted_tasks;
future<> update_processing_base_table(replica::database& db, const view_building_state& building_state, abort_source& as);
future<> flush_base_table(replica::database& db, table_id base_table_id, abort_source& as); bool some_batch_finished = false;
future<> clean_up_after_batch(); condition_variable state_updated_cv;
future<> clear();
// Clears completed/aborted tasks and creates batches (without starting them) for started tasks.
// Returns a map of tasks per shard to execute.
future<> update(view_building_worker& vbw);
future<> finish_completed_tasks();
// The state can be aborted if, for example, a view is dropped, then all its tasks
// are aborted and the coordinator may choose new base table to process.
// This method aborts all batches as we stop to processing the current base table.
future<> clear_state();
// Flush table with `table_id` on all shards.
// This method should be used only on currently processing base table and
// it updates `flushed_views` field.
future<> flush_table(view_building_worker& vbw, table_id table_id);
}; };
// Wrapper which represents information needed to create // Wrapper which represents information needed to create
@@ -120,14 +147,14 @@ private:
replica::database& _db; replica::database& _db;
db::system_keyspace& _sys_ks; db::system_keyspace& _sys_ks;
service::migration_notifier& _mnotifier; service::migration_notifier& _mnotifier;
service::raft_group0& _group0; service::raft_group0_client& _group0_client;
view_update_generator& _vug; view_update_generator& _vug;
netw::messaging_service& _messaging; netw::messaging_service& _messaging;
view_building_state_machine& _vb_state_machine; view_building_state_machine& _vb_state_machine;
abort_source _as; abort_source _as;
named_gate _gate; named_gate _gate;
state _state; local_state _state;
std::unordered_set<table_id> _views_in_progress; std::unordered_set<table_id> _views_in_progress;
future<> _view_building_state_observer = make_ready_future<>(); future<> _view_building_state_observer = make_ready_future<>();
@@ -139,7 +166,7 @@ private:
public: public:
view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier, view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier,
service::raft_group0& group0, view_update_generator& vug, netw::messaging_service& ms, service::raft_group0_client& group0_client, view_update_generator& vug, netw::messaging_service& ms,
view_building_state_machine& vbsm); view_building_state_machine& vbsm);
future<> init(); future<> init();
@@ -158,11 +185,10 @@ public:
void cleanup_staging_sstables(locator::effective_replication_map_ptr erm, table_id table_id, locator::tablet_id tid); void cleanup_staging_sstables(locator::effective_replication_map_ptr erm, table_id table_id, locator::tablet_id tid);
private: private:
future<view_building_state> get_latest_view_building_state(raft::term_t term);
future<> check_for_aborted_tasks();
future<> run_view_building_state_observer(); future<> run_view_building_state_observer();
future<> update_built_views(); future<> update_built_views();
future<> update_building_state();
bool is_shard_free(shard_id shard);
dht::token_range get_tablet_token_range(table_id table_id, dht::token last_token); dht::token_range get_tablet_token_range(table_id table_id, dht::token last_token);
future<> do_build_range(table_id base_id, std::vector<table_id> views_ids, dht::token last_token, abort_source& as); future<> do_build_range(table_id base_id, std::vector<table_id> views_ids, dht::token last_token, abort_source& as);
@@ -176,7 +202,7 @@ private:
void init_messaging_service(); void init_messaging_service();
future<> uninit_messaging_service(); future<> uninit_messaging_service();
future<std::vector<utils::UUID>> work_on_tasks(raft::term_t term, std::vector<utils::UUID> ids); future<std::vector<view_task_result>> work_on_tasks(std::vector<utils::UUID> ids);
}; };
} }

View File

@@ -483,7 +483,7 @@ public:
}); });
co_await add_partition(mutation_sink, "load", [this] () -> future<sstring> { co_await add_partition(mutation_sink, "load", [this] () -> future<sstring> {
return map_reduce_tables<int64_t>([] (replica::table& tbl) { return map_reduce_tables<int64_t>([] (replica::table& tbl) {
return tbl.get_stats().live_disk_space_used.on_disk; return tbl.get_stats().live_disk_space_used;
}).then([] (int64_t load) { }).then([] (int64_t load) {
return format("{}", load); return format("{}", load);
}); });
@@ -1158,104 +1158,6 @@ private:
} }
}; };
class tablet_sizes : public group0_virtual_table {
private:
sharded<service::tablet_allocator>& _talloc;
sharded<replica::database>& _db;
public:
tablet_sizes(sharded<service::tablet_allocator>& talloc,
sharded<replica::database>& db,
sharded<service::raft_group_registry>& raft_gr,
sharded<netw::messaging_service>& ms)
: group0_virtual_table(build_schema(), raft_gr, ms)
, _talloc(talloc)
, _db(db)
{ }
future<> execute_on_leader(std::function<void(mutation)> mutation_sink, reader_permit permit) override {
auto stats = _talloc.local().get_load_stats();
while (!stats) {
// Wait for stats to be refreshed by topology coordinator
{
abort_on_expiry aoe(permit.timeout());
reader_permit::awaits_guard ag(permit);
co_await seastar::sleep_abortable(std::chrono::milliseconds(200), aoe.abort_source());
}
if (!co_await is_leader(permit)) {
co_await redirect_to_leader(std::move(mutation_sink), std::move(permit));
co_return;
}
stats = _talloc.local().get_load_stats();
}
auto tm = _db.local().get_token_metadata_ptr();
auto prepare_replica_sizes = [] (const std::unordered_map<host_id, uint64_t>& replica_sizes) {
map_type_impl::native_type tmp;
for (auto& r: replica_sizes) {
auto replica = r.first.uuid();
int64_t tablet_size = int64_t(r.second);
auto map_element = std::make_pair<data_value, data_value>(data_value(replica), data_value(tablet_size));
tmp.push_back(std::move(map_element));
}
return tmp;
};
auto prepare_missing_replica = [] (const std::unordered_set<host_id>& missing_replicas) {
set_type_impl::native_type tmp;
for (auto& r: missing_replicas) {
tmp.push_back(data_value(r.uuid()));
}
return tmp;
};
auto map_type = map_type_impl::get_instance(uuid_type, long_type, false);
auto set_type = set_type_impl::get_instance(uuid_type, false);
for (auto&& [table, tmap] : tm->tablets().all_tables_ungrouped()) {
mutation m(schema(), make_partition_key(table));
co_await tmap->for_each_tablet([&] (locator::tablet_id tid, const locator::tablet_info& tinfo) -> future<> {
auto trange = tmap->get_token_range(tid);
int64_t last_token = trange.end()->value().raw();
auto& r = m.partition().clustered_row(*schema(), clustering_key::from_single_value(*schema(), data_value(last_token).serialize_nonnull()));
const range_based_tablet_id rb_tid {table, trange};
std::unordered_map<host_id, uint64_t> replica_sizes;
std::unordered_set<host_id> missing_replicas;
for (auto& replica : tinfo.replicas) {
auto tablet_size_opt = stats->get_tablet_size(replica.host, rb_tid);
if (tablet_size_opt) {
replica_sizes[replica.host] = *tablet_size_opt;
} else {
missing_replicas.insert(replica.host);
}
}
set_cell(r.cells(), "replicas", make_map_value(map_type, prepare_replica_sizes(replica_sizes)));
set_cell(r.cells(), "missing_replicas", make_set_value(set_type, prepare_missing_replica(missing_replicas)));
return make_ready_future<>();
});
mutation_sink(m);
}
}
private:
static schema_ptr build_schema() {
auto id = generate_legacy_id(system_keyspace::NAME, "tablet_sizes");
return schema_builder(system_keyspace::NAME, "tablet_sizes", std::make_optional(id))
.with_column("table_id", uuid_type, column_kind::partition_key)
.with_column("last_token", long_type, column_kind::clustering_key)
.with_column("replicas", map_type_impl::get_instance(uuid_type, long_type, false))
.with_column("missing_replicas", set_type_impl::get_instance(uuid_type, false))
.with_sharder(1, 0) // shard0-only
.with_hash_version()
.build();
}
dht::decorated_key make_partition_key(table_id table) {
return dht::decorate_key(*_s, partition_key::from_single_value(
*_s, data_value(table.uuid()).serialize_nonnull()));
}
};
class cdc_timestamps_table : public streaming_virtual_table { class cdc_timestamps_table : public streaming_virtual_table {
private: private:
replica::database& _db; replica::database& _db;
@@ -1451,7 +1353,6 @@ future<> initialize_virtual_tables(
co_await add_table(std::make_unique<clients_table>(ss)); co_await add_table(std::make_unique<clients_table>(ss));
co_await add_table(std::make_unique<raft_state_table>(dist_raft_gr)); co_await add_table(std::make_unique<raft_state_table>(dist_raft_gr));
co_await add_table(std::make_unique<load_per_node>(tablet_allocator, dist_db, dist_raft_gr, ms, dist_gossiper)); co_await add_table(std::make_unique<load_per_node>(tablet_allocator, dist_db, dist_raft_gr, ms, dist_gossiper));
co_await add_table(std::make_unique<tablet_sizes>(tablet_allocator, dist_db, dist_raft_gr, ms));
co_await add_table(std::make_unique<cdc_timestamps_table>(db, ss)); co_await add_table(std::make_unique<cdc_timestamps_table>(db, ss));
co_await add_table(std::make_unique<cdc_streams_table>(db, ss)); co_await add_table(std::make_unique<cdc_streams_table>(db, ss));

View File

@@ -18,9 +18,6 @@ target_link_libraries(scylla_dht
PRIVATE PRIVATE
replica) replica)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(scylla_dht REUSE_FROM scylla-precompiled-header)
endif()
add_whole_archive(dht scylla_dht) add_whole_archive(dht scylla_dht)
check_headers(check-headers scylla_dht check_headers(check-headers scylla_dht

View File

@@ -6,7 +6,13 @@ Before=slices.target
MemoryAccounting=true MemoryAccounting=true
IOAccounting=true IOAccounting=true
CPUAccounting=true CPUAccounting=true
# Systemd deprecated settings BlockIOWeight and CPUShares. But they are still the ones used in RHEL7
# Newer SystemD wants IOWeight and CPUWeight instead. Luckily both newer and older SystemD seem to
# ignore the unwanted option so safest to get both. Using just the old versions would work too but
# seems less future proof. Using just the new versions does not work at all for RHEL7/
BlockIOWeight=1000
IOWeight=1000 IOWeight=1000
MemorySwapMax=0 MemorySwapMax=0
CPUShares=1000
CPUWeight=1000 CPUWeight=1000

View File

@@ -2,6 +2,7 @@ etc/default/scylla-server
etc/default/scylla-housekeeping etc/default/scylla-housekeeping
etc/scylla.d/*.conf etc/scylla.d/*.conf
etc/bash_completion.d/nodetool-completion etc/bash_completion.d/nodetool-completion
opt/scylladb/share/p11-kit/modules/*
opt/scylladb/share/doc/scylla/* opt/scylladb/share/doc/scylla/*
opt/scylladb/share/doc/scylla/licenses/ opt/scylladb/share/doc/scylla/licenses/
usr/lib/systemd/system/*.timer usr/lib/systemd/system/*.timer

View File

@@ -122,6 +122,7 @@ ln -sfT /etc/scylla /var/lib/scylla/conf
%config(noreplace) %{_sysconfdir}/sysconfig/scylla-housekeeping %config(noreplace) %{_sysconfdir}/sysconfig/scylla-housekeeping
%attr(0755,root,root) %dir %{_sysconfdir}/scylla.d %attr(0755,root,root) %dir %{_sysconfdir}/scylla.d
%config(noreplace) %{_sysconfdir}/scylla.d/*.conf %config(noreplace) %{_sysconfdir}/scylla.d/*.conf
/opt/scylladb/share/p11-kit/modules/*
/opt/scylladb/share/doc/scylla/* /opt/scylladb/share/doc/scylla/*
%{_unitdir}/scylla-fstrim.service %{_unitdir}/scylla-fstrim.service
%{_unitdir}/scylla-housekeeping-daily.service %{_unitdir}/scylla-housekeeping-daily.service

Some files were not shown because too many files have changed in this diff Show More