mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-28 04:06:59 +00:00
Compare commits
249 Commits
copilot/ad
...
copilot/re
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
bb1ab98fc9 | ||
|
|
bdbc47a333 | ||
|
|
5407b6b43d | ||
|
|
8582156257 | ||
|
|
866c96f536 | ||
|
|
367633270a | ||
|
|
e97a504775 | ||
|
|
a5c971d21c | ||
|
|
a0809f0032 | ||
|
|
bb6e41f97a | ||
|
|
4df6b51ac2 | ||
|
|
0c8730ba05 | ||
|
|
bc2e83bc1f | ||
|
|
f4c3d5c1b7 | ||
|
|
e54abde3e8 | ||
|
|
9696ee64d0 | ||
|
|
8dd69f02a8 | ||
|
|
d000fa3335 | ||
|
|
4e289e8e6a | ||
|
|
9d2f7c3f52 | ||
|
|
e3e81a9a7a | ||
|
|
b82f92b439 | ||
|
|
f00e00fde0 | ||
|
|
b0727d3f2a | ||
|
|
4169bdb7a6 | ||
|
|
c5580399a8 | ||
|
|
1d42770936 | ||
|
|
d287b054b9 | ||
|
|
4f803aad22 | ||
|
|
a54bf50290 | ||
|
|
06dd3b2e64 | ||
|
|
6163fedd2e | ||
|
|
67f1c6d36c | ||
|
|
669286b1d6 | ||
|
|
b9199e8b24 | ||
|
|
1ff7f5941b | ||
|
|
3b70154f0a | ||
|
|
6ae72ed134 | ||
|
|
9213a163cb | ||
|
|
d9593732b1 | ||
|
|
48cf84064c | ||
|
|
a12165761e | ||
|
|
7dc04b033c | ||
|
|
654ac9099b | ||
|
|
ff1b212319 | ||
|
|
4e7ec9333f | ||
|
|
357f91de52 | ||
|
|
a191503ddf | ||
|
|
619bf3ac4b | ||
|
|
b5c85d08bb | ||
|
|
3aaab5d5a3 | ||
|
|
605f71d074 | ||
|
|
086c6992f5 | ||
|
|
b6afacfc1e | ||
|
|
0ed3452721 | ||
|
|
c3c0991428 | ||
|
|
563e5ddd62 | ||
|
|
796205678f | ||
|
|
902d70d6b2 | ||
|
|
ce2a403f18 | ||
|
|
d4be9a058c | ||
|
|
44c605e59c | ||
|
|
da5cc13e97 | ||
|
|
b54a9f4613 | ||
|
|
d35ce81ff1 | ||
|
|
b76af2d07f | ||
|
|
48a28c24c5 | ||
|
|
59c87025d1 | ||
|
|
1d5f60baac | ||
|
|
37e3dacf33 | ||
|
|
97b7c03709 | ||
|
|
62802b119b | ||
|
|
54edb44b20 | ||
|
|
c85671ce51 | ||
|
|
9b968dc72c | ||
|
|
e366030a92 | ||
|
|
32afcdbaf0 | ||
|
|
323e5cd171 | ||
|
|
d6c14de380 | ||
|
|
ab4896dc70 | ||
|
|
287c9eea65 | ||
|
|
4160ae94c1 | ||
|
|
cc273e867d | ||
|
|
68c7236acb | ||
|
|
dd461e0472 | ||
|
|
0c9b2e5332 | ||
|
|
b29c42adce | ||
|
|
ea3dc0b0de | ||
|
|
2a6bef96d6 | ||
|
|
19da1cb656 | ||
|
|
2cf1ca43b5 | ||
|
|
642f468c59 | ||
|
|
bd7c87731b | ||
|
|
4c667e87ec | ||
|
|
f4555be8a5 | ||
|
|
943350fd35 | ||
|
|
9cde93e3da | ||
|
|
86cd0a4dce | ||
|
|
9bbdd487b4 | ||
|
|
2138ab6b0e | ||
|
|
90a6aa8057 | ||
|
|
384bffb8da | ||
|
|
1f6e3301e7 | ||
|
|
765a7e9868 | ||
|
|
3c376d1b64 | ||
|
|
584f4e467e | ||
|
|
4c7c5f4af7 | ||
|
|
5dcdaa6f66 | ||
|
|
ff5c7bd960 | ||
|
|
64a65cac55 | ||
|
|
9f97c376f1 | ||
|
|
fe9581f54c | ||
|
|
fb8cbf1615 | ||
|
|
24d69b4005 | ||
|
|
eb04af5020 | ||
|
|
b911a643fd | ||
|
|
1263e1de54 | ||
|
|
bcd1758911 | ||
|
|
868ac42a8b | ||
|
|
005807ebb8 | ||
|
|
273f664496 | ||
|
|
b2c2a99741 | ||
|
|
ca62effdd2 | ||
|
|
9f10aebc66 | ||
|
|
3702e982b9 | ||
|
|
e569a04785 | ||
|
|
39cec4ae45 | ||
|
|
9cb766f929 | ||
|
|
468b800e89 | ||
|
|
f2b0489d8c | ||
|
|
853811be90 | ||
|
|
d4b77c422f | ||
|
|
13eca61d41 | ||
|
|
724dc1e582 | ||
|
|
5541f75405 | ||
|
|
08974e1d50 | ||
|
|
6d853c8f11 | ||
|
|
eb5e9f728c | ||
|
|
d6ef5967ef | ||
|
|
19a7d8e248 | ||
|
|
296d7b8595 | ||
|
|
76174d1f7a | ||
|
|
85db7b1caf | ||
|
|
b0643f8959 | ||
|
|
e8b0f8faa9 | ||
|
|
58456455e3 | ||
|
|
c40b3ba4b3 | ||
|
|
39711920eb | ||
|
|
e96863be0c | ||
|
|
cede4f66af | ||
|
|
38a1b1032a | ||
|
|
dab74471cc | ||
|
|
3003669c96 | ||
|
|
77dcad9484 | ||
|
|
c8d2f89de7 | ||
|
|
18e1dbd42e | ||
|
|
c32e9e1b54 | ||
|
|
da51a30780 | ||
|
|
73090c0d27 | ||
|
|
38e14d9cd5 | ||
|
|
aacf883a8b | ||
|
|
5c6813ccd0 | ||
|
|
6f79fcf4d5 | ||
|
|
939fcc0603 | ||
|
|
d589e68642 | ||
|
|
64a075533b | ||
|
|
3c4546d839 | ||
|
|
66bd3dc22c | ||
|
|
4488a4fb06 | ||
|
|
825d81dde2 | ||
|
|
0cc5208f8e | ||
|
|
f89bb68fe2 | ||
|
|
03408b185e | ||
|
|
ce8db6e19e | ||
|
|
3f11a5ed8c | ||
|
|
22f22d183f | ||
|
|
d51b1fea94 | ||
|
|
3cf1225ae6 | ||
|
|
74ecedfb5c | ||
|
|
a0734b8605 | ||
|
|
45ad93a52c | ||
|
|
2ca926f669 | ||
|
|
ad3cf2c174 | ||
|
|
5d761373c2 | ||
|
|
e5fbe3d217 | ||
|
|
a9cf7d08da | ||
|
|
a084094c18 | ||
|
|
104de44a8d | ||
|
|
1cabc8d9b0 | ||
|
|
dc7944ce5c | ||
|
|
6ee0f1f3a7 | ||
|
|
9563d87f74 | ||
|
|
366ecef1b9 | ||
|
|
8ed36702ae | ||
|
|
53b71018e8 | ||
|
|
0d68512b1f | ||
|
|
fd81333181 | ||
|
|
dedc8bdf71 | ||
|
|
f83c4ffc68 | ||
|
|
4a85ea8eb2 | ||
|
|
ed8d127457 | ||
|
|
d8e299dbb2 | ||
|
|
05b9cafb57 | ||
|
|
7b9428d8d7 | ||
|
|
11f6a25d44 | ||
|
|
4d5f7a57ea | ||
|
|
64e099f03b | ||
|
|
656ce27e7f | ||
|
|
5b78e1cebe | ||
|
|
65b364d94a | ||
|
|
c049992a93 | ||
|
|
35fd603acd | ||
|
|
39bfad48cc | ||
|
|
0602afc085 | ||
|
|
10975bf65c | ||
|
|
8bf09ac6f7 | ||
|
|
991c0f6e6d | ||
|
|
76b21d7a5a | ||
|
|
3856c9d376 | ||
|
|
5a0fddc9ee | ||
|
|
9cb776dee8 | ||
|
|
d55044b696 | ||
|
|
2ec3303edd | ||
|
|
0fede18447 | ||
|
|
675eb3be98 | ||
|
|
c853197281 | ||
|
|
9868341c73 | ||
|
|
e6dee8aab5 | ||
|
|
78ab31118e | ||
|
|
cb1679d299 | ||
|
|
604e5b6727 | ||
|
|
8f9f92728e | ||
|
|
88bb203c9c | ||
|
|
1f6918be3f | ||
|
|
79d0f93693 | ||
|
|
218916e7c2 | ||
|
|
004ba32fa5 | ||
|
|
1895d85ed2 | ||
|
|
346e0f64e2 | ||
|
|
1cfce430f1 | ||
|
|
0398bc0056 | ||
|
|
66ac66178b | ||
|
|
a32e8091a9 | ||
|
|
8c2f60f111 | ||
|
|
4f6aeb7b6b | ||
|
|
ffdc8d49c7 | ||
|
|
e3fde8087a | ||
|
|
c922256616 | ||
|
|
b18b052d26 |
86
.github/copilot-instructions.md
vendored
Normal file
86
.github/copilot-instructions.md
vendored
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# ScyllaDB Development Instructions
|
||||||
|
|
||||||
|
## Project Context
|
||||||
|
High-performance distributed NoSQL database. Core values: performance, correctness, readability.
|
||||||
|
|
||||||
|
## Build System
|
||||||
|
|
||||||
|
### Modern Build (configure.py + ninja)
|
||||||
|
```bash
|
||||||
|
# Configure (run once per mode, or when switching modes)
|
||||||
|
./configure.py --mode=<mode> # mode: dev, debug, release, sanitize
|
||||||
|
|
||||||
|
# Build everything
|
||||||
|
ninja <mode>-build # e.g., ninja dev-build
|
||||||
|
|
||||||
|
# Build Scylla binary only (sufficient for Python integration tests)
|
||||||
|
ninja build/<mode>/scylla
|
||||||
|
|
||||||
|
# Build specific test
|
||||||
|
ninja build/<mode>/test/boost/<test_name>
|
||||||
|
```
|
||||||
|
|
||||||
|
## Running Tests
|
||||||
|
|
||||||
|
### C++ Unit Tests
|
||||||
|
```bash
|
||||||
|
# Run all tests in a file
|
||||||
|
./test.py --mode=<mode> test/<suite>/<test_name>.cc
|
||||||
|
|
||||||
|
# Run a single test case from a file
|
||||||
|
./test.py --mode=<mode> test/<suite>/<test_name>.cc::<test_case_name>
|
||||||
|
|
||||||
|
# Examples
|
||||||
|
./test.py --mode=dev test/boost/memtable_test.cc
|
||||||
|
./test.py --mode=dev test/raft/raft_server_test.cc::test_check_abort_on_client_api
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:**
|
||||||
|
- Use full path with `.cc` extension (e.g., `test/boost/test_name.cc`, not `boost/test_name`)
|
||||||
|
- To run a single test case, append `::<test_case_name>` to the file path
|
||||||
|
- If you encounter permission issues with cgroup metric gathering, add `--no-gather-metrics` flag
|
||||||
|
|
||||||
|
**Rebuilding Tests:**
|
||||||
|
- test.py does NOT automatically rebuild when test source files are modified
|
||||||
|
- Many tests are part of composite binaries (e.g., `combined_tests` in test/boost contains multiple test files)
|
||||||
|
- To find which binary contains a test, check `configure.py` in the repository root (primary source) or `test/<suite>/CMakeLists.txt`
|
||||||
|
- To rebuild a specific test binary: `ninja build/<mode>/test/<suite>/<binary_name>`
|
||||||
|
- Examples:
|
||||||
|
- `ninja build/dev/test/boost/combined_tests` (contains group0_voter_calculator_test.cc and others)
|
||||||
|
- `ninja build/dev/test/raft/replication_test` (standalone Raft test)
|
||||||
|
|
||||||
|
### Python Integration Tests
|
||||||
|
```bash
|
||||||
|
# Only requires Scylla binary (full build usually not needed)
|
||||||
|
ninja build/<mode>/scylla
|
||||||
|
|
||||||
|
# Run all tests in a file
|
||||||
|
./test.py --mode=<mode> <test_path>
|
||||||
|
|
||||||
|
# Run a single test case from a file
|
||||||
|
./test.py --mode=<mode> <test_path>::<test_function_name>
|
||||||
|
|
||||||
|
# Examples
|
||||||
|
./test.py --mode=dev alternator/
|
||||||
|
./test.py --mode=dev cluster/test_raft_voters::test_raft_limited_voters_retain_coordinator
|
||||||
|
|
||||||
|
# Optional flags
|
||||||
|
./test.py --mode=dev cluster/test_raft_no_quorum -v # Verbose output
|
||||||
|
./test.py --mode=dev cluster/test_raft_no_quorum --repeat 5 # Repeat test 5 times
|
||||||
|
```
|
||||||
|
|
||||||
|
**Important:**
|
||||||
|
- Use path without `.py` extension (e.g., `cluster/test_raft_no_quorum`, not `cluster/test_raft_no_quorum.py`)
|
||||||
|
- To run a single test case, append `::<test_function_name>` to the file path
|
||||||
|
- Add `-v` for verbose output
|
||||||
|
- Add `--repeat <num>` to repeat a test multiple times
|
||||||
|
- After modifying C++ source files, only rebuild the Scylla binary for Python tests - building the entire repository is unnecessary
|
||||||
|
|
||||||
|
## Code Philosophy
|
||||||
|
- Performance matters in hot paths (data read/write, inner loops)
|
||||||
|
- Self-documenting code through clear naming
|
||||||
|
- Comments explain "why", not "what"
|
||||||
|
- Prefer standard library over custom implementations
|
||||||
|
- Strive for simplicity and clarity, add complexity only when clearly justified
|
||||||
|
- Question requests: don't blindly implement requests - evaluate trade-offs, identify issues, and suggest better alternatives when appropriate
|
||||||
|
- Consider different approaches, weigh pros and cons, and recommend the best fit for the specific context
|
||||||
115
.github/instructions/cpp.instructions.md
vendored
Normal file
115
.github/instructions/cpp.instructions.md
vendored
Normal file
@@ -0,0 +1,115 @@
|
|||||||
|
---
|
||||||
|
applyTo: "**/*.{cc,hh}"
|
||||||
|
---
|
||||||
|
|
||||||
|
# C++ Guidelines
|
||||||
|
|
||||||
|
**Important:** Always match the style and conventions of existing code in the file and directory.
|
||||||
|
|
||||||
|
## Memory Management
|
||||||
|
- Prefer stack allocation whenever possible
|
||||||
|
- Use `std::unique_ptr` by default for dynamic allocations
|
||||||
|
- `new`/`delete` are forbidden (use RAII)
|
||||||
|
- Use `seastar::lw_shared_ptr` or `seastar::shared_ptr` for shared ownership within same shard
|
||||||
|
- Use `seastar::foreign_ptr` for cross-shard sharing
|
||||||
|
- Avoid `std::shared_ptr` except when interfacing with external C++ APIs
|
||||||
|
- Avoid raw pointers except for non-owning references or C API interop
|
||||||
|
|
||||||
|
## Seastar Asynchronous Programming
|
||||||
|
- Use `seastar::future<T>` for all async operations
|
||||||
|
- Prefer coroutines (`co_await`, `co_return`) over `.then()` chains for readability
|
||||||
|
- Coroutines are preferred over `seastar::do_with()` for managing temporary state
|
||||||
|
- In hot paths where futures are ready, continuations may be more efficient than coroutines
|
||||||
|
- Chain futures with `.then()`, don't block with `.get()` (unless in `seastar::thread` context)
|
||||||
|
- All I/O must be asynchronous (no blocking calls)
|
||||||
|
- Use `seastar::gate` for shutdown coordination
|
||||||
|
- Use `seastar::semaphore` for resource limiting (not `std::mutex`)
|
||||||
|
- Break long loops with `maybe_yield()` to avoid reactor stalls
|
||||||
|
|
||||||
|
## Coroutines
|
||||||
|
```cpp
|
||||||
|
seastar::future<T> func() {
|
||||||
|
auto result = co_await async_operation();
|
||||||
|
co_return result;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Error Handling
|
||||||
|
- Throw exceptions for errors (futures propagate them automatically)
|
||||||
|
- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead
|
||||||
|
- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)
|
||||||
|
- Database-specific: throw appropriate schema/query exceptions
|
||||||
|
|
||||||
|
## Performance
|
||||||
|
- Pass large objects by `const&` or `&&` (move semantics)
|
||||||
|
- Use `std::string_view` for non-owning string references
|
||||||
|
- Avoid copies: prefer move semantics
|
||||||
|
- Use `utils::chunked_vector` instead of `std::vector` for large allocations (>128KB)
|
||||||
|
- Minimize dynamic allocations in hot paths
|
||||||
|
|
||||||
|
## Database-Specific Types
|
||||||
|
- Use `schema_ptr` for schema references
|
||||||
|
- Use `mutation` and `mutation_partition` for data modifications
|
||||||
|
- Use `partition_key` and `clustering_key` for keys
|
||||||
|
- Use `api::timestamp_type` for database timestamps
|
||||||
|
- Use `gc_clock` for garbage collection timing
|
||||||
|
|
||||||
|
## Style
|
||||||
|
- C++23 standard (prefer modern features, especially coroutines)
|
||||||
|
- Use `auto` when type is obvious from RHS
|
||||||
|
- Avoid `auto` when it obscures the type
|
||||||
|
- Use range-based for loops: `for (const auto& item : container)`
|
||||||
|
- Use standard algorithms when they clearly simplify code (e.g., replacing 10-line loops)
|
||||||
|
- Avoid chaining multiple algorithms if a straightforward loop is clearer
|
||||||
|
- Mark functions and variables `const` whenever possible
|
||||||
|
- Use scoped enums: `enum class` (not unscoped `enum`)
|
||||||
|
|
||||||
|
## Headers
|
||||||
|
- Use `#pragma once`
|
||||||
|
- Include order: own header, C++ std, Seastar, Boost, project headers
|
||||||
|
- Forward declare when possible
|
||||||
|
- Never `using namespace` in headers (exception: `using namespace seastar` is globally available via `seastarx.hh`)
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
- Public APIs require clear documentation
|
||||||
|
- Implementation details should be self-evident from code
|
||||||
|
- Use `///` or Doxygen `/** */` for public documentation, `//` for implementation notes - follow the existing style
|
||||||
|
|
||||||
|
## Naming
|
||||||
|
- `snake_case` for most identifiers (classes, functions, variables, namespaces)
|
||||||
|
- Template parameters: `CamelCase` (e.g., `template<typename ValueType>`)
|
||||||
|
- Member variables: prefix with `_` (e.g., `int _count;`)
|
||||||
|
- Structs (value-only): no `_` prefix on members
|
||||||
|
- Constants and `constexpr`: `snake_case` (e.g., `static constexpr int max_size = 100;`)
|
||||||
|
- Files: `.hh` for headers, `.cc` for source
|
||||||
|
|
||||||
|
## Formatting
|
||||||
|
- 4 spaces indentation, never tabs
|
||||||
|
- Opening braces on same line as control structure (except namespaces)
|
||||||
|
- Space after keywords: `if (`, `while (`, `return `
|
||||||
|
- Whitespace around operators matches precedence: `*a + *b` not `* a+* b`
|
||||||
|
- Line length: keep reasonable (<160 chars), use continuation lines with double indent if needed
|
||||||
|
- Brace all nested scopes, even single statements
|
||||||
|
- Minimal patches: only format code you modify, never reformat entire files
|
||||||
|
|
||||||
|
## Logging
|
||||||
|
- Use structured logging with appropriate levels: DEBUG, INFO, WARN, ERROR
|
||||||
|
- Include context in log messages (e.g., request IDs)
|
||||||
|
- Never log sensitive data (credentials, PII)
|
||||||
|
|
||||||
|
## Forbidden
|
||||||
|
- `malloc`/`free`
|
||||||
|
- `printf` family (use logging or fmt)
|
||||||
|
- Raw pointers for ownership
|
||||||
|
- `using namespace` in headers
|
||||||
|
- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)
|
||||||
|
- `std::atomic` (reserved for very special circumstances only)
|
||||||
|
- Macros (use `inline`, `constexpr`, or templates instead)
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
When modifying existing code, follow TDD: create/update test first, then implement.
|
||||||
|
- Examine existing tests for style and structure
|
||||||
|
- Use Boost.Test framework
|
||||||
|
- Use `SEASTAR_THREAD_TEST_CASE` for Seastar asynchronous tests
|
||||||
|
- Aim for high code coverage, especially for new features and bug fixes
|
||||||
|
- Maintain bisectability: all tests must pass in every commit. Mark failing tests with `BOOST_FAIL()` or similar, then fix in subsequent commit
|
||||||
51
.github/instructions/python.instructions.md
vendored
Normal file
51
.github/instructions/python.instructions.md
vendored
Normal file
@@ -0,0 +1,51 @@
|
|||||||
|
---
|
||||||
|
applyTo: "**/*.py"
|
||||||
|
---
|
||||||
|
|
||||||
|
# Python Guidelines
|
||||||
|
|
||||||
|
**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.
|
||||||
|
|
||||||
|
## Style
|
||||||
|
- Follow PEP 8
|
||||||
|
- Use type hints for function signatures (unless directory style omits them)
|
||||||
|
- Use f-strings for formatting
|
||||||
|
- Line length: 160 characters max
|
||||||
|
- 4 spaces for indentation
|
||||||
|
|
||||||
|
## Imports
|
||||||
|
Order: standard library, third-party, local imports
|
||||||
|
```python
|
||||||
|
import os
|
||||||
|
import sys
|
||||||
|
|
||||||
|
import pytest
|
||||||
|
from cassandra.cluster import Cluster
|
||||||
|
|
||||||
|
from test.utils import setup_keyspace
|
||||||
|
```
|
||||||
|
|
||||||
|
Never use `from module import *`
|
||||||
|
|
||||||
|
## Documentation
|
||||||
|
All public functions/classes need docstrings (unless the current directory conventions omit them):
|
||||||
|
```python
|
||||||
|
def my_function(arg1: str, arg2: int) -> bool:
|
||||||
|
"""
|
||||||
|
Brief summary of function purpose.
|
||||||
|
|
||||||
|
Args:
|
||||||
|
arg1: Description of first argument.
|
||||||
|
arg2: Description of second argument.
|
||||||
|
|
||||||
|
Returns:
|
||||||
|
Description of return value.
|
||||||
|
"""
|
||||||
|
pass
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Best Practices
|
||||||
|
- Maintain bisectability: all tests must pass in every commit
|
||||||
|
- Mark currently-failing tests with `@pytest.mark.xfail`, unmark when fixed
|
||||||
|
- Use descriptive names that convey intent
|
||||||
|
- Docstrings/comments should explain what the test verifies and why, and if it reproduces a specific issue or how it fits into the larger test suite
|
||||||
34
.github/workflows/docs-validate-metrics.yml
vendored
Normal file
34
.github/workflows/docs-validate-metrics.yml
vendored
Normal file
@@ -0,0 +1,34 @@
|
|||||||
|
name: Docs / Validate metrics
|
||||||
|
|
||||||
|
on:
|
||||||
|
pull_request:
|
||||||
|
branches:
|
||||||
|
- master
|
||||||
|
- enterprise
|
||||||
|
paths:
|
||||||
|
- '**/*.cc'
|
||||||
|
- 'scripts/metrics-config.yml'
|
||||||
|
- 'scripts/get_description.py'
|
||||||
|
- 'docs/_ext/scylladb_metrics.py'
|
||||||
|
|
||||||
|
jobs:
|
||||||
|
validate-metrics:
|
||||||
|
runs-on: ubuntu-latest
|
||||||
|
name: Check metrics documentation coverage
|
||||||
|
|
||||||
|
steps:
|
||||||
|
- name: Checkout code
|
||||||
|
uses: actions/checkout@v4
|
||||||
|
with:
|
||||||
|
submodules: true
|
||||||
|
|
||||||
|
- name: Set up Python
|
||||||
|
uses: actions/setup-python@v6
|
||||||
|
with:
|
||||||
|
python-version: '3.10'
|
||||||
|
|
||||||
|
- name: Install dependencies
|
||||||
|
run: pip install PyYAML
|
||||||
|
|
||||||
|
- name: Validate metrics
|
||||||
|
run: python3 scripts/get_description.py --validate -c scripts/metrics-config.yml
|
||||||
@@ -116,6 +116,7 @@ list(APPEND absl_cxx_flags
|
|||||||
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
|
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
|
||||||
list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})
|
list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})
|
||||||
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
|
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
|
||||||
|
list(APPEND absl_cxx_flags "-Wno-deprecated-builtins")
|
||||||
list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags})
|
list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags})
|
||||||
endif()
|
endif()
|
||||||
set(ABSL_DEFAULT_LINKOPTS
|
set(ABSL_DEFAULT_LINKOPTS
|
||||||
@@ -163,7 +164,45 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
|
|||||||
include(add_version_library)
|
include(add_version_library)
|
||||||
generate_scylla_version()
|
generate_scylla_version()
|
||||||
|
|
||||||
|
option(Scylla_USE_PRECOMPILED_HEADER "Use precompiled header for Scylla" ON)
|
||||||
|
add_library(scylla-precompiled-header STATIC exported_templates.cc)
|
||||||
|
target_link_libraries(scylla-precompiled-header PRIVATE
|
||||||
|
absl::headers
|
||||||
|
absl::btree
|
||||||
|
absl::hash
|
||||||
|
absl::raw_hash_set
|
||||||
|
Seastar::seastar
|
||||||
|
Snappy::snappy
|
||||||
|
systemd
|
||||||
|
ZLIB::ZLIB
|
||||||
|
lz4::lz4_static
|
||||||
|
zstd::zstd_static)
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER)
|
||||||
|
set(Scylla_USE_PRECOMPILED_HEADER_USE ON)
|
||||||
|
find_program(DISTCC_EXEC NAMES distcc OPTIONAL)
|
||||||
|
if (DISTCC_EXEC)
|
||||||
|
if(DEFINED ENV{DISTCC_HOSTS})
|
||||||
|
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
|
||||||
|
message(STATUS "Disabling precompiled header usage because distcc exists and DISTCC_HOSTS is set, assuming you're using distributed compilation.")
|
||||||
|
else()
|
||||||
|
file(REAL_PATH "~/.distcc/hosts" DIST_CC_HOSTS_PATH EXPAND_TILDE)
|
||||||
|
if (EXISTS ${DIST_CC_HOSTS_PATH})
|
||||||
|
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
|
||||||
|
message(STATUS "Disabling precompiled header usage because distcc and ~/.distcc/hosts exists, assuming you're using distributed compilation.")
|
||||||
|
endif()
|
||||||
|
endif()
|
||||||
|
endif()
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")
|
||||||
|
target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")
|
||||||
|
target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)
|
||||||
|
endif()
|
||||||
|
else()
|
||||||
|
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
|
||||||
|
endif()
|
||||||
|
|
||||||
add_library(scylla-main STATIC)
|
add_library(scylla-main STATIC)
|
||||||
|
|
||||||
target_sources(scylla-main
|
target_sources(scylla-main
|
||||||
PRIVATE
|
PRIVATE
|
||||||
absl-flat_hash_map.cc
|
absl-flat_hash_map.cc
|
||||||
@@ -208,6 +247,7 @@ target_link_libraries(scylla-main
|
|||||||
ZLIB::ZLIB
|
ZLIB::ZLIB
|
||||||
lz4::lz4_static
|
lz4::lz4_static
|
||||||
zstd::zstd_static
|
zstd::zstd_static
|
||||||
|
scylla-precompiled-header
|
||||||
)
|
)
|
||||||
|
|
||||||
option(Scylla_CHECK_HEADERS
|
option(Scylla_CHECK_HEADERS
|
||||||
|
|||||||
109
UNIMPLEMENTED_ENUM_ANALYSIS.md
Normal file
109
UNIMPLEMENTED_ENUM_ANALYSIS.md
Normal file
@@ -0,0 +1,109 @@
|
|||||||
|
# Analysis of unimplemented::cause Enum Values
|
||||||
|
|
||||||
|
This document provides an analysis of the `unimplemented::cause` enum values after cleanup.
|
||||||
|
|
||||||
|
## Removed Unused Enum Values (20 values removed)
|
||||||
|
|
||||||
|
The following enum values had **zero usages** in the codebase and have been removed:
|
||||||
|
|
||||||
|
- `LWT` - Lightweight transactions
|
||||||
|
- `PAGING` - Query result paging
|
||||||
|
- `AUTH` - Authentication
|
||||||
|
- `PERMISSIONS` - Permission checking
|
||||||
|
- `COUNTERS` - Counter columns
|
||||||
|
- `MIGRATIONS` - Schema migrations
|
||||||
|
- `GOSSIP` - Gossip protocol
|
||||||
|
- `TOKEN_RESTRICTION` - Token-based restrictions
|
||||||
|
- `LEGACY_COMPOSITE_KEYS` - Legacy composite key handling
|
||||||
|
- `COLLECTION_RANGE_TOMBSTONES` - Collection range tombstones
|
||||||
|
- `RANGE_DELETES` - Range deletion operations
|
||||||
|
- `COMPRESSION` - Compression features
|
||||||
|
- `NONATOMIC` - Non-atomic operations
|
||||||
|
- `CONSISTENCY` - Consistency level handling
|
||||||
|
- `WRAP_AROUND` - Token wrap-around handling
|
||||||
|
- `STORAGE_SERVICE` - Storage service operations
|
||||||
|
- `SCHEMA_CHANGE` - Schema change operations
|
||||||
|
- `MIXED_CF` - Mixed column family operations
|
||||||
|
- `SSTABLE_FORMAT_M` - SSTable format M
|
||||||
|
|
||||||
|
## Remaining Enum Values (8 values kept)
|
||||||
|
|
||||||
|
### 1. `API` (4 usages)
|
||||||
|
**Impact**: REST API features that are not fully implemented.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `api/column_family.cc:1052` - Fails when `split_output` parameter is used in major compaction
|
||||||
|
- `api/compaction_manager.cc:100,146,216` - Warns when force_user_defined_compaction or related operations are called
|
||||||
|
|
||||||
|
**User Impact**: Some REST API endpoints for compaction management are stubs and will warn or fail.
|
||||||
|
|
||||||
|
### 2. `INDEXES` (6 usages)
|
||||||
|
**Impact**: Secondary index features not fully supported.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `api/column_family.cc:433,440,449,456` - Warns about index-related operations
|
||||||
|
- `cql3/restrictions/statement_restrictions.cc:1158` - Fails when attempting filtering on collection columns without proper indexing
|
||||||
|
- `cql3/statements/update_statement.cc:149` - Warns about index operations
|
||||||
|
|
||||||
|
**User Impact**: Some advanced secondary index features (especially filtering on collections) are not available.
|
||||||
|
|
||||||
|
### 3. `TRIGGERS` (2 usages)
|
||||||
|
**Impact**: Trigger support is not implemented.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `db/schema_tables.cc:2017` - Warns when loading trigger metadata from schema tables
|
||||||
|
- `service/storage_proxy.cc:4166` - Warns when processing trigger-related operations
|
||||||
|
|
||||||
|
**User Impact**: Cassandra triggers (stored procedures that execute on data changes) are not supported.
|
||||||
|
|
||||||
|
### 4. `METRICS` (1 usage)
|
||||||
|
**Impact**: Some query processor metrics are not collected.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `cql3/query_processor.cc:585` - Warns about missing metrics implementation
|
||||||
|
|
||||||
|
**User Impact**: Minor - some internal metrics may not be available.
|
||||||
|
|
||||||
|
### 5. `VALIDATION` (4 usages)
|
||||||
|
**Impact**: Schema validation checks are partially implemented.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `cql3/functions/token_fct.hh:38` - Warns about validation in token functions
|
||||||
|
- `cql3/statements/drop_keyspace_statement.cc:40` - Warns when dropping keyspace
|
||||||
|
- `cql3/statements/truncate_statement.cc:87` - Warns when truncating table
|
||||||
|
- `service/migration_manager.cc:750` - Warns during schema migrations
|
||||||
|
|
||||||
|
**User Impact**: Some schema validation checks are skipped (with warnings logged).
|
||||||
|
|
||||||
|
### 6. `REVERSED` (1 usage)
|
||||||
|
**Impact**: Reversed type support in CQL protocol.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `transport/server.cc:2085` - Fails when trying to use reversed types in CQL protocol
|
||||||
|
|
||||||
|
**User Impact**: Reversed types are not supported in the CQL protocol implementation.
|
||||||
|
|
||||||
|
### 7. `HINT` (1 usage)
|
||||||
|
**Impact**: Hint replaying is not implemented.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `db/batchlog_manager.cc:251` - Warns when attempting to replay hints
|
||||||
|
|
||||||
|
**User Impact**: Cassandra hints (temporary storage of writes when nodes are down) are not supported.
|
||||||
|
|
||||||
|
### 8. `SUPER` (2 usages)
|
||||||
|
**Impact**: Super column families are not supported.
|
||||||
|
|
||||||
|
**Usages**:
|
||||||
|
- `db/legacy_schema_migrator.cc:157` - Fails when encountering super column family in legacy schema
|
||||||
|
- `db/schema_tables.cc:2288` - Fails when encountering super column family in schema tables
|
||||||
|
|
||||||
|
**User Impact**: Super column families (legacy Cassandra feature) will cause errors if encountered in legacy data or schema migrations.
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
- **Removed**: 20 unused enum values (76% reduction)
|
||||||
|
- **Kept**: 8 actively used enum values (24% remaining)
|
||||||
|
- **Total lines removed**: ~40 lines from enum definition and switch statement
|
||||||
|
|
||||||
|
The remaining enum values represent actual unimplemented features that users may encounter, with varying impacts ranging from warnings (TRIGGERS, METRICS, VALIDATION, HINT) to failures (API split_output, INDEXES on collections, REVERSED types, SUPER tables).
|
||||||
@@ -34,5 +34,8 @@ target_link_libraries(alternator
|
|||||||
idl
|
idl
|
||||||
absl::headers)
|
absl::headers)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(alternator REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers alternator
|
check_headers(check-headers alternator
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
@@ -888,7 +888,7 @@ future<executor::request_return_type> executor::describe_table(client_state& cli
|
|||||||
|
|
||||||
schema_ptr schema = get_table(_proxy, request);
|
schema_ptr schema = get_table(_proxy, request);
|
||||||
get_stats_from_schema(_proxy, *schema)->api_operations.describe_table++;
|
get_stats_from_schema(_proxy, *schema)->api_operations.describe_table++;
|
||||||
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
|
tracing::add_alternator_table_name(trace_state, schema->cf_name());
|
||||||
|
|
||||||
rjson::value table_description = co_await fill_table_description(schema, table_status::active, _proxy, client_state, trace_state, permit);
|
rjson::value table_description = co_await fill_table_description(schema, table_status::active, _proxy, client_state, trace_state, permit);
|
||||||
rjson::value response = rjson::empty_object();
|
rjson::value response = rjson::empty_object();
|
||||||
@@ -989,7 +989,7 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
|
|||||||
std::string table_name = get_table_name(request);
|
std::string table_name = get_table_name(request);
|
||||||
|
|
||||||
std::string keyspace_name = executor::KEYSPACE_NAME_PREFIX + table_name;
|
std::string keyspace_name = executor::KEYSPACE_NAME_PREFIX + table_name;
|
||||||
tracing::add_table_name(trace_state, keyspace_name, table_name);
|
tracing::add_alternator_table_name(trace_state, table_name);
|
||||||
auto& p = _proxy.container();
|
auto& p = _proxy.container();
|
||||||
|
|
||||||
schema_ptr schema = get_table(_proxy, request);
|
schema_ptr schema = get_table(_proxy, request);
|
||||||
@@ -1008,8 +1008,8 @@ future<executor::request_return_type> executor::delete_table(client_state& clien
|
|||||||
throw api_error::resource_not_found(fmt::format("Requested resource not found: Table: {} not found", table_name));
|
throw api_error::resource_not_found(fmt::format("Requested resource not found: Table: {} not found", table_name));
|
||||||
}
|
}
|
||||||
|
|
||||||
auto m = co_await service::prepare_column_family_drop_announcement(_proxy, keyspace_name, table_name, group0_guard.write_timestamp(), service::drop_views::yes);
|
auto m = co_await service::prepare_column_family_drop_announcement(p.local(), keyspace_name, table_name, group0_guard.write_timestamp(), service::drop_views::yes);
|
||||||
auto m2 = co_await service::prepare_keyspace_drop_announcement(_proxy, keyspace_name, group0_guard.write_timestamp());
|
auto m2 = co_await service::prepare_keyspace_drop_announcement(p.local(), keyspace_name, group0_guard.write_timestamp());
|
||||||
|
|
||||||
std::move(m2.begin(), m2.end(), std::back_inserter(m));
|
std::move(m2.begin(), m2.end(), std::back_inserter(m));
|
||||||
|
|
||||||
@@ -1583,7 +1583,7 @@ static future<executor::request_return_type> create_table_on_shard0(service::cli
|
|||||||
std::unordered_set<std::string> unused_attribute_definitions =
|
std::unordered_set<std::string> unused_attribute_definitions =
|
||||||
validate_attribute_definitions("", *attribute_definitions);
|
validate_attribute_definitions("", *attribute_definitions);
|
||||||
|
|
||||||
tracing::add_table_name(trace_state, keyspace_name, table_name);
|
tracing::add_alternator_table_name(trace_state, table_name);
|
||||||
|
|
||||||
schema_builder builder(keyspace_name, table_name);
|
schema_builder builder(keyspace_name, table_name);
|
||||||
auto [hash_key, range_key] = parse_key_schema(request, "");
|
auto [hash_key, range_key] = parse_key_schema(request, "");
|
||||||
@@ -1865,10 +1865,10 @@ future<executor::request_return_type> executor::create_table(client_state& clien
|
|||||||
_stats.api_operations.create_table++;
|
_stats.api_operations.create_table++;
|
||||||
elogger.trace("Creating table {}", request);
|
elogger.trace("Creating table {}", request);
|
||||||
|
|
||||||
co_return co_await _mm.container().invoke_on(0, [&, tr = tracing::global_trace_state_ptr(trace_state), request = std::move(request), &sp = _proxy.container(), &g = _gossiper.container(), client_state_other_shard = client_state.move_to_other_shard(), enforce_authorization = bool(_enforce_authorization), warn_authorization = bool(_warn_authorization)]
|
co_return co_await _mm.container().invoke_on(0, [&, tr = tracing::global_trace_state_ptr(trace_state), request = std::move(request), &sp = _proxy.container(), &g = _gossiper.container(), &e = this->container(), client_state_other_shard = client_state.move_to_other_shard(), enforce_authorization = bool(_enforce_authorization), warn_authorization = bool(_warn_authorization)]
|
||||||
(service::migration_manager& mm) mutable -> future<executor::request_return_type> {
|
(service::migration_manager& mm) mutable -> future<executor::request_return_type> {
|
||||||
const db::tablets_mode_t::mode tablets_mode = _proxy.data_dictionary().get_config().tablets_mode_for_new_keyspaces(); // type cast
|
const db::tablets_mode_t::mode tablets_mode = _proxy.data_dictionary().get_config().tablets_mode_for_new_keyspaces(); // type cast
|
||||||
co_return co_await create_table_on_shard0(client_state_other_shard.get(), tr, std::move(request), sp.local(), mm, g.local(), enforce_authorization, warn_authorization, _stats, std::move(tablets_mode));
|
co_return co_await create_table_on_shard0(client_state_other_shard.get(), tr, std::move(request), sp.local(), mm, g.local(), enforce_authorization, warn_authorization, e.local()._stats, std::move(tablets_mode));
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -1930,7 +1930,7 @@ future<executor::request_return_type> executor::update_table(client_state& clien
|
|||||||
|
|
||||||
schema_ptr tab = get_table(p.local(), request);
|
schema_ptr tab = get_table(p.local(), request);
|
||||||
|
|
||||||
tracing::add_table_name(gt, tab->ks_name(), tab->cf_name());
|
tracing::add_alternator_table_name(gt, tab->cf_name());
|
||||||
|
|
||||||
// the ugly but harmless conversion to string_view here is because
|
// the ugly but harmless conversion to string_view here is because
|
||||||
// Seastar's sstring is missing a find(std::string_view) :-()
|
// Seastar's sstring is missing a find(std::string_view) :-()
|
||||||
@@ -2624,14 +2624,14 @@ std::optional<service::cas_shard> rmw_operation::shard_for_execute(bool needs_re
|
|||||||
// Build the return value from the different RMW operations (UpdateItem,
|
// Build the return value from the different RMW operations (UpdateItem,
|
||||||
// PutItem, DeleteItem). All these return nothing by default, but can
|
// PutItem, DeleteItem). All these return nothing by default, but can
|
||||||
// optionally return Attributes if requested via the ReturnValues option.
|
// optionally return Attributes if requested via the ReturnValues option.
|
||||||
static future<executor::request_return_type> rmw_operation_return(rjson::value&& attributes, const consumed_capacity_counter& consumed_capacity, uint64_t& metric) {
|
static executor::request_return_type rmw_operation_return(rjson::value&& attributes, const consumed_capacity_counter& consumed_capacity, uint64_t& metric) {
|
||||||
rjson::value ret = rjson::empty_object();
|
rjson::value ret = rjson::empty_object();
|
||||||
consumed_capacity.add_consumed_capacity_to_response_if_needed(ret);
|
consumed_capacity.add_consumed_capacity_to_response_if_needed(ret);
|
||||||
metric += consumed_capacity.get_consumed_capacity_units();
|
metric += consumed_capacity.get_consumed_capacity_units();
|
||||||
if (!attributes.IsNull()) {
|
if (!attributes.IsNull()) {
|
||||||
rjson::add(ret, "Attributes", std::move(attributes));
|
rjson::add(ret, "Attributes", std::move(attributes));
|
||||||
}
|
}
|
||||||
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
|
return rjson::print(std::move(ret));
|
||||||
}
|
}
|
||||||
|
|
||||||
static future<std::unique_ptr<rjson::value>> get_previous_item(
|
static future<std::unique_ptr<rjson::value>> get_previous_item(
|
||||||
@@ -2697,7 +2697,10 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
|
|||||||
stats& global_stats,
|
stats& global_stats,
|
||||||
stats& per_table_stats,
|
stats& per_table_stats,
|
||||||
uint64_t& wcu_total) {
|
uint64_t& wcu_total) {
|
||||||
auto cdc_opts = cdc::per_request_options{};
|
auto cdc_opts = cdc::per_request_options{
|
||||||
|
.alternator = true,
|
||||||
|
.alternator_streams_increased_compatibility = schema()->cdc_options().enabled() && proxy.data_dictionary().get_config().alternator_streams_increased_compatibility(),
|
||||||
|
};
|
||||||
if (needs_read_before_write) {
|
if (needs_read_before_write) {
|
||||||
if (_write_isolation == write_isolation::FORBID_RMW) {
|
if (_write_isolation == write_isolation::FORBID_RMW) {
|
||||||
throw api_error::validation("Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information.");
|
throw api_error::validation("Read-modify-write operations are disabled by 'forbid_rmw' write isolation policy. Refer to https://github.com/scylladb/scylla/blob/master/docs/alternator/alternator.md#write-isolation-policies for more information.");
|
||||||
@@ -2742,7 +2745,7 @@ future<executor::request_return_type> rmw_operation::execute(service::storage_pr
|
|||||||
if (!is_applied) {
|
if (!is_applied) {
|
||||||
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));
|
return make_ready_future<executor::request_return_type>(api_error::conditional_check_failed("The conditional request failed", std::move(_return_attributes)));
|
||||||
}
|
}
|
||||||
return rmw_operation_return(std::move(_return_attributes), _consumed_capacity, wcu_total);
|
return make_ready_future<executor::request_return_type>(rmw_operation_return(std::move(_return_attributes), _consumed_capacity, wcu_total));
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -2856,7 +2859,7 @@ future<executor::request_return_type> executor::put_item(client_state& client_st
|
|||||||
elogger.trace("put_item {}", request);
|
elogger.trace("put_item {}", request);
|
||||||
|
|
||||||
auto op = make_shared<put_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
|
auto op = make_shared<put_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
|
||||||
tracing::add_table_name(trace_state, op->schema()->ks_name(), op->schema()->cf_name());
|
tracing::add_alternator_table_name(trace_state, op->schema()->cf_name());
|
||||||
const bool needs_read_before_write = op->needs_read_before_write();
|
const bool needs_read_before_write = op->needs_read_before_write();
|
||||||
|
|
||||||
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
|
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
|
||||||
@@ -2960,7 +2963,7 @@ future<executor::request_return_type> executor::delete_item(client_state& client
|
|||||||
|
|
||||||
auto op = make_shared<delete_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
|
auto op = make_shared<delete_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
|
||||||
lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *(op->schema()));
|
lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *(op->schema()));
|
||||||
tracing::add_table_name(trace_state, op->schema()->ks_name(), op->schema()->cf_name());
|
tracing::add_alternator_table_name(trace_state, op->schema()->cf_name());
|
||||||
const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write();
|
const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write();
|
||||||
|
|
||||||
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
|
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
|
||||||
@@ -3054,6 +3057,9 @@ static future<> cas_write(service::storage_proxy& proxy, schema_ptr schema, serv
|
|||||||
auto timeout = executor::default_timeout();
|
auto timeout = executor::default_timeout();
|
||||||
auto op = seastar::make_shared<put_or_delete_item_cas_request>(schema, std::move(mutation_builders));
|
auto op = seastar::make_shared<put_or_delete_item_cas_request>(schema, std::move(mutation_builders));
|
||||||
auto cdc_opts = cdc::per_request_options{
|
auto cdc_opts = cdc::per_request_options{
|
||||||
|
.alternator = true,
|
||||||
|
.alternator_streams_increased_compatibility =
|
||||||
|
schema->cdc_options().enabled() && proxy.data_dictionary().get_config().alternator_streams_increased_compatibility(),
|
||||||
};
|
};
|
||||||
return proxy.cas(schema, std::move(cas_shard), op, nullptr, to_partition_ranges(dk),
|
return proxy.cas(schema, std::move(cas_shard), op, nullptr, to_partition_ranges(dk),
|
||||||
{timeout, std::move(permit), client_state, trace_state},
|
{timeout, std::move(permit), client_state, trace_state},
|
||||||
@@ -3104,8 +3110,10 @@ static future<> do_batch_write(service::storage_proxy& proxy,
|
|||||||
utils::chunked_vector<mutation> mutations;
|
utils::chunked_vector<mutation> mutations;
|
||||||
mutations.reserve(mutation_builders.size());
|
mutations.reserve(mutation_builders.size());
|
||||||
api::timestamp_type now = api::new_timestamp();
|
api::timestamp_type now = api::new_timestamp();
|
||||||
|
bool any_cdc_enabled = false;
|
||||||
for (auto& b : mutation_builders) {
|
for (auto& b : mutation_builders) {
|
||||||
mutations.push_back(b.second.build(b.first, now));
|
mutations.push_back(b.second.build(b.first, now));
|
||||||
|
any_cdc_enabled |= b.first->cdc_options().enabled();
|
||||||
}
|
}
|
||||||
return proxy.mutate(std::move(mutations),
|
return proxy.mutate(std::move(mutations),
|
||||||
db::consistency_level::LOCAL_QUORUM,
|
db::consistency_level::LOCAL_QUORUM,
|
||||||
@@ -3114,7 +3122,10 @@ static future<> do_batch_write(service::storage_proxy& proxy,
|
|||||||
std::move(permit),
|
std::move(permit),
|
||||||
db::allow_per_partition_rate_limit::yes,
|
db::allow_per_partition_rate_limit::yes,
|
||||||
false,
|
false,
|
||||||
cdc::per_request_options{});
|
cdc::per_request_options{
|
||||||
|
.alternator = true,
|
||||||
|
.alternator_streams_increased_compatibility = any_cdc_enabled && proxy.data_dictionary().get_config().alternator_streams_increased_compatibility(),
|
||||||
|
});
|
||||||
} else {
|
} else {
|
||||||
// Do the write via LWT:
|
// Do the write via LWT:
|
||||||
// Multiple mutations may be destined for the same partition, adding
|
// Multiple mutations may be destined for the same partition, adding
|
||||||
@@ -3204,7 +3215,7 @@ future<executor::request_return_type> executor::batch_write_item(client_state& c
|
|||||||
per_table_stats->api_operations.batch_write_item++;
|
per_table_stats->api_operations.batch_write_item++;
|
||||||
per_table_stats->api_operations.batch_write_item_batch_total += it->value.Size();
|
per_table_stats->api_operations.batch_write_item_batch_total += it->value.Size();
|
||||||
per_table_stats->api_operations.batch_write_item_histogram.add(it->value.Size());
|
per_table_stats->api_operations.batch_write_item_histogram.add(it->value.Size());
|
||||||
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
|
tracing::add_alternator_table_name(trace_state, schema->cf_name());
|
||||||
|
|
||||||
std::unordered_set<primary_key, primary_key_hash, primary_key_equal> used_keys(
|
std::unordered_set<primary_key, primary_key_hash, primary_key_equal> used_keys(
|
||||||
1, primary_key_hash{schema}, primary_key_equal{schema});
|
1, primary_key_hash{schema}, primary_key_equal{schema});
|
||||||
@@ -4464,7 +4475,7 @@ future<executor::request_return_type> executor::update_item(client_state& client
|
|||||||
elogger.trace("update_item {}", request);
|
elogger.trace("update_item {}", request);
|
||||||
|
|
||||||
auto op = make_shared<update_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
|
auto op = make_shared<update_item_operation>(*_parsed_expression_cache, _proxy, std::move(request));
|
||||||
tracing::add_table_name(trace_state, op->schema()->ks_name(), op->schema()->cf_name());
|
tracing::add_alternator_table_name(trace_state, op->schema()->cf_name());
|
||||||
const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write();
|
const bool needs_read_before_write = _proxy.data_dictionary().get_config().alternator_force_read_before_write() || op->needs_read_before_write();
|
||||||
|
|
||||||
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
|
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, op->schema(), auth::permission::MODIFY, _stats);
|
||||||
@@ -4545,7 +4556,7 @@ future<executor::request_return_type> executor::get_item(client_state& client_st
|
|||||||
schema_ptr schema = get_table(_proxy, request);
|
schema_ptr schema = get_table(_proxy, request);
|
||||||
lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *schema);
|
lw_shared_ptr<stats> per_table_stats = get_stats_from_schema(_proxy, *schema);
|
||||||
per_table_stats->api_operations.get_item++;
|
per_table_stats->api_operations.get_item++;
|
||||||
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
|
tracing::add_alternator_table_name(trace_state, schema->cf_name());
|
||||||
|
|
||||||
rjson::value& query_key = request["Key"];
|
rjson::value& query_key = request["Key"];
|
||||||
db::consistency_level cl = get_read_consistency(request);
|
db::consistency_level cl = get_read_consistency(request);
|
||||||
@@ -4694,7 +4705,7 @@ future<executor::request_return_type> executor::batch_get_item(client_state& cli
|
|||||||
uint batch_size = 0;
|
uint batch_size = 0;
|
||||||
for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
|
for (auto it = request_items.MemberBegin(); it != request_items.MemberEnd(); ++it) {
|
||||||
table_requests rs(get_table_from_batch_request(_proxy, it));
|
table_requests rs(get_table_from_batch_request(_proxy, it));
|
||||||
tracing::add_table_name(trace_state, sstring(executor::KEYSPACE_NAME_PREFIX) + rs.schema->cf_name(), rs.schema->cf_name());
|
tracing::add_alternator_table_name(trace_state, rs.schema->cf_name());
|
||||||
rs.cl = get_read_consistency(it->value);
|
rs.cl = get_read_consistency(it->value);
|
||||||
std::unordered_set<std::string> used_attribute_names;
|
std::unordered_set<std::string> used_attribute_names;
|
||||||
rs.attrs_to_get = ::make_shared<const std::optional<attrs_to_get>>(calculate_attrs_to_get(it->value, *_parsed_expression_cache, used_attribute_names));
|
rs.attrs_to_get = ::make_shared<const std::optional<attrs_to_get>>(calculate_attrs_to_get(it->value, *_parsed_expression_cache, used_attribute_names));
|
||||||
@@ -5130,13 +5141,15 @@ static rjson::value encode_paging_state(const schema& schema, const service::pag
|
|||||||
}
|
}
|
||||||
auto pos = paging_state.get_position_in_partition();
|
auto pos = paging_state.get_position_in_partition();
|
||||||
if (pos.has_key()) {
|
if (pos.has_key()) {
|
||||||
auto exploded_ck = pos.key().explode();
|
// Alternator itself allows at most one column in clustering key, but
|
||||||
auto exploded_ck_it = exploded_ck.begin();
|
// user can use Alternator api to access system tables which might have
|
||||||
for (const column_definition& cdef : schema.clustering_key_columns()) {
|
// multiple clustering key columns. So we need to handle that case here.
|
||||||
rjson::add_with_string_name(last_evaluated_key, std::string_view(cdef.name_as_text()), rjson::empty_object());
|
auto cdef_it = schema.clustering_key_columns().begin();
|
||||||
rjson::value& key_entry = last_evaluated_key[cdef.name_as_text()];
|
for(const auto &exploded_ck : pos.key().explode()) {
|
||||||
rjson::add_with_string_name(key_entry, type_to_string(cdef.type), json_key_column_value(*exploded_ck_it, cdef));
|
rjson::add_with_string_name(last_evaluated_key, std::string_view(cdef_it->name_as_text()), rjson::empty_object());
|
||||||
++exploded_ck_it;
|
rjson::value& key_entry = last_evaluated_key[cdef_it->name_as_text()];
|
||||||
|
rjson::add_with_string_name(key_entry, type_to_string(cdef_it->type), json_key_column_value(exploded_ck, *cdef_it));
|
||||||
|
++cdef_it;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
// To avoid possible conflicts (and thus having to reserve these names) we
|
// To avoid possible conflicts (and thus having to reserve these names) we
|
||||||
@@ -5296,6 +5309,7 @@ future<executor::request_return_type> executor::scan(client_state& client_state,
|
|||||||
elogger.trace("Scanning {}", request);
|
elogger.trace("Scanning {}", request);
|
||||||
|
|
||||||
auto [schema, table_type] = get_table_or_view(_proxy, request);
|
auto [schema, table_type] = get_table_or_view(_proxy, request);
|
||||||
|
tracing::add_alternator_table_name(trace_state, schema->cf_name());
|
||||||
get_stats_from_schema(_proxy, *schema)->api_operations.scan++;
|
get_stats_from_schema(_proxy, *schema)->api_operations.scan++;
|
||||||
auto segment = get_int_attribute(request, "Segment");
|
auto segment = get_int_attribute(request, "Segment");
|
||||||
auto total_segments = get_int_attribute(request, "TotalSegments");
|
auto total_segments = get_int_attribute(request, "TotalSegments");
|
||||||
@@ -5775,7 +5789,7 @@ future<executor::request_return_type> executor::query(client_state& client_state
|
|||||||
|
|
||||||
auto [schema, table_type] = get_table_or_view(_proxy, request);
|
auto [schema, table_type] = get_table_or_view(_proxy, request);
|
||||||
get_stats_from_schema(_proxy, *schema)->api_operations.query++;
|
get_stats_from_schema(_proxy, *schema)->api_operations.query++;
|
||||||
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
|
tracing::add_alternator_table_name(trace_state, schema->cf_name());
|
||||||
|
|
||||||
rjson::value* exclusive_start_key = rjson::find(request, "ExclusiveStartKey");
|
rjson::value* exclusive_start_key = rjson::find(request, "ExclusiveStartKey");
|
||||||
db::consistency_level cl = get_read_consistency(request);
|
db::consistency_level cl = get_read_consistency(request);
|
||||||
|
|||||||
@@ -282,15 +282,23 @@ std::string type_to_string(data_type type) {
|
|||||||
return it->second;
|
return it->second;
|
||||||
}
|
}
|
||||||
|
|
||||||
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
|
std::optional<bytes> try_get_key_column_value(const rjson::value& item, const column_definition& column) {
|
||||||
std::string column_name = column.name_as_text();
|
std::string column_name = column.name_as_text();
|
||||||
const rjson::value* key_typed_value = rjson::find(item, column_name);
|
const rjson::value* key_typed_value = rjson::find(item, column_name);
|
||||||
if (!key_typed_value) {
|
if (!key_typed_value) {
|
||||||
throw api_error::validation(fmt::format("Key column {} not found", column_name));
|
return std::nullopt;
|
||||||
}
|
}
|
||||||
return get_key_from_typed_value(*key_typed_value, column);
|
return get_key_from_typed_value(*key_typed_value, column);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
|
||||||
|
auto value = try_get_key_column_value(item, column);
|
||||||
|
if (!value) {
|
||||||
|
throw api_error::validation(fmt::format("Key column {} not found", column.name_as_text()));
|
||||||
|
}
|
||||||
|
return std::move(*value);
|
||||||
|
}
|
||||||
|
|
||||||
// Parses the JSON encoding for a key value, which is a map with a single
|
// Parses the JSON encoding for a key value, which is a map with a single
|
||||||
// entry whose key is the type and the value is the encoded value.
|
// entry whose key is the type and the value is the encoded value.
|
||||||
// If this type does not match the desired "type_str", an api_error::validation
|
// If this type does not match the desired "type_str", an api_error::validation
|
||||||
@@ -380,20 +388,38 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
|
|||||||
return clustering_key::make_empty();
|
return clustering_key::make_empty();
|
||||||
}
|
}
|
||||||
std::vector<bytes> raw_ck;
|
std::vector<bytes> raw_ck;
|
||||||
// FIXME: this is a loop, but we really allow only one clustering key column.
|
// Note: it's possible to get more than one clustering column here, as
|
||||||
|
// Alternator can be used to read scylla internal tables.
|
||||||
for (const column_definition& cdef : schema->clustering_key_columns()) {
|
for (const column_definition& cdef : schema->clustering_key_columns()) {
|
||||||
bytes raw_value = get_key_column_value(item, cdef);
|
auto raw_value = get_key_column_value(item, cdef);
|
||||||
raw_ck.push_back(std::move(raw_value));
|
raw_ck.push_back(std::move(raw_value));
|
||||||
}
|
}
|
||||||
|
|
||||||
return clustering_key::from_exploded(raw_ck);
|
return clustering_key::from_exploded(raw_ck);
|
||||||
}
|
}
|
||||||
|
|
||||||
position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema) {
|
clustering_key_prefix ck_prefix_from_json(const rjson::value& item, schema_ptr schema) {
|
||||||
auto ck = ck_from_json(item, schema);
|
if (schema->clustering_key_size() == 0) {
|
||||||
if (is_alternator_keyspace(schema->ks_name())) {
|
return clustering_key_prefix::make_empty();
|
||||||
return position_in_partition::for_key(std::move(ck));
|
|
||||||
}
|
}
|
||||||
|
std::vector<bytes> raw_ck;
|
||||||
|
for (const column_definition& cdef : schema->clustering_key_columns()) {
|
||||||
|
auto raw_value = try_get_key_column_value(item, cdef);
|
||||||
|
if (!raw_value) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
raw_ck.push_back(std::move(*raw_value));
|
||||||
|
}
|
||||||
|
|
||||||
|
return clustering_key_prefix::from_exploded(raw_ck);
|
||||||
|
}
|
||||||
|
|
||||||
|
position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema) {
|
||||||
|
const bool is_alternator_ks = is_alternator_keyspace(schema->ks_name());
|
||||||
|
if (is_alternator_ks) {
|
||||||
|
return position_in_partition::for_key(ck_from_json(item, schema));
|
||||||
|
}
|
||||||
|
|
||||||
const auto region_item = rjson::find(item, scylla_paging_region);
|
const auto region_item = rjson::find(item, scylla_paging_region);
|
||||||
const auto weight_item = rjson::find(item, scylla_paging_weight);
|
const auto weight_item = rjson::find(item, scylla_paging_weight);
|
||||||
if (bool(region_item) != bool(weight_item)) {
|
if (bool(region_item) != bool(weight_item)) {
|
||||||
@@ -413,8 +439,9 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
|
|||||||
} else {
|
} else {
|
||||||
throw std::runtime_error(fmt::format("Invalid value for weight: {}", weight_view));
|
throw std::runtime_error(fmt::format("Invalid value for weight: {}", weight_view));
|
||||||
}
|
}
|
||||||
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(std::move(ck)) : std::nullopt);
|
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(ck_prefix_from_json(item, schema)) : std::nullopt);
|
||||||
}
|
}
|
||||||
|
auto ck = ck_from_json(item, schema);
|
||||||
if (ck.is_empty()) {
|
if (ck.is_empty()) {
|
||||||
return position_in_partition::for_partition_start();
|
return position_in_partition::for_partition_start();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -13,6 +13,7 @@
|
|||||||
#include <seastar/http/function_handlers.hh>
|
#include <seastar/http/function_handlers.hh>
|
||||||
#include <seastar/http/short_streams.hh>
|
#include <seastar/http/short_streams.hh>
|
||||||
#include <seastar/core/coroutine.hh>
|
#include <seastar/core/coroutine.hh>
|
||||||
|
#include <seastar/coroutine/maybe_yield.hh>
|
||||||
#include <seastar/util/defer.hh>
|
#include <seastar/util/defer.hh>
|
||||||
#include <seastar/util/short_streams.hh>
|
#include <seastar/util/short_streams.hh>
|
||||||
#include "seastarx.hh"
|
#include "seastarx.hh"
|
||||||
@@ -32,6 +33,7 @@
|
|||||||
#include "utils/aws_sigv4.hh"
|
#include "utils/aws_sigv4.hh"
|
||||||
#include "client_data.hh"
|
#include "client_data.hh"
|
||||||
#include "utils/updateable_value.hh"
|
#include "utils/updateable_value.hh"
|
||||||
|
#include <zlib.h>
|
||||||
|
|
||||||
static logging::logger slogger("alternator-server");
|
static logging::logger slogger("alternator-server");
|
||||||
|
|
||||||
@@ -551,6 +553,106 @@ read_entire_stream(input_stream<char>& inp, size_t length_limit) {
|
|||||||
co_return ret;
|
co_return ret;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// safe_gzip_stream is an exception-safe wrapper for zlib's z_stream.
|
||||||
|
// The "z_stream" struct is used by zlib to hold state while decompressing a
|
||||||
|
// stream of data. It allocates memory which must be freed with inflateEnd(),
|
||||||
|
// which the destructor of this class does.
|
||||||
|
class safe_gzip_zstream {
|
||||||
|
z_stream _zs;
|
||||||
|
public:
|
||||||
|
safe_gzip_zstream() {
|
||||||
|
memset(&_zs, 0, sizeof(_zs));
|
||||||
|
// The strange 16 + WMAX_BITS tells zlib to expect and decode
|
||||||
|
// a gzip header, not a zlib header.
|
||||||
|
if (inflateInit2(&_zs, 16 + MAX_WBITS) != Z_OK) {
|
||||||
|
// Should only happen if memory allocation fails
|
||||||
|
throw std::bad_alloc();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
~safe_gzip_zstream() {
|
||||||
|
inflateEnd(&_zs);
|
||||||
|
}
|
||||||
|
z_stream* operator->() {
|
||||||
|
return &_zs;
|
||||||
|
}
|
||||||
|
z_stream* get() {
|
||||||
|
return &_zs;
|
||||||
|
}
|
||||||
|
void reset() {
|
||||||
|
inflateReset(&_zs);
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
// ungzip() takes a chunked_content with a gzip-compressed request body,
|
||||||
|
// uncompresses it, and returns the uncompressed content as a chunked_content.
|
||||||
|
// If the uncompressed content exceeds length_limit, an error is thrown.
|
||||||
|
static future<chunked_content>
|
||||||
|
ungzip(chunked_content&& compressed_body, size_t length_limit) {
|
||||||
|
chunked_content ret;
|
||||||
|
// output_buf can be any size - when uncompressing input_buf, it doesn't
|
||||||
|
// need to fit in a single output_buf, we'll use multiple output_buf for
|
||||||
|
// a single input_buf if needed.
|
||||||
|
constexpr size_t OUTPUT_BUF_SIZE = 4096;
|
||||||
|
temporary_buffer<char> output_buf;
|
||||||
|
safe_gzip_zstream strm;
|
||||||
|
bool complete_stream = false; // empty input is not a valid gzip
|
||||||
|
size_t total_out_bytes = 0;
|
||||||
|
for (const temporary_buffer<char>& input_buf : compressed_body) {
|
||||||
|
if (input_buf.empty()) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
complete_stream = false;
|
||||||
|
strm->next_in = (Bytef*) input_buf.get();
|
||||||
|
strm->avail_in = (uInt) input_buf.size();
|
||||||
|
do {
|
||||||
|
co_await coroutine::maybe_yield();
|
||||||
|
if (output_buf.empty()) {
|
||||||
|
output_buf = temporary_buffer<char>(OUTPUT_BUF_SIZE);
|
||||||
|
}
|
||||||
|
strm->next_out = (Bytef*) output_buf.get();
|
||||||
|
strm->avail_out = OUTPUT_BUF_SIZE;
|
||||||
|
int e = inflate(strm.get(), Z_NO_FLUSH);
|
||||||
|
size_t out_bytes = OUTPUT_BUF_SIZE - strm->avail_out;
|
||||||
|
if (out_bytes > 0) {
|
||||||
|
// If output_buf is nearly full, we save it as-is in ret. But
|
||||||
|
// if it only has little data, better copy to a small buffer.
|
||||||
|
if (out_bytes > OUTPUT_BUF_SIZE/2) {
|
||||||
|
ret.push_back(std::move(output_buf).prefix(out_bytes));
|
||||||
|
// output_buf is now empty. if this loop finds more input,
|
||||||
|
// we'll allocate a new output buffer.
|
||||||
|
} else {
|
||||||
|
ret.push_back(temporary_buffer<char>(output_buf.get(), out_bytes));
|
||||||
|
}
|
||||||
|
total_out_bytes += out_bytes;
|
||||||
|
if (total_out_bytes > length_limit) {
|
||||||
|
throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (e == Z_STREAM_END) {
|
||||||
|
// There may be more input after the first gzip stream - in
|
||||||
|
// either this input_buf or the next one. The additional input
|
||||||
|
// should be a second concatenated gzip. We need to allow that
|
||||||
|
// by resetting the gzip stream and continuing the input loop
|
||||||
|
// until there's no more input.
|
||||||
|
strm.reset();
|
||||||
|
if (strm->avail_in == 0) {
|
||||||
|
complete_stream = true;
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
} else if (e != Z_OK && e != Z_BUF_ERROR) {
|
||||||
|
// DynamoDB returns an InternalServerError when given a bad
|
||||||
|
// gzip request body. See test test_broken_gzip_content
|
||||||
|
throw api_error::internal("Error during gzip decompression of request body");
|
||||||
|
}
|
||||||
|
} while (strm->avail_in > 0 || strm->avail_out == 0);
|
||||||
|
}
|
||||||
|
if (!complete_stream) {
|
||||||
|
// The gzip stream was not properly finished with Z_STREAM_END
|
||||||
|
throw api_error::internal("Truncated gzip in request body");
|
||||||
|
}
|
||||||
|
co_return ret;
|
||||||
|
}
|
||||||
|
|
||||||
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
|
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
|
||||||
_executor._stats.total_operations++;
|
_executor._stats.total_operations++;
|
||||||
sstring target = req->get_header("X-Amz-Target");
|
sstring target = req->get_header("X-Amz-Target");
|
||||||
@@ -588,6 +690,21 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
|
|||||||
units.return_units(mem_estimate - new_mem_estimate);
|
units.return_units(mem_estimate - new_mem_estimate);
|
||||||
}
|
}
|
||||||
auto username = co_await verify_signature(*req, content);
|
auto username = co_await verify_signature(*req, content);
|
||||||
|
// If the request is compressed, uncompress it now, after we checked
|
||||||
|
// the signature (the signature is computed on the compressed content).
|
||||||
|
// We apply the request_content_length_limit again to the uncompressed
|
||||||
|
// content - we don't want to allow a tiny compressed request to
|
||||||
|
// expand to a huge uncompressed request.
|
||||||
|
sstring content_encoding = req->get_header("Content-Encoding");
|
||||||
|
if (content_encoding == "gzip") {
|
||||||
|
content = co_await ungzip(std::move(content), request_content_length_limit);
|
||||||
|
} else if (!content_encoding.empty()) {
|
||||||
|
// DynamoDB returns a 500 error for unsupported Content-Encoding.
|
||||||
|
// I'm not sure if this is the best error code, but let's do it too.
|
||||||
|
// See the test test_garbage_content_encoding confirming this case.
|
||||||
|
co_return api_error::internal("Unsupported Content-Encoding");
|
||||||
|
}
|
||||||
|
|
||||||
// As long as the system_clients_entry object is alive, this request will
|
// As long as the system_clients_entry object is alive, this request will
|
||||||
// be visible in the "system.clients" virtual table. When requested, this
|
// be visible in the "system.clients" virtual table. When requested, this
|
||||||
// entry will be formatted by server::ongoing_request::make_client_data().
|
// entry will be formatted by server::ongoing_request::make_client_data().
|
||||||
|
|||||||
@@ -106,5 +106,8 @@ target_link_libraries(api
|
|||||||
wasmtime_bindings
|
wasmtime_bindings
|
||||||
absl::headers)
|
absl::headers)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(api REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers api
|
check_headers(check-headers api
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
@@ -66,6 +66,13 @@ static future<json::json_return_type> get_cf_stats(sharded<replica::database>&
|
|||||||
}, std::plus<int64_t>());
|
}, std::plus<int64_t>());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static future<json::json_return_type> get_cf_stats(sharded<replica::database>& db,
|
||||||
|
std::function<int64_t(const replica::column_family_stats&)> f) {
|
||||||
|
return map_reduce_cf(db, int64_t(0), [f](const replica::column_family& cf) {
|
||||||
|
return f(cf.get_stats());
|
||||||
|
}, std::plus<int64_t>());
|
||||||
|
}
|
||||||
|
|
||||||
static future<json::json_return_type> for_tables_on_all_shards(sharded<replica::database>& db, std::vector<table_info> tables, std::function<future<>(replica::table&)> set) {
|
static future<json::json_return_type> for_tables_on_all_shards(sharded<replica::database>& db, std::vector<table_info> tables, std::function<future<>(replica::table&)> set) {
|
||||||
return do_with(std::move(tables), [&db, set] (const std::vector<table_info>& tables) {
|
return do_with(std::move(tables), [&db, set] (const std::vector<table_info>& tables) {
|
||||||
return db.invoke_on_all([&tables, set] (replica::database& db) {
|
return db.invoke_on_all([&tables, set] (replica::database& db) {
|
||||||
@@ -1066,10 +1073,14 @@ void set_column_family(http_context& ctx, routes& r, sharded<replica::database>&
|
|||||||
});
|
});
|
||||||
|
|
||||||
ss::get_load.set(r, [&db] (std::unique_ptr<http::request> req) {
|
ss::get_load.set(r, [&db] (std::unique_ptr<http::request> req) {
|
||||||
return get_cf_stats(db, &replica::column_family_stats::live_disk_space_used);
|
return get_cf_stats(db, [](const replica::column_family_stats& stats) {
|
||||||
|
return stats.live_disk_space_used.on_disk;
|
||||||
|
});
|
||||||
});
|
});
|
||||||
ss::get_metrics_load.set(r, [&db] (std::unique_ptr<http::request> req) {
|
ss::get_metrics_load.set(r, [&db] (std::unique_ptr<http::request> req) {
|
||||||
return get_cf_stats(db, &replica::column_family_stats::live_disk_space_used);
|
return get_cf_stats(db, [](const replica::column_family_stats& stats) {
|
||||||
|
return stats.live_disk_space_used.on_disk;
|
||||||
|
});
|
||||||
});
|
});
|
||||||
|
|
||||||
ss::get_keyspaces.set(r, [&db] (const_req req) {
|
ss::get_keyspaces.set(r, [&db] (const_req req) {
|
||||||
|
|||||||
@@ -17,4 +17,7 @@ target_link_libraries(scylla_audit
|
|||||||
PRIVATE
|
PRIVATE
|
||||||
cql3)
|
cql3)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(scylla_audit REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
add_whole_archive(audit scylla_audit)
|
add_whole_archive(audit scylla_audit)
|
||||||
|
|||||||
@@ -9,6 +9,7 @@ target_sources(scylla_auth
|
|||||||
allow_all_authorizer.cc
|
allow_all_authorizer.cc
|
||||||
authenticated_user.cc
|
authenticated_user.cc
|
||||||
authenticator.cc
|
authenticator.cc
|
||||||
|
cache.cc
|
||||||
certificate_authenticator.cc
|
certificate_authenticator.cc
|
||||||
common.cc
|
common.cc
|
||||||
default_authorizer.cc
|
default_authorizer.cc
|
||||||
@@ -44,5 +45,8 @@ target_link_libraries(scylla_auth
|
|||||||
|
|
||||||
add_whole_archive(auth scylla_auth)
|
add_whole_archive(auth scylla_auth)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(scylla_auth REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers scylla_auth
|
check_headers(check-headers scylla_auth
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
@@ -23,6 +23,7 @@ static const class_registrator<
|
|||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&,
|
::service::migration_manager&,
|
||||||
|
cache&,
|
||||||
utils::alien_worker&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
|
utils::alien_worker&> registration("org.apache.cassandra.auth.AllowAllAuthenticator");
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -12,6 +12,7 @@
|
|||||||
|
|
||||||
#include "auth/authenticated_user.hh"
|
#include "auth/authenticated_user.hh"
|
||||||
#include "auth/authenticator.hh"
|
#include "auth/authenticator.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "auth/common.hh"
|
#include "auth/common.hh"
|
||||||
#include "utils/alien_worker.hh"
|
#include "utils/alien_worker.hh"
|
||||||
|
|
||||||
@@ -29,7 +30,7 @@ extern const std::string_view allow_all_authenticator_name;
|
|||||||
|
|
||||||
class allow_all_authenticator final : public authenticator {
|
class allow_all_authenticator final : public authenticator {
|
||||||
public:
|
public:
|
||||||
allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&) {
|
allow_all_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&) {
|
||||||
}
|
}
|
||||||
|
|
||||||
virtual future<> start() override {
|
virtual future<> start() override {
|
||||||
|
|||||||
180
auth/cache.cc
Normal file
180
auth/cache.cc
Normal file
@@ -0,0 +1,180 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2017-present ScyllaDB
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
|
||||||
|
*/
|
||||||
|
|
||||||
|
#include "auth/cache.hh"
|
||||||
|
#include "auth/common.hh"
|
||||||
|
#include "auth/roles-metadata.hh"
|
||||||
|
#include "cql3/query_processor.hh"
|
||||||
|
#include "cql3/untyped_result_set.hh"
|
||||||
|
#include "db/consistency_level_type.hh"
|
||||||
|
#include "db/system_keyspace.hh"
|
||||||
|
#include "schema/schema.hh"
|
||||||
|
#include <iterator>
|
||||||
|
#include <seastar/coroutine/maybe_yield.hh>
|
||||||
|
#include <seastar/core/format.hh>
|
||||||
|
|
||||||
|
namespace auth {
|
||||||
|
|
||||||
|
logging::logger logger("auth-cache");
|
||||||
|
|
||||||
|
cache::cache(cql3::query_processor& qp) noexcept
|
||||||
|
: _current_version(0)
|
||||||
|
, _qp(qp) {
|
||||||
|
}
|
||||||
|
|
||||||
|
lw_shared_ptr<const cache::role_record> cache::get(const role_name_t& role) const noexcept {
|
||||||
|
auto it = _roles.find(role);
|
||||||
|
if (it == _roles.end()) {
|
||||||
|
return {};
|
||||||
|
}
|
||||||
|
return it->second;
|
||||||
|
}
|
||||||
|
|
||||||
|
future<lw_shared_ptr<cache::role_record>> cache::fetch_role(const role_name_t& role) const {
|
||||||
|
auto rec = make_lw_shared<role_record>();
|
||||||
|
rec->version = _current_version;
|
||||||
|
|
||||||
|
auto fetch = [this, &role](const sstring& q) {
|
||||||
|
return _qp.execute_internal(q, db::consistency_level::LOCAL_ONE,
|
||||||
|
internal_distributed_query_state(), {role},
|
||||||
|
cql3::query_processor::cache_internal::yes);
|
||||||
|
};
|
||||||
|
// roles
|
||||||
|
{
|
||||||
|
static const sstring q = format("SELECT * FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, meta::roles_table::name);
|
||||||
|
auto rs = co_await fetch(q);
|
||||||
|
if (!rs->empty()) {
|
||||||
|
auto& r = rs->one();
|
||||||
|
rec->is_superuser = r.get_or<bool>("is_superuser", false);
|
||||||
|
rec->can_login = r.get_or<bool>("can_login", false);
|
||||||
|
rec->salted_hash = r.get_or<sstring>("salted_hash", "");
|
||||||
|
if (r.has("member_of")) {
|
||||||
|
auto mo = r.get_set<sstring>("member_of");
|
||||||
|
rec->member_of.insert(
|
||||||
|
std::make_move_iterator(mo.begin()),
|
||||||
|
std::make_move_iterator(mo.end()));
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
// role got deleted
|
||||||
|
co_return nullptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// members
|
||||||
|
{
|
||||||
|
static const sstring q = format("SELECT role, member FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, ROLE_MEMBERS_CF);
|
||||||
|
auto rs = co_await fetch(q);
|
||||||
|
for (const auto& r : *rs) {
|
||||||
|
rec->members.insert(r.get_as<sstring>("member"));
|
||||||
|
co_await coroutine::maybe_yield();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// attributes
|
||||||
|
{
|
||||||
|
static const sstring q = format("SELECT role, name, value FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, ROLE_ATTRIBUTES_CF);
|
||||||
|
auto rs = co_await fetch(q);
|
||||||
|
for (const auto& r : *rs) {
|
||||||
|
rec->attributes[r.get_as<sstring>("name")] =
|
||||||
|
r.get_as<sstring>("value");
|
||||||
|
co_await coroutine::maybe_yield();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
// permissions
|
||||||
|
{
|
||||||
|
static const sstring q = format("SELECT role, resource, permissions FROM {}.{} WHERE role = ?", db::system_keyspace::NAME, PERMISSIONS_CF);
|
||||||
|
auto rs = co_await fetch(q);
|
||||||
|
for (const auto& r : *rs) {
|
||||||
|
auto resource = r.get_as<sstring>("resource");
|
||||||
|
auto perms_strings = r.get_set<sstring>("permissions");
|
||||||
|
std::unordered_set<sstring> perms_set(perms_strings.begin(), perms_strings.end());
|
||||||
|
auto pset = permissions::from_strings(perms_set);
|
||||||
|
rec->permissions[std::move(resource)] = std::move(pset);
|
||||||
|
co_await coroutine::maybe_yield();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
co_return rec;
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> cache::prune_all() noexcept {
|
||||||
|
for (auto it = _roles.begin(); it != _roles.end(); ) {
|
||||||
|
if (it->second->version != _current_version) {
|
||||||
|
_roles.erase(it++);
|
||||||
|
co_await coroutine::maybe_yield();
|
||||||
|
} else {
|
||||||
|
++it;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> cache::load_all() {
|
||||||
|
if (legacy_mode(_qp)) {
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
SCYLLA_ASSERT(this_shard_id() == 0);
|
||||||
|
++_current_version;
|
||||||
|
|
||||||
|
logger.info("Loading all roles");
|
||||||
|
const uint32_t page_size = 128;
|
||||||
|
auto loader = [this](const cql3::untyped_result_set::row& r) -> future<stop_iteration> {
|
||||||
|
const auto name = r.get_as<sstring>("role");
|
||||||
|
auto role = co_await fetch_role(name);
|
||||||
|
if (role) {
|
||||||
|
_roles[name] = role;
|
||||||
|
}
|
||||||
|
co_return stop_iteration::no;
|
||||||
|
};
|
||||||
|
co_await _qp.query_internal(format("SELECT * FROM {}.{}",
|
||||||
|
db::system_keyspace::NAME, meta::roles_table::name),
|
||||||
|
db::consistency_level::LOCAL_ONE, {}, page_size, loader);
|
||||||
|
|
||||||
|
co_await prune_all();
|
||||||
|
for (const auto& [name, role] : _roles) {
|
||||||
|
co_await distribute_role(name, role);
|
||||||
|
}
|
||||||
|
co_await container().invoke_on_others([this](cache& c) -> future<> {
|
||||||
|
c._current_version = _current_version;
|
||||||
|
co_await c.prune_all();
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> cache::load_roles(std::unordered_set<role_name_t> roles) {
|
||||||
|
if (legacy_mode(_qp)) {
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
for (const auto& name : roles) {
|
||||||
|
logger.info("Loading role {}", name);
|
||||||
|
auto role = co_await fetch_role(name);
|
||||||
|
if (role) {
|
||||||
|
_roles[name] = role;
|
||||||
|
} else {
|
||||||
|
_roles.erase(name);
|
||||||
|
}
|
||||||
|
co_await distribute_role(name, role);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> cache::distribute_role(const role_name_t& name, lw_shared_ptr<role_record> role) {
|
||||||
|
auto role_ptr = role.get();
|
||||||
|
co_await container().invoke_on_others([&name, role_ptr](cache& c) {
|
||||||
|
if (!role_ptr) {
|
||||||
|
c._roles.erase(name);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
auto role_copy = make_lw_shared<role_record>(*role_ptr);
|
||||||
|
c._roles[name] = std::move(role_copy);
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
bool cache::includes_table(const table_id& id) noexcept {
|
||||||
|
return id == db::system_keyspace::roles()->id()
|
||||||
|
|| id == db::system_keyspace::role_members()->id()
|
||||||
|
|| id == db::system_keyspace::role_attributes()->id()
|
||||||
|
|| id == db::system_keyspace::role_permissions()->id();
|
||||||
|
}
|
||||||
|
|
||||||
|
} // namespace auth
|
||||||
61
auth/cache.hh
Normal file
61
auth/cache.hh
Normal file
@@ -0,0 +1,61 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2025-present ScyllaDB
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
|
||||||
|
*/
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <unordered_set>
|
||||||
|
#include <unordered_map>
|
||||||
|
|
||||||
|
#include <seastar/core/sstring.hh>
|
||||||
|
#include <seastar/core/future.hh>
|
||||||
|
#include <seastar/core/sharded.hh>
|
||||||
|
#include <seastar/core/shared_ptr.hh>
|
||||||
|
|
||||||
|
#include <absl/container/flat_hash_map.h>
|
||||||
|
|
||||||
|
#include "auth/permission.hh"
|
||||||
|
#include "auth/common.hh"
|
||||||
|
|
||||||
|
namespace cql3 { class query_processor; }
|
||||||
|
|
||||||
|
namespace auth {
|
||||||
|
|
||||||
|
class cache : public peering_sharded_service<cache> {
|
||||||
|
public:
|
||||||
|
using role_name_t = sstring;
|
||||||
|
using version_tag_t = char;
|
||||||
|
|
||||||
|
struct role_record {
|
||||||
|
bool can_login = false;
|
||||||
|
bool is_superuser = false;
|
||||||
|
std::unordered_set<role_name_t> member_of;
|
||||||
|
std::unordered_set<role_name_t> members;
|
||||||
|
sstring salted_hash;
|
||||||
|
std::unordered_map<sstring, sstring> attributes;
|
||||||
|
std::unordered_map<sstring, permission_set> permissions;
|
||||||
|
version_tag_t version; // used for seamless cache reloads
|
||||||
|
};
|
||||||
|
|
||||||
|
explicit cache(cql3::query_processor& qp) noexcept;
|
||||||
|
lw_shared_ptr<const role_record> get(const role_name_t& role) const noexcept;
|
||||||
|
future<> load_all();
|
||||||
|
future<> load_roles(std::unordered_set<role_name_t> roles);
|
||||||
|
static bool includes_table(const table_id&) noexcept;
|
||||||
|
|
||||||
|
private:
|
||||||
|
using roles_map = absl::flat_hash_map<role_name_t, lw_shared_ptr<role_record>>;
|
||||||
|
roles_map _roles;
|
||||||
|
version_tag_t _current_version;
|
||||||
|
cql3::query_processor& _qp;
|
||||||
|
|
||||||
|
future<lw_shared_ptr<role_record>> fetch_role(const role_name_t& role) const;
|
||||||
|
future<> prune_all() noexcept;
|
||||||
|
future<> distribute_role(const role_name_t& name, const lw_shared_ptr<role_record> role);
|
||||||
|
};
|
||||||
|
|
||||||
|
} // namespace auth
|
||||||
@@ -48,6 +48,10 @@ extern constinit const std::string_view AUTH_PACKAGE_NAME;
|
|||||||
|
|
||||||
} // namespace meta
|
} // namespace meta
|
||||||
|
|
||||||
|
constexpr std::string_view PERMISSIONS_CF = "role_permissions";
|
||||||
|
constexpr std::string_view ROLE_MEMBERS_CF = "role_members";
|
||||||
|
constexpr std::string_view ROLE_ATTRIBUTES_CF = "role_attributes";
|
||||||
|
|
||||||
// This is a helper to check whether auth-v2 is on.
|
// This is a helper to check whether auth-v2 is on.
|
||||||
bool legacy_mode(cql3::query_processor& qp);
|
bool legacy_mode(cql3::query_processor& qp);
|
||||||
|
|
||||||
|
|||||||
@@ -37,7 +37,6 @@ std::string_view default_authorizer::qualified_java_name() const {
|
|||||||
static constexpr std::string_view ROLE_NAME = "role";
|
static constexpr std::string_view ROLE_NAME = "role";
|
||||||
static constexpr std::string_view RESOURCE_NAME = "resource";
|
static constexpr std::string_view RESOURCE_NAME = "resource";
|
||||||
static constexpr std::string_view PERMISSIONS_NAME = "permissions";
|
static constexpr std::string_view PERMISSIONS_NAME = "permissions";
|
||||||
static constexpr std::string_view PERMISSIONS_CF = "role_permissions";
|
|
||||||
|
|
||||||
static logging::logger alogger("default_authorizer");
|
static logging::logger alogger("default_authorizer");
|
||||||
|
|
||||||
|
|||||||
@@ -83,17 +83,18 @@ static const class_registrator<
|
|||||||
ldap_role_manager,
|
ldap_role_manager,
|
||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&> registration(ldap_role_manager_full_name);
|
::service::migration_manager&,
|
||||||
|
cache&> registration(ldap_role_manager_full_name);
|
||||||
|
|
||||||
ldap_role_manager::ldap_role_manager(
|
ldap_role_manager::ldap_role_manager(
|
||||||
std::string_view query_template, std::string_view target_attr, std::string_view bind_name, std::string_view bind_password,
|
std::string_view query_template, std::string_view target_attr, std::string_view bind_name, std::string_view bind_password,
|
||||||
cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm)
|
cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache)
|
||||||
: _std_mgr(qp, rg0c, mm), _group0_client(rg0c), _query_template(query_template), _target_attr(target_attr), _bind_name(bind_name)
|
: _std_mgr(qp, rg0c, mm, cache), _group0_client(rg0c), _query_template(query_template), _target_attr(target_attr), _bind_name(bind_name)
|
||||||
, _bind_password(bind_password)
|
, _bind_password(bind_password)
|
||||||
, _connection_factory(bind(std::mem_fn(&ldap_role_manager::reconnect), std::ref(*this))) {
|
, _connection_factory(bind(std::mem_fn(&ldap_role_manager::reconnect), std::ref(*this))) {
|
||||||
}
|
}
|
||||||
|
|
||||||
ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm)
|
ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache)
|
||||||
: ldap_role_manager(
|
: ldap_role_manager(
|
||||||
qp.db().get_config().ldap_url_template(),
|
qp.db().get_config().ldap_url_template(),
|
||||||
qp.db().get_config().ldap_attr_role(),
|
qp.db().get_config().ldap_attr_role(),
|
||||||
@@ -101,7 +102,8 @@ ldap_role_manager::ldap_role_manager(cql3::query_processor& qp, ::service::raft_
|
|||||||
qp.db().get_config().ldap_bind_passwd(),
|
qp.db().get_config().ldap_bind_passwd(),
|
||||||
qp,
|
qp,
|
||||||
rg0c,
|
rg0c,
|
||||||
mm) {
|
mm,
|
||||||
|
cache) {
|
||||||
}
|
}
|
||||||
|
|
||||||
std::string_view ldap_role_manager::qualified_java_name() const noexcept {
|
std::string_view ldap_role_manager::qualified_java_name() const noexcept {
|
||||||
|
|||||||
@@ -14,6 +14,7 @@
|
|||||||
|
|
||||||
#include "ent/ldap/ldap_connection.hh"
|
#include "ent/ldap/ldap_connection.hh"
|
||||||
#include "standard_role_manager.hh"
|
#include "standard_role_manager.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
|
|
||||||
namespace auth {
|
namespace auth {
|
||||||
|
|
||||||
@@ -43,12 +44,13 @@ class ldap_role_manager : public role_manager {
|
|||||||
std::string_view bind_password, ///< LDAP bind credentials.
|
std::string_view bind_password, ///< LDAP bind credentials.
|
||||||
cql3::query_processor& qp, ///< Passed to standard_role_manager.
|
cql3::query_processor& qp, ///< Passed to standard_role_manager.
|
||||||
::service::raft_group0_client& rg0c, ///< Passed to standard_role_manager.
|
::service::raft_group0_client& rg0c, ///< Passed to standard_role_manager.
|
||||||
::service::migration_manager& mm ///< Passed to standard_role_manager.
|
::service::migration_manager& mm, ///< Passed to standard_role_manager.
|
||||||
|
cache& cache ///< Passed to standard_role_manager.
|
||||||
);
|
);
|
||||||
|
|
||||||
/// Retrieves LDAP configuration entries from qp and invokes the other constructor. Required by
|
/// Retrieves LDAP configuration entries from qp and invokes the other constructor. Required by
|
||||||
/// class_registrator<role_manager>.
|
/// class_registrator<role_manager>.
|
||||||
ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm);
|
ldap_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& rg0c, ::service::migration_manager& mm, cache& cache);
|
||||||
|
|
||||||
/// Thrown when query-template parsing fails.
|
/// Thrown when query-template parsing fails.
|
||||||
struct url_error : public std::runtime_error {
|
struct url_error : public std::runtime_error {
|
||||||
|
|||||||
@@ -11,6 +11,7 @@
|
|||||||
#include <seastar/core/future.hh>
|
#include <seastar/core/future.hh>
|
||||||
#include <stdexcept>
|
#include <stdexcept>
|
||||||
#include <string_view>
|
#include <string_view>
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "cql3/description.hh"
|
#include "cql3/description.hh"
|
||||||
#include "utils/class_registrator.hh"
|
#include "utils/class_registrator.hh"
|
||||||
|
|
||||||
@@ -23,7 +24,8 @@ static const class_registrator<
|
|||||||
maintenance_socket_role_manager,
|
maintenance_socket_role_manager,
|
||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&> registration(sstring{maintenance_socket_role_manager_name});
|
::service::migration_manager&,
|
||||||
|
cache&> registration(sstring{maintenance_socket_role_manager_name});
|
||||||
|
|
||||||
|
|
||||||
std::string_view maintenance_socket_role_manager::qualified_java_name() const noexcept {
|
std::string_view maintenance_socket_role_manager::qualified_java_name() const noexcept {
|
||||||
|
|||||||
@@ -8,6 +8,7 @@
|
|||||||
|
|
||||||
#pragma once
|
#pragma once
|
||||||
|
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "auth/resource.hh"
|
#include "auth/resource.hh"
|
||||||
#include "auth/role_manager.hh"
|
#include "auth/role_manager.hh"
|
||||||
#include <seastar/core/future.hh>
|
#include <seastar/core/future.hh>
|
||||||
@@ -29,7 +30,7 @@ extern const std::string_view maintenance_socket_role_manager_name;
|
|||||||
// system_auth keyspace, which may be not yet created when the maintenance socket starts listening.
|
// system_auth keyspace, which may be not yet created when the maintenance socket starts listening.
|
||||||
class maintenance_socket_role_manager final : public role_manager {
|
class maintenance_socket_role_manager final : public role_manager {
|
||||||
public:
|
public:
|
||||||
maintenance_socket_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&) {}
|
maintenance_socket_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&) {}
|
||||||
|
|
||||||
virtual std::string_view qualified_java_name() const noexcept override;
|
virtual std::string_view qualified_java_name() const noexcept override;
|
||||||
|
|
||||||
|
|||||||
@@ -49,6 +49,7 @@ static const class_registrator<
|
|||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&,
|
::service::migration_manager&,
|
||||||
|
cache&,
|
||||||
utils::alien_worker&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
|
utils::alien_worker&> password_auth_reg("org.apache.cassandra.auth.PasswordAuthenticator");
|
||||||
|
|
||||||
static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
|
static thread_local auto rng_for_salt = std::default_random_engine(std::random_device{}());
|
||||||
@@ -63,10 +64,11 @@ std::string password_authenticator::default_superuser(const db::config& cfg) {
|
|||||||
password_authenticator::~password_authenticator() {
|
password_authenticator::~password_authenticator() {
|
||||||
}
|
}
|
||||||
|
|
||||||
password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, utils::alien_worker& hashing_worker)
|
password_authenticator::password_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache, utils::alien_worker& hashing_worker)
|
||||||
: _qp(qp)
|
: _qp(qp)
|
||||||
, _group0_client(g0)
|
, _group0_client(g0)
|
||||||
, _migration_manager(mm)
|
, _migration_manager(mm)
|
||||||
|
, _cache(cache)
|
||||||
, _stopped(make_ready_future<>())
|
, _stopped(make_ready_future<>())
|
||||||
, _superuser(default_superuser(qp.db().get_config()))
|
, _superuser(default_superuser(qp.db().get_config()))
|
||||||
, _hashing_worker(hashing_worker)
|
, _hashing_worker(hashing_worker)
|
||||||
@@ -315,11 +317,20 @@ future<authenticated_user> password_authenticator::authenticate(
|
|||||||
const sstring password = credentials.at(PASSWORD_KEY);
|
const sstring password = credentials.at(PASSWORD_KEY);
|
||||||
|
|
||||||
try {
|
try {
|
||||||
const std::optional<sstring> salted_hash = co_await get_password_hash(username);
|
std::optional<sstring> salted_hash;
|
||||||
|
if (legacy_mode(_qp)) {
|
||||||
|
salted_hash = co_await get_password_hash(username);
|
||||||
if (!salted_hash) {
|
if (!salted_hash) {
|
||||||
throw exceptions::authentication_exception("Username and/or password are incorrect");
|
throw exceptions::authentication_exception("Username and/or password are incorrect");
|
||||||
}
|
}
|
||||||
const bool password_match = co_await _hashing_worker.submit<bool>([password = std::move(password), salted_hash = std::move(salted_hash)]{
|
} else {
|
||||||
|
auto role = _cache.get(username);
|
||||||
|
if (!role || role->salted_hash.empty()) {
|
||||||
|
throw exceptions::authentication_exception("Username and/or password are incorrect");
|
||||||
|
}
|
||||||
|
salted_hash = role->salted_hash;
|
||||||
|
}
|
||||||
|
const bool password_match = co_await _hashing_worker.submit<bool>([password = std::move(password), salted_hash] {
|
||||||
return passwords::check(password, *salted_hash);
|
return passwords::check(password, *salted_hash);
|
||||||
});
|
});
|
||||||
if (!password_match) {
|
if (!password_match) {
|
||||||
|
|||||||
@@ -16,6 +16,7 @@
|
|||||||
#include "db/consistency_level_type.hh"
|
#include "db/consistency_level_type.hh"
|
||||||
#include "auth/authenticator.hh"
|
#include "auth/authenticator.hh"
|
||||||
#include "auth/passwords.hh"
|
#include "auth/passwords.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "service/raft/raft_group0_client.hh"
|
#include "service/raft/raft_group0_client.hh"
|
||||||
#include "utils/alien_worker.hh"
|
#include "utils/alien_worker.hh"
|
||||||
|
|
||||||
@@ -41,6 +42,7 @@ class password_authenticator : public authenticator {
|
|||||||
cql3::query_processor& _qp;
|
cql3::query_processor& _qp;
|
||||||
::service::raft_group0_client& _group0_client;
|
::service::raft_group0_client& _group0_client;
|
||||||
::service::migration_manager& _migration_manager;
|
::service::migration_manager& _migration_manager;
|
||||||
|
cache& _cache;
|
||||||
future<> _stopped;
|
future<> _stopped;
|
||||||
abort_source _as;
|
abort_source _as;
|
||||||
std::string _superuser; // default superuser name from the config (may or may not be present in roles table)
|
std::string _superuser; // default superuser name from the config (may or may not be present in roles table)
|
||||||
@@ -53,7 +55,7 @@ public:
|
|||||||
static db::consistency_level consistency_for_user(std::string_view role_name);
|
static db::consistency_level consistency_for_user(std::string_view role_name);
|
||||||
static std::string default_superuser(const db::config&);
|
static std::string default_superuser(const db::config&);
|
||||||
|
|
||||||
password_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
|
password_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&);
|
||||||
|
|
||||||
~password_authenticator();
|
~password_authenticator();
|
||||||
|
|
||||||
|
|||||||
@@ -35,9 +35,10 @@ static const class_registrator<
|
|||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&,
|
::service::migration_manager&,
|
||||||
|
cache&,
|
||||||
utils::alien_worker&> saslauthd_auth_reg("com.scylladb.auth.SaslauthdAuthenticator");
|
utils::alien_worker&> saslauthd_auth_reg("com.scylladb.auth.SaslauthdAuthenticator");
|
||||||
|
|
||||||
saslauthd_authenticator::saslauthd_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&)
|
saslauthd_authenticator::saslauthd_authenticator(cql3::query_processor& qp, ::service::raft_group0_client&, ::service::migration_manager&, cache&, utils::alien_worker&)
|
||||||
: _socket_path(qp.db().get_config().saslauthd_socket_path())
|
: _socket_path(qp.db().get_config().saslauthd_socket_path())
|
||||||
{}
|
{}
|
||||||
|
|
||||||
|
|||||||
@@ -11,6 +11,7 @@
|
|||||||
#pragma once
|
#pragma once
|
||||||
|
|
||||||
#include "auth/authenticator.hh"
|
#include "auth/authenticator.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "utils/alien_worker.hh"
|
#include "utils/alien_worker.hh"
|
||||||
|
|
||||||
namespace cql3 {
|
namespace cql3 {
|
||||||
@@ -29,7 +30,7 @@ namespace auth {
|
|||||||
class saslauthd_authenticator : public authenticator {
|
class saslauthd_authenticator : public authenticator {
|
||||||
sstring _socket_path; ///< Path to the domain socket on which saslauthd is listening.
|
sstring _socket_path; ///< Path to the domain socket on which saslauthd is listening.
|
||||||
public:
|
public:
|
||||||
saslauthd_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, utils::alien_worker&);
|
saslauthd_authenticator(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&,utils::alien_worker&);
|
||||||
|
|
||||||
future<> start() override;
|
future<> start() override;
|
||||||
|
|
||||||
|
|||||||
@@ -17,6 +17,7 @@
|
|||||||
#include <chrono>
|
#include <chrono>
|
||||||
|
|
||||||
#include <seastar/core/future-util.hh>
|
#include <seastar/core/future-util.hh>
|
||||||
|
#include <seastar/core/shard_id.hh>
|
||||||
#include <seastar/core/sharded.hh>
|
#include <seastar/core/sharded.hh>
|
||||||
#include <seastar/core/shared_ptr.hh>
|
#include <seastar/core/shared_ptr.hh>
|
||||||
|
|
||||||
@@ -157,6 +158,7 @@ static future<> validate_role_exists(const service& ser, std::string_view role_n
|
|||||||
|
|
||||||
service::service(
|
service::service(
|
||||||
utils::loading_cache_config c,
|
utils::loading_cache_config c,
|
||||||
|
cache& cache,
|
||||||
cql3::query_processor& qp,
|
cql3::query_processor& qp,
|
||||||
::service::raft_group0_client& g0,
|
::service::raft_group0_client& g0,
|
||||||
::service::migration_notifier& mn,
|
::service::migration_notifier& mn,
|
||||||
@@ -166,6 +168,7 @@ service::service(
|
|||||||
maintenance_socket_enabled used_by_maintenance_socket)
|
maintenance_socket_enabled used_by_maintenance_socket)
|
||||||
: _loading_cache_config(std::move(c))
|
: _loading_cache_config(std::move(c))
|
||||||
, _permissions_cache(nullptr)
|
, _permissions_cache(nullptr)
|
||||||
|
, _cache(cache)
|
||||||
, _qp(qp)
|
, _qp(qp)
|
||||||
, _group0_client(g0)
|
, _group0_client(g0)
|
||||||
, _mnotifier(mn)
|
, _mnotifier(mn)
|
||||||
@@ -188,15 +191,17 @@ service::service(
|
|||||||
::service::migration_manager& mm,
|
::service::migration_manager& mm,
|
||||||
const service_config& sc,
|
const service_config& sc,
|
||||||
maintenance_socket_enabled used_by_maintenance_socket,
|
maintenance_socket_enabled used_by_maintenance_socket,
|
||||||
|
cache& cache,
|
||||||
utils::alien_worker& hashing_worker)
|
utils::alien_worker& hashing_worker)
|
||||||
: service(
|
: service(
|
||||||
std::move(c),
|
std::move(c),
|
||||||
|
cache,
|
||||||
qp,
|
qp,
|
||||||
g0,
|
g0,
|
||||||
mn,
|
mn,
|
||||||
create_object<authorizer>(sc.authorizer_java_name, qp, g0, mm),
|
create_object<authorizer>(sc.authorizer_java_name, qp, g0, mm),
|
||||||
create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm, hashing_worker),
|
create_object<authenticator>(sc.authenticator_java_name, qp, g0, mm, cache, hashing_worker),
|
||||||
create_object<role_manager>(sc.role_manager_java_name, qp, g0, mm),
|
create_object<role_manager>(sc.role_manager_java_name, qp, g0, mm, cache),
|
||||||
used_by_maintenance_socket) {
|
used_by_maintenance_socket) {
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -232,6 +237,9 @@ future<> service::start(::service::migration_manager& mm, db::system_keyspace& s
|
|||||||
auto auth_version = co_await sys_ks.get_auth_version();
|
auto auth_version = co_await sys_ks.get_auth_version();
|
||||||
// version is set in query processor to be easily available in various places we call auth::legacy_mode check.
|
// version is set in query processor to be easily available in various places we call auth::legacy_mode check.
|
||||||
_qp.auth_version = auth_version;
|
_qp.auth_version = auth_version;
|
||||||
|
if (this_shard_id() == 0) {
|
||||||
|
co_await _cache.load_all();
|
||||||
|
}
|
||||||
if (!_used_by_maintenance_socket) {
|
if (!_used_by_maintenance_socket) {
|
||||||
// this legacy keyspace is only used by cqlsh
|
// this legacy keyspace is only used by cqlsh
|
||||||
// it's needed when executing `list roles` or `list users`
|
// it's needed when executing `list roles` or `list users`
|
||||||
|
|||||||
@@ -21,6 +21,7 @@
|
|||||||
#include "auth/authorizer.hh"
|
#include "auth/authorizer.hh"
|
||||||
#include "auth/permission.hh"
|
#include "auth/permission.hh"
|
||||||
#include "auth/permissions_cache.hh"
|
#include "auth/permissions_cache.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "auth/role_manager.hh"
|
#include "auth/role_manager.hh"
|
||||||
#include "auth/common.hh"
|
#include "auth/common.hh"
|
||||||
#include "cql3/description.hh"
|
#include "cql3/description.hh"
|
||||||
@@ -77,6 +78,7 @@ public:
|
|||||||
class service final : public seastar::peering_sharded_service<service> {
|
class service final : public seastar::peering_sharded_service<service> {
|
||||||
utils::loading_cache_config _loading_cache_config;
|
utils::loading_cache_config _loading_cache_config;
|
||||||
std::unique_ptr<permissions_cache> _permissions_cache;
|
std::unique_ptr<permissions_cache> _permissions_cache;
|
||||||
|
cache& _cache;
|
||||||
|
|
||||||
cql3::query_processor& _qp;
|
cql3::query_processor& _qp;
|
||||||
|
|
||||||
@@ -107,6 +109,7 @@ class service final : public seastar::peering_sharded_service<service> {
|
|||||||
public:
|
public:
|
||||||
service(
|
service(
|
||||||
utils::loading_cache_config,
|
utils::loading_cache_config,
|
||||||
|
cache& cache,
|
||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_notifier&,
|
::service::migration_notifier&,
|
||||||
@@ -128,6 +131,7 @@ public:
|
|||||||
::service::migration_manager&,
|
::service::migration_manager&,
|
||||||
const service_config&,
|
const service_config&,
|
||||||
maintenance_socket_enabled,
|
maintenance_socket_enabled,
|
||||||
|
cache&,
|
||||||
utils::alien_worker&);
|
utils::alien_worker&);
|
||||||
|
|
||||||
future<> start(::service::migration_manager&, db::system_keyspace&);
|
future<> start(::service::migration_manager&, db::system_keyspace&);
|
||||||
|
|||||||
@@ -41,21 +41,6 @@
|
|||||||
|
|
||||||
namespace auth {
|
namespace auth {
|
||||||
|
|
||||||
namespace meta {
|
|
||||||
|
|
||||||
namespace role_members_table {
|
|
||||||
|
|
||||||
constexpr std::string_view name{"role_members" , 12};
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
namespace role_attributes_table {
|
|
||||||
|
|
||||||
constexpr std::string_view name{"role_attributes", 15};
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
|
||||||
|
|
||||||
static logging::logger log("standard_role_manager");
|
static logging::logger log("standard_role_manager");
|
||||||
|
|
||||||
@@ -64,7 +49,8 @@ static const class_registrator<
|
|||||||
standard_role_manager,
|
standard_role_manager,
|
||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&> registration("org.apache.cassandra.auth.CassandraRoleManager");
|
::service::migration_manager&,
|
||||||
|
cache&> registration("org.apache.cassandra.auth.CassandraRoleManager");
|
||||||
|
|
||||||
struct record final {
|
struct record final {
|
||||||
sstring name;
|
sstring name;
|
||||||
@@ -121,10 +107,11 @@ static bool has_can_login(const cql3::untyped_result_set_row& row) {
|
|||||||
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob_unfragmented("can_login")).is_null());
|
return row.has("can_login") && !(boolean_type->deserialize(row.get_blob_unfragmented("can_login")).is_null());
|
||||||
}
|
}
|
||||||
|
|
||||||
standard_role_manager::standard_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm)
|
standard_role_manager::standard_role_manager(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache)
|
||||||
: _qp(qp)
|
: _qp(qp)
|
||||||
, _group0_client(g0)
|
, _group0_client(g0)
|
||||||
, _migration_manager(mm)
|
, _migration_manager(mm)
|
||||||
|
, _cache(cache)
|
||||||
, _stopped(make_ready_future<>())
|
, _stopped(make_ready_future<>())
|
||||||
, _superuser(password_authenticator::default_superuser(qp.db().get_config()))
|
, _superuser(password_authenticator::default_superuser(qp.db().get_config()))
|
||||||
{}
|
{}
|
||||||
@@ -136,7 +123,7 @@ std::string_view standard_role_manager::qualified_java_name() const noexcept {
|
|||||||
const resource_set& standard_role_manager::protected_resources() const {
|
const resource_set& standard_role_manager::protected_resources() const {
|
||||||
static const resource_set resources({
|
static const resource_set resources({
|
||||||
make_data_resource(meta::legacy::AUTH_KS, meta::roles_table::name),
|
make_data_resource(meta::legacy::AUTH_KS, meta::roles_table::name),
|
||||||
make_data_resource(meta::legacy::AUTH_KS, meta::role_members_table::name)});
|
make_data_resource(meta::legacy::AUTH_KS, ROLE_MEMBERS_CF)});
|
||||||
|
|
||||||
return resources;
|
return resources;
|
||||||
}
|
}
|
||||||
@@ -160,7 +147,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
|
|||||||
" PRIMARY KEY (role, member)"
|
" PRIMARY KEY (role, member)"
|
||||||
")",
|
")",
|
||||||
meta::legacy::AUTH_KS,
|
meta::legacy::AUTH_KS,
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
static const sstring create_role_attributes_query = seastar::format(
|
static const sstring create_role_attributes_query = seastar::format(
|
||||||
"CREATE TABLE {}.{} ("
|
"CREATE TABLE {}.{} ("
|
||||||
" role text,"
|
" role text,"
|
||||||
@@ -169,7 +156,7 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
|
|||||||
" PRIMARY KEY(role, name)"
|
" PRIMARY KEY(role, name)"
|
||||||
")",
|
")",
|
||||||
meta::legacy::AUTH_KS,
|
meta::legacy::AUTH_KS,
|
||||||
meta::role_attributes_table::name);
|
ROLE_ATTRIBUTES_CF);
|
||||||
return when_all_succeed(
|
return when_all_succeed(
|
||||||
create_legacy_metadata_table_if_missing(
|
create_legacy_metadata_table_if_missing(
|
||||||
meta::roles_table::name,
|
meta::roles_table::name,
|
||||||
@@ -177,12 +164,12 @@ future<> standard_role_manager::create_legacy_metadata_tables_if_missing() const
|
|||||||
create_roles_query,
|
create_roles_query,
|
||||||
_migration_manager),
|
_migration_manager),
|
||||||
create_legacy_metadata_table_if_missing(
|
create_legacy_metadata_table_if_missing(
|
||||||
meta::role_members_table::name,
|
ROLE_MEMBERS_CF,
|
||||||
_qp,
|
_qp,
|
||||||
create_role_members_query,
|
create_role_members_query,
|
||||||
_migration_manager),
|
_migration_manager),
|
||||||
create_legacy_metadata_table_if_missing(
|
create_legacy_metadata_table_if_missing(
|
||||||
meta::role_attributes_table::name,
|
ROLE_ATTRIBUTES_CF,
|
||||||
_qp,
|
_qp,
|
||||||
create_role_attributes_query,
|
create_role_attributes_query,
|
||||||
_migration_manager)).discard_result();
|
_migration_manager)).discard_result();
|
||||||
@@ -429,7 +416,7 @@ future<> standard_role_manager::drop(std::string_view role_name, ::service::grou
|
|||||||
const auto revoke_from_members = [this, role_name, &mc] () -> future<> {
|
const auto revoke_from_members = [this, role_name, &mc] () -> future<> {
|
||||||
const sstring query = seastar::format("SELECT member FROM {}.{} WHERE role = ?",
|
const sstring query = seastar::format("SELECT member FROM {}.{} WHERE role = ?",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
const auto members = co_await _qp.execute_internal(
|
const auto members = co_await _qp.execute_internal(
|
||||||
query,
|
query,
|
||||||
consistency_for_role(role_name),
|
consistency_for_role(role_name),
|
||||||
@@ -461,7 +448,7 @@ future<> standard_role_manager::drop(std::string_view role_name, ::service::grou
|
|||||||
const auto remove_attributes_of = [this, role_name, &mc] () -> future<> {
|
const auto remove_attributes_of = [this, role_name, &mc] () -> future<> {
|
||||||
const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ?",
|
const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ?",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_attributes_table::name);
|
ROLE_ATTRIBUTES_CF);
|
||||||
if (legacy_mode(_qp)) {
|
if (legacy_mode(_qp)) {
|
||||||
co_await _qp.execute_internal(query, {sstring(role_name)},
|
co_await _qp.execute_internal(query, {sstring(role_name)},
|
||||||
cql3::query_processor::cache_internal::yes).discard_result();
|
cql3::query_processor::cache_internal::yes).discard_result();
|
||||||
@@ -517,7 +504,7 @@ standard_role_manager::legacy_modify_membership(
|
|||||||
case membership_change::add: {
|
case membership_change::add: {
|
||||||
const sstring insert_query = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
|
const sstring insert_query = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
co_return co_await _qp.execute_internal(
|
co_return co_await _qp.execute_internal(
|
||||||
insert_query,
|
insert_query,
|
||||||
consistency_for_role(role_name),
|
consistency_for_role(role_name),
|
||||||
@@ -529,7 +516,7 @@ standard_role_manager::legacy_modify_membership(
|
|||||||
case membership_change::remove: {
|
case membership_change::remove: {
|
||||||
const sstring delete_query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
|
const sstring delete_query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
co_return co_await _qp.execute_internal(
|
co_return co_await _qp.execute_internal(
|
||||||
delete_query,
|
delete_query,
|
||||||
consistency_for_role(role_name),
|
consistency_for_role(role_name),
|
||||||
@@ -567,12 +554,12 @@ standard_role_manager::modify_membership(
|
|||||||
case membership_change::add:
|
case membership_change::add:
|
||||||
modify_role_members = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
|
modify_role_members = seastar::format("INSERT INTO {}.{} (role, member) VALUES (?, ?)",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
break;
|
break;
|
||||||
case membership_change::remove:
|
case membership_change::remove:
|
||||||
modify_role_members = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
|
modify_role_members = seastar::format("DELETE FROM {}.{} WHERE role = ? AND member = ?",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
break;
|
break;
|
||||||
default:
|
default:
|
||||||
on_internal_error(log, format("unknown membership_change value: {}", int(ch)));
|
on_internal_error(log, format("unknown membership_change value: {}", int(ch)));
|
||||||
@@ -666,7 +653,7 @@ future<role_set> standard_role_manager::query_granted(std::string_view grantee_n
|
|||||||
future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted(::service::query_state& qs) {
|
future<role_to_directly_granted_map> standard_role_manager::query_all_directly_granted(::service::query_state& qs) {
|
||||||
const sstring query = seastar::format("SELECT * FROM {}.{}",
|
const sstring query = seastar::format("SELECT * FROM {}.{}",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_members_table::name);
|
ROLE_MEMBERS_CF);
|
||||||
|
|
||||||
const auto results = co_await _qp.execute_internal(
|
const auto results = co_await _qp.execute_internal(
|
||||||
query,
|
query,
|
||||||
@@ -731,15 +718,21 @@ future<bool> standard_role_manager::is_superuser(std::string_view role_name) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
future<bool> standard_role_manager::can_login(std::string_view role_name) {
|
future<bool> standard_role_manager::can_login(std::string_view role_name) {
|
||||||
return require_record(_qp, role_name).then([](record r) {
|
if (legacy_mode(_qp)) {
|
||||||
return r.can_login;
|
const auto r = co_await require_record(_qp, role_name);
|
||||||
});
|
co_return r.can_login;
|
||||||
|
}
|
||||||
|
auto role = _cache.get(sstring(role_name));
|
||||||
|
if (!role) {
|
||||||
|
throw nonexistant_role(role_name);
|
||||||
|
}
|
||||||
|
co_return role->can_login;
|
||||||
}
|
}
|
||||||
|
|
||||||
future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) {
|
future<std::optional<sstring>> standard_role_manager::get_attribute(std::string_view role_name, std::string_view attribute_name, ::service::query_state& qs) {
|
||||||
const sstring query = seastar::format("SELECT name, value FROM {}.{} WHERE role = ? AND name = ?",
|
const sstring query = seastar::format("SELECT name, value FROM {}.{} WHERE role = ? AND name = ?",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_attributes_table::name);
|
ROLE_ATTRIBUTES_CF);
|
||||||
const auto result_set = co_await _qp.execute_internal(query, db::consistency_level::ONE, qs, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes);
|
const auto result_set = co_await _qp.execute_internal(query, db::consistency_level::ONE, qs, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes);
|
||||||
if (!result_set->empty()) {
|
if (!result_set->empty()) {
|
||||||
const cql3::untyped_result_set_row &row = result_set->one();
|
const cql3::untyped_result_set_row &row = result_set->one();
|
||||||
@@ -770,7 +763,7 @@ future<> standard_role_manager::set_attribute(std::string_view role_name, std::s
|
|||||||
}
|
}
|
||||||
const sstring query = seastar::format("INSERT INTO {}.{} (role, name, value) VALUES (?, ?, ?)",
|
const sstring query = seastar::format("INSERT INTO {}.{} (role, name, value) VALUES (?, ?, ?)",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_attributes_table::name);
|
ROLE_ATTRIBUTES_CF);
|
||||||
if (legacy_mode(_qp)) {
|
if (legacy_mode(_qp)) {
|
||||||
co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, cql3::query_processor::cache_internal::yes).discard_result();
|
co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name), sstring(attribute_value)}, cql3::query_processor::cache_internal::yes).discard_result();
|
||||||
} else {
|
} else {
|
||||||
@@ -785,7 +778,7 @@ future<> standard_role_manager::remove_attribute(std::string_view role_name, std
|
|||||||
}
|
}
|
||||||
const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND name = ?",
|
const sstring query = seastar::format("DELETE FROM {}.{} WHERE role = ? AND name = ?",
|
||||||
get_auth_ks_name(_qp),
|
get_auth_ks_name(_qp),
|
||||||
meta::role_attributes_table::name);
|
ROLE_ATTRIBUTES_CF);
|
||||||
if (legacy_mode(_qp)) {
|
if (legacy_mode(_qp)) {
|
||||||
co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes).discard_result();
|
co_await _qp.execute_internal(query, {sstring(role_name), sstring(attribute_name)}, cql3::query_processor::cache_internal::yes).discard_result();
|
||||||
} else {
|
} else {
|
||||||
|
|||||||
@@ -10,6 +10,7 @@
|
|||||||
|
|
||||||
#include "auth/common.hh"
|
#include "auth/common.hh"
|
||||||
#include "auth/role_manager.hh"
|
#include "auth/role_manager.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
|
|
||||||
#include <string_view>
|
#include <string_view>
|
||||||
|
|
||||||
@@ -36,13 +37,14 @@ class standard_role_manager final : public role_manager {
|
|||||||
cql3::query_processor& _qp;
|
cql3::query_processor& _qp;
|
||||||
::service::raft_group0_client& _group0_client;
|
::service::raft_group0_client& _group0_client;
|
||||||
::service::migration_manager& _migration_manager;
|
::service::migration_manager& _migration_manager;
|
||||||
|
cache& _cache;
|
||||||
future<> _stopped;
|
future<> _stopped;
|
||||||
abort_source _as;
|
abort_source _as;
|
||||||
std::string _superuser;
|
std::string _superuser;
|
||||||
shared_promise<> _superuser_created_promise;
|
shared_promise<> _superuser_created_promise;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&);
|
standard_role_manager(cql3::query_processor&, ::service::raft_group0_client&, ::service::migration_manager&, cache&);
|
||||||
|
|
||||||
virtual std::string_view qualified_java_name() const noexcept override;
|
virtual std::string_view qualified_java_name() const noexcept override;
|
||||||
|
|
||||||
|
|||||||
@@ -13,6 +13,7 @@
|
|||||||
#include "auth/authorizer.hh"
|
#include "auth/authorizer.hh"
|
||||||
#include "auth/default_authorizer.hh"
|
#include "auth/default_authorizer.hh"
|
||||||
#include "auth/password_authenticator.hh"
|
#include "auth/password_authenticator.hh"
|
||||||
|
#include "auth/cache.hh"
|
||||||
#include "auth/permission.hh"
|
#include "auth/permission.hh"
|
||||||
#include "service/raft/raft_group0_client.hh"
|
#include "service/raft/raft_group0_client.hh"
|
||||||
#include "utils/class_registrator.hh"
|
#include "utils/class_registrator.hh"
|
||||||
@@ -37,8 +38,8 @@ class transitional_authenticator : public authenticator {
|
|||||||
public:
|
public:
|
||||||
static const sstring PASSWORD_AUTHENTICATOR_NAME;
|
static const sstring PASSWORD_AUTHENTICATOR_NAME;
|
||||||
|
|
||||||
transitional_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, utils::alien_worker& hashing_worker)
|
transitional_authenticator(cql3::query_processor& qp, ::service::raft_group0_client& g0, ::service::migration_manager& mm, cache& cache, utils::alien_worker& hashing_worker)
|
||||||
: transitional_authenticator(std::make_unique<password_authenticator>(qp, g0, mm, hashing_worker)) {
|
: transitional_authenticator(std::make_unique<password_authenticator>(qp, g0, mm, cache, hashing_worker)) {
|
||||||
}
|
}
|
||||||
transitional_authenticator(std::unique_ptr<authenticator> a)
|
transitional_authenticator(std::unique_ptr<authenticator> a)
|
||||||
: _authenticator(std::move(a)) {
|
: _authenticator(std::move(a)) {
|
||||||
@@ -240,6 +241,7 @@ static const class_registrator<
|
|||||||
cql3::query_processor&,
|
cql3::query_processor&,
|
||||||
::service::raft_group0_client&,
|
::service::raft_group0_client&,
|
||||||
::service::migration_manager&,
|
::service::migration_manager&,
|
||||||
|
auth::cache&,
|
||||||
utils::alien_worker&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator");
|
utils::alien_worker&> transitional_authenticator_reg(auth::PACKAGE_NAME + "TransitionalAuthenticator");
|
||||||
|
|
||||||
static const class_registrator<
|
static const class_registrator<
|
||||||
|
|||||||
@@ -15,6 +15,7 @@
|
|||||||
#include <cmath>
|
#include <cmath>
|
||||||
|
|
||||||
#include "seastarx.hh"
|
#include "seastarx.hh"
|
||||||
|
#include "backlog_controller_fwd.hh"
|
||||||
|
|
||||||
// Simple proportional controller to adjust shares for processes for which a backlog can be clearly
|
// Simple proportional controller to adjust shares for processes for which a backlog can be clearly
|
||||||
// defined.
|
// defined.
|
||||||
@@ -128,11 +129,21 @@ public:
|
|||||||
static constexpr unsigned normalization_factor = 30;
|
static constexpr unsigned normalization_factor = 30;
|
||||||
static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
|
static constexpr float disable_backlog = std::numeric_limits<double>::infinity();
|
||||||
static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
|
static constexpr float backlog_disabled(float backlog) { return std::isinf(backlog); }
|
||||||
compaction_controller(backlog_controller::scheduling_group sg, float static_shares, std::chrono::milliseconds interval, std::function<float()> current_backlog)
|
static inline const std::vector<backlog_controller::control_point> default_control_points = {
|
||||||
|
backlog_controller::control_point{0.0, 50}, {1.5, 100}, {normalization_factor, default_compaction_maximum_shares}};
|
||||||
|
compaction_controller(backlog_controller::scheduling_group sg, float static_shares, std::optional<float> max_shares,
|
||||||
|
std::chrono::milliseconds interval, std::function<float()> current_backlog)
|
||||||
: backlog_controller(std::move(sg), std::move(interval),
|
: backlog_controller(std::move(sg), std::move(interval),
|
||||||
std::vector<backlog_controller::control_point>({{0.0, 50}, {1.5, 100} , {normalization_factor, 1000}}),
|
default_control_points,
|
||||||
std::move(current_backlog),
|
std::move(current_backlog),
|
||||||
static_shares
|
static_shares
|
||||||
)
|
)
|
||||||
{}
|
{
|
||||||
|
if (max_shares) {
|
||||||
|
set_max_shares(*max_shares);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Updates the maximum output value for control points.
|
||||||
|
void set_max_shares(float max_shares);
|
||||||
};
|
};
|
||||||
|
|||||||
13
backlog_controller_fwd.hh
Normal file
13
backlog_controller_fwd.hh
Normal file
@@ -0,0 +1,13 @@
|
|||||||
|
/*
|
||||||
|
* Copyright (C) 2025-present ScyllaDB
|
||||||
|
*/
|
||||||
|
|
||||||
|
/*
|
||||||
|
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
|
||||||
|
*/
|
||||||
|
|
||||||
|
#pragma once
|
||||||
|
|
||||||
|
#include <cstdint>
|
||||||
|
|
||||||
|
static constexpr uint64_t default_compaction_maximum_shares = 1000;
|
||||||
@@ -17,5 +17,8 @@ target_link_libraries(cdc
|
|||||||
PRIVATE
|
PRIVATE
|
||||||
replica)
|
replica)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(cdc REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers cdc
|
check_headers(check-headers cdc
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
79
cdc/log.cc
79
cdc/log.cc
@@ -25,6 +25,7 @@
|
|||||||
#include "locator/abstract_replication_strategy.hh"
|
#include "locator/abstract_replication_strategy.hh"
|
||||||
#include "locator/topology.hh"
|
#include "locator/topology.hh"
|
||||||
#include "replica/database.hh"
|
#include "replica/database.hh"
|
||||||
|
#include "db/config.hh"
|
||||||
#include "db/schema_tables.hh"
|
#include "db/schema_tables.hh"
|
||||||
#include "gms/feature_service.hh"
|
#include "gms/feature_service.hh"
|
||||||
#include "schema/schema.hh"
|
#include "schema/schema.hh"
|
||||||
@@ -586,11 +587,9 @@ bytes log_data_column_deleted_elements_name_bytes(const bytes& column_name) {
|
|||||||
return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
|
return to_bytes(cdc_deleted_elements_column_prefix) + column_name;
|
||||||
}
|
}
|
||||||
|
|
||||||
static schema_ptr create_log_schema(const schema& s, const replica::database& db,
|
static void set_default_properties_log_table(schema_builder& b, const schema& s,
|
||||||
const keyspace_metadata& ksm, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old)
|
const replica::database& db, const keyspace_metadata& ksm)
|
||||||
{
|
{
|
||||||
schema_builder b(s.ks_name(), log_name(s.cf_name()));
|
|
||||||
b.with_partitioner(cdc::cdc_partitioner::classname);
|
|
||||||
b.set_compaction_strategy(compaction::compaction_strategy_type::time_window);
|
b.set_compaction_strategy(compaction::compaction_strategy_type::time_window);
|
||||||
b.set_comment(fmt::format("CDC log for {}.{}", s.ks_name(), s.cf_name()));
|
b.set_comment(fmt::format("CDC log for {}.{}", s.ks_name(), s.cf_name()));
|
||||||
auto ttl_seconds = s.cdc_options().ttl();
|
auto ttl_seconds = s.cdc_options().ttl();
|
||||||
@@ -616,13 +615,22 @@ static schema_ptr create_log_schema(const schema& s, const replica::database& db
|
|||||||
std::to_string(std::max(1, window_seconds / 2))},
|
std::to_string(std::max(1, window_seconds / 2))},
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
b.set_caching_options(caching_options::get_disabled_caching_options());
|
||||||
|
|
||||||
|
auto rs = generate_replication_strategy(ksm, db.get_token_metadata().get_topology());
|
||||||
|
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(*rs, db.get_token_metadata(), false));
|
||||||
|
b.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
|
||||||
|
}
|
||||||
|
|
||||||
|
static void add_columns_to_cdc_log(schema_builder& b, const schema& s,
|
||||||
|
const api::timestamp_type timestamp, const schema_ptr old)
|
||||||
|
{
|
||||||
b.with_column(log_meta_column_name_bytes("stream_id"), bytes_type, column_kind::partition_key);
|
b.with_column(log_meta_column_name_bytes("stream_id"), bytes_type, column_kind::partition_key);
|
||||||
b.with_column(log_meta_column_name_bytes("time"), timeuuid_type, column_kind::clustering_key);
|
b.with_column(log_meta_column_name_bytes("time"), timeuuid_type, column_kind::clustering_key);
|
||||||
b.with_column(log_meta_column_name_bytes("batch_seq_no"), int32_type, column_kind::clustering_key);
|
b.with_column(log_meta_column_name_bytes("batch_seq_no"), int32_type, column_kind::clustering_key);
|
||||||
b.with_column(log_meta_column_name_bytes("operation"), data_type_for<operation_native_type>());
|
b.with_column(log_meta_column_name_bytes("operation"), data_type_for<operation_native_type>());
|
||||||
b.with_column(log_meta_column_name_bytes("ttl"), long_type);
|
b.with_column(log_meta_column_name_bytes("ttl"), long_type);
|
||||||
b.with_column(log_meta_column_name_bytes("end_of_batch"), boolean_type);
|
b.with_column(log_meta_column_name_bytes("end_of_batch"), boolean_type);
|
||||||
b.set_caching_options(caching_options::get_disabled_caching_options());
|
|
||||||
|
|
||||||
auto validate_new_column = [&] (const sstring& name) {
|
auto validate_new_column = [&] (const sstring& name) {
|
||||||
// When dropping a column from a CDC log table, we set the drop timestamp to be
|
// When dropping a column from a CDC log table, we set the drop timestamp to be
|
||||||
@@ -692,15 +700,28 @@ static schema_ptr create_log_schema(const schema& s, const replica::database& db
|
|||||||
add_columns(s.clustering_key_columns());
|
add_columns(s.clustering_key_columns());
|
||||||
add_columns(s.static_columns(), true);
|
add_columns(s.static_columns(), true);
|
||||||
add_columns(s.regular_columns(), true);
|
add_columns(s.regular_columns(), true);
|
||||||
|
}
|
||||||
|
|
||||||
|
static schema_ptr create_log_schema(const schema& s, const replica::database& db,
|
||||||
|
const keyspace_metadata& ksm, api::timestamp_type timestamp, std::optional<table_id> uuid, schema_ptr old)
|
||||||
|
{
|
||||||
|
schema_builder b(s.ks_name(), log_name(s.cf_name()));
|
||||||
|
|
||||||
|
b.with_partitioner(cdc::cdc_partitioner::classname);
|
||||||
|
|
||||||
|
if (old) {
|
||||||
|
// If the user reattaches the log table, do not change its properties.
|
||||||
|
b.set_properties(old->get_properties());
|
||||||
|
} else {
|
||||||
|
set_default_properties_log_table(b, s, db, ksm);
|
||||||
|
}
|
||||||
|
|
||||||
|
add_columns_to_cdc_log(b, s, timestamp, old);
|
||||||
|
|
||||||
if (uuid) {
|
if (uuid) {
|
||||||
b.set_uuid(*uuid);
|
b.set_uuid(*uuid);
|
||||||
}
|
}
|
||||||
|
|
||||||
auto rs = generate_replication_strategy(ksm, db.get_token_metadata().get_topology());
|
|
||||||
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(*rs, db.get_token_metadata()));
|
|
||||||
b.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
|
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* #10473 - if we are redefining the log table, we need to ensure any dropped
|
* #10473 - if we are redefining the log table, we need to ensure any dropped
|
||||||
* columns are registered in "dropped_columns" table, otherwise clients will not
|
* columns are registered in "dropped_columns" table, otherwise clients will not
|
||||||
@@ -931,9 +952,6 @@ static managed_bytes merge(const abstract_type& type, const managed_bytes_opt& p
|
|||||||
throw std::runtime_error(format("cdc merge: unknown type {}", type.name()));
|
throw std::runtime_error(format("cdc merge: unknown type {}", type.name()));
|
||||||
}
|
}
|
||||||
|
|
||||||
using cell_map = std::unordered_map<const column_definition*, managed_bytes_opt>;
|
|
||||||
using row_states_map = std::unordered_map<clustering_key, cell_map, clustering_key::hashing, clustering_key::equality>;
|
|
||||||
|
|
||||||
static managed_bytes_opt get_col_from_row_state(const cell_map* state, const column_definition& cdef) {
|
static managed_bytes_opt get_col_from_row_state(const cell_map* state, const column_definition& cdef) {
|
||||||
if (state) {
|
if (state) {
|
||||||
if (auto it = state->find(&cdef); it != state->end()) {
|
if (auto it = state->find(&cdef); it != state->end()) {
|
||||||
@@ -943,7 +961,12 @@ static managed_bytes_opt get_col_from_row_state(const cell_map* state, const col
|
|||||||
return std::nullopt;
|
return std::nullopt;
|
||||||
}
|
}
|
||||||
|
|
||||||
static cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck) {
|
cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck) {
|
||||||
|
auto it = row_states.find(ck);
|
||||||
|
return it == row_states.end() ? nullptr : &it->second;
|
||||||
|
}
|
||||||
|
|
||||||
|
const cell_map* get_row_state(const row_states_map& row_states, const clustering_key& ck) {
|
||||||
auto it = row_states.find(ck);
|
auto it = row_states.find(ck);
|
||||||
return it == row_states.end() ? nullptr : &it->second;
|
return it == row_states.end() ? nullptr : &it->second;
|
||||||
}
|
}
|
||||||
@@ -1413,6 +1436,8 @@ struct process_change_visitor {
|
|||||||
row_states_map& _clustering_row_states;
|
row_states_map& _clustering_row_states;
|
||||||
cell_map& _static_row_state;
|
cell_map& _static_row_state;
|
||||||
|
|
||||||
|
const bool _is_update = false;
|
||||||
|
|
||||||
const bool _generate_delta_values = true;
|
const bool _generate_delta_values = true;
|
||||||
|
|
||||||
void static_row_cells(auto&& visit_row_cells) {
|
void static_row_cells(auto&& visit_row_cells) {
|
||||||
@@ -1436,12 +1461,13 @@ struct process_change_visitor {
|
|||||||
|
|
||||||
struct clustering_row_cells_visitor : public process_row_visitor {
|
struct clustering_row_cells_visitor : public process_row_visitor {
|
||||||
operation _cdc_op = operation::update;
|
operation _cdc_op = operation::update;
|
||||||
|
operation _marker_op = operation::insert;
|
||||||
|
|
||||||
using process_row_visitor::process_row_visitor;
|
using process_row_visitor::process_row_visitor;
|
||||||
|
|
||||||
void marker(const row_marker& rm) {
|
void marker(const row_marker& rm) {
|
||||||
_ttl_column = get_ttl(rm);
|
_ttl_column = get_ttl(rm);
|
||||||
_cdc_op = operation::insert;
|
_cdc_op = _marker_op;
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -1449,6 +1475,9 @@ struct process_change_visitor {
|
|||||||
log_ck, _touched_parts, _builder,
|
log_ck, _touched_parts, _builder,
|
||||||
_enable_updating_state, &ckey, get_row_state(_clustering_row_states, ckey),
|
_enable_updating_state, &ckey, get_row_state(_clustering_row_states, ckey),
|
||||||
_clustering_row_states, _generate_delta_values);
|
_clustering_row_states, _generate_delta_values);
|
||||||
|
if (_is_update && _request_options.alternator) {
|
||||||
|
v._marker_op = operation::update;
|
||||||
|
}
|
||||||
visit_row_cells(v);
|
visit_row_cells(v);
|
||||||
|
|
||||||
if (_enable_updating_state) {
|
if (_enable_updating_state) {
|
||||||
@@ -1602,6 +1631,11 @@ private:
|
|||||||
|
|
||||||
row_states_map _clustering_row_states;
|
row_states_map _clustering_row_states;
|
||||||
cell_map _static_row_state;
|
cell_map _static_row_state;
|
||||||
|
// True if the mutated row existed before applying the mutation. In other
|
||||||
|
// words, if the preimage is enabled and it isn't empty (otherwise, we
|
||||||
|
// assume that the row is non-existent). Used for Alternator Streams (see
|
||||||
|
// #6918).
|
||||||
|
bool _is_update = false;
|
||||||
|
|
||||||
const bool _uses_tablets;
|
const bool _uses_tablets;
|
||||||
|
|
||||||
@@ -1728,6 +1762,7 @@ public:
|
|||||||
._enable_updating_state = _enable_updating_state,
|
._enable_updating_state = _enable_updating_state,
|
||||||
._clustering_row_states = _clustering_row_states,
|
._clustering_row_states = _clustering_row_states,
|
||||||
._static_row_state = _static_row_state,
|
._static_row_state = _static_row_state,
|
||||||
|
._is_update = _is_update,
|
||||||
._generate_delta_values = generate_delta_values(_builder->base_schema())
|
._generate_delta_values = generate_delta_values(_builder->base_schema())
|
||||||
};
|
};
|
||||||
cdc::inspect_mutation(m, v);
|
cdc::inspect_mutation(m, v);
|
||||||
@@ -1738,6 +1773,10 @@ public:
|
|||||||
_builder->end_record();
|
_builder->end_record();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const row_states_map& clustering_row_states() const override {
|
||||||
|
return _clustering_row_states;
|
||||||
|
}
|
||||||
|
|
||||||
// Takes and returns generated cdc log mutations and associated statistics about parts touched during transformer's lifetime.
|
// Takes and returns generated cdc log mutations and associated statistics about parts touched during transformer's lifetime.
|
||||||
// The `transformer` object on which this method was called on should not be used anymore.
|
// The `transformer` object on which this method was called on should not be used anymore.
|
||||||
std::tuple<utils::chunked_vector<mutation>, stats::part_type_set> finish() && {
|
std::tuple<utils::chunked_vector<mutation>, stats::part_type_set> finish() && {
|
||||||
@@ -1861,6 +1900,7 @@ public:
|
|||||||
_static_row_state[&c] = std::move(*maybe_cell_view);
|
_static_row_state[&c] = std::move(*maybe_cell_view);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
_is_update = true;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (static_only) {
|
if (static_only) {
|
||||||
@@ -1948,6 +1988,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
|
|||||||
return make_ready_future<>();
|
return make_ready_future<>();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
const bool alternator_increased_compatibility = options.alternator && options.alternator_streams_increased_compatibility;
|
||||||
transformer trans(_ctxt, s, m.decorated_key(), options);
|
transformer trans(_ctxt, s, m.decorated_key(), options);
|
||||||
|
|
||||||
auto f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(nullptr);
|
auto f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(nullptr);
|
||||||
@@ -1955,7 +1996,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
|
|||||||
// Preimage has been fetched by upper layers.
|
// Preimage has been fetched by upper layers.
|
||||||
tracing::trace(tr_state, "CDC: Using a prefetched preimage");
|
tracing::trace(tr_state, "CDC: Using a prefetched preimage");
|
||||||
f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(options.preimage);
|
f = make_ready_future<lw_shared_ptr<cql3::untyped_result_set>>(options.preimage);
|
||||||
} else if (s->cdc_options().preimage() || s->cdc_options().postimage()) {
|
} else if (s->cdc_options().preimage() || s->cdc_options().postimage() || alternator_increased_compatibility) {
|
||||||
// Note: further improvement here would be to coalesce the pre-image selects into one
|
// Note: further improvement here would be to coalesce the pre-image selects into one
|
||||||
// if a batch contains several modifications to the same table. Otoh, batch is rare(?)
|
// if a batch contains several modifications to the same table. Otoh, batch is rare(?)
|
||||||
// so this is premature.
|
// so this is premature.
|
||||||
@@ -1972,7 +2013,7 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
|
|||||||
tracing::trace(tr_state, "CDC: Preimage not enabled for the table, not querying current value of {}", m.decorated_key());
|
tracing::trace(tr_state, "CDC: Preimage not enabled for the table, not querying current value of {}", m.decorated_key());
|
||||||
}
|
}
|
||||||
|
|
||||||
return f.then([trans = std::move(trans), &mutations, idx, tr_state, &details] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
|
return f.then([alternator_increased_compatibility, trans = std::move(trans), &mutations, idx, tr_state, &details, &options] (lw_shared_ptr<cql3::untyped_result_set> rs) mutable {
|
||||||
auto& m = mutations[idx];
|
auto& m = mutations[idx];
|
||||||
auto& s = m.schema();
|
auto& s = m.schema();
|
||||||
|
|
||||||
@@ -1987,13 +2028,13 @@ cdc::cdc_service::impl::augment_mutation_call(lowres_clock::time_point timeout,
|
|||||||
details.had_preimage |= preimage;
|
details.had_preimage |= preimage;
|
||||||
details.had_postimage |= postimage;
|
details.had_postimage |= postimage;
|
||||||
tracing::trace(tr_state, "CDC: Generating log mutations for {}", m.decorated_key());
|
tracing::trace(tr_state, "CDC: Generating log mutations for {}", m.decorated_key());
|
||||||
if (should_split(m)) {
|
if (should_split(m, options)) {
|
||||||
tracing::trace(tr_state, "CDC: Splitting {}", m.decorated_key());
|
tracing::trace(tr_state, "CDC: Splitting {}", m.decorated_key());
|
||||||
details.was_split = true;
|
details.was_split = true;
|
||||||
process_changes_with_splitting(m, trans, preimage, postimage);
|
process_changes_with_splitting(m, trans, preimage, postimage, alternator_increased_compatibility);
|
||||||
} else {
|
} else {
|
||||||
tracing::trace(tr_state, "CDC: No need to split {}", m.decorated_key());
|
tracing::trace(tr_state, "CDC: No need to split {}", m.decorated_key());
|
||||||
process_changes_without_splitting(m, trans, preimage, postimage);
|
process_changes_without_splitting(m, trans, preimage, postimage, alternator_increased_compatibility);
|
||||||
}
|
}
|
||||||
auto [log_mut, touched_parts] = std::move(trans).finish();
|
auto [log_mut, touched_parts] = std::move(trans).finish();
|
||||||
const int generated_count = log_mut.size();
|
const int generated_count = log_mut.size();
|
||||||
|
|||||||
14
cdc/log.hh
14
cdc/log.hh
@@ -52,6 +52,9 @@ class database;
|
|||||||
|
|
||||||
namespace cdc {
|
namespace cdc {
|
||||||
|
|
||||||
|
using cell_map = std::unordered_map<const column_definition*, managed_bytes_opt>;
|
||||||
|
using row_states_map = std::unordered_map<clustering_key, cell_map, clustering_key::hashing, clustering_key::equality>;
|
||||||
|
|
||||||
// cdc log table operation
|
// cdc log table operation
|
||||||
enum class operation : int8_t {
|
enum class operation : int8_t {
|
||||||
// note: these values will eventually be read by a third party, probably not privvy to this
|
// note: these values will eventually be read by a third party, probably not privvy to this
|
||||||
@@ -73,6 +76,14 @@ struct per_request_options {
|
|||||||
// Scylla. Currently, only TTL expiration implementation for Alternator
|
// Scylla. Currently, only TTL expiration implementation for Alternator
|
||||||
// uses this.
|
// uses this.
|
||||||
const bool is_system_originated = false;
|
const bool is_system_originated = false;
|
||||||
|
// True if this mutation was emitted by Alternator.
|
||||||
|
const bool alternator = false;
|
||||||
|
// Sacrifice performance for the sake of better compatibility with DynamoDB
|
||||||
|
// Streams. It's important for correctness that
|
||||||
|
// alternator_streams_increased_compatibility config flag be read once per
|
||||||
|
// request, because it's live-updateable. As a result, the flag may change
|
||||||
|
// between reads.
|
||||||
|
const bool alternator_streams_increased_compatibility = false;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct operation_result_tracker;
|
struct operation_result_tracker;
|
||||||
@@ -142,4 +153,7 @@ bool is_cdc_metacolumn_name(const sstring& name);
|
|||||||
|
|
||||||
utils::UUID generate_timeuuid(api::timestamp_type t);
|
utils::UUID generate_timeuuid(api::timestamp_type t);
|
||||||
|
|
||||||
|
cell_map* get_row_state(row_states_map& row_states, const clustering_key& ck);
|
||||||
|
const cell_map* get_row_state(const row_states_map& row_states, const clustering_key& ck);
|
||||||
|
|
||||||
} // namespace cdc
|
} // namespace cdc
|
||||||
|
|||||||
163
cdc/split.cc
163
cdc/split.cc
@@ -6,15 +6,28 @@
|
|||||||
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
|
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
#include "bytes.hh"
|
||||||
|
#include "bytes_fwd.hh"
|
||||||
|
#include "mutation/atomic_cell.hh"
|
||||||
|
#include "mutation/atomic_cell_or_collection.hh"
|
||||||
|
#include "mutation/collection_mutation.hh"
|
||||||
#include "mutation/mutation.hh"
|
#include "mutation/mutation.hh"
|
||||||
|
#include "mutation/tombstone.hh"
|
||||||
#include "schema/schema.hh"
|
#include "schema/schema.hh"
|
||||||
|
|
||||||
|
#include "seastar/core/sstring.hh"
|
||||||
#include "types/concrete_types.hh"
|
#include "types/concrete_types.hh"
|
||||||
|
#include "types/types.hh"
|
||||||
#include "types/user.hh"
|
#include "types/user.hh"
|
||||||
|
|
||||||
#include "split.hh"
|
#include "split.hh"
|
||||||
#include "log.hh"
|
#include "log.hh"
|
||||||
#include "change_visitor.hh"
|
#include "change_visitor.hh"
|
||||||
|
#include "utils/managed_bytes.hh"
|
||||||
|
#include <string_view>
|
||||||
|
#include <unordered_map>
|
||||||
|
|
||||||
|
extern logging::logger cdc_log;
|
||||||
|
|
||||||
struct atomic_column_update {
|
struct atomic_column_update {
|
||||||
column_id id;
|
column_id id;
|
||||||
@@ -490,6 +503,8 @@ struct should_split_visitor {
|
|||||||
// Otherwise we store the change's ttl.
|
// Otherwise we store the change's ttl.
|
||||||
std::optional<gc_clock::duration> _ttl = std::nullopt;
|
std::optional<gc_clock::duration> _ttl = std::nullopt;
|
||||||
|
|
||||||
|
virtual ~should_split_visitor() = default;
|
||||||
|
|
||||||
inline bool finished() const { return _result; }
|
inline bool finished() const { return _result; }
|
||||||
inline void stop() { _result = true; }
|
inline void stop() { _result = true; }
|
||||||
|
|
||||||
@@ -512,7 +527,7 @@ struct should_split_visitor {
|
|||||||
|
|
||||||
void collection_tombstone(const tombstone& t) { visit(t.timestamp + 1); }
|
void collection_tombstone(const tombstone& t) { visit(t.timestamp + 1); }
|
||||||
|
|
||||||
void live_collection_cell(bytes_view, const atomic_cell_view& cell) {
|
virtual void live_collection_cell(bytes_view, const atomic_cell_view& cell) {
|
||||||
if (_had_row_marker) {
|
if (_had_row_marker) {
|
||||||
// nonatomic updates cannot be expressed with an INSERT.
|
// nonatomic updates cannot be expressed with an INSERT.
|
||||||
return stop();
|
return stop();
|
||||||
@@ -522,7 +537,7 @@ struct should_split_visitor {
|
|||||||
void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
|
void dead_collection_cell(bytes_view, const atomic_cell_view& cell) { visit(cell); }
|
||||||
void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); }
|
void collection_column(const column_definition&, auto&& visit_collection) { visit_collection(*this); }
|
||||||
|
|
||||||
void marker(const row_marker& rm) {
|
virtual void marker(const row_marker& rm) {
|
||||||
_had_row_marker = true;
|
_had_row_marker = true;
|
||||||
visit(rm.timestamp(), get_ttl(rm));
|
visit(rm.timestamp(), get_ttl(rm));
|
||||||
}
|
}
|
||||||
@@ -563,7 +578,29 @@ struct should_split_visitor {
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
bool should_split(const mutation& m) {
|
// This is the same as the above, but it doesn't split a row marker away from
|
||||||
|
// an update. As a result, updates that create an item appear as a single log
|
||||||
|
// row.
|
||||||
|
class alternator_should_split_visitor : public should_split_visitor {
|
||||||
|
public:
|
||||||
|
~alternator_should_split_visitor() override = default;
|
||||||
|
|
||||||
|
void live_collection_cell(bytes_view, const atomic_cell_view& cell) override {
|
||||||
|
visit(cell.timestamp());
|
||||||
|
}
|
||||||
|
|
||||||
|
void marker(const row_marker& rm) override {
|
||||||
|
visit(rm.timestamp());
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
|
bool should_split(const mutation& m, const per_request_options& options) {
|
||||||
|
if (options.alternator) {
|
||||||
|
alternator_should_split_visitor v;
|
||||||
|
cdc::inspect_mutation(m, v);
|
||||||
|
return v._result || v._ts == api::missing_timestamp;
|
||||||
|
}
|
||||||
|
|
||||||
should_split_visitor v;
|
should_split_visitor v;
|
||||||
|
|
||||||
cdc::inspect_mutation(m, v);
|
cdc::inspect_mutation(m, v);
|
||||||
@@ -573,8 +610,109 @@ bool should_split(const mutation& m) {
|
|||||||
|| v._ts == api::missing_timestamp;
|
|| v._ts == api::missing_timestamp;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Returns true if the row state and the atomic and nonatomic entries represent
|
||||||
|
// an equivalent item.
|
||||||
|
static bool entries_match_row_state(const schema_ptr& base_schema, const cell_map& row_state, const std::vector<atomic_column_update>& atomic_entries,
|
||||||
|
std::vector<nonatomic_column_update>& nonatomic_entries) {
|
||||||
|
for (const auto& update : atomic_entries) {
|
||||||
|
const column_definition& cdef = base_schema->column_at(column_kind::regular_column, update.id);
|
||||||
|
const auto it = row_state.find(&cdef);
|
||||||
|
if (it == row_state.end()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if (to_managed_bytes_opt(update.cell.value().linearize()) != it->second) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
if (nonatomic_entries.empty()) {
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const auto& update : nonatomic_entries) {
|
||||||
|
const column_definition& cdef = base_schema->column_at(column_kind::regular_column, update.id);
|
||||||
|
const auto it = row_state.find(&cdef);
|
||||||
|
if (it == row_state.end()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
// The only collection used by Alternator is a non-frozen map.
|
||||||
|
auto current_raw_map = cdef.type->deserialize(*it->second);
|
||||||
|
map_type_impl::native_type current_values = value_cast<map_type_impl::native_type>(current_raw_map);
|
||||||
|
|
||||||
|
if (current_values.size() != update.cells.size()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
std::unordered_map<sstring_view, bytes> current_values_map;
|
||||||
|
for (const auto& entry : current_values) {
|
||||||
|
const auto attr_name = std::string_view(value_cast<sstring>(entry.first));
|
||||||
|
current_values_map[attr_name] = value_cast<bytes>(entry.second);
|
||||||
|
}
|
||||||
|
|
||||||
|
for (const auto& [key, value] : update.cells) {
|
||||||
|
const auto key_str = to_string_view(key);
|
||||||
|
if (!value.is_live()) {
|
||||||
|
if (current_values_map.contains(key_str)) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
} else if (current_values_map[key_str] != value.value().linearize()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
|
bool should_skip(batch& changes, const mutation& base_mutation, change_processor& processor) {
|
||||||
|
const schema_ptr& base_schema = base_mutation.schema();
|
||||||
|
// Alternator doesn't use static updates and clustered range deletions.
|
||||||
|
if (!changes.static_updates.empty() || !changes.clustered_range_deletions.empty()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
for (clustered_row_insert& u : changes.clustered_inserts) {
|
||||||
|
const cell_map* row_state = get_row_state(processor.clustering_row_states(), u.key);
|
||||||
|
if (!row_state) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if (!entries_match_row_state(base_schema, *row_state, u.atomic_entries, u.nonatomic_entries)) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (clustered_row_update& u : changes.clustered_updates) {
|
||||||
|
const cell_map* row_state = get_row_state(processor.clustering_row_states(), u.key);
|
||||||
|
if (!row_state) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if (!entries_match_row_state(base_schema, *row_state, u.atomic_entries, u.nonatomic_entries)) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Skip only if the row being deleted does not exist (i.e. the deletion is a no-op).
|
||||||
|
for (const auto& row_deletion : changes.clustered_row_deletions) {
|
||||||
|
if (processor.clustering_row_states().contains(row_deletion.key)) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Don't skip if the item exists.
|
||||||
|
//
|
||||||
|
// Increased DynamoDB Streams compatibility guarantees that single-item
|
||||||
|
// operations will read the item and store it in the clustering row states.
|
||||||
|
// If it is not found there, we may skip CDC. This is safe as long as the
|
||||||
|
// assumptions of this operation's write isolation are not violated.
|
||||||
|
if (changes.partition_deletions && processor.clustering_row_states().contains(clustering_key::make_empty())) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
|
||||||
|
cdc_log.trace("Skipping CDC log for mutation {}", base_mutation);
|
||||||
|
return true;
|
||||||
|
}
|
||||||
|
|
||||||
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
|
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
|
||||||
bool enable_preimage, bool enable_postimage) {
|
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility) {
|
||||||
const auto base_schema = base_mutation.schema();
|
const auto base_schema = base_mutation.schema();
|
||||||
auto changes = extract_changes(base_mutation);
|
auto changes = extract_changes(base_mutation);
|
||||||
auto pk = base_mutation.key();
|
auto pk = base_mutation.key();
|
||||||
@@ -586,9 +724,6 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
|
|||||||
const auto last_timestamp = changes.rbegin()->first;
|
const auto last_timestamp = changes.rbegin()->first;
|
||||||
|
|
||||||
for (auto& [change_ts, btch] : changes) {
|
for (auto& [change_ts, btch] : changes) {
|
||||||
const bool is_last = change_ts == last_timestamp;
|
|
||||||
processor.begin_timestamp(change_ts, is_last);
|
|
||||||
|
|
||||||
clustered_column_set affected_clustered_columns_per_row{clustering_key::less_compare(*base_schema)};
|
clustered_column_set affected_clustered_columns_per_row{clustering_key::less_compare(*base_schema)};
|
||||||
one_kind_column_set affected_static_columns{base_schema->static_columns_count()};
|
one_kind_column_set affected_static_columns{base_schema->static_columns_count()};
|
||||||
|
|
||||||
@@ -597,6 +732,12 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
|
|||||||
affected_clustered_columns_per_row = btch.get_affected_clustered_columns_per_row(*base_mutation.schema());
|
affected_clustered_columns_per_row = btch.get_affected_clustered_columns_per_row(*base_mutation.schema());
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (alternator_strict_compatibility && should_skip(btch, base_mutation, processor)) {
|
||||||
|
continue;
|
||||||
|
}
|
||||||
|
|
||||||
|
const bool is_last = change_ts == last_timestamp;
|
||||||
|
processor.begin_timestamp(change_ts, is_last);
|
||||||
if (enable_preimage) {
|
if (enable_preimage) {
|
||||||
if (affected_static_columns.count() > 0) {
|
if (affected_static_columns.count() > 0) {
|
||||||
processor.produce_preimage(nullptr, affected_static_columns);
|
processor.produce_preimage(nullptr, affected_static_columns);
|
||||||
@@ -684,7 +825,13 @@ void process_changes_with_splitting(const mutation& base_mutation, change_proces
|
|||||||
}
|
}
|
||||||
|
|
||||||
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
|
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
|
||||||
bool enable_preimage, bool enable_postimage) {
|
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility) {
|
||||||
|
if (alternator_strict_compatibility) {
|
||||||
|
auto changes = extract_changes(base_mutation);
|
||||||
|
if (should_skip(changes.begin()->second, base_mutation, processor)) {
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
}
|
||||||
auto ts = find_timestamp(base_mutation);
|
auto ts = find_timestamp(base_mutation);
|
||||||
processor.begin_timestamp(ts, true);
|
processor.begin_timestamp(ts, true);
|
||||||
|
|
||||||
|
|||||||
@@ -9,6 +9,7 @@
|
|||||||
#pragma once
|
#pragma once
|
||||||
|
|
||||||
#include <boost/dynamic_bitset.hpp> // IWYU pragma: keep
|
#include <boost/dynamic_bitset.hpp> // IWYU pragma: keep
|
||||||
|
#include "cdc/log.hh"
|
||||||
#include "replica/database_fwd.hh"
|
#include "replica/database_fwd.hh"
|
||||||
#include "mutation/timestamp.hh"
|
#include "mutation/timestamp.hh"
|
||||||
|
|
||||||
@@ -65,12 +66,14 @@ public:
|
|||||||
// Tells processor we have reached end of record - last part
|
// Tells processor we have reached end of record - last part
|
||||||
// of a given timestamp batch
|
// of a given timestamp batch
|
||||||
virtual void end_record() = 0;
|
virtual void end_record() = 0;
|
||||||
|
|
||||||
|
virtual const row_states_map& clustering_row_states() const = 0;
|
||||||
};
|
};
|
||||||
|
|
||||||
bool should_split(const mutation& base_mutation);
|
bool should_split(const mutation& base_mutation, const per_request_options& options);
|
||||||
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
|
void process_changes_with_splitting(const mutation& base_mutation, change_processor& processor,
|
||||||
bool enable_preimage, bool enable_postimage);
|
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility);
|
||||||
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
|
void process_changes_without_splitting(const mutation& base_mutation, change_processor& processor,
|
||||||
bool enable_preimage, bool enable_postimage);
|
bool enable_preimage, bool enable_postimage, bool alternator_strict_compatibility);
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -21,5 +21,8 @@ target_link_libraries(compaction
|
|||||||
mutation_writer
|
mutation_writer
|
||||||
replica)
|
replica)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(compaction REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers compaction
|
check_headers(check-headers compaction
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
@@ -867,8 +867,8 @@ auto fmt::formatter<compaction::compaction_task_executor>::format(const compacti
|
|||||||
|
|
||||||
namespace compaction {
|
namespace compaction {
|
||||||
|
|
||||||
inline compaction_controller make_compaction_controller(const compaction_manager::scheduling_group& csg, uint64_t static_shares, std::function<double()> fn) {
|
inline compaction_controller make_compaction_controller(const compaction_manager::scheduling_group& csg, uint64_t static_shares, std::optional<float> max_shares, std::function<double()> fn) {
|
||||||
return compaction_controller(csg, static_shares, 250ms, std::move(fn));
|
return compaction_controller(csg, static_shares, max_shares, 250ms, std::move(fn));
|
||||||
}
|
}
|
||||||
|
|
||||||
compaction::compaction_state::~compaction_state() {
|
compaction::compaction_state::~compaction_state() {
|
||||||
@@ -1014,7 +1014,7 @@ compaction_manager::compaction_manager(config cfg, abort_source& as, tasks::task
|
|||||||
, _sys_ks("compaction_manager::system_keyspace")
|
, _sys_ks("compaction_manager::system_keyspace")
|
||||||
, _cfg(std::move(cfg))
|
, _cfg(std::move(cfg))
|
||||||
, _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
|
, _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
|
||||||
, _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), [this] () -> float {
|
, _compaction_controller(make_compaction_controller(compaction_sg(), static_shares(), _cfg.max_shares.get(), [this] () -> float {
|
||||||
_last_backlog = backlog();
|
_last_backlog = backlog();
|
||||||
auto b = _last_backlog / available_memory();
|
auto b = _last_backlog / available_memory();
|
||||||
// This means we are using an unimplemented strategy
|
// This means we are using an unimplemented strategy
|
||||||
@@ -1033,6 +1033,10 @@ compaction_manager::compaction_manager(config cfg, abort_source& as, tasks::task
|
|||||||
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
|
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
|
||||||
, _update_compaction_static_shares_action([this] { return update_static_shares(static_shares()); })
|
, _update_compaction_static_shares_action([this] { return update_static_shares(static_shares()); })
|
||||||
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
|
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
|
||||||
|
, _compaction_max_shares_observer(_cfg.max_shares.observe([this] (const float& max_shares) {
|
||||||
|
cmlog.info("Updating max shares to {}", max_shares);
|
||||||
|
_compaction_controller.set_max_shares(max_shares);
|
||||||
|
}))
|
||||||
, _strategy_control(std::make_unique<strategy_control>(*this))
|
, _strategy_control(std::make_unique<strategy_control>(*this))
|
||||||
, _tombstone_gc_state(_shared_tombstone_gc_state) {
|
, _tombstone_gc_state(_shared_tombstone_gc_state) {
|
||||||
tm.register_module(_task_manager_module->get_name(), _task_manager_module);
|
tm.register_module(_task_manager_module->get_name(), _task_manager_module);
|
||||||
@@ -1051,11 +1055,12 @@ compaction_manager::compaction_manager(tasks::task_manager& tm)
|
|||||||
, _sys_ks("compaction_manager::system_keyspace")
|
, _sys_ks("compaction_manager::system_keyspace")
|
||||||
, _cfg(config{ .available_memory = 1 })
|
, _cfg(config{ .available_memory = 1 })
|
||||||
, _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
|
, _compaction_submission_timer(compaction_sg(), compaction_submission_callback())
|
||||||
, _compaction_controller(make_compaction_controller(compaction_sg(), 1, [] () -> float { return 1.0; }))
|
, _compaction_controller(make_compaction_controller(compaction_sg(), 1, std::nullopt, [] () -> float { return 1.0; }))
|
||||||
, _backlog_manager(_compaction_controller)
|
, _backlog_manager(_compaction_controller)
|
||||||
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
|
, _throughput_updater(serialized_action([this] { return update_throughput(throughput_mbs()); }))
|
||||||
, _update_compaction_static_shares_action([] { return make_ready_future<>(); })
|
, _update_compaction_static_shares_action([] { return make_ready_future<>(); })
|
||||||
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
|
, _compaction_static_shares_observer(_cfg.static_shares.observe(_update_compaction_static_shares_action.make_observer()))
|
||||||
|
, _compaction_max_shares_observer(_cfg.max_shares.observe([] (const float& max_shares) {}))
|
||||||
, _strategy_control(std::make_unique<strategy_control>(*this))
|
, _strategy_control(std::make_unique<strategy_control>(*this))
|
||||||
, _tombstone_gc_state(_shared_tombstone_gc_state) {
|
, _tombstone_gc_state(_shared_tombstone_gc_state) {
|
||||||
tm.register_module(_task_manager_module->get_name(), _task_manager_module);
|
tm.register_module(_task_manager_module->get_name(), _task_manager_module);
|
||||||
|
|||||||
@@ -80,6 +80,7 @@ public:
|
|||||||
scheduling_group maintenance_sched_group;
|
scheduling_group maintenance_sched_group;
|
||||||
size_t available_memory = 0;
|
size_t available_memory = 0;
|
||||||
utils::updateable_value<float> static_shares = utils::updateable_value<float>(0);
|
utils::updateable_value<float> static_shares = utils::updateable_value<float>(0);
|
||||||
|
utils::updateable_value<float> max_shares = utils::updateable_value<float>(0);
|
||||||
utils::updateable_value<uint32_t> throughput_mb_per_sec = utils::updateable_value<uint32_t>(0);
|
utils::updateable_value<uint32_t> throughput_mb_per_sec = utils::updateable_value<uint32_t>(0);
|
||||||
std::chrono::seconds flush_all_tables_before_major = std::chrono::duration_cast<std::chrono::seconds>(std::chrono::days(1));
|
std::chrono::seconds flush_all_tables_before_major = std::chrono::duration_cast<std::chrono::seconds>(std::chrono::days(1));
|
||||||
};
|
};
|
||||||
@@ -159,6 +160,7 @@ private:
|
|||||||
std::optional<utils::observer<uint32_t>> _throughput_option_observer;
|
std::optional<utils::observer<uint32_t>> _throughput_option_observer;
|
||||||
serialized_action _update_compaction_static_shares_action;
|
serialized_action _update_compaction_static_shares_action;
|
||||||
utils::observer<float> _compaction_static_shares_observer;
|
utils::observer<float> _compaction_static_shares_observer;
|
||||||
|
utils::observer<float> _compaction_max_shares_observer;
|
||||||
uint64_t _validation_errors = 0;
|
uint64_t _validation_errors = 0;
|
||||||
|
|
||||||
class strategy_control;
|
class strategy_control;
|
||||||
@@ -291,6 +293,10 @@ public:
|
|||||||
return _cfg.static_shares.get();
|
return _cfg.static_shares.get();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
float max_shares() const noexcept {
|
||||||
|
return _cfg.max_shares.get();
|
||||||
|
}
|
||||||
|
|
||||||
uint32_t throughput_mbs() const noexcept {
|
uint32_t throughput_mbs() const noexcept {
|
||||||
return _cfg.throughput_mb_per_sec.get();
|
return _cfg.throughput_mb_per_sec.get();
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -227,7 +227,7 @@ future<> run_table_tasks(replica::database& db, std::vector<table_tasks_info> ta
|
|||||||
// Tables will be kept in descending order.
|
// Tables will be kept in descending order.
|
||||||
std::ranges::sort(table_tasks, std::greater<>(), [&] (const table_tasks_info& tti) {
|
std::ranges::sort(table_tasks, std::greater<>(), [&] (const table_tasks_info& tti) {
|
||||||
try {
|
try {
|
||||||
return db.find_column_family(tti.ti.id).get_stats().live_disk_space_used;
|
return db.find_column_family(tti.ti.id).get_stats().live_disk_space_used.on_disk;
|
||||||
} catch (const replica::no_such_column_family& e) {
|
} catch (const replica::no_such_column_family& e) {
|
||||||
return int64_t(-1);
|
return int64_t(-1);
|
||||||
}
|
}
|
||||||
@@ -281,7 +281,7 @@ future<> run_keyspace_tasks(replica::database& db, std::vector<keyspace_tasks_in
|
|||||||
try {
|
try {
|
||||||
return std::accumulate(kti.table_infos.begin(), kti.table_infos.end(), int64_t(0), [&] (int64_t sum, const table_info& t) {
|
return std::accumulate(kti.table_infos.begin(), kti.table_infos.end(), int64_t(0), [&] (int64_t sum, const table_info& t) {
|
||||||
try {
|
try {
|
||||||
sum += db.find_column_family(t.id).get_stats().live_disk_space_used;
|
sum += db.find_column_family(t.id).get_stats().live_disk_space_used.on_disk;
|
||||||
} catch (const replica::no_such_column_family&) {
|
} catch (const replica::no_such_column_family&) {
|
||||||
// ignore
|
// ignore
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -888,9 +888,18 @@ rf_rack_valid_keyspaces: false
|
|||||||
#
|
#
|
||||||
# Vector Store options
|
# Vector Store options
|
||||||
#
|
#
|
||||||
# A comma-separated list of URIs for the vector store using DNS name. Only HTTP schema is supported. Port number is mandatory.
|
# HTTP and HTTPS schemes are supported. Port number is mandatory.
|
||||||
# Default is empty, which means that the vector store is not used.
|
# If both `vector_store_primary_uri` and `vector_store_secondary_uri` are unset or empty, vector search is disabled.
|
||||||
|
#
|
||||||
|
# A comma-separated list of primary vector store node URIs. These nodes are preferred for vector search operations.
|
||||||
# vector_store_primary_uri: http://vector-store.dns.name:{port}
|
# vector_store_primary_uri: http://vector-store.dns.name:{port}
|
||||||
|
#
|
||||||
|
# A comma-separated list of secondary vector store node URIs. These nodes are used as a fallback when all primary nodes are unavailable, and are typically located in a different availability zone for high availability.
|
||||||
|
# vector_store_secondary_uri: http://vector-store.dns.name:{port}
|
||||||
|
#
|
||||||
|
# Options for encrypted connections to the vector store. These options are used for HTTPS URIs in vector_store_primary_uri and vector_store_secondary_uri.
|
||||||
|
# vector_store_encryption_options:
|
||||||
|
# truststore: <not set, use system trust>
|
||||||
|
|
||||||
#
|
#
|
||||||
# io-streaming rate limiting
|
# io-streaming rate limiting
|
||||||
|
|||||||
99
configure.py
99
configure.py
@@ -445,6 +445,7 @@ ldap_tests = set([
|
|||||||
scylla_tests = set([
|
scylla_tests = set([
|
||||||
'test/boost/combined_tests',
|
'test/boost/combined_tests',
|
||||||
'test/boost/UUID_test',
|
'test/boost/UUID_test',
|
||||||
|
'test/boost/url_parse_test',
|
||||||
'test/boost/advanced_rpc_compressor_test',
|
'test/boost/advanced_rpc_compressor_test',
|
||||||
'test/boost/allocation_strategy_test',
|
'test/boost/allocation_strategy_test',
|
||||||
'test/boost/alternator_unit_test',
|
'test/boost/alternator_unit_test',
|
||||||
@@ -646,6 +647,28 @@ vector_search_tests = set([
|
|||||||
'test/vector_search/client_test'
|
'test/vector_search/client_test'
|
||||||
])
|
])
|
||||||
|
|
||||||
|
vector_search_validator_bin = 'vector-search-validator/bin/vector-search-validator'
|
||||||
|
vector_search_validator_deps = set([
|
||||||
|
'test/vector_search_validator/build-validator',
|
||||||
|
'test/vector_search_validator/Cargo.toml',
|
||||||
|
'test/vector_search_validator/crates/validator/Cargo.toml',
|
||||||
|
'test/vector_search_validator/crates/validator/src/main.rs',
|
||||||
|
'test/vector_search_validator/crates/validator-scylla/Cargo.toml',
|
||||||
|
'test/vector_search_validator/crates/validator-scylla/src/lib.rs',
|
||||||
|
'test/vector_search_validator/crates/validator-scylla/src/cql.rs',
|
||||||
|
])
|
||||||
|
|
||||||
|
vector_store_bin = 'vector-search-validator/bin/vector-store'
|
||||||
|
vector_store_deps = set([
|
||||||
|
'test/vector_search_validator/build-env',
|
||||||
|
'test/vector_search_validator/build-vector-store',
|
||||||
|
])
|
||||||
|
|
||||||
|
vector_search_validator_bins = set([
|
||||||
|
vector_search_validator_bin,
|
||||||
|
vector_store_bin,
|
||||||
|
])
|
||||||
|
|
||||||
wasms = set([
|
wasms = set([
|
||||||
'wasm/return_input.wat',
|
'wasm/return_input.wat',
|
||||||
'wasm/test_complex_null_values.wat',
|
'wasm/test_complex_null_values.wat',
|
||||||
@@ -679,7 +702,7 @@ other = set([
|
|||||||
'iotune',
|
'iotune',
|
||||||
])
|
])
|
||||||
|
|
||||||
all_artifacts = apps | cpp_apps | tests | other | wasms
|
all_artifacts = apps | cpp_apps | tests | other | wasms | vector_search_validator_bins
|
||||||
|
|
||||||
arg_parser = argparse.ArgumentParser('Configure scylla', add_help=False, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
|
arg_parser = argparse.ArgumentParser('Configure scylla', add_help=False, formatter_class=argparse.ArgumentDefaultsHelpFormatter)
|
||||||
arg_parser.add_argument('--out', dest='buildfile', action='store', default='build.ninja',
|
arg_parser.add_argument('--out', dest='buildfile', action='store', default='build.ninja',
|
||||||
@@ -763,6 +786,7 @@ arg_parser.add_argument('--use-cmake', action=argparse.BooleanOptionalAction, de
|
|||||||
arg_parser.add_argument('--coverage', action = 'store_true', help = 'Compile scylla with coverage instrumentation')
|
arg_parser.add_argument('--coverage', action = 'store_true', help = 'Compile scylla with coverage instrumentation')
|
||||||
arg_parser.add_argument('--build-dir', action='store', default='build',
|
arg_parser.add_argument('--build-dir', action='store', default='build',
|
||||||
help='Build directory path')
|
help='Build directory path')
|
||||||
|
arg_parser.add_argument('--disable-precompiled-header', action='store_true', default=False, help='Disable precompiled header for scylla binary')
|
||||||
arg_parser.add_argument('-h', '--help', action='store_true', help='show this help message and exit')
|
arg_parser.add_argument('-h', '--help', action='store_true', help='show this help message and exit')
|
||||||
args = arg_parser.parse_args()
|
args = arg_parser.parse_args()
|
||||||
if args.help:
|
if args.help:
|
||||||
@@ -1172,6 +1196,7 @@ scylla_core = (['message/messaging_service.cc',
|
|||||||
'auth/allow_all_authorizer.cc',
|
'auth/allow_all_authorizer.cc',
|
||||||
'auth/authenticated_user.cc',
|
'auth/authenticated_user.cc',
|
||||||
'auth/authenticator.cc',
|
'auth/authenticator.cc',
|
||||||
|
'auth/cache.cc',
|
||||||
'auth/common.cc',
|
'auth/common.cc',
|
||||||
'auth/default_authorizer.cc',
|
'auth/default_authorizer.cc',
|
||||||
'auth/resource.cc',
|
'auth/resource.cc',
|
||||||
@@ -1268,7 +1293,8 @@ scylla_core = (['message/messaging_service.cc',
|
|||||||
'vector_search/vector_store_client.cc',
|
'vector_search/vector_store_client.cc',
|
||||||
'vector_search/dns.cc',
|
'vector_search/dns.cc',
|
||||||
'vector_search/client.cc',
|
'vector_search/client.cc',
|
||||||
'vector_search/clients.cc'
|
'vector_search/clients.cc',
|
||||||
|
'vector_search/truststore.cc'
|
||||||
] + [Antlr3Grammar('cql3/Cql.g')] \
|
] + [Antlr3Grammar('cql3/Cql.g')] \
|
||||||
+ scylla_raft_core
|
+ scylla_raft_core
|
||||||
)
|
)
|
||||||
@@ -1579,6 +1605,7 @@ deps['test/boost/combined_tests'] += [
|
|||||||
'test/boost/query_processor_test.cc',
|
'test/boost/query_processor_test.cc',
|
||||||
'test/boost/reader_concurrency_semaphore_test.cc',
|
'test/boost/reader_concurrency_semaphore_test.cc',
|
||||||
'test/boost/repair_test.cc',
|
'test/boost/repair_test.cc',
|
||||||
|
'test/boost/replicator_test.cc',
|
||||||
'test/boost/restrictions_test.cc',
|
'test/boost/restrictions_test.cc',
|
||||||
'test/boost/role_manager_test.cc',
|
'test/boost/role_manager_test.cc',
|
||||||
'test/boost/row_cache_test.cc',
|
'test/boost/row_cache_test.cc',
|
||||||
@@ -1621,6 +1648,7 @@ deps['test/boost/bytes_ostream_test'] = [
|
|||||||
]
|
]
|
||||||
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
|
deps['test/boost/input_stream_test'] = ['test/boost/input_stream_test.cc']
|
||||||
deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
|
deps['test/boost/UUID_test'] = ['clocks-impl.cc', 'utils/UUID_gen.cc', 'test/boost/UUID_test.cc', 'utils/uuid.cc', 'utils/dynamic_bitset.cc', 'utils/hashers.cc', 'utils/on_internal_error.cc']
|
||||||
|
deps['test/boost/url_parse_test'] = ['utils/http.cc', 'test/boost/url_parse_test.cc', ]
|
||||||
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
|
deps['test/boost/murmur_hash_test'] = ['bytes.cc', 'utils/murmur_hash.cc', 'test/boost/murmur_hash_test.cc']
|
||||||
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'utils/labels.cc']
|
deps['test/boost/allocation_strategy_test'] = ['test/boost/allocation_strategy_test.cc', 'utils/logalloc.cc', 'utils/dynamic_bitset.cc', 'utils/labels.cc']
|
||||||
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
|
deps['test/boost/log_heap_test'] = ['test/boost/log_heap_test.cc']
|
||||||
@@ -2185,7 +2213,15 @@ if os.path.exists(kmipc_lib):
|
|||||||
user_cflags += f' -I{kmipc_dir}/include -DHAVE_KMIP'
|
user_cflags += f' -I{kmipc_dir}/include -DHAVE_KMIP'
|
||||||
|
|
||||||
def get_extra_cxxflags(mode, mode_config, cxx, debuginfo):
|
def get_extra_cxxflags(mode, mode_config, cxx, debuginfo):
|
||||||
cxxflags = []
|
cxxflags = [
|
||||||
|
# we need this flag for correct precompiled header handling in connection with ccache (or similar)
|
||||||
|
# `git` tools don't preserve timestamps, so when using ccache it might be possible to add pch to ccache
|
||||||
|
# and then later (after for example rebase) get `stdafx.hh` with different timestamp, but the same content.
|
||||||
|
# this will tell ccache to bring pch from its cache. Later on clang will check if timestamps match and complain.
|
||||||
|
# Adding `-fpch-validate-input-files-content` tells clang to check content of stdafx.hh if timestamps don't match.
|
||||||
|
# The flag seems to be present in gcc as well.
|
||||||
|
"" if args.disable_precompiled_header else '-fpch-validate-input-files-content'
|
||||||
|
]
|
||||||
|
|
||||||
optimization_level = mode_config['optimization-level']
|
optimization_level = mode_config['optimization-level']
|
||||||
cxxflags.append(f'-O{optimization_level}')
|
cxxflags.append(f'-O{optimization_level}')
|
||||||
@@ -2250,6 +2286,7 @@ def write_build_file(f,
|
|||||||
scylla_version,
|
scylla_version,
|
||||||
scylla_release,
|
scylla_release,
|
||||||
args):
|
args):
|
||||||
|
use_precompiled_header = not args.disable_precompiled_header
|
||||||
warnings = get_warning_options(args.cxx)
|
warnings = get_warning_options(args.cxx)
|
||||||
rustc_target = pick_rustc_target('wasm32-wasi', 'wasm32-wasip1')
|
rustc_target = pick_rustc_target('wasm32-wasi', 'wasm32-wasip1')
|
||||||
f.write(textwrap.dedent('''\
|
f.write(textwrap.dedent('''\
|
||||||
@@ -2356,7 +2393,10 @@ def write_build_file(f,
|
|||||||
|
|
||||||
for mode in build_modes:
|
for mode in build_modes:
|
||||||
modeval = modes[mode]
|
modeval = modes[mode]
|
||||||
|
seastar_lib_ext = 'so' if modeval['build_seastar_shared_libs'] else 'a'
|
||||||
|
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
|
||||||
|
seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
|
||||||
|
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
|
||||||
fmt_lib = 'fmt'
|
fmt_lib = 'fmt'
|
||||||
f.write(textwrap.dedent('''\
|
f.write(textwrap.dedent('''\
|
||||||
cxx_ld_flags_{mode} = {cxx_ld_flags}
|
cxx_ld_flags_{mode} = {cxx_ld_flags}
|
||||||
@@ -2369,6 +2409,14 @@ def write_build_file(f,
|
|||||||
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
|
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in
|
||||||
description = CXX $out
|
description = CXX $out
|
||||||
depfile = $out.d
|
depfile = $out.d
|
||||||
|
rule cxx_build_precompiled_header.{mode}
|
||||||
|
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in -Winvalid-pch -fpch-instantiate-templates -Xclang -emit-pch -DSCYLLA_USE_PRECOMPILED_HEADER
|
||||||
|
description = CXX-PRECOMPILED-HEADER $out
|
||||||
|
depfile = $out.d
|
||||||
|
rule cxx_with_pch.{mode}
|
||||||
|
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags_{mode} $cxxflags $obj_cxxflags -c -o $out $in -Winvalid-pch -Xclang -include-pch -Xclang $builddir/{mode}/stdafx.hh.pch
|
||||||
|
description = CXX $out
|
||||||
|
depfile = $out.d
|
||||||
rule link.{mode}
|
rule link.{mode}
|
||||||
command = $cxx $ld_flags_{mode} $ldflags -o $out $in $libs $libs_{mode}
|
command = $cxx $ld_flags_{mode} $ldflags -o $out $in $libs $libs_{mode}
|
||||||
description = LINK $out
|
description = LINK $out
|
||||||
@@ -2402,7 +2450,7 @@ def write_build_file(f,
|
|||||||
$builddir/{mode}/gen/${{stem}}Parser.cpp
|
$builddir/{mode}/gen/${{stem}}Parser.cpp
|
||||||
description = ANTLR3 $in
|
description = ANTLR3 $in
|
||||||
rule checkhh.{mode}
|
rule checkhh.{mode}
|
||||||
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc
|
command = $cxx -MD -MT $out -MF $out.d {seastar_cflags} $cxxflags $cxxflags_{mode} $obj_cxxflags --include $in -c -o $out $builddir/{mode}/gen/empty.cc -USCYLLA_USE_PRECOMPILED_HEADER
|
||||||
description = CHECKHH $in
|
description = CHECKHH $in
|
||||||
depfile = $out.d
|
depfile = $out.d
|
||||||
rule test.{mode}
|
rule test.{mode}
|
||||||
@@ -2416,10 +2464,11 @@ def write_build_file(f,
|
|||||||
description = RUST_LIB $out
|
description = RUST_LIB $out
|
||||||
''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=args.test_repeat, test_timeout=args.test_timeout, **modeval))
|
''').format(mode=mode, antlr3_exec=args.antlr3_exec, fmt_lib=fmt_lib, test_repeat=args.test_repeat, test_timeout=args.test_timeout, **modeval))
|
||||||
f.write(
|
f.write(
|
||||||
'build {mode}-build: phony {artifacts} {wasms}\n'.format(
|
'build {mode}-build: phony {artifacts} {wasms} {vector_search_validator_bins}\n'.format(
|
||||||
mode=mode,
|
mode=mode,
|
||||||
artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms)]),
|
artifacts=str.join(' ', ['$builddir/' + mode + '/' + x for x in sorted(build_artifacts - wasms - vector_search_validator_bins)]),
|
||||||
wasms = str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & wasms)]),
|
wasms = str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & wasms)]),
|
||||||
|
vector_search_validator_bins=str.join(' ', ['$builddir/' + x for x in sorted(build_artifacts & vector_search_validator_bins)]),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
if profile_recipe := modes[mode].get('profile_recipe'):
|
if profile_recipe := modes[mode].get('profile_recipe'):
|
||||||
@@ -2428,6 +2477,7 @@ def write_build_file(f,
|
|||||||
include_dist_target = f'dist-{mode}' if args.enable_dist is None or args.enable_dist else ''
|
include_dist_target = f'dist-{mode}' if args.enable_dist is None or args.enable_dist else ''
|
||||||
f.write(f'build {mode}: phony {include_cxx_target} {include_dist_target}\n')
|
f.write(f'build {mode}: phony {include_cxx_target} {include_dist_target}\n')
|
||||||
compiles = {}
|
compiles = {}
|
||||||
|
compiles_with_pch = set()
|
||||||
swaggers = set()
|
swaggers = set()
|
||||||
serializers = {}
|
serializers = {}
|
||||||
ragels = {}
|
ragels = {}
|
||||||
@@ -2442,16 +2492,16 @@ def write_build_file(f,
|
|||||||
# object code. And we enable LTO when linking the main Scylla executable, while disable
|
# object code. And we enable LTO when linking the main Scylla executable, while disable
|
||||||
# it when linking anything else.
|
# it when linking anything else.
|
||||||
|
|
||||||
seastar_lib_ext = 'so' if modeval['build_seastar_shared_libs'] else 'a'
|
|
||||||
for binary in sorted(build_artifacts):
|
for binary in sorted(build_artifacts):
|
||||||
if modeval['is_profile'] and binary != "scylla":
|
if modeval['is_profile'] and binary != "scylla":
|
||||||
# Just to avoid clutter in build.ninja
|
# Just to avoid clutter in build.ninja
|
||||||
continue
|
continue
|
||||||
profile_dep = modes[mode].get('profile_target', "")
|
profile_dep = modes[mode].get('profile_target', "")
|
||||||
|
|
||||||
if binary in other or binary in wasms:
|
if binary in other or binary in wasms or binary in vector_search_validator_bins:
|
||||||
continue
|
continue
|
||||||
srcs = deps[binary]
|
srcs = deps[binary]
|
||||||
|
# 'scylla'
|
||||||
objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
|
objs = ['$builddir/' + mode + '/' + src.replace('.cc', '.o')
|
||||||
for src in srcs
|
for src in srcs
|
||||||
if src.endswith('.cc')]
|
if src.endswith('.cc')]
|
||||||
@@ -2487,9 +2537,6 @@ def write_build_file(f,
|
|||||||
continue
|
continue
|
||||||
|
|
||||||
do_lto = modes[mode]['has_lto'] and binary in lto_binaries
|
do_lto = modes[mode]['has_lto'] and binary in lto_binaries
|
||||||
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
|
|
||||||
seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
|
|
||||||
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
|
|
||||||
seastar_testing_libs = f'$seastar_testing_libs_{mode}'
|
seastar_testing_libs = f'$seastar_testing_libs_{mode}'
|
||||||
|
|
||||||
local_libs = f'$seastar_libs_{mode} $libs'
|
local_libs = f'$seastar_libs_{mode} $libs'
|
||||||
@@ -2499,6 +2546,7 @@ def write_build_file(f,
|
|||||||
local_libs += ' -flto=thin -ffat-lto-objects'
|
local_libs += ' -flto=thin -ffat-lto-objects'
|
||||||
else:
|
else:
|
||||||
local_libs += ' -fno-lto'
|
local_libs += ' -fno-lto'
|
||||||
|
use_pch = use_precompiled_header and binary == 'scylla'
|
||||||
if binary in tests:
|
if binary in tests:
|
||||||
if binary in pure_boost_tests:
|
if binary in pure_boost_tests:
|
||||||
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
|
local_libs += ' ' + maybe_static(args.staticboost, '-lboost_unit_test_framework')
|
||||||
@@ -2527,6 +2575,8 @@ def write_build_file(f,
|
|||||||
if src.endswith('.cc'):
|
if src.endswith('.cc'):
|
||||||
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
|
obj = '$builddir/' + mode + '/' + src.replace('.cc', '.o')
|
||||||
compiles[obj] = src
|
compiles[obj] = src
|
||||||
|
if use_pch:
|
||||||
|
compiles_with_pch.add(obj)
|
||||||
elif src.endswith('.idl.hh'):
|
elif src.endswith('.idl.hh'):
|
||||||
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
|
hh = '$builddir/' + mode + '/gen/' + src.replace('.idl.hh', '.dist.hh')
|
||||||
serializers[hh] = src
|
serializers[hh] = src
|
||||||
@@ -2559,10 +2609,11 @@ def write_build_file(f,
|
|||||||
)
|
)
|
||||||
|
|
||||||
f.write(
|
f.write(
|
||||||
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms}\n'.format(
|
'build {mode}-test: test.{mode} {test_executables} $builddir/{mode}/scylla {wasms} {vector_search_validator_bins} \n'.format(
|
||||||
mode=mode,
|
mode=mode,
|
||||||
test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in sorted(tests)]),
|
test_executables=' '.join(['$builddir/{}/{}'.format(mode, binary) for binary in sorted(tests)]),
|
||||||
wasms=' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]),
|
wasms=' '.join([f'$builddir/{binary}' for binary in sorted(wasms)]),
|
||||||
|
vector_search_validator_bins=' '.join([f'$builddir/{binary}' for binary in sorted(vector_search_validator_bins)]),
|
||||||
)
|
)
|
||||||
)
|
)
|
||||||
f.write(
|
f.write(
|
||||||
@@ -2605,7 +2656,9 @@ def write_build_file(f,
|
|||||||
src = compiles[obj]
|
src = compiles[obj]
|
||||||
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
|
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
|
||||||
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
|
abseil_dep = ' '.join(f'$builddir/{mode}/abseil/{lib}' for lib in abseil_libs)
|
||||||
f.write(f'build {obj}: cxx.{mode} {src} | {profile_dep} || {seastar_dep} {abseil_dep} {gen_headers_dep}\n')
|
pch_dep = f'$builddir/{mode}/stdafx.hh.pch' if obj in compiles_with_pch else ''
|
||||||
|
cxx_cmd = 'cxx_with_pch' if obj in compiles_with_pch else 'cxx'
|
||||||
|
f.write(f'build {obj}: {cxx_cmd}.{mode} {src} | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
|
||||||
if src in modeval['per_src_extra_cxxflags']:
|
if src in modeval['per_src_extra_cxxflags']:
|
||||||
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=modeval["per_src_extra_cxxflags"][src], **modeval))
|
f.write(' cxxflags = {seastar_cflags} $cxxflags $cxxflags_{mode} {extra_cxxflags}\n'.format(mode=mode, extra_cxxflags=modeval["per_src_extra_cxxflags"][src], **modeval))
|
||||||
for swagger in swaggers:
|
for swagger in swaggers:
|
||||||
@@ -2666,6 +2719,8 @@ def write_build_file(f,
|
|||||||
f.write(' target = {lib}\n'.format(**locals()))
|
f.write(' target = {lib}\n'.format(**locals()))
|
||||||
f.write(' profile_dep = {profile_dep}\n'.format(**locals()))
|
f.write(' profile_dep = {profile_dep}\n'.format(**locals()))
|
||||||
|
|
||||||
|
f.write(f'build $builddir/{mode}/stdafx.hh.pch: cxx_build_precompiled_header.{mode} stdafx.hh | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
|
||||||
|
|
||||||
f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n'
|
f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n'
|
||||||
.format(**locals()))
|
.format(**locals()))
|
||||||
f.write(' pool = submodule_pool\n')
|
f.write(' pool = submodule_pool\n')
|
||||||
@@ -2729,6 +2784,19 @@ def write_build_file(f,
|
|||||||
'build compiler-training: phony {}\n'.format(' '.join(['{mode}-compiler-training'.format(mode=mode) for mode in default_modes]))
|
'build compiler-training: phony {}\n'.format(' '.join(['{mode}-compiler-training'.format(mode=mode) for mode in default_modes]))
|
||||||
)
|
)
|
||||||
|
|
||||||
|
f.write(textwrap.dedent(f'''\
|
||||||
|
rule build-vector-search-validator
|
||||||
|
command = test/vector_search_validator/build-validator $builddir
|
||||||
|
rule build-vector-store
|
||||||
|
command = test/vector_search_validator/build-vector-store $builddir
|
||||||
|
'''))
|
||||||
|
f.write(
|
||||||
|
'build $builddir/{vector_search_validator_bin}: build-vector-search-validator {}\n'.format(' '.join([dep for dep in sorted(vector_search_validator_deps)]), vector_search_validator_bin=vector_search_validator_bin)
|
||||||
|
)
|
||||||
|
f.write(
|
||||||
|
'build $builddir/{vector_store_bin}: build-vector-store {}\n'.format(' '.join([dep for dep in sorted(vector_store_deps)]), vector_store_bin=vector_store_bin)
|
||||||
|
)
|
||||||
|
|
||||||
f.write(textwrap.dedent(f'''\
|
f.write(textwrap.dedent(f'''\
|
||||||
build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-{scylla_version}-{scylla_release}.{arch}.tar.gz' for mode in default_modes])}
|
build dist-unified-tar: phony {' '.join([f'$builddir/{mode}/dist/tar/{scylla_product}-unified-{scylla_version}-{scylla_release}.{arch}.tar.gz' for mode in default_modes])}
|
||||||
build dist-unified: phony dist-unified-tar
|
build dist-unified: phony dist-unified-tar
|
||||||
@@ -2942,7 +3010,7 @@ def configure_using_cmake(args):
|
|||||||
'CMAKE_DEFAULT_CONFIGS': selected_configs,
|
'CMAKE_DEFAULT_CONFIGS': selected_configs,
|
||||||
'CMAKE_C_COMPILER': args.cc,
|
'CMAKE_C_COMPILER': args.cc,
|
||||||
'CMAKE_CXX_COMPILER': args.cxx,
|
'CMAKE_CXX_COMPILER': args.cxx,
|
||||||
'CMAKE_CXX_FLAGS': args.user_cflags,
|
'CMAKE_CXX_FLAGS': args.user_cflags + ("" if args.disable_precompiled_header else " -fpch-validate-input-files-content"),
|
||||||
'CMAKE_EXE_LINKER_FLAGS': args.user_ldflags,
|
'CMAKE_EXE_LINKER_FLAGS': args.user_ldflags,
|
||||||
'CMAKE_EXPORT_COMPILE_COMMANDS': 'ON',
|
'CMAKE_EXPORT_COMPILE_COMMANDS': 'ON',
|
||||||
'Scylla_CHECK_HEADERS': 'ON',
|
'Scylla_CHECK_HEADERS': 'ON',
|
||||||
@@ -2951,6 +3019,7 @@ def configure_using_cmake(args):
|
|||||||
'Scylla_TEST_REPEAT': args.test_repeat,
|
'Scylla_TEST_REPEAT': args.test_repeat,
|
||||||
'Scylla_ENABLE_LTO': 'ON' if args.lto else 'OFF',
|
'Scylla_ENABLE_LTO': 'ON' if args.lto else 'OFF',
|
||||||
'Scylla_WITH_DEBUG_INFO' : 'ON' if args.debuginfo else 'OFF',
|
'Scylla_WITH_DEBUG_INFO' : 'ON' if args.debuginfo else 'OFF',
|
||||||
|
'Scylla_USE_PRECOMPILED_HEADER': 'OFF' if args.disable_precompiled_header else 'ON',
|
||||||
}
|
}
|
||||||
if args.date_stamp:
|
if args.date_stamp:
|
||||||
settings['Scylla_DATE_STAMP'] = args.date_stamp
|
settings['Scylla_DATE_STAMP'] = args.date_stamp
|
||||||
|
|||||||
@@ -138,5 +138,8 @@ target_link_libraries(cql3
|
|||||||
lang
|
lang
|
||||||
transport)
|
transport)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(cql3 REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers cql3
|
check_headers(check-headers cql3
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
16
cql3/Cql.g
16
cql3/Cql.g
@@ -575,6 +575,15 @@ usingTimeoutServiceLevelClauseObjective[std::unique_ptr<cql3::attributes::raw>&
|
|||||||
| serviceLevel sl_name=serviceLevelOrRoleName { attrs->service_level = std::move(sl_name); }
|
| serviceLevel sl_name=serviceLevelOrRoleName { attrs->service_level = std::move(sl_name); }
|
||||||
;
|
;
|
||||||
|
|
||||||
|
usingTimeoutConcurrencyClause[std::unique_ptr<cql3::attributes::raw>& attrs]
|
||||||
|
: K_USING usingTimeoutConcurrencyClauseObjective[attrs] ( K_AND usingTimeoutConcurrencyClauseObjective[attrs] )*
|
||||||
|
;
|
||||||
|
|
||||||
|
usingTimeoutConcurrencyClauseObjective[std::unique_ptr<cql3::attributes::raw>& attrs]
|
||||||
|
: K_TIMEOUT to=term { attrs->timeout = std::move(to); }
|
||||||
|
| K_CONCURRENCY c=term { attrs->concurrency = std::move(c); }
|
||||||
|
;
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* UPDATE <CF>
|
* UPDATE <CF>
|
||||||
* USING TIMESTAMP <long>
|
* USING TIMESTAMP <long>
|
||||||
@@ -666,7 +675,7 @@ pruneMaterializedViewStatement returns [std::unique_ptr<raw::select_statement> e
|
|||||||
auto attrs = std::make_unique<cql3::attributes::raw>();
|
auto attrs = std::make_unique<cql3::attributes::raw>();
|
||||||
expression wclause = conjunction{};
|
expression wclause = conjunction{};
|
||||||
}
|
}
|
||||||
: K_PRUNE K_MATERIALIZED K_VIEW cf=columnFamilyName (K_WHERE w=whereClause { wclause = std::move(w); } )? ( usingClause[attrs] )?
|
: K_PRUNE K_MATERIALIZED K_VIEW cf=columnFamilyName (K_WHERE w=whereClause { wclause = std::move(w); } )? ( usingTimeoutConcurrencyClause[attrs] )?
|
||||||
{
|
{
|
||||||
auto params = make_lw_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, statement_subtype, bypass_cache);
|
auto params = make_lw_shared<raw::select_statement::parameters>(std::move(orderings), is_distinct, allow_filtering, statement_subtype, bypass_cache);
|
||||||
return std::make_unique<raw::select_statement>(std::move(cf), std::move(params),
|
return std::make_unique<raw::select_statement>(std::move(cf), std::move(params),
|
||||||
@@ -1560,6 +1569,10 @@ serviceLevelOrRoleName returns [sstring name]
|
|||||||
| t=QUOTED_NAME { $name = sstring($t.text); }
|
| t=QUOTED_NAME { $name = sstring($t.text); }
|
||||||
| k=unreserved_keyword { $name = k;
|
| k=unreserved_keyword { $name = k;
|
||||||
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
|
std::transform($name.begin(), $name.end(), $name.begin(), ::tolower);}
|
||||||
|
// The literal `default` will not be parsed by any of the previous
|
||||||
|
// rules, so we need to cover it manually. Needed by CREATE SERVICE
|
||||||
|
// LEVEL and ATTACH SERVICE LEVEL.
|
||||||
|
| t=K_DEFAULT { $name = sstring("default"); }
|
||||||
| QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
|
| QMARK {add_recognition_error("Bind variables cannot be used for service levels or role names");}
|
||||||
;
|
;
|
||||||
|
|
||||||
@@ -2366,6 +2379,7 @@ K_LIKE: L I K E;
|
|||||||
|
|
||||||
K_TIMEOUT: T I M E O U T;
|
K_TIMEOUT: T I M E O U T;
|
||||||
K_PRUNE: P R U N E;
|
K_PRUNE: P R U N E;
|
||||||
|
K_CONCURRENCY: C O N C U R R E N C Y;
|
||||||
|
|
||||||
K_EXECUTE: E X E C U T E;
|
K_EXECUTE: E X E C U T E;
|
||||||
|
|
||||||
|
|||||||
@@ -20,19 +20,21 @@
|
|||||||
namespace cql3 {
|
namespace cql3 {
|
||||||
|
|
||||||
std::unique_ptr<attributes> attributes::none() {
|
std::unique_ptr<attributes> attributes::none() {
|
||||||
return std::unique_ptr<attributes>{new attributes{{}, {}, {}, {}}};
|
return std::unique_ptr<attributes>{new attributes{{}, {}, {}, {}, {}}};
|
||||||
}
|
}
|
||||||
|
|
||||||
attributes::attributes(std::optional<cql3::expr::expression>&& timestamp,
|
attributes::attributes(std::optional<cql3::expr::expression>&& timestamp,
|
||||||
std::optional<cql3::expr::expression>&& time_to_live,
|
std::optional<cql3::expr::expression>&& time_to_live,
|
||||||
std::optional<cql3::expr::expression>&& timeout,
|
std::optional<cql3::expr::expression>&& timeout,
|
||||||
std::optional<sstring> service_level)
|
std::optional<sstring> service_level,
|
||||||
|
std::optional<cql3::expr::expression>&& concurrency)
|
||||||
: _timestamp_unset_guard(timestamp)
|
: _timestamp_unset_guard(timestamp)
|
||||||
, _timestamp{std::move(timestamp)}
|
, _timestamp{std::move(timestamp)}
|
||||||
, _time_to_live_unset_guard(time_to_live)
|
, _time_to_live_unset_guard(time_to_live)
|
||||||
, _time_to_live{std::move(time_to_live)}
|
, _time_to_live{std::move(time_to_live)}
|
||||||
, _timeout{std::move(timeout)}
|
, _timeout{std::move(timeout)}
|
||||||
, _service_level(std::move(service_level))
|
, _service_level(std::move(service_level))
|
||||||
|
, _concurrency{std::move(concurrency)}
|
||||||
{ }
|
{ }
|
||||||
|
|
||||||
bool attributes::is_timestamp_set() const {
|
bool attributes::is_timestamp_set() const {
|
||||||
@@ -51,6 +53,10 @@ bool attributes::is_service_level_set() const {
|
|||||||
return bool(_service_level);
|
return bool(_service_level);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
bool attributes::is_concurrency_set() const {
|
||||||
|
return bool(_concurrency);
|
||||||
|
}
|
||||||
|
|
||||||
int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
|
int64_t attributes::get_timestamp(int64_t now, const query_options& options) {
|
||||||
if (!_timestamp.has_value() || _timestamp_unset_guard.is_unset(options)) {
|
if (!_timestamp.has_value() || _timestamp_unset_guard.is_unset(options)) {
|
||||||
return now;
|
return now;
|
||||||
@@ -123,6 +129,27 @@ qos::service_level_options attributes::get_service_level(qos::service_level_cont
|
|||||||
return sl_controller.get_service_level(sl_name).slo;
|
return sl_controller.get_service_level(sl_name).slo;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
std::optional<int32_t> attributes::get_concurrency(const query_options& options) const {
|
||||||
|
if (!_concurrency.has_value()) {
|
||||||
|
return std::nullopt;
|
||||||
|
}
|
||||||
|
|
||||||
|
cql3::raw_value concurrency_raw = expr::evaluate(*_concurrency, options);
|
||||||
|
if (concurrency_raw.is_null()) {
|
||||||
|
throw exceptions::invalid_request_exception("Invalid null value of concurrency");
|
||||||
|
}
|
||||||
|
int32_t concurrency;
|
||||||
|
try {
|
||||||
|
concurrency = concurrency_raw.view().validate_and_deserialize<int32_t>(*int32_type);
|
||||||
|
} catch (marshal_exception& e) {
|
||||||
|
throw exceptions::invalid_request_exception("Invalid concurrency value");
|
||||||
|
}
|
||||||
|
if (concurrency <= 0) {
|
||||||
|
throw exceptions::invalid_request_exception("Concurrency must be a positive integer");
|
||||||
|
}
|
||||||
|
return concurrency;
|
||||||
|
}
|
||||||
|
|
||||||
void attributes::fill_prepare_context(prepare_context& ctx) {
|
void attributes::fill_prepare_context(prepare_context& ctx) {
|
||||||
if (_timestamp.has_value()) {
|
if (_timestamp.has_value()) {
|
||||||
expr::fill_prepare_context(*_timestamp, ctx);
|
expr::fill_prepare_context(*_timestamp, ctx);
|
||||||
@@ -133,10 +160,13 @@ void attributes::fill_prepare_context(prepare_context& ctx) {
|
|||||||
if (_timeout.has_value()) {
|
if (_timeout.has_value()) {
|
||||||
expr::fill_prepare_context(*_timeout, ctx);
|
expr::fill_prepare_context(*_timeout, ctx);
|
||||||
}
|
}
|
||||||
|
if (_concurrency.has_value()) {
|
||||||
|
expr::fill_prepare_context(*_concurrency, ctx);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const {
|
std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const {
|
||||||
std::optional<expr::expression> ts, ttl, to;
|
std::optional<expr::expression> ts, ttl, to, conc;
|
||||||
|
|
||||||
if (timestamp.has_value()) {
|
if (timestamp.has_value()) {
|
||||||
ts = prepare_expression(*timestamp, db, ks_name, nullptr, timestamp_receiver(ks_name, cf_name));
|
ts = prepare_expression(*timestamp, db, ks_name, nullptr, timestamp_receiver(ks_name, cf_name));
|
||||||
@@ -153,7 +183,12 @@ std::unique_ptr<attributes> attributes::raw::prepare(data_dictionary::database d
|
|||||||
verify_no_aggregate_functions(*timeout, "USING clause");
|
verify_no_aggregate_functions(*timeout, "USING clause");
|
||||||
}
|
}
|
||||||
|
|
||||||
return std::unique_ptr<attributes>{new attributes{std::move(ts), std::move(ttl), std::move(to), std::move(service_level)}};
|
if (concurrency.has_value()) {
|
||||||
|
conc = prepare_expression(*concurrency, db, ks_name, nullptr, concurrency_receiver(ks_name, cf_name));
|
||||||
|
verify_no_aggregate_functions(*concurrency, "USING clause");
|
||||||
|
}
|
||||||
|
|
||||||
|
return std::unique_ptr<attributes>{new attributes{std::move(ts), std::move(ttl), std::move(to), std::move(service_level), std::move(conc)}};
|
||||||
}
|
}
|
||||||
|
|
||||||
lw_shared_ptr<column_specification> attributes::raw::timestamp_receiver(const sstring& ks_name, const sstring& cf_name) const {
|
lw_shared_ptr<column_specification> attributes::raw::timestamp_receiver(const sstring& ks_name, const sstring& cf_name) const {
|
||||||
@@ -168,4 +203,8 @@ lw_shared_ptr<column_specification> attributes::raw::timeout_receiver(const sstr
|
|||||||
return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[timeout]", true), duration_type);
|
return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[timeout]", true), duration_type);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
lw_shared_ptr<column_specification> attributes::raw::concurrency_receiver(const sstring& ks_name, const sstring& cf_name) const {
|
||||||
|
return make_lw_shared<column_specification>(ks_name, cf_name, ::make_shared<column_identifier>("[concurrency]", true), data_type_for<int32_t>());
|
||||||
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -36,13 +36,15 @@ private:
|
|||||||
std::optional<cql3::expr::expression> _time_to_live;
|
std::optional<cql3::expr::expression> _time_to_live;
|
||||||
std::optional<cql3::expr::expression> _timeout;
|
std::optional<cql3::expr::expression> _timeout;
|
||||||
std::optional<sstring> _service_level;
|
std::optional<sstring> _service_level;
|
||||||
|
std::optional<cql3::expr::expression> _concurrency;
|
||||||
public:
|
public:
|
||||||
static std::unique_ptr<attributes> none();
|
static std::unique_ptr<attributes> none();
|
||||||
private:
|
private:
|
||||||
attributes(std::optional<cql3::expr::expression>&& timestamp,
|
attributes(std::optional<cql3::expr::expression>&& timestamp,
|
||||||
std::optional<cql3::expr::expression>&& time_to_live,
|
std::optional<cql3::expr::expression>&& time_to_live,
|
||||||
std::optional<cql3::expr::expression>&& timeout,
|
std::optional<cql3::expr::expression>&& timeout,
|
||||||
std::optional<sstring> service_level);
|
std::optional<sstring> service_level,
|
||||||
|
std::optional<cql3::expr::expression>&& concurrency);
|
||||||
public:
|
public:
|
||||||
bool is_timestamp_set() const;
|
bool is_timestamp_set() const;
|
||||||
|
|
||||||
@@ -52,6 +54,8 @@ public:
|
|||||||
|
|
||||||
bool is_service_level_set() const;
|
bool is_service_level_set() const;
|
||||||
|
|
||||||
|
bool is_concurrency_set() const;
|
||||||
|
|
||||||
int64_t get_timestamp(int64_t now, const query_options& options);
|
int64_t get_timestamp(int64_t now, const query_options& options);
|
||||||
|
|
||||||
std::optional<int32_t> get_time_to_live(const query_options& options);
|
std::optional<int32_t> get_time_to_live(const query_options& options);
|
||||||
@@ -60,6 +64,8 @@ public:
|
|||||||
|
|
||||||
qos::service_level_options get_service_level(qos::service_level_controller& sl_controller) const;
|
qos::service_level_options get_service_level(qos::service_level_controller& sl_controller) const;
|
||||||
|
|
||||||
|
std::optional<int32_t> get_concurrency(const query_options& options) const;
|
||||||
|
|
||||||
void fill_prepare_context(prepare_context& ctx);
|
void fill_prepare_context(prepare_context& ctx);
|
||||||
|
|
||||||
class raw final {
|
class raw final {
|
||||||
@@ -68,6 +74,7 @@ public:
|
|||||||
std::optional<cql3::expr::expression> time_to_live;
|
std::optional<cql3::expr::expression> time_to_live;
|
||||||
std::optional<cql3::expr::expression> timeout;
|
std::optional<cql3::expr::expression> timeout;
|
||||||
std::optional<sstring> service_level;
|
std::optional<sstring> service_level;
|
||||||
|
std::optional<cql3::expr::expression> concurrency;
|
||||||
|
|
||||||
std::unique_ptr<attributes> prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const;
|
std::unique_ptr<attributes> prepare(data_dictionary::database db, const sstring& ks_name, const sstring& cf_name) const;
|
||||||
private:
|
private:
|
||||||
@@ -76,6 +83,8 @@ public:
|
|||||||
lw_shared_ptr<column_specification> time_to_live_receiver(const sstring& ks_name, const sstring& cf_name) const;
|
lw_shared_ptr<column_specification> time_to_live_receiver(const sstring& ks_name, const sstring& cf_name) const;
|
||||||
|
|
||||||
lw_shared_ptr<column_specification> timeout_receiver(const sstring& ks_name, const sstring& cf_name) const;
|
lw_shared_ptr<column_specification> timeout_receiver(const sstring& ks_name, const sstring& cf_name) const;
|
||||||
|
|
||||||
|
lw_shared_ptr<column_specification> concurrency_receiver(const sstring& ks_name, const sstring& cf_name) const;
|
||||||
};
|
};
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -37,6 +37,12 @@ future<::shared_ptr<cql_transport::messages::result_message>>
|
|||||||
alter_service_level_statement::execute(query_processor& qp,
|
alter_service_level_statement::execute(query_processor& qp,
|
||||||
service::query_state &state,
|
service::query_state &state,
|
||||||
const query_options &, std::optional<service::group0_guard> guard) const {
|
const query_options &, std::optional<service::group0_guard> guard) const {
|
||||||
|
if (_service_level == qos::service_level_controller::default_service_level_name) {
|
||||||
|
sstring reason = seastar::format("The default service level, {}, cannot be altered",
|
||||||
|
qos::service_level_controller::default_service_level_name);
|
||||||
|
throw exceptions::invalid_request_exception(std::move(reason));
|
||||||
|
}
|
||||||
|
|
||||||
service::group0_batch mc{std::move(guard)};
|
service::group0_batch mc{std::move(guard)};
|
||||||
validate_shares_option(qp, _slo);
|
validate_shares_option(qp, _slo);
|
||||||
qos::service_level& sl = state.get_service_level_controller().get_service_level(_service_level);
|
qos::service_level& sl = state.get_service_level_controller().get_service_level(_service_level);
|
||||||
|
|||||||
@@ -186,10 +186,6 @@ void alter_table_statement::add_column(const query_options&, const schema& schem
|
|||||||
if (!schema.is_compound()) {
|
if (!schema.is_compound()) {
|
||||||
throw exceptions::invalid_request_exception("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
|
throw exceptions::invalid_request_exception("Cannot use non-frozen collections with a non-composite PRIMARY KEY");
|
||||||
}
|
}
|
||||||
if (schema.is_super()) {
|
|
||||||
throw exceptions::invalid_request_exception("Cannot use non-frozen collections with super column families");
|
|
||||||
}
|
|
||||||
|
|
||||||
|
|
||||||
// If there used to be a non-frozen collection column with the same name (that has been dropped),
|
// If there used to be a non-frozen collection column with the same name (that has been dropped),
|
||||||
// we could still have some data using the old type, and so we can't allow adding a collection
|
// we could still have some data using the old type, and so we can't allow adding a collection
|
||||||
@@ -422,7 +418,14 @@ std::pair<schema_ptr, std::vector<view_ptr>> alter_table_statement::prepare_sche
|
|||||||
throw exceptions::invalid_request_exception(format("The synchronous_updates option is only applicable to materialized views, not to base tables"));
|
throw exceptions::invalid_request_exception(format("The synchronous_updates option is only applicable to materialized views, not to base tables"));
|
||||||
}
|
}
|
||||||
|
|
||||||
_properties->apply_to_builder(cfm, std::move(schema_extensions), db, keyspace());
|
if (is_cdc_log_table) {
|
||||||
|
auto gc_opts = _properties->get_tombstone_gc_options(schema_extensions);
|
||||||
|
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
|
||||||
|
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on CDC log tables.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
_properties->apply_to_builder(cfm, std::move(schema_extensions), db, keyspace(), !is_cdc_log_table);
|
||||||
}
|
}
|
||||||
break;
|
break;
|
||||||
|
|
||||||
|
|||||||
@@ -55,8 +55,29 @@ view_ptr alter_view_statement::prepare_view(data_dictionary::database db) const
|
|||||||
auto schema_extensions = _properties->make_schema_extensions(db.extensions());
|
auto schema_extensions = _properties->make_schema_extensions(db.extensions());
|
||||||
_properties->validate(db, keyspace(), schema_extensions);
|
_properties->validate(db, keyspace(), schema_extensions);
|
||||||
|
|
||||||
|
bool is_colocated = [&] {
|
||||||
|
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
auto base_schema = db.find_schema(schema->view_info()->base_id());
|
||||||
|
if (!base_schema) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return std::ranges::equal(
|
||||||
|
schema->partition_key_columns(),
|
||||||
|
base_schema->partition_key_columns(),
|
||||||
|
[](const column_definition& a, const column_definition& b) { return a.name() == b.name(); });
|
||||||
|
}();
|
||||||
|
|
||||||
|
if (is_colocated) {
|
||||||
|
auto gc_opts = _properties->get_tombstone_gc_options(schema_extensions);
|
||||||
|
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
|
||||||
|
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on co-located materialized view tables.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
auto builder = schema_builder(schema);
|
auto builder = schema_builder(schema);
|
||||||
_properties->apply_to_builder(builder, std::move(schema_extensions), db, keyspace());
|
_properties->apply_to_builder(builder, std::move(schema_extensions), db, keyspace(), !is_colocated);
|
||||||
|
|
||||||
if (builder.get_gc_grace_seconds() == 0) {
|
if (builder.get_gc_grace_seconds() == 0) {
|
||||||
throw exceptions::invalid_request_exception(
|
throw exceptions::invalid_request_exception(
|
||||||
|
|||||||
@@ -43,6 +43,14 @@ attach_service_level_statement::execute(query_processor& qp,
|
|||||||
service::query_state &state,
|
service::query_state &state,
|
||||||
const query_options &,
|
const query_options &,
|
||||||
std::optional<service::group0_guard> guard) const {
|
std::optional<service::group0_guard> guard) const {
|
||||||
|
if (_service_level == qos::service_level_controller::default_service_level_name) {
|
||||||
|
sstring reason = seastar::format("The default service level, {}, cannot be "
|
||||||
|
"attached to a role. If you want to detach an attached service level, "
|
||||||
|
"use the DETACH SERVICE LEVEL statement",
|
||||||
|
qos::service_level_controller::default_service_level_name);
|
||||||
|
throw exceptions::invalid_request_exception(std::move(reason));
|
||||||
|
}
|
||||||
|
|
||||||
auto sli = co_await state.get_service_level_controller().get_distributed_service_level(_service_level);
|
auto sli = co_await state.get_service_level_controller().get_distributed_service_level(_service_level);
|
||||||
if (sli.empty()) {
|
if (sli.empty()) {
|
||||||
throw qos::nonexistant_service_level_exception(_service_level);
|
throw qos::nonexistant_service_level_exception(_service_level);
|
||||||
|
|||||||
@@ -293,7 +293,7 @@ std::optional<db::tablet_options::map_type> cf_prop_defs::get_tablet_options() c
|
|||||||
return std::nullopt;
|
return std::nullopt;
|
||||||
}
|
}
|
||||||
|
|
||||||
void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name) const {
|
void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name, bool supports_repair) const {
|
||||||
if (has_property(KW_COMMENT)) {
|
if (has_property(KW_COMMENT)) {
|
||||||
builder.set_comment(get_string(KW_COMMENT, ""));
|
builder.set_comment(get_string(KW_COMMENT, ""));
|
||||||
}
|
}
|
||||||
@@ -379,7 +379,7 @@ void cf_prop_defs::apply_to_builder(schema_builder& builder, schema::extensions_
|
|||||||
}
|
}
|
||||||
// Set default tombstone_gc mode.
|
// Set default tombstone_gc mode.
|
||||||
if (!schema_extensions.contains(tombstone_gc_extension::NAME)) {
|
if (!schema_extensions.contains(tombstone_gc_extension::NAME)) {
|
||||||
auto ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, ks_name));
|
auto ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, ks_name, supports_repair));
|
||||||
schema_extensions.emplace(tombstone_gc_extension::NAME, std::move(ext));
|
schema_extensions.emplace(tombstone_gc_extension::NAME, std::move(ext));
|
||||||
}
|
}
|
||||||
builder.set_extensions(std::move(schema_extensions));
|
builder.set_extensions(std::move(schema_extensions));
|
||||||
|
|||||||
@@ -110,7 +110,7 @@ public:
|
|||||||
bool get_synchronous_updates_flag() const;
|
bool get_synchronous_updates_flag() const;
|
||||||
std::optional<db::tablet_options::map_type> get_tablet_options() const;
|
std::optional<db::tablet_options::map_type> get_tablet_options() const;
|
||||||
|
|
||||||
void apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name) const;
|
void apply_to_builder(schema_builder& builder, schema::extensions_map schema_extensions, const data_dictionary::database& db, sstring ks_name, bool supports_repair) const;
|
||||||
void validate_minimum_int(const sstring& field, int32_t minimum_value, int32_t default_value) const;
|
void validate_minimum_int(const sstring& field, int32_t minimum_value, int32_t default_value) const;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
|||||||
@@ -201,7 +201,14 @@ view_ptr create_index_statement::create_view_for_index(const schema_ptr schema,
|
|||||||
"";
|
"";
|
||||||
builder.with_view_info(schema, false, where_clause);
|
builder.with_view_info(schema, false, where_clause);
|
||||||
|
|
||||||
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, schema->ks_name()));
|
bool is_colocated = [&] {
|
||||||
|
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
return im.local();
|
||||||
|
}();
|
||||||
|
|
||||||
|
auto tombstone_gc_ext = seastar::make_shared<tombstone_gc_extension>(get_default_tombstone_gc_mode(db, schema->ks_name(), !is_colocated));
|
||||||
builder.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
|
builder.add_extension(tombstone_gc_extension::NAME, std::move(tombstone_gc_ext));
|
||||||
|
|
||||||
// A local secondary index should be backed by a *synchronous* view,
|
// A local secondary index should be backed by a *synchronous* view,
|
||||||
@@ -272,12 +279,16 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
|
|||||||
throw exceptions::invalid_request_exception(format("index names shouldn't be more than {:d} characters long (got \"{}\")", schema::NAME_LENGTH, _index_name.c_str()));
|
throw exceptions::invalid_request_exception(format("index names shouldn't be more than {:d} characters long (got \"{}\")", schema::NAME_LENGTH, _index_name.c_str()));
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Regular secondary indexes require rf-rack-validity.
|
||||||
|
// Custom indexes need to validate this property themselves, if they need it.
|
||||||
|
if (!_properties || !_properties->custom_class) {
|
||||||
try {
|
try {
|
||||||
db::view::validate_view_keyspace(db, keyspace());
|
db::view::validate_view_keyspace(db, keyspace());
|
||||||
} catch (const std::exception& e) {
|
} catch (const std::exception& e) {
|
||||||
// The type of the thrown exception is not specified, so we need to wrap it here.
|
// The type of the thrown exception is not specified, so we need to wrap it here.
|
||||||
throw exceptions::invalid_request_exception(e.what());
|
throw exceptions::invalid_request_exception(e.what());
|
||||||
}
|
}
|
||||||
|
}
|
||||||
|
|
||||||
validate_for_local_index(*schema);
|
validate_for_local_index(*schema);
|
||||||
|
|
||||||
@@ -292,7 +303,7 @@ std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_e
|
|||||||
throw exceptions::invalid_request_exception(format("Non-supported custom class \'{}\' provided", *(_properties->custom_class)));
|
throw exceptions::invalid_request_exception(format("Non-supported custom class \'{}\' provided", *(_properties->custom_class)));
|
||||||
}
|
}
|
||||||
auto custom_index = (*custom_index_factory)();
|
auto custom_index = (*custom_index_factory)();
|
||||||
custom_index->validate(*schema, *_properties, targets, db.features());
|
custom_index->validate(*schema, *_properties, targets, db.features(), db);
|
||||||
_properties->index_version = custom_index->index_version(*schema);
|
_properties->index_version = custom_index->index_version(*schema);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|||||||
@@ -45,6 +45,12 @@ create_service_level_statement::execute(query_processor& qp,
|
|||||||
throw exceptions::invalid_request_exception("Names starting with '$' are reserved for internal tenants. Use a different name.");
|
throw exceptions::invalid_request_exception("Names starting with '$' are reserved for internal tenants. Use a different name.");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (_service_level == qos::service_level_controller::default_service_level_name) {
|
||||||
|
sstring reason = seastar::format("The default service level, {}, already exists "
|
||||||
|
"and cannot be created", qos::service_level_controller::default_service_level_name);
|
||||||
|
throw exceptions::invalid_request_exception(std::move(reason));
|
||||||
|
}
|
||||||
|
|
||||||
service::group0_batch mc{std::move(guard)};
|
service::group0_batch mc{std::move(guard)};
|
||||||
validate_shares_option(qp, _slo);
|
validate_shares_option(qp, _slo);
|
||||||
|
|
||||||
|
|||||||
@@ -128,7 +128,7 @@ void create_table_statement::apply_properties_to(schema_builder& builder, const
|
|||||||
builder.set_compressor_params(db.get_config().sstable_compression_user_table_options());
|
builder.set_compressor_params(db.get_config().sstable_compression_user_table_options());
|
||||||
}
|
}
|
||||||
|
|
||||||
_properties->apply_to_builder(builder, _properties->make_schema_extensions(db.extensions()), db, keyspace());
|
_properties->apply_to_builder(builder, _properties->make_schema_extensions(db.extensions()), db, keyspace(), true);
|
||||||
}
|
}
|
||||||
|
|
||||||
void create_table_statement::add_column_metadata_from_aliases(schema_builder& builder, std::vector<bytes> aliases, const std::vector<data_type>& types, column_kind kind) const
|
void create_table_statement::add_column_metadata_from_aliases(schema_builder& builder, std::vector<bytes> aliases, const std::vector<data_type>& types, column_kind kind) const
|
||||||
|
|||||||
@@ -373,7 +373,30 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
|
|||||||
db::view::create_virtual_column(builder, def->name(), def->type);
|
db::view::create_virtual_column(builder, def->name(), def->type);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
_properties.properties()->apply_to_builder(builder, std::move(schema_extensions), db, keyspace());
|
|
||||||
|
bool is_colocated = [&] {
|
||||||
|
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
if (target_partition_keys.size() != schema->partition_key_columns().size()) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
for (size_t i = 0; i < target_partition_keys.size(); ++i) {
|
||||||
|
if (target_partition_keys[i] != &schema->partition_key_columns()[i]) {
|
||||||
|
return false;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return true;
|
||||||
|
}();
|
||||||
|
|
||||||
|
if (is_colocated) {
|
||||||
|
auto gc_opts = _properties.properties()->get_tombstone_gc_options(schema_extensions);
|
||||||
|
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
|
||||||
|
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on co-located materialized view tables.");
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
_properties.properties()->apply_to_builder(builder, std::move(schema_extensions), db, keyspace(), !is_colocated);
|
||||||
|
|
||||||
if (builder.default_time_to_live().count() > 0) {
|
if (builder.default_time_to_live().count() > 0) {
|
||||||
throw exceptions::invalid_request_exception(
|
throw exceptions::invalid_request_exception(
|
||||||
|
|||||||
@@ -34,6 +34,11 @@ drop_service_level_statement::execute(query_processor& qp,
|
|||||||
service::query_state &state,
|
service::query_state &state,
|
||||||
const query_options &,
|
const query_options &,
|
||||||
std::optional<service::group0_guard> guard) const {
|
std::optional<service::group0_guard> guard) const {
|
||||||
|
if (_service_level == qos::service_level_controller::default_service_level_name) {
|
||||||
|
sstring reason = seastar::format("The default service level, {}, cannot be dropped",
|
||||||
|
qos::service_level_controller::default_service_level_name);
|
||||||
|
throw exceptions::invalid_request_exception(std::move(reason));
|
||||||
|
}
|
||||||
service::group0_batch mc{std::move(guard)};
|
service::group0_batch mc{std::move(guard)};
|
||||||
auto& sl = state.get_service_level_controller();
|
auto& sl = state.get_service_level_controller();
|
||||||
co_await sl.drop_distributed_service_level(_service_level, _if_exists, mc);
|
co_await sl.drop_distributed_service_level(_service_level, _if_exists, mc);
|
||||||
|
|||||||
@@ -8,6 +8,7 @@
|
|||||||
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
|
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
|
||||||
*/
|
*/
|
||||||
|
|
||||||
|
#include "seastar/core/format.hh"
|
||||||
#include "seastar/core/sstring.hh"
|
#include "seastar/core/sstring.hh"
|
||||||
#include "utils/assert.hh"
|
#include "utils/assert.hh"
|
||||||
#include "cql3/statements/ks_prop_defs.hh"
|
#include "cql3/statements/ks_prop_defs.hh"
|
||||||
@@ -113,6 +114,17 @@ static locator::replication_strategy_config_options prepare_options(
|
|||||||
return options;
|
return options;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
if (uses_tablets) {
|
||||||
|
for (const auto& opt: old_options) {
|
||||||
|
if (opt.first == ks_prop_defs::REPLICATION_FACTOR_KEY) {
|
||||||
|
on_internal_error(logger, format("prepare_options: old_options contains invalid key '{}'", ks_prop_defs::REPLICATION_FACTOR_KEY));
|
||||||
|
}
|
||||||
|
if (!options.contains(opt.first)) {
|
||||||
|
throw exceptions::configuration_exception(fmt::format("Attempted to implicitly drop replicas in datacenter {}. If this is the desired behavior, set replication factor to 0 in {} explicitly.", opt.first, opt.first));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
// For users' convenience, expand the 'replication_factor' option into a replication factor for each DC.
|
// For users' convenience, expand the 'replication_factor' option into a replication factor for each DC.
|
||||||
// If the user simply switches from another strategy without providing any options,
|
// If the user simply switches from another strategy without providing any options,
|
||||||
// but the other strategy used the 'replication_factor' option, it will also be expanded.
|
// but the other strategy used the 'replication_factor' option, it will also be expanded.
|
||||||
|
|||||||
@@ -21,7 +21,7 @@ namespace cql3 {
|
|||||||
namespace statements {
|
namespace statements {
|
||||||
|
|
||||||
static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges, std::vector<query::clustering_range> clustering_bounds, view_ptr view,
|
static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges, std::vector<query::clustering_range> clustering_bounds, view_ptr view,
|
||||||
service::storage_proxy& proxy, service::query_state& state, const query_options& options, cql_stats& stats, db::timeout_clock::duration timeout_duration) {
|
service::storage_proxy& proxy, service::query_state& state, const query_options& options, cql_stats& stats, db::timeout_clock::duration timeout_duration, size_t concurrency) {
|
||||||
auto key_columns = std::ranges::to<std::vector<const column_definition*>>(
|
auto key_columns = std::ranges::to<std::vector<const column_definition*>>(
|
||||||
view->all_columns()
|
view->all_columns()
|
||||||
| std::views::filter([] (const column_definition& cdef) { return cdef.is_primary_key(); })
|
| std::views::filter([] (const column_definition& cdef) { return cdef.is_primary_key(); })
|
||||||
@@ -35,7 +35,7 @@ static future<> delete_ghost_rows(dht::partition_range_vector partition_ranges,
|
|||||||
tracing::trace(state.get_trace_state(), "Deleting ghost rows from partition ranges {}", partition_ranges);
|
tracing::trace(state.get_trace_state(), "Deleting ghost rows from partition ranges {}", partition_ranges);
|
||||||
|
|
||||||
auto p = service::pager::query_pagers::ghost_row_deleting_pager(schema_ptr(view), selection, state,
|
auto p = service::pager::query_pagers::ghost_row_deleting_pager(schema_ptr(view), selection, state,
|
||||||
options, std::move(command), std::move(partition_ranges), stats, proxy, timeout_duration);
|
options, std::move(command), std::move(partition_ranges), stats, proxy, timeout_duration, concurrency);
|
||||||
|
|
||||||
int32_t page_size = std::max(options.get_page_size(), 1000);
|
int32_t page_size = std::max(options.get_page_size(), 1000);
|
||||||
auto now = gc_clock::now();
|
auto now = gc_clock::now();
|
||||||
@@ -62,7 +62,8 @@ future<::shared_ptr<cql_transport::messages::result_message>> prune_materialized
|
|||||||
auto timeout_duration = get_timeout(state.get_client_state(), options);
|
auto timeout_duration = get_timeout(state.get_client_state(), options);
|
||||||
dht::partition_range_vector key_ranges = _restrictions->get_partition_key_ranges(options);
|
dht::partition_range_vector key_ranges = _restrictions->get_partition_key_ranges(options);
|
||||||
std::vector<query::clustering_range> clustering_bounds = _restrictions->get_clustering_bounds(options);
|
std::vector<query::clustering_range> clustering_bounds = _restrictions->get_clustering_bounds(options);
|
||||||
return delete_ghost_rows(std::move(key_ranges), std::move(clustering_bounds), view_ptr(_schema), qp.proxy(), state, options, _stats, timeout_duration).then([] {
|
size_t concurrency = _attrs->is_concurrency_set() ? _attrs->get_concurrency(options).value() : 1;
|
||||||
|
return delete_ghost_rows(std::move(key_ranges), std::move(clustering_bounds), view_ptr(_schema), qp.proxy(), state, options, _stats, timeout_duration, concurrency).then([] {
|
||||||
return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(::make_shared<cql_transport::messages::result_message::void_message>());
|
return make_ready_future<::shared_ptr<cql_transport::messages::result_message>>(::make_shared<cql_transport::messages::result_message::void_message>());
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -2031,14 +2031,16 @@ future<shared_ptr<cql_transport::messages::result_message>> vector_indexed_table
|
|||||||
fmt::format("Use of ANN OF in an ORDER BY clause requires a LIMIT that is not greater than {}. LIMIT was {}", max_ann_query_limit, limit)));
|
fmt::format("Use of ANN OF in an ORDER BY clause requires a LIMIT that is not greater than {}. LIMIT was {}", max_ann_query_limit, limit)));
|
||||||
}
|
}
|
||||||
|
|
||||||
auto as = abort_source();
|
auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options);
|
||||||
auto pkeys = co_await qp.vector_store_client().ann(_schema->ks_name(), _index.metadata().name(), _schema, get_ann_ordering_vector(options), limit, as);
|
auto aoe = abort_on_expiry(timeout);
|
||||||
|
auto pkeys = co_await qp.vector_store_client().ann(
|
||||||
|
_schema->ks_name(), _index.metadata().name(), _schema, get_ann_ordering_vector(options), limit, aoe.abort_source());
|
||||||
if (!pkeys.has_value()) {
|
if (!pkeys.has_value()) {
|
||||||
co_await coroutine::return_exception(
|
co_await coroutine::return_exception(
|
||||||
exceptions::invalid_request_exception(std::visit(vector_search::vector_store_client::ann_error_visitor{}, pkeys.error())));
|
exceptions::invalid_request_exception(std::visit(vector_search::vector_store_client::ann_error_visitor{}, pkeys.error())));
|
||||||
}
|
}
|
||||||
|
|
||||||
co_return co_await query_base_table(qp, state, options, pkeys.value());
|
co_return co_await query_base_table(qp, state, options, pkeys.value(), timeout);
|
||||||
});
|
});
|
||||||
|
|
||||||
auto page_size = options.get_page_size();
|
auto page_size = options.get_page_size();
|
||||||
@@ -2073,10 +2075,10 @@ std::vector<float> vector_indexed_table_select_statement::get_ann_ordering_vecto
|
|||||||
return util::to_vector<float>(values);
|
return util::to_vector<float>(values);
|
||||||
}
|
}
|
||||||
|
|
||||||
future<::shared_ptr<cql_transport::messages::result_message>> vector_indexed_table_select_statement::query_base_table(
|
future<::shared_ptr<cql_transport::messages::result_message>> vector_indexed_table_select_statement::query_base_table(query_processor& qp,
|
||||||
query_processor& qp, service::query_state& state, const query_options& options, const std::vector<vector_search::primary_key>& pkeys) const {
|
service::query_state& state, const query_options& options, const std::vector<vector_search::primary_key>& pkeys,
|
||||||
|
lowres_clock::time_point timeout) const {
|
||||||
auto command = prepare_command_for_base_query(qp, state, options);
|
auto command = prepare_command_for_base_query(qp, state, options);
|
||||||
auto timeout = db::timeout_clock::now() + get_timeout(state.get_client_state(), options);
|
|
||||||
|
|
||||||
// For tables without clustering columns, we can optimize by querying
|
// For tables without clustering columns, we can optimize by querying
|
||||||
// partition ranges instead of individual primary keys, since the
|
// partition ranges instead of individual primary keys, since the
|
||||||
|
|||||||
@@ -389,8 +389,8 @@ private:
|
|||||||
|
|
||||||
std::vector<float> get_ann_ordering_vector(const query_options& options) const;
|
std::vector<float> get_ann_ordering_vector(const query_options& options) const;
|
||||||
|
|
||||||
future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(
|
future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(query_processor& qp, service::query_state& state,
|
||||||
query_processor& qp, service::query_state& state, const query_options& options, const std::vector<vector_search::primary_key>& pkeys) const;
|
const query_options& options, const std::vector<vector_search::primary_key>& pkeys, lowres_clock::time_point timeout) const;
|
||||||
|
|
||||||
future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(query_processor& qp, service::query_state& state,
|
future<::shared_ptr<cql_transport::messages::result_message>> query_base_table(query_processor& qp, service::query_state& state,
|
||||||
const query_options& options, lw_shared_ptr<query::read_command> command, lowres_clock::time_point timeout,
|
const query_options& options, lw_shared_ptr<query::read_command> command, lowres_clock::time_point timeout,
|
||||||
|
|||||||
@@ -12,5 +12,8 @@ target_link_libraries(data_dictionary
|
|||||||
Seastar::seastar
|
Seastar::seastar
|
||||||
xxHash::xxhash)
|
xxHash::xxhash)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(data_dictionary REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers data_dictionary
|
check_headers(check-headers data_dictionary
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
@@ -60,5 +60,8 @@ target_link_libraries(db
|
|||||||
data_dictionary
|
data_dictionary
|
||||||
cql3)
|
cql3)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(db REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
check_headers(check-headers db
|
check_headers(check-headers db
|
||||||
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)
|
||||||
|
|||||||
@@ -3461,12 +3461,15 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
|
|||||||
clogger.debug("Read {} bytes of data ({}, {})", size, pos, rem);
|
clogger.debug("Read {} bytes of data ({}, {})", size, pos, rem);
|
||||||
|
|
||||||
while (rem < size) {
|
while (rem < size) {
|
||||||
|
const auto initial_size = initial.size_bytes();
|
||||||
|
|
||||||
if (eof) {
|
if (eof) {
|
||||||
auto reason = fmt::format("unexpected EOF, rem={}, size={}", rem, size);
|
auto reason = fmt::format("unexpected EOF, pos={}, rem={}, size={}, alignment={}, initial_size={}",
|
||||||
|
pos, rem, size, alignment, initial_size);
|
||||||
throw segment_truncation(std::move(reason), block_boundry);
|
throw segment_truncation(std::move(reason), block_boundry);
|
||||||
}
|
}
|
||||||
|
|
||||||
auto block_size = alignment - initial.size_bytes();
|
auto block_size = alignment - initial_size;
|
||||||
// using a stream is perhaps not 100% effective, but we need to
|
// using a stream is perhaps not 100% effective, but we need to
|
||||||
// potentially address data in pages smaller than the current
|
// potentially address data in pages smaller than the current
|
||||||
// disk/fs we are reading from can handle (but please no).
|
// disk/fs we are reading from can handle (but please no).
|
||||||
@@ -3474,8 +3477,9 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
|
|||||||
|
|
||||||
if (tmp.size_bytes() == 0) {
|
if (tmp.size_bytes() == 0) {
|
||||||
eof = true;
|
eof = true;
|
||||||
auto reason = fmt::format("read 0 bytes, while tried to read {} bytes. rem={}, size={}",
|
auto reason = fmt::format("read 0 bytes, while tried to read {} bytes. "
|
||||||
block_size, rem, size);
|
"pos={}, rem={}, size={}, alignment={}, initial_size={}",
|
||||||
|
block_size, pos, rem, size, alignment, initial_size);
|
||||||
throw segment_truncation(std::move(reason), block_boundry);
|
throw segment_truncation(std::move(reason), block_boundry);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -3511,13 +3515,13 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
|
|||||||
auto checksum = crc.checksum();
|
auto checksum = crc.checksum();
|
||||||
|
|
||||||
if (check != checksum) {
|
if (check != checksum) {
|
||||||
auto reason = fmt::format("checksums do not match: {:x} vs. {:x}. rem={}, size={}",
|
auto reason = fmt::format("checksums do not match: {:x} vs. {:x}. pos={}, rem={}, size={}, alignment={}, initial_size={}",
|
||||||
check, checksum, rem, size);
|
check, checksum, pos, rem, size, alignment, initial_size);
|
||||||
throw segment_data_corruption_error(std::move(reason), alignment);
|
throw segment_data_corruption_error(std::move(reason), alignment);
|
||||||
}
|
}
|
||||||
if (id != this->id) {
|
if (id != this->id) {
|
||||||
auto reason = fmt::format("IDs do not match: {} vs. {}. rem={}, size={}",
|
auto reason = fmt::format("IDs do not match: {} vs. {}. pos={}, rem={}, size={}, alignment={}, initial_size={}",
|
||||||
id, this->id, rem, size);
|
id, this->id, pos, rem, size, alignment, initial_size);
|
||||||
throw segment_truncation(std::move(reason), pos + rem);
|
throw segment_truncation(std::move(reason), pos + rem);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
@@ -3626,6 +3630,10 @@ db::commitlog::read_log_file(const replay_state& state, sstring filename, sstrin
|
|||||||
auto old = pos;
|
auto old = pos;
|
||||||
pos = next_pos(off);
|
pos = next_pos(off);
|
||||||
clogger.trace("Pos {} -> {} ({})", old, pos, off);
|
clogger.trace("Pos {} -> {} ({})", old, pos, off);
|
||||||
|
// #24346 check eof status whenever we move file pos.
|
||||||
|
if (pos >= file_size) {
|
||||||
|
eof = true;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
future<> read_entry() {
|
future<> read_entry() {
|
||||||
|
|||||||
23
db/config.cc
23
db/config.cc
@@ -36,6 +36,7 @@
|
|||||||
#include "sstables/compressor.hh"
|
#include "sstables/compressor.hh"
|
||||||
#include "utils/log.hh"
|
#include "utils/log.hh"
|
||||||
#include "service/tablet_allocator_fwd.hh"
|
#include "service/tablet_allocator_fwd.hh"
|
||||||
|
#include "backlog_controller_fwd.hh"
|
||||||
#include "utils/config_file_impl.hh"
|
#include "utils/config_file_impl.hh"
|
||||||
#include "exceptions/exceptions.hh"
|
#include "exceptions/exceptions.hh"
|
||||||
#include <seastar/core/metrics_api.hh>
|
#include <seastar/core/metrics_api.hh>
|
||||||
@@ -630,6 +631,8 @@ db::config::config(std::shared_ptr<db::extensions> exts)
|
|||||||
"If set to higher than 0, ignore the controller's output and set the memtable shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
|
"If set to higher than 0, ignore the controller's output and set the memtable shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
|
||||||
, compaction_static_shares(this, "compaction_static_shares", liveness::LiveUpdate, value_status::Used, 0,
|
, compaction_static_shares(this, "compaction_static_shares", liveness::LiveUpdate, value_status::Used, 0,
|
||||||
"If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
|
"If set to higher than 0, ignore the controller's output and set the compaction shares statically. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
|
||||||
|
, compaction_max_shares(this, "compaction_max_shares", liveness::LiveUpdate, value_status::Used, default_compaction_maximum_shares,
|
||||||
|
"Set the maximum shares of regular compaction to the specific value. Do not set this unless you know what you are doing and suspect a problem in the controller. This option will be retired when the controller reaches more maturity.")
|
||||||
, compaction_enforce_min_threshold(this, "compaction_enforce_min_threshold", liveness::LiveUpdate, value_status::Used, false,
|
, compaction_enforce_min_threshold(this, "compaction_enforce_min_threshold", liveness::LiveUpdate, value_status::Used, false,
|
||||||
"If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold.")
|
"If set to true, enforce the min_threshold option for compactions strictly. If false (default), Scylla may decide to compact even if below min_threshold.")
|
||||||
, compaction_flush_all_tables_before_major_seconds(this, "compaction_flush_all_tables_before_major_seconds", value_status::Used, 86400,
|
, compaction_flush_all_tables_before_major_seconds(this, "compaction_flush_all_tables_before_major_seconds", value_status::Used, 86400,
|
||||||
@@ -1035,8 +1038,9 @@ db::config::config(std::shared_ptr<db::extensions> exts)
|
|||||||
"Controls whether traffic between nodes is compressed. The valid values are:\n"
|
"Controls whether traffic between nodes is compressed. The valid values are:\n"
|
||||||
"* all: All traffic is compressed.\n"
|
"* all: All traffic is compressed.\n"
|
||||||
"* dc : Traffic between data centers is compressed.\n"
|
"* dc : Traffic between data centers is compressed.\n"
|
||||||
|
"* rack : Traffic between racks is compressed.\n"
|
||||||
"* none : No compression.",
|
"* none : No compression.",
|
||||||
{"all", "dc", "none"})
|
{"all", "dc", "rack", "none"})
|
||||||
, internode_compression_zstd_max_cpu_fraction(this, "internode_compression_zstd_max_cpu_fraction", liveness::LiveUpdate, value_status::Used, 0.000,
|
, internode_compression_zstd_max_cpu_fraction(this, "internode_compression_zstd_max_cpu_fraction", liveness::LiveUpdate, value_status::Used, 0.000,
|
||||||
"ZSTD compression of RPC will consume at most this fraction of each internode_compression_zstd_quota_refresh_period_ms time slice.\n"
|
"ZSTD compression of RPC will consume at most this fraction of each internode_compression_zstd_quota_refresh_period_ms time slice.\n"
|
||||||
"If you wish to try out zstd for RPC compression, 0.05 is a reasonable starting point.")
|
"If you wish to try out zstd for RPC compression, 0.05 is a reasonable starting point.")
|
||||||
@@ -1168,6 +1172,17 @@ db::config::config(std::shared_ptr<db::extensions> exts)
|
|||||||
"* default_weight: (Default: 1 **) How many requests are handled during each turn of the RoundRobin.\n"
|
"* default_weight: (Default: 1 **) How many requests are handled during each turn of the RoundRobin.\n"
|
||||||
"* weights: (Default: Keyspace: 1) Takes a list of keyspaces. It sets how many requests are handled during each turn of the RoundRobin, based on the request_scheduler_id.")
|
"* weights: (Default: Keyspace: 1) Takes a list of keyspaces. It sets how many requests are handled during each turn of the RoundRobin, based on the request_scheduler_id.")
|
||||||
/**
|
/**
|
||||||
|
* @Group Vector search settings
|
||||||
|
* @GroupDescription Settings for configuring and tuning vector search functionality.
|
||||||
|
*/
|
||||||
|
, vector_store_primary_uri(this, "vector_store_primary_uri", liveness::LiveUpdate, value_status::Used, "",
|
||||||
|
"A comma-separated list of primary vector store node URIs. These nodes are preferred for vector search operations.")
|
||||||
|
, vector_store_secondary_uri(this, "vector_store_secondary_uri", liveness::LiveUpdate, value_status::Used, "",
|
||||||
|
"A comma-separated list of secondary vector store node URIs. These nodes are used as a fallback when all primary nodes are unavailable, and are typically located in a different availability zone for high availability.")
|
||||||
|
, vector_store_encryption_options(this, "vector_store_encryption_options", value_status::Used, {},
|
||||||
|
"Options for encrypted connections to the vector store. These options are used for HTTPS URIs in `vector_store_primary_uri` and `vector_store_secondary_uri`. The available options are:\n"
|
||||||
|
"* truststore: (Default: <not set, use system truststore>) Location of the truststore containing the trusted certificate for authenticating remote servers.")
|
||||||
|
/**
|
||||||
* @Group Security properties
|
* @Group Security properties
|
||||||
* @GroupDescription Server and client security settings.
|
* @GroupDescription Server and client security settings.
|
||||||
*/
|
*/
|
||||||
@@ -1429,6 +1444,11 @@ db::config::config(std::shared_ptr<db::extensions> exts)
|
|||||||
, alternator_warn_authorization(this, "alternator_warn_authorization", liveness::LiveUpdate, value_status::Used, false, "Count and log warnings about failed authentication or authorization")
|
, alternator_warn_authorization(this, "alternator_warn_authorization", liveness::LiveUpdate, value_status::Used, false, "Count and log warnings about failed authentication or authorization")
|
||||||
, alternator_write_isolation(this, "alternator_write_isolation", value_status::Used, "", "Default write isolation policy for Alternator.")
|
, alternator_write_isolation(this, "alternator_write_isolation", value_status::Used, "", "Default write isolation policy for Alternator.")
|
||||||
, alternator_streams_time_window_s(this, "alternator_streams_time_window_s", value_status::Used, 10, "CDC query confidence window for alternator streams.")
|
, alternator_streams_time_window_s(this, "alternator_streams_time_window_s", value_status::Used, 10, "CDC query confidence window for alternator streams.")
|
||||||
|
, alternator_streams_increased_compatibility(this, "alternator_streams_increased_compatibility", liveness::LiveUpdate, value_status::Used, false,
|
||||||
|
"Increases compatibility with DynamoDB Streams at the cost of performance. "
|
||||||
|
"If enabled, Alternator compares the existing item with the new one during "
|
||||||
|
"data-modifying operations to determine which event type should be emitted. "
|
||||||
|
"This penalty is incurred only for tables with Alternator Streams enabled.")
|
||||||
, alternator_timeout_in_ms(this, "alternator_timeout_in_ms", liveness::LiveUpdate, value_status::Used, 10000,
|
, alternator_timeout_in_ms(this, "alternator_timeout_in_ms", liveness::LiveUpdate, value_status::Used, 10000,
|
||||||
"The server-side timeout for completing Alternator API requests.")
|
"The server-side timeout for completing Alternator API requests.")
|
||||||
, alternator_ttl_period_in_seconds(this, "alternator_ttl_period_in_seconds", value_status::Used,
|
, alternator_ttl_period_in_seconds(this, "alternator_ttl_period_in_seconds", value_status::Used,
|
||||||
@@ -1450,7 +1470,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
|
|||||||
, alternator_max_expression_cache_entries_per_shard(this, "alternator_max_expression_cache_entries_per_shard", liveness::LiveUpdate, value_status::Used, 2000, "Maximum number of cached parsed request expressions, per shard.")
|
, alternator_max_expression_cache_entries_per_shard(this, "alternator_max_expression_cache_entries_per_shard", liveness::LiveUpdate, value_status::Used, 2000, "Maximum number of cached parsed request expressions, per shard.")
|
||||||
, alternator_max_users_query_size_in_trace_output(this, "alternator_max_users_query_size_in_trace_output", liveness::LiveUpdate, value_status::Used, uint64_t(4096),
|
, alternator_max_users_query_size_in_trace_output(this, "alternator_max_users_query_size_in_trace_output", liveness::LiveUpdate, value_status::Used, uint64_t(4096),
|
||||||
"Maximum size of user's command in trace output (`alternator_op` entry). Larger traces will be truncated and have `<truncated>` message appended - which doesn't count to the maximum limit.")
|
"Maximum size of user's command in trace output (`alternator_op` entry). Larger traces will be truncated and have `<truncated>` message appended - which doesn't count to the maximum limit.")
|
||||||
, vector_store_primary_uri(this, "vector_store_primary_uri", liveness::LiveUpdate, value_status::Used, "", "A comma-separated list of vector store node URIs. If not set, vector search is disabled.")
|
|
||||||
, abort_on_ebadf(this, "abort_on_ebadf", value_status::Used, true, "Abort the server on incorrect file descriptor access. Throws exception when disabled.")
|
, abort_on_ebadf(this, "abort_on_ebadf", value_status::Used, true, "Abort the server on incorrect file descriptor access. Throws exception when disabled.")
|
||||||
, sanitizer_report_backtrace(this, "sanitizer_report_backtrace", value_status::Used, false,
|
, sanitizer_report_backtrace(this, "sanitizer_report_backtrace", value_status::Used, false,
|
||||||
"In debug mode, report log-structured allocator sanitizer violations with a backtrace. Slow.")
|
"In debug mode, report log-structured allocator sanitizer violations with a backtrace. Slow.")
|
||||||
|
|||||||
@@ -189,6 +189,7 @@ public:
|
|||||||
named_value<bool> auto_adjust_flush_quota;
|
named_value<bool> auto_adjust_flush_quota;
|
||||||
named_value<float> memtable_flush_static_shares;
|
named_value<float> memtable_flush_static_shares;
|
||||||
named_value<float> compaction_static_shares;
|
named_value<float> compaction_static_shares;
|
||||||
|
named_value<float> compaction_max_shares;
|
||||||
named_value<bool> compaction_enforce_min_threshold;
|
named_value<bool> compaction_enforce_min_threshold;
|
||||||
named_value<uint32_t> compaction_flush_all_tables_before_major_seconds;
|
named_value<uint32_t> compaction_flush_all_tables_before_major_seconds;
|
||||||
named_value<sstring> cluster_name;
|
named_value<sstring> cluster_name;
|
||||||
@@ -343,6 +344,9 @@ public:
|
|||||||
named_value<sstring> request_scheduler;
|
named_value<sstring> request_scheduler;
|
||||||
named_value<sstring> request_scheduler_id;
|
named_value<sstring> request_scheduler_id;
|
||||||
named_value<string_map> request_scheduler_options;
|
named_value<string_map> request_scheduler_options;
|
||||||
|
named_value<sstring> vector_store_primary_uri;
|
||||||
|
named_value<sstring> vector_store_secondary_uri;
|
||||||
|
named_value<string_map> vector_store_encryption_options;
|
||||||
named_value<sstring> authenticator;
|
named_value<sstring> authenticator;
|
||||||
named_value<sstring> internode_authenticator;
|
named_value<sstring> internode_authenticator;
|
||||||
named_value<sstring> authorizer;
|
named_value<sstring> authorizer;
|
||||||
@@ -461,6 +465,7 @@ public:
|
|||||||
named_value<bool> alternator_warn_authorization;
|
named_value<bool> alternator_warn_authorization;
|
||||||
named_value<sstring> alternator_write_isolation;
|
named_value<sstring> alternator_write_isolation;
|
||||||
named_value<uint32_t> alternator_streams_time_window_s;
|
named_value<uint32_t> alternator_streams_time_window_s;
|
||||||
|
named_value<bool> alternator_streams_increased_compatibility;
|
||||||
named_value<uint32_t> alternator_timeout_in_ms;
|
named_value<uint32_t> alternator_timeout_in_ms;
|
||||||
named_value<double> alternator_ttl_period_in_seconds;
|
named_value<double> alternator_ttl_period_in_seconds;
|
||||||
named_value<sstring> alternator_describe_endpoints;
|
named_value<sstring> alternator_describe_endpoints;
|
||||||
@@ -469,8 +474,6 @@ public:
|
|||||||
named_value<uint32_t> alternator_max_expression_cache_entries_per_shard;
|
named_value<uint32_t> alternator_max_expression_cache_entries_per_shard;
|
||||||
named_value<uint64_t> alternator_max_users_query_size_in_trace_output;
|
named_value<uint64_t> alternator_max_users_query_size_in_trace_output;
|
||||||
|
|
||||||
named_value<sstring> vector_store_primary_uri;
|
|
||||||
|
|
||||||
named_value<bool> abort_on_ebadf;
|
named_value<bool> abort_on_ebadf;
|
||||||
|
|
||||||
named_value<bool> sanitizer_report_backtrace;
|
named_value<bool> sanitizer_report_backtrace;
|
||||||
|
|||||||
@@ -152,8 +152,8 @@ public:
|
|||||||
|
|
||||||
builder.with_version(sm.digest());
|
builder.with_version(sm.digest());
|
||||||
|
|
||||||
cf_type cf = sstring_to_cf_type(td.get_or("type", sstring("standard")));
|
auto type_str = td.get_or("type", sstring("standard"));
|
||||||
if (cf == cf_type::super) {
|
if (type_str == "Super") {
|
||||||
fail(unimplemented::cause::SUPER);
|
fail(unimplemented::cause::SUPER);
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -284,14 +284,9 @@ public:
|
|||||||
if (kind_str == "compact_value") {
|
if (kind_str == "compact_value") {
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
if (kind == column_kind::clustering_key) {
|
if (kind == column_kind::clustering_key && !is_compound) {
|
||||||
if (cf == cf_type::super && component_index != 0) {
|
|
||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
if (cf != cf_type::super && !is_compound) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
std::optional<index_metadata_kind> index_kind;
|
std::optional<index_metadata_kind> index_kind;
|
||||||
|
|||||||
@@ -143,7 +143,7 @@ static computed_columns_map get_computed_columns(const schema_mutations& sm);
|
|||||||
|
|
||||||
static std::vector<column_definition> create_columns_from_column_rows(
|
static std::vector<column_definition> create_columns_from_column_rows(
|
||||||
const query::result_set& rows, const sstring& keyspace,
|
const query::result_set& rows, const sstring& keyspace,
|
||||||
const sstring& table, bool is_super, column_view_virtual is_view_virtual, const computed_columns_map& computed_columns,
|
const sstring& table, column_view_virtual is_view_virtual, const computed_columns_map& computed_columns,
|
||||||
const data_dictionary::user_types_storage& user_types);
|
const data_dictionary::user_types_storage& user_types);
|
||||||
|
|
||||||
|
|
||||||
@@ -1804,9 +1804,6 @@ static schema_mutations make_table_mutations(schema_ptr table, api::timestamp_ty
|
|||||||
auto scylla_tables_mutation = make_scylla_tables_mutation(table, timestamp);
|
auto scylla_tables_mutation = make_scylla_tables_mutation(table, timestamp);
|
||||||
|
|
||||||
list_type_impl::native_type flags;
|
list_type_impl::native_type flags;
|
||||||
if (table->is_super()) {
|
|
||||||
flags.emplace_back("super");
|
|
||||||
}
|
|
||||||
if (table->is_dense()) {
|
if (table->is_dense()) {
|
||||||
flags.emplace_back("dense");
|
flags.emplace_back("dense");
|
||||||
}
|
}
|
||||||
@@ -2280,7 +2277,6 @@ schema_ptr create_table_from_mutations(const schema_ctxt& ctxt, schema_mutations
|
|||||||
auto id = table_id(table_row.get_nonnull<utils::UUID>("id"));
|
auto id = table_id(table_row.get_nonnull<utils::UUID>("id"));
|
||||||
schema_builder builder{ks_name, cf_name, id};
|
schema_builder builder{ks_name, cf_name, id};
|
||||||
|
|
||||||
auto cf = cf_type::standard;
|
|
||||||
auto is_dense = false;
|
auto is_dense = false;
|
||||||
auto is_counter = false;
|
auto is_counter = false;
|
||||||
auto is_compound = false;
|
auto is_compound = false;
|
||||||
@@ -2289,7 +2285,6 @@ schema_ptr create_table_from_mutations(const schema_ctxt& ctxt, schema_mutations
|
|||||||
if (flags) {
|
if (flags) {
|
||||||
for (auto& s : *flags) {
|
for (auto& s : *flags) {
|
||||||
if (s == "super") {
|
if (s == "super") {
|
||||||
// cf = cf_type::super;
|
|
||||||
fail(unimplemented::cause::SUPER);
|
fail(unimplemented::cause::SUPER);
|
||||||
} else if (s == "dense") {
|
} else if (s == "dense") {
|
||||||
is_dense = true;
|
is_dense = true;
|
||||||
@@ -2305,9 +2300,7 @@ schema_ptr create_table_from_mutations(const schema_ctxt& ctxt, schema_mutations
|
|||||||
std::vector<column_definition> column_defs = create_columns_from_column_rows(
|
std::vector<column_definition> column_defs = create_columns_from_column_rows(
|
||||||
query::result_set(sm.columns_mutation()),
|
query::result_set(sm.columns_mutation()),
|
||||||
ks_name,
|
ks_name,
|
||||||
cf_name,/*,
|
cf_name,
|
||||||
fullRawComparator, */
|
|
||||||
cf == cf_type::super,
|
|
||||||
column_view_virtual::no,
|
column_view_virtual::no,
|
||||||
computed_columns,
|
computed_columns,
|
||||||
user_types);
|
user_types);
|
||||||
@@ -2486,9 +2479,7 @@ static computed_columns_map get_computed_columns(const schema_mutations& sm) {
|
|||||||
static std::vector<column_definition> create_columns_from_column_rows(
|
static std::vector<column_definition> create_columns_from_column_rows(
|
||||||
const query::result_set& rows,
|
const query::result_set& rows,
|
||||||
const sstring& keyspace,
|
const sstring& keyspace,
|
||||||
const sstring& table, /*,
|
const sstring& table,
|
||||||
AbstractType<?> rawComparator, */
|
|
||||||
bool is_super,
|
|
||||||
column_view_virtual is_view_virtual,
|
column_view_virtual is_view_virtual,
|
||||||
const computed_columns_map& computed_columns,
|
const computed_columns_map& computed_columns,
|
||||||
const data_dictionary::user_types_storage& user_types)
|
const data_dictionary::user_types_storage& user_types)
|
||||||
@@ -2565,12 +2556,12 @@ static schema_builder prepare_view_schema_builder_from_mutations(const schema_ct
|
|||||||
}
|
}
|
||||||
|
|
||||||
auto computed_columns = get_computed_columns(sm);
|
auto computed_columns = get_computed_columns(sm);
|
||||||
auto column_defs = create_columns_from_column_rows(query::result_set(sm.columns_mutation()), ks_name, cf_name, false, column_view_virtual::no, computed_columns, user_types);
|
auto column_defs = create_columns_from_column_rows(query::result_set(sm.columns_mutation()), ks_name, cf_name, column_view_virtual::no, computed_columns, user_types);
|
||||||
for (auto&& cdef : column_defs) {
|
for (auto&& cdef : column_defs) {
|
||||||
builder.with_column_ordered(cdef);
|
builder.with_column_ordered(cdef);
|
||||||
}
|
}
|
||||||
if (sm.view_virtual_columns_mutation()) {
|
if (sm.view_virtual_columns_mutation()) {
|
||||||
column_defs = create_columns_from_column_rows(query::result_set(*sm.view_virtual_columns_mutation()), ks_name, cf_name, false, column_view_virtual::yes, computed_columns, user_types);
|
column_defs = create_columns_from_column_rows(query::result_set(*sm.view_virtual_columns_mutation()), ks_name, cf_name, column_view_virtual::yes, computed_columns, user_types);
|
||||||
for (auto&& cdef : column_defs) {
|
for (auto&& cdef : column_defs) {
|
||||||
builder.with_column_ordered(cdef);
|
builder.with_column_ordered(cdef);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -766,9 +766,6 @@ schema_ptr system_keyspace::size_estimates() {
|
|||||||
"partitions larger than specified threshold"
|
"partitions larger than specified threshold"
|
||||||
);
|
);
|
||||||
builder.set_gc_grace_seconds(0);
|
builder.set_gc_grace_seconds(0);
|
||||||
// FIXME re-enable caching for this and the other two
|
|
||||||
// system.large_* tables once
|
|
||||||
// https://github.com/scylladb/scylla/issues/3288 is fixed
|
|
||||||
builder.set_caching_options(caching_options::get_disabled_caching_options());
|
builder.set_caching_options(caching_options::get_disabled_caching_options());
|
||||||
builder.with_hash_version();
|
builder.with_hash_version();
|
||||||
return builder.build(schema_builder::compact_storage::no);
|
return builder.build(schema_builder::compact_storage::no);
|
||||||
@@ -1667,7 +1664,7 @@ schema_ptr system_keyspace::view_building_tasks() {
|
|||||||
.with_column("key", utf8_type, column_kind::partition_key)
|
.with_column("key", utf8_type, column_kind::partition_key)
|
||||||
.with_column("id", timeuuid_type, column_kind::clustering_key)
|
.with_column("id", timeuuid_type, column_kind::clustering_key)
|
||||||
.with_column("type", utf8_type)
|
.with_column("type", utf8_type)
|
||||||
.with_column("state", utf8_type)
|
.with_column("aborted", boolean_type)
|
||||||
.with_column("base_id", uuid_type)
|
.with_column("base_id", uuid_type)
|
||||||
.with_column("view_id", uuid_type)
|
.with_column("view_id", uuid_type)
|
||||||
.with_column("last_token", long_type)
|
.with_column("last_token", long_type)
|
||||||
@@ -3062,14 +3059,14 @@ future<mutation> system_keyspace::make_remove_view_build_status_on_host_mutation
|
|||||||
static constexpr auto VIEW_BUILDING_KEY = "view_building";
|
static constexpr auto VIEW_BUILDING_KEY = "view_building";
|
||||||
|
|
||||||
future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
|
future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
|
||||||
static const sstring query = format("SELECT id, type, state, base_id, view_id, last_token, host_id, shard FROM {}.{} WHERE key = '{}'", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
static const sstring query = format("SELECT id, type, aborted, base_id, view_id, last_token, host_id, shard FROM {}.{} WHERE key = '{}'", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
||||||
using namespace db::view;
|
using namespace db::view;
|
||||||
|
|
||||||
building_tasks tasks;
|
building_tasks tasks;
|
||||||
co_await _qp.query_internal(query, [&] (const cql3::untyped_result_set_row& row) -> future<stop_iteration> {
|
co_await _qp.query_internal(query, [&] (const cql3::untyped_result_set_row& row) -> future<stop_iteration> {
|
||||||
auto id = row.get_as<utils::UUID>("id");
|
auto id = row.get_as<utils::UUID>("id");
|
||||||
auto type = task_type_from_string(row.get_as<sstring>("type"));
|
auto type = task_type_from_string(row.get_as<sstring>("type"));
|
||||||
auto state = task_state_from_string(row.get_as<sstring>("state"));
|
auto aborted = row.get_as<bool>("aborted");
|
||||||
auto base_id = table_id(row.get_as<utils::UUID>("base_id"));
|
auto base_id = table_id(row.get_as<utils::UUID>("base_id"));
|
||||||
auto view_id = row.get_opt<utils::UUID>("view_id").transform([] (const utils::UUID& uuid) { return table_id(uuid); });
|
auto view_id = row.get_opt<utils::UUID>("view_id").transform([] (const utils::UUID& uuid) { return table_id(uuid); });
|
||||||
auto last_token = dht::token::from_int64(row.get_as<int64_t>("last_token"));
|
auto last_token = dht::token::from_int64(row.get_as<int64_t>("last_token"));
|
||||||
@@ -3077,7 +3074,7 @@ future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
|
|||||||
auto shard = unsigned(row.get_as<int32_t>("shard"));
|
auto shard = unsigned(row.get_as<int32_t>("shard"));
|
||||||
|
|
||||||
locator::tablet_replica replica{host_id, shard};
|
locator::tablet_replica replica{host_id, shard};
|
||||||
view_building_task task{id, type, state, base_id, view_id, replica, last_token};
|
view_building_task task{id, type, aborted, base_id, view_id, replica, last_token};
|
||||||
|
|
||||||
switch (type) {
|
switch (type) {
|
||||||
case db::view::view_building_task::task_type::build_range:
|
case db::view::view_building_task::task_type::build_range:
|
||||||
@@ -3096,7 +3093,7 @@ future<db::view::building_tasks> system_keyspace::get_view_building_tasks() {
|
|||||||
}
|
}
|
||||||
|
|
||||||
future<mutation> system_keyspace::make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task) {
|
future<mutation> system_keyspace::make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task) {
|
||||||
static const sstring stmt = format("INSERT INTO {}.{}(key, id, type, state, base_id, view_id, last_token, host_id, shard) VALUES ('{}', ?, ?, ?, ?, ?, ?, ?, ?)", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
static const sstring stmt = format("INSERT INTO {}.{}(key, id, type, aborted, base_id, view_id, last_token, host_id, shard) VALUES ('{}', ?, ?, ?, ?, ?, ?, ?, ?)", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
||||||
using namespace db::view;
|
using namespace db::view;
|
||||||
|
|
||||||
data_value_or_unset view_id = unset_value{};
|
data_value_or_unset view_id = unset_value{};
|
||||||
@@ -3107,7 +3104,7 @@ future<mutation> system_keyspace::make_view_building_task_mutation(api::timestam
|
|||||||
view_id = data_value(task.view_id->uuid());
|
view_id = data_value(task.view_id->uuid());
|
||||||
}
|
}
|
||||||
auto muts = co_await _qp.get_mutations_internal(stmt, internal_system_query_state(), ts, {
|
auto muts = co_await _qp.get_mutations_internal(stmt, internal_system_query_state(), ts, {
|
||||||
task.id, task_type_to_sstring(task.type), task_state_to_sstring(task.state),
|
task.id, task_type_to_sstring(task.type), task.aborted,
|
||||||
task.base_id.uuid(), view_id, dht::token::to_int64(task.last_token),
|
task.base_id.uuid(), view_id, dht::token::to_int64(task.last_token),
|
||||||
task.replica.host.uuid(), int32_t(task.replica.shard)
|
task.replica.host.uuid(), int32_t(task.replica.shard)
|
||||||
});
|
});
|
||||||
@@ -3117,18 +3114,6 @@ future<mutation> system_keyspace::make_view_building_task_mutation(api::timestam
|
|||||||
co_return std::move(muts[0]);
|
co_return std::move(muts[0]);
|
||||||
}
|
}
|
||||||
|
|
||||||
future<mutation> system_keyspace::make_update_view_building_task_state_mutation(api::timestamp_type ts, utils::UUID id, db::view::view_building_task::task_state state) {
|
|
||||||
static const sstring stmt = format("UPDATE {}.{} SET state = ? WHERE key = '{}' AND id = ?", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
|
||||||
|
|
||||||
auto muts = co_await _qp.get_mutations_internal(stmt, internal_system_query_state(), ts, {
|
|
||||||
task_state_to_sstring(state), id
|
|
||||||
});
|
|
||||||
if (muts.size() != 1) {
|
|
||||||
on_internal_error(slogger, fmt::format("expected 1 mutation got {}", muts.size()));
|
|
||||||
}
|
|
||||||
co_return std::move(muts[0]);
|
|
||||||
}
|
|
||||||
|
|
||||||
future<mutation> system_keyspace::make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id) {
|
future<mutation> system_keyspace::make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id) {
|
||||||
static const sstring stmt = format("DELETE FROM {}.{} WHERE key = '{}' AND id = ?", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
static const sstring stmt = format("DELETE FROM {}.{} WHERE key = '{}' AND id = ?", NAME, VIEW_BUILDING_TASKS, VIEW_BUILDING_KEY);
|
||||||
|
|
||||||
|
|||||||
@@ -576,7 +576,6 @@ public:
|
|||||||
// system.view_building_tasks
|
// system.view_building_tasks
|
||||||
future<db::view::building_tasks> get_view_building_tasks();
|
future<db::view::building_tasks> get_view_building_tasks();
|
||||||
future<mutation> make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task);
|
future<mutation> make_view_building_task_mutation(api::timestamp_type ts, const db::view::view_building_task& task);
|
||||||
future<mutation> make_update_view_building_task_state_mutation(api::timestamp_type ts, utils::UUID id, db::view::view_building_task::task_state state);
|
|
||||||
future<mutation> make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id);
|
future<mutation> make_remove_view_building_task_mutation(api::timestamp_type ts, utils::UUID id);
|
||||||
|
|
||||||
// system.scylla_local, view_building_processing_base key
|
// system.scylla_local, view_building_processing_base key
|
||||||
|
|||||||
@@ -9,6 +9,8 @@
|
|||||||
#include "query/query-result-reader.hh"
|
#include "query/query-result-reader.hh"
|
||||||
#include "replica/database_fwd.hh"
|
#include "replica/database_fwd.hh"
|
||||||
#include "db/timeout_clock.hh"
|
#include "db/timeout_clock.hh"
|
||||||
|
#include <seastar/core/future.hh>
|
||||||
|
#include <seastar/core/gate.hh>
|
||||||
|
|
||||||
namespace service {
|
namespace service {
|
||||||
class storage_proxy;
|
class storage_proxy;
|
||||||
@@ -25,8 +27,14 @@ class delete_ghost_rows_visitor {
|
|||||||
replica::table& _view_table;
|
replica::table& _view_table;
|
||||||
schema_ptr _base_schema;
|
schema_ptr _base_schema;
|
||||||
std::optional<partition_key> _view_pk;
|
std::optional<partition_key> _view_pk;
|
||||||
|
db::timeout_semaphore _concurrency_semaphore;
|
||||||
|
seastar::gate _gate;
|
||||||
|
std::exception_ptr& _ex;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration);
|
delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration, size_t concurrency, std::exception_ptr& ex);
|
||||||
|
delete_ghost_rows_visitor(delete_ghost_rows_visitor&&) = default;
|
||||||
|
~delete_ghost_rows_visitor() noexcept;
|
||||||
|
|
||||||
void add_value(const column_definition& def, query::result_row_view::iterator_type& i) {
|
void add_value(const column_definition& def, query::result_row_view::iterator_type& i) {
|
||||||
}
|
}
|
||||||
@@ -45,6 +53,9 @@ public:
|
|||||||
uint32_t accept_partition_end(const query::result_row_view& static_row) {
|
uint32_t accept_partition_end(const query::result_row_view& static_row) {
|
||||||
return 0;
|
return 0;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
future<> do_accept_new_row(partition_key pk, clustering_key ck);
|
||||||
};
|
};
|
||||||
|
|
||||||
} //namespace db::view
|
} //namespace db::view
|
||||||
|
|||||||
@@ -3597,7 +3597,7 @@ view_updating_consumer::view_updating_consumer(view_update_generator& gen, schem
|
|||||||
})
|
})
|
||||||
{ }
|
{ }
|
||||||
|
|
||||||
delete_ghost_rows_visitor::delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration)
|
delete_ghost_rows_visitor::delete_ghost_rows_visitor(service::storage_proxy& proxy, service::query_state& state, view_ptr view, db::timeout_clock::duration timeout_duration, size_t concurrency, std::exception_ptr& ex)
|
||||||
: _proxy(proxy)
|
: _proxy(proxy)
|
||||||
, _state(state)
|
, _state(state)
|
||||||
, _timeout_duration(timeout_duration)
|
, _timeout_duration(timeout_duration)
|
||||||
@@ -3605,8 +3605,20 @@ delete_ghost_rows_visitor::delete_ghost_rows_visitor(service::storage_proxy& pro
|
|||||||
, _view_table(_proxy.get_db().local().find_column_family(view))
|
, _view_table(_proxy.get_db().local().find_column_family(view))
|
||||||
, _base_schema(_proxy.get_db().local().find_schema(_view->view_info()->base_id()))
|
, _base_schema(_proxy.get_db().local().find_schema(_view->view_info()->base_id()))
|
||||||
, _view_pk()
|
, _view_pk()
|
||||||
|
, _concurrency_semaphore(concurrency)
|
||||||
|
, _ex(ex)
|
||||||
{}
|
{}
|
||||||
|
|
||||||
|
|
||||||
|
delete_ghost_rows_visitor::~delete_ghost_rows_visitor() noexcept {
|
||||||
|
try {
|
||||||
|
_gate.close().get();
|
||||||
|
} catch (...) {
|
||||||
|
// Closing the gate should never throw, but if it does anyway, capture the exception.
|
||||||
|
_ex = std::current_exception();
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
void delete_ghost_rows_visitor::accept_new_partition(const partition_key& key, uint32_t row_count) {
|
void delete_ghost_rows_visitor::accept_new_partition(const partition_key& key, uint32_t row_count) {
|
||||||
SCYLLA_ASSERT(thread::running_in_thread());
|
SCYLLA_ASSERT(thread::running_in_thread());
|
||||||
_view_pk = key;
|
_view_pk = key;
|
||||||
@@ -3614,7 +3626,18 @@ void delete_ghost_rows_visitor::accept_new_partition(const partition_key& key, u
|
|||||||
|
|
||||||
// Assumes running in seastar::thread
|
// Assumes running in seastar::thread
|
||||||
void delete_ghost_rows_visitor::accept_new_row(const clustering_key& ck, const query::result_row_view& static_row, const query::result_row_view& row) {
|
void delete_ghost_rows_visitor::accept_new_row(const clustering_key& ck, const query::result_row_view& static_row, const query::result_row_view& row) {
|
||||||
auto view_exploded_pk = _view_pk->explode();
|
auto units = get_units(_concurrency_semaphore, 1).get();
|
||||||
|
(void)seastar::try_with_gate(_gate, [this, pk = _view_pk.value(), units = std::move(units), ck] () mutable {
|
||||||
|
return do_accept_new_row(std::move(pk), std::move(ck)).then_wrapped([this, units = std::move(units)] (future<>&& f) mutable {
|
||||||
|
if (f.failed()) {
|
||||||
|
_ex = f.get_exception();
|
||||||
|
}
|
||||||
|
});
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> delete_ghost_rows_visitor::do_accept_new_row(partition_key pk, clustering_key ck) {
|
||||||
|
auto view_exploded_pk = pk.explode();
|
||||||
auto view_exploded_ck = ck.explode();
|
auto view_exploded_ck = ck.explode();
|
||||||
std::vector<bytes> base_exploded_pk(_base_schema->partition_key_size());
|
std::vector<bytes> base_exploded_pk(_base_schema->partition_key_size());
|
||||||
std::vector<bytes> base_exploded_ck(_base_schema->clustering_key_size());
|
std::vector<bytes> base_exploded_ck(_base_schema->clustering_key_size());
|
||||||
@@ -3649,17 +3672,17 @@ void delete_ghost_rows_visitor::accept_new_row(const clustering_key& ck, const q
|
|||||||
_proxy.get_max_result_size(partition_slice), query::tombstone_limit(_proxy.get_tombstone_limit()));
|
_proxy.get_max_result_size(partition_slice), query::tombstone_limit(_proxy.get_tombstone_limit()));
|
||||||
auto timeout = db::timeout_clock::now() + _timeout_duration;
|
auto timeout = db::timeout_clock::now() + _timeout_duration;
|
||||||
service::storage_proxy::coordinator_query_options opts{timeout, _state.get_permit(), _state.get_client_state(), _state.get_trace_state()};
|
service::storage_proxy::coordinator_query_options opts{timeout, _state.get_permit(), _state.get_client_state(), _state.get_trace_state()};
|
||||||
auto base_qr = _proxy.query(_base_schema, command, std::move(partition_ranges), db::consistency_level::ALL, opts).get();
|
auto base_qr = co_await _proxy.query(_base_schema, command, std::move(partition_ranges), db::consistency_level::ALL, opts);
|
||||||
query::result& result = *base_qr.query_result;
|
query::result& result = *base_qr.query_result;
|
||||||
auto delete_ghost_row = [&]() {
|
auto delete_ghost_row = [&]() -> future<> {
|
||||||
mutation m(_view, *_view_pk);
|
mutation m(_view, pk);
|
||||||
auto& row = m.partition().clustered_row(*_view, ck);
|
auto& row = m.partition().clustered_row(*_view, ck);
|
||||||
row.apply(tombstone(api::new_timestamp(), gc_clock::now()));
|
row.apply(tombstone(api::new_timestamp(), gc_clock::now()));
|
||||||
timeout = db::timeout_clock::now() + _timeout_duration;
|
timeout = db::timeout_clock::now() + _timeout_duration;
|
||||||
_proxy.mutate({m}, db::consistency_level::ALL, timeout, _state.get_trace_state(), empty_service_permit(), db::allow_per_partition_rate_limit::no).get();
|
return _proxy.mutate({m}, db::consistency_level::ALL, timeout, _state.get_trace_state(), empty_service_permit(), db::allow_per_partition_rate_limit::no);
|
||||||
};
|
};
|
||||||
if (result.row_count().value_or(0) == 0) {
|
if (result.row_count().value_or(0) == 0) {
|
||||||
delete_ghost_row();
|
co_await delete_ghost_row();
|
||||||
} else if (!view_key_cols_not_in_base_key.empty()) {
|
} else if (!view_key_cols_not_in_base_key.empty()) {
|
||||||
if (result.row_count().value_or(0) != 1) {
|
if (result.row_count().value_or(0) != 1) {
|
||||||
on_internal_error(vlogger, format("Got multiple base rows corresponding to a single view row when pruning {}.{}", _view->ks_name(), _view->cf_name()));
|
on_internal_error(vlogger, format("Got multiple base rows corresponding to a single view row when pruning {}.{}", _view->ks_name(), _view->cf_name()));
|
||||||
@@ -3669,7 +3692,7 @@ void delete_ghost_rows_visitor::accept_new_row(const clustering_key& ck, const q
|
|||||||
for (const auto& [col_def, col_val] : view_key_cols_not_in_base_key) {
|
for (const auto& [col_def, col_val] : view_key_cols_not_in_base_key) {
|
||||||
const data_value* base_val = base_row.get_data_value(col_def->name_as_text());
|
const data_value* base_val = base_row.get_data_value(col_def->name_as_text());
|
||||||
if (!base_val || base_val->is_null() || col_val != base_val->serialize_nonnull()) {
|
if (!base_val || base_val->is_null() || col_val != base_val->serialize_nonnull()) {
|
||||||
delete_ghost_row();
|
co_await delete_ghost_row();
|
||||||
break;
|
break;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -104,6 +104,8 @@ future<> view_building_coordinator::run() {
|
|||||||
_vb_sm.event.broadcast();
|
_vb_sm.event.broadcast();
|
||||||
});
|
});
|
||||||
|
|
||||||
|
auto finished_tasks_gc_fiber = finished_task_gc_fiber();
|
||||||
|
|
||||||
while (!_as.abort_requested()) {
|
while (!_as.abort_requested()) {
|
||||||
co_await utils::get_local_injector().inject("view_building_coordinator_pause_main_loop", utils::wait_for_message(std::chrono::minutes(2)));
|
co_await utils::get_local_injector().inject("view_building_coordinator_pause_main_loop", utils::wait_for_message(std::chrono::minutes(2)));
|
||||||
if (utils::get_local_injector().enter("view_building_coordinator_skip_main_loop")) {
|
if (utils::get_local_injector().enter("view_building_coordinator_skip_main_loop")) {
|
||||||
@@ -121,12 +123,7 @@ future<> view_building_coordinator::run() {
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
auto started_new_work = co_await work_on_view_building(std::move(*guard_opt));
|
co_await work_on_view_building(std::move(*guard_opt));
|
||||||
if (started_new_work) {
|
|
||||||
// If any tasks were started, do another iteration, so the coordinator can attach itself to the tasks (via RPC)
|
|
||||||
vbc_logger.debug("view building coordinator started new tasks, do next iteration without waiting for event");
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
co_await await_event();
|
co_await await_event();
|
||||||
} catch (...) {
|
} catch (...) {
|
||||||
handle_coordinator_error(std::current_exception());
|
handle_coordinator_error(std::current_exception());
|
||||||
@@ -142,6 +139,66 @@ future<> view_building_coordinator::run() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
|
co_await std::move(finished_tasks_gc_fiber);
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> view_building_coordinator::finished_task_gc_fiber() {
|
||||||
|
static auto task_gc_interval = 200ms;
|
||||||
|
|
||||||
|
while (!_as.abort_requested()) {
|
||||||
|
try {
|
||||||
|
co_await clean_finished_tasks();
|
||||||
|
co_await sleep_abortable(task_gc_interval, _as);
|
||||||
|
} catch (abort_requested_exception&) {
|
||||||
|
vbc_logger.debug("view_building_coordinator::finished_task_gc_fiber got abort_requested_exception");
|
||||||
|
} catch (service::group0_concurrent_modification&) {
|
||||||
|
vbc_logger.info("view_building_coordinator::finished_task_gc_fiber got group0_concurrent_modification");
|
||||||
|
} catch (raft::request_aborted&) {
|
||||||
|
vbc_logger.debug("view_building_coordinator::finished_task_gc_fiber got raft::request_aborted");
|
||||||
|
} catch (service::term_changed_error&) {
|
||||||
|
vbc_logger.debug("view_building_coordinator::finished_task_gc_fiber notices term change {} -> {}", _term, _raft.get_current_term());
|
||||||
|
} catch (raft::commit_status_unknown&) {
|
||||||
|
vbc_logger.warn("view_building_coordinator::finished_task_gc_fiber got raft::commit_status_unknown");
|
||||||
|
} catch (...) {
|
||||||
|
vbc_logger.error("view_building_coordinator::finished_task_gc_fiber got error: {}", std::current_exception());
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> view_building_coordinator::clean_finished_tasks() {
|
||||||
|
// Avoid acquiring a group0 operation if there are no tasks.
|
||||||
|
if (_finished_tasks.empty()) {
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
|
||||||
|
auto guard = co_await start_operation();
|
||||||
|
auto lock = co_await get_unique_lock(_mutex);
|
||||||
|
|
||||||
|
if (!_vb_sm.building_state.currently_processed_base_table || std::ranges::all_of(_finished_tasks, [] (auto& e) { return e.second.empty(); })) {
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
|
||||||
|
view_building_task_mutation_builder builder(guard.write_timestamp());
|
||||||
|
for (auto& [replica, tasks]: _finished_tasks) {
|
||||||
|
for (auto& task_id: tasks) {
|
||||||
|
// The task might be aborted in the meantime. In this case we cannot remove it because we need it to create a new task.
|
||||||
|
//
|
||||||
|
// TODO: When we're aborting a view building task (for instance due to tablet migration),
|
||||||
|
// we can look if we already finished it (check if it's in `_finished_tasks`).
|
||||||
|
// If yes, we can just remove it instead of aborting it.
|
||||||
|
auto task_opt = _vb_sm.building_state.get_task(*_vb_sm.building_state.currently_processed_base_table, replica, task_id);
|
||||||
|
if (task_opt && !task_opt->get().aborted) {
|
||||||
|
builder.del_task(task_id);
|
||||||
|
vbc_logger.debug("Removing finished task with ID: {}", task_id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
co_await commit_mutations(std::move(guard), {builder.build()}, "remove finished view building tasks");
|
||||||
|
for (auto& [_, tasks_set]: _finished_tasks) {
|
||||||
|
tasks_set.clear();
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
future<std::optional<service::group0_guard>> view_building_coordinator::update_state(service::group0_guard guard) {
|
future<std::optional<service::group0_guard>> view_building_coordinator::update_state(service::group0_guard guard) {
|
||||||
@@ -301,18 +358,16 @@ future<> view_building_coordinator::update_views_statuses(const service::group0_
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
future<bool> view_building_coordinator::work_on_view_building(service::group0_guard guard) {
|
future<> view_building_coordinator::work_on_view_building(service::group0_guard guard) {
|
||||||
if (!_vb_sm.building_state.currently_processed_base_table) {
|
if (!_vb_sm.building_state.currently_processed_base_table) {
|
||||||
vbc_logger.debug("No base table is selected, nothing to do.");
|
vbc_logger.debug("No base table is selected, nothing to do.");
|
||||||
co_return false;
|
co_return;
|
||||||
}
|
}
|
||||||
|
|
||||||
utils::chunked_vector<mutation> muts;
|
// Acquire unique lock of `_finished_tasks` to ensure each replica has its own entry in it
|
||||||
std::unordered_set<locator::tablet_replica> _remote_work_keys_to_erase;
|
// and to select tasks for them.
|
||||||
|
auto lock = co_await get_unique_lock(_mutex);
|
||||||
for (auto& replica: get_replicas_with_tasks()) {
|
for (auto& replica: get_replicas_with_tasks()) {
|
||||||
// Check whether the coordinator already waits for the remote work on the replica to be finished.
|
|
||||||
// If so: check if the work is done and and remove the shared_future, skip this replica otherwise.
|
|
||||||
bool skip_work_on_this_replica = false;
|
|
||||||
if (_remote_work.contains(replica)) {
|
if (_remote_work.contains(replica)) {
|
||||||
if (!_remote_work[replica].available()) {
|
if (!_remote_work[replica].available()) {
|
||||||
vbc_logger.debug("Replica {} is still doing work", replica);
|
vbc_logger.debug("Replica {} is still doing work", replica);
|
||||||
@@ -320,22 +375,8 @@ future<bool> view_building_coordinator::work_on_view_building(service::group0_gu
|
|||||||
}
|
}
|
||||||
|
|
||||||
auto remote_results_opt = co_await _remote_work[replica].get_future();
|
auto remote_results_opt = co_await _remote_work[replica].get_future();
|
||||||
if (remote_results_opt) {
|
|
||||||
auto results_muts = co_await update_state_after_work_is_done(guard, replica, std::move(*remote_results_opt));
|
|
||||||
muts.insert(muts.end(), std::make_move_iterator(results_muts.begin()), std::make_move_iterator(results_muts.end()));
|
|
||||||
// If the replica successfully finished its work, we need to commit mutations generated above before selecting next task
|
|
||||||
skip_work_on_this_replica = !results_muts.empty();
|
|
||||||
}
|
|
||||||
|
|
||||||
// If there were no mutations for this replica, we can just remove the entry from `_remote_work` map
|
|
||||||
// and start new work in the same iteration.
|
|
||||||
// Otherwise, the entry needs to be removed after the mutations are committed successfully.
|
|
||||||
if (skip_work_on_this_replica) {
|
|
||||||
_remote_work_keys_to_erase.insert(replica);
|
|
||||||
} else {
|
|
||||||
_remote_work.erase(replica);
|
_remote_work.erase(replica);
|
||||||
}
|
}
|
||||||
}
|
|
||||||
|
|
||||||
const bool ignore_gossiper = utils::get_local_injector().enter("view_building_coordinator_ignore_gossiper");
|
const bool ignore_gossiper = utils::get_local_injector().enter("view_building_coordinator_ignore_gossiper");
|
||||||
if (!_gossiper.is_alive(replica.host) && !ignore_gossiper) {
|
if (!_gossiper.is_alive(replica.host) && !ignore_gossiper) {
|
||||||
@@ -343,31 +384,16 @@ future<bool> view_building_coordinator::work_on_view_building(service::group0_gu
|
|||||||
continue;
|
continue;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (skip_work_on_this_replica) {
|
if (!_finished_tasks.contains(replica)) {
|
||||||
continue;
|
_finished_tasks.insert({replica, {}});
|
||||||
}
|
}
|
||||||
|
|
||||||
if (auto already_started_ids = _vb_sm.building_state.get_started_tasks(*_vb_sm.building_state.currently_processed_base_table, replica); !already_started_ids.empty()) {
|
if (auto todo_ids = select_tasks_for_replica(replica); !todo_ids.empty()) {
|
||||||
// If the replica has any task in `STARTED` state, attach the coordinator to the work.
|
start_remote_worker(replica, std::move(todo_ids));
|
||||||
attach_to_started_tasks(replica, std::move(already_started_ids));
|
|
||||||
} else if (auto todo_ids = select_tasks_for_replica(replica); !todo_ids.empty()) {
|
|
||||||
// If the replica has no started tasks and there are tasks to do, mark them as started.
|
|
||||||
// The coordinator will attach itself to the work in next iteration.
|
|
||||||
auto new_mutations = co_await start_tasks(guard, std::move(todo_ids));
|
|
||||||
muts.insert(muts.end(), std::make_move_iterator(new_mutations.begin()), std::make_move_iterator(new_mutations.end()));
|
|
||||||
} else {
|
} else {
|
||||||
vbc_logger.debug("Nothing to do for replica {}", replica);
|
vbc_logger.debug("Nothing to do for replica {}", replica);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
if (!muts.empty()) {
|
|
||||||
co_await commit_mutations(std::move(guard), std::move(muts), "start view building tasks");
|
|
||||||
for (auto& key: _remote_work_keys_to_erase) {
|
|
||||||
_remote_work.erase(key);
|
|
||||||
}
|
|
||||||
co_return true;
|
|
||||||
}
|
|
||||||
co_return false;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
std::set<locator::tablet_replica> view_building_coordinator::get_replicas_with_tasks() {
|
std::set<locator::tablet_replica> view_building_coordinator::get_replicas_with_tasks() {
|
||||||
@@ -390,7 +416,7 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
|
|||||||
// Select only building tasks and return theirs ids
|
// Select only building tasks and return theirs ids
|
||||||
auto filter_building_tasks = [] (const std::vector<view_building_task>& tasks) -> std::vector<utils::UUID> {
|
auto filter_building_tasks = [] (const std::vector<view_building_task>& tasks) -> std::vector<utils::UUID> {
|
||||||
return tasks | std::views::filter([] (const view_building_task& t) {
|
return tasks | std::views::filter([] (const view_building_task& t) {
|
||||||
return t.type == view_building_task::task_type::build_range && t.state == view_building_task::task_state::idle;
|
return t.type == view_building_task::task_type::build_range && !t.aborted;
|
||||||
}) | std::views::transform([] (const view_building_task& t) {
|
}) | std::views::transform([] (const view_building_task& t) {
|
||||||
return t.id;
|
return t.id;
|
||||||
}) | std::ranges::to<std::vector>();
|
}) | std::ranges::to<std::vector>();
|
||||||
@@ -404,7 +430,29 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
|
|||||||
}
|
}
|
||||||
|
|
||||||
auto& tablet_map = _db.get_token_metadata().tablets().get_tablet_map(*_vb_sm.building_state.currently_processed_base_table);
|
auto& tablet_map = _db.get_token_metadata().tablets().get_tablet_map(*_vb_sm.building_state.currently_processed_base_table);
|
||||||
for (auto& [token, tasks]: _vb_sm.building_state.collect_tasks_by_last_token(*_vb_sm.building_state.currently_processed_base_table, replica)) {
|
auto tasks_by_last_token = _vb_sm.building_state.collect_tasks_by_last_token(*_vb_sm.building_state.currently_processed_base_table, replica);
|
||||||
|
|
||||||
|
// Remove completed tasks in `_finished_tasks` from `tasks_by_last_token`
|
||||||
|
auto it = tasks_by_last_token.begin();
|
||||||
|
while (it != tasks_by_last_token.end()) {
|
||||||
|
auto task_it = it->second.begin();
|
||||||
|
while (task_it != it->second.end()) {
|
||||||
|
if (_finished_tasks.at(replica).contains(task_it->id)) {
|
||||||
|
task_it = it->second.erase(task_it);
|
||||||
|
} else {
|
||||||
|
++task_it;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Remove the entry from `tasks_by_last_token` if its vector is empty
|
||||||
|
if (it->second.empty()) {
|
||||||
|
it = tasks_by_last_token.erase(it);
|
||||||
|
} else {
|
||||||
|
++it;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
for (auto& [token, tasks]: tasks_by_last_token) {
|
||||||
auto tid = tablet_map.get_tablet_id(token);
|
auto tid = tablet_map.get_tablet_id(token);
|
||||||
if (tablet_map.get_tablet_transition_info(tid)) {
|
if (tablet_map.get_tablet_transition_info(tid)) {
|
||||||
vbc_logger.debug("Tablet {} on replica {} is in transition.", tid, replica);
|
vbc_logger.debug("Tablet {} on replica {} is in transition.", tid, replica);
|
||||||
@@ -416,7 +464,7 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
|
|||||||
return building_tasks;
|
return building_tasks;
|
||||||
} else {
|
} else {
|
||||||
return tasks | std::views::filter([] (const view_building_task& t) {
|
return tasks | std::views::filter([] (const view_building_task& t) {
|
||||||
return t.state == view_building_task::task_state::idle;
|
return !t.aborted;
|
||||||
}) | std::views::transform([] (const view_building_task& t) {
|
}) | std::views::transform([] (const view_building_task& t) {
|
||||||
return t.id;
|
return t.id;
|
||||||
}) | std::ranges::to<std::vector>();
|
}) | std::ranges::to<std::vector>();
|
||||||
@@ -426,32 +474,21 @@ std::vector<utils::UUID> view_building_coordinator::select_tasks_for_replica(loc
|
|||||||
return {};
|
return {};
|
||||||
}
|
}
|
||||||
|
|
||||||
future<utils::chunked_vector<mutation>> view_building_coordinator::start_tasks(const service::group0_guard& guard, std::vector<utils::UUID> tasks) {
|
void view_building_coordinator::start_remote_worker(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks) {
|
||||||
vbc_logger.info("Starting tasks {}", tasks);
|
|
||||||
|
|
||||||
utils::chunked_vector<mutation> muts;
|
|
||||||
for (auto& t: tasks) {
|
|
||||||
auto mut = co_await _sys_ks.make_update_view_building_task_state_mutation(guard.write_timestamp(), t, view_building_task::task_state::started);
|
|
||||||
muts.push_back(std::move(mut));
|
|
||||||
}
|
|
||||||
co_return muts;
|
|
||||||
}
|
|
||||||
|
|
||||||
void view_building_coordinator::attach_to_started_tasks(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks) {
|
|
||||||
vbc_logger.debug("Attaching to started tasks {} on replica {}", tasks, replica);
|
vbc_logger.debug("Attaching to started tasks {} on replica {}", tasks, replica);
|
||||||
shared_future<std::optional<remote_work_results>> work = work_on_tasks(replica, std::move(tasks));
|
shared_future<std::optional<std::vector<utils::UUID>>> work = work_on_tasks(replica, std::move(tasks));
|
||||||
_remote_work.insert({replica, std::move(work)});
|
_remote_work.insert({replica, std::move(work)});
|
||||||
}
|
}
|
||||||
|
|
||||||
future<std::optional<view_building_coordinator::remote_work_results>> view_building_coordinator::work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks) {
|
future<std::optional<std::vector<utils::UUID>>> view_building_coordinator::work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks) {
|
||||||
constexpr auto backoff_duration = std::chrono::seconds(1);
|
constexpr auto backoff_duration = std::chrono::seconds(1);
|
||||||
static thread_local logger::rate_limit rate_limit{backoff_duration};
|
static thread_local logger::rate_limit rate_limit{backoff_duration};
|
||||||
|
|
||||||
std::vector<view_task_result> remote_results;
|
std::vector<utils::UUID> remote_results;
|
||||||
bool rpc_failed = false;
|
bool rpc_failed = false;
|
||||||
|
|
||||||
try {
|
try {
|
||||||
remote_results = co_await ser::view_rpc_verbs::send_work_on_view_building_tasks(&_messaging, replica.host, _as, tasks);
|
remote_results = co_await ser::view_rpc_verbs::send_work_on_view_building_tasks(&_messaging, replica.host, _as, _term, replica.shard, tasks);
|
||||||
} catch (...) {
|
} catch (...) {
|
||||||
vbc_logger.log(log_level::warn, rate_limit, "Work on tasks {} on replica {}, failed with error: {}",
|
vbc_logger.log(log_level::warn, rate_limit, "Work on tasks {} on replica {}, failed with error: {}",
|
||||||
tasks, replica, std::current_exception());
|
tasks, replica, std::current_exception());
|
||||||
@@ -464,44 +501,14 @@ future<std::optional<view_building_coordinator::remote_work_results>> view_build
|
|||||||
co_return std::nullopt;
|
co_return std::nullopt;
|
||||||
}
|
}
|
||||||
|
|
||||||
if (tasks.size() != remote_results.size()) {
|
// In `view_building_coordinator::work_on_view_building()` we made sure that,
|
||||||
on_internal_error(vbc_logger, fmt::format("Number of tasks ({}) and results ({}) do not match for replica {}", tasks.size(), remote_results.size(), replica));
|
// each replica has its own entry in the `_finished_tasks`, so now we can just take a shared lock
|
||||||
}
|
// and insert its of finished tasks to this replica bucket as there is at most one instance of this method for each replica.
|
||||||
|
auto lock = co_await get_shared_lock(_mutex);
|
||||||
|
_finished_tasks.at(replica).insert_range(remote_results);
|
||||||
|
|
||||||
remote_work_results results;
|
|
||||||
for (size_t i = 0; i < tasks.size(); ++i) {
|
|
||||||
results.push_back({tasks[i], remote_results[i]});
|
|
||||||
}
|
|
||||||
_vb_sm.event.broadcast();
|
_vb_sm.event.broadcast();
|
||||||
co_return results;
|
co_return remote_results;
|
||||||
}
|
|
||||||
|
|
||||||
// Mark finished task as done (remove them from the table).
|
|
||||||
// Retry failed tasks if possible (if failed tasks wasn't aborted).
|
|
||||||
future<utils::chunked_vector<mutation>> view_building_coordinator::update_state_after_work_is_done(const service::group0_guard& guard, const locator::tablet_replica& replica, view_building_coordinator::remote_work_results results) {
|
|
||||||
vbc_logger.debug("Got results from replica {}: {}", replica, results);
|
|
||||||
|
|
||||||
utils::chunked_vector<mutation> muts;
|
|
||||||
for (auto& result: results) {
|
|
||||||
vbc_logger.info("Task {} was finished with result: {}", result.first, result.second);
|
|
||||||
|
|
||||||
if (!_vb_sm.building_state.currently_processed_base_table) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
|
|
||||||
// A task can be aborted by deleting it or by setting its state to `ABORTED`.
|
|
||||||
// If the task was aborted by changing the state,
|
|
||||||
// we shouldn't remove it here because it might be needed
|
|
||||||
// to generate updated after tablet operation (migration/resize)
|
|
||||||
// is finished.
|
|
||||||
auto task_opt = _vb_sm.building_state.get_task(*_vb_sm.building_state.currently_processed_base_table, replica, result.first);
|
|
||||||
if (task_opt && task_opt->get().state != view_building_task::task_state::aborted) {
|
|
||||||
// Otherwise, the task was completed successfully and we can remove it.
|
|
||||||
auto delete_mut = co_await _sys_ks.make_remove_view_building_task_mutation(guard.write_timestamp(), result.first);
|
|
||||||
muts.push_back(std::move(delete_mut));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
co_return muts;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
future<> view_building_coordinator::stop() {
|
future<> view_building_coordinator::stop() {
|
||||||
@@ -531,7 +538,7 @@ void view_building_coordinator::generate_tablet_migration_updates(utils::chunked
|
|||||||
auto create_task_copy_on_pending_replica = [&] (const view_building_task& task) {
|
auto create_task_copy_on_pending_replica = [&] (const view_building_task& task) {
|
||||||
auto new_id = builder.new_id();
|
auto new_id = builder.new_id();
|
||||||
builder.set_type(new_id, task.type)
|
builder.set_type(new_id, task.type)
|
||||||
.set_state(new_id, view_building_task::task_state::idle)
|
.set_aborted(new_id, false)
|
||||||
.set_base_id(new_id, task.base_id)
|
.set_base_id(new_id, task.base_id)
|
||||||
.set_last_token(new_id, task.last_token)
|
.set_last_token(new_id, task.last_token)
|
||||||
.set_replica(new_id, *trinfo.pending_replica);
|
.set_replica(new_id, *trinfo.pending_replica);
|
||||||
@@ -599,7 +606,7 @@ void view_building_coordinator::generate_tablet_resize_updates(utils::chunked_ve
|
|||||||
auto create_task_copy = [&] (const view_building_task& task, dht::token last_token) -> utils::UUID {
|
auto create_task_copy = [&] (const view_building_task& task, dht::token last_token) -> utils::UUID {
|
||||||
auto new_id = builder.new_id();
|
auto new_id = builder.new_id();
|
||||||
builder.set_type(new_id, task.type)
|
builder.set_type(new_id, task.type)
|
||||||
.set_state(new_id, view_building_task::task_state::idle)
|
.set_aborted(new_id, false)
|
||||||
.set_base_id(new_id, task.base_id)
|
.set_base_id(new_id, task.base_id)
|
||||||
.set_last_token(new_id, last_token)
|
.set_last_token(new_id, last_token)
|
||||||
.set_replica(new_id, task.replica);
|
.set_replica(new_id, task.replica);
|
||||||
@@ -668,7 +675,7 @@ void view_building_coordinator::abort_tasks(utils::chunked_vector<canonical_muta
|
|||||||
auto abort_task_map = [&] (const task_map& task_map) {
|
auto abort_task_map = [&] (const task_map& task_map) {
|
||||||
for (auto& [id, _]: task_map) {
|
for (auto& [id, _]: task_map) {
|
||||||
vbc_logger.debug("Aborting task {}", id);
|
vbc_logger.debug("Aborting task {}", id);
|
||||||
builder.set_state(id, view_building_task::task_state::aborted);
|
builder.set_aborted(id, true);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -698,7 +705,7 @@ void abort_view_building_tasks(const view_building_state_machine& vb_sm,
|
|||||||
for (auto& [id, task]: task_map) {
|
for (auto& [id, task]: task_map) {
|
||||||
if (task.last_token == last_token) {
|
if (task.last_token == last_token) {
|
||||||
vbc_logger.debug("Aborting task {}", id);
|
vbc_logger.debug("Aborting task {}", id);
|
||||||
builder.set_state(id, view_building_task::task_state::aborted);
|
builder.set_aborted(id, true);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
@@ -714,10 +721,10 @@ void abort_view_building_tasks(const view_building_state_machine& vb_sm,
|
|||||||
|
|
||||||
static void rollback_task_map(view_building_task_mutation_builder& builder, const task_map& task_map) {
|
static void rollback_task_map(view_building_task_mutation_builder& builder, const task_map& task_map) {
|
||||||
for (auto& [id, task]: task_map) {
|
for (auto& [id, task]: task_map) {
|
||||||
if (task.state == view_building_task::task_state::aborted) {
|
if (task.aborted) {
|
||||||
auto new_id = builder.new_id();
|
auto new_id = builder.new_id();
|
||||||
builder.set_type(new_id, task.type)
|
builder.set_type(new_id, task.type)
|
||||||
.set_state(new_id, view_building_task::task_state::idle)
|
.set_aborted(new_id, false)
|
||||||
.set_base_id(new_id, task.base_id)
|
.set_base_id(new_id, task.base_id)
|
||||||
.set_last_token(new_id, task.last_token)
|
.set_last_token(new_id, task.last_token)
|
||||||
.set_replica(new_id, task.replica);
|
.set_replica(new_id, task.replica);
|
||||||
|
|||||||
@@ -54,9 +54,9 @@ class view_building_coordinator : public service::endpoint_lifecycle_subscriber
|
|||||||
const raft::term_t _term;
|
const raft::term_t _term;
|
||||||
abort_source& _as;
|
abort_source& _as;
|
||||||
|
|
||||||
|
std::unordered_map<locator::tablet_replica, shared_future<std::optional<std::vector<utils::UUID>>>> _remote_work;
|
||||||
using remote_work_results = std::vector<std::pair<utils::UUID, db::view::view_task_result>>;
|
shared_mutex _mutex; // guards `_finished_tasks` field
|
||||||
std::unordered_map<locator::tablet_replica, shared_future<std::optional<remote_work_results>>> _remote_work;
|
std::unordered_map<locator::tablet_replica, std::unordered_set<utils::UUID>> _finished_tasks;
|
||||||
|
|
||||||
public:
|
public:
|
||||||
view_building_coordinator(replica::database& db, raft::server& raft, service::raft_group0& group0,
|
view_building_coordinator(replica::database& db, raft::server& raft, service::raft_group0& group0,
|
||||||
@@ -86,9 +86,11 @@ private:
|
|||||||
future<> commit_mutations(service::group0_guard guard, utils::chunked_vector<mutation> mutations, std::string_view description);
|
future<> commit_mutations(service::group0_guard guard, utils::chunked_vector<mutation> mutations, std::string_view description);
|
||||||
void handle_coordinator_error(std::exception_ptr eptr);
|
void handle_coordinator_error(std::exception_ptr eptr);
|
||||||
|
|
||||||
|
future<> finished_task_gc_fiber();
|
||||||
|
future<> clean_finished_tasks();
|
||||||
|
|
||||||
future<std::optional<service::group0_guard>> update_state(service::group0_guard guard);
|
future<std::optional<service::group0_guard>> update_state(service::group0_guard guard);
|
||||||
// Returns if any new tasks were started
|
future<> work_on_view_building(service::group0_guard guard);
|
||||||
future<bool> work_on_view_building(service::group0_guard guard);
|
|
||||||
|
|
||||||
future<> mark_view_build_status_started(const service::group0_guard& guard, table_id view_id, utils::chunked_vector<mutation>& out);
|
future<> mark_view_build_status_started(const service::group0_guard& guard, table_id view_id, utils::chunked_vector<mutation>& out);
|
||||||
future<> mark_all_remaining_view_build_statuses_started(const service::group0_guard& guard, table_id base_id, utils::chunked_vector<mutation>& out);
|
future<> mark_all_remaining_view_build_statuses_started(const service::group0_guard& guard, table_id base_id, utils::chunked_vector<mutation>& out);
|
||||||
@@ -97,10 +99,8 @@ private:
|
|||||||
std::set<locator::tablet_replica> get_replicas_with_tasks();
|
std::set<locator::tablet_replica> get_replicas_with_tasks();
|
||||||
std::vector<utils::UUID> select_tasks_for_replica(locator::tablet_replica replica);
|
std::vector<utils::UUID> select_tasks_for_replica(locator::tablet_replica replica);
|
||||||
|
|
||||||
future<utils::chunked_vector<mutation>> start_tasks(const service::group0_guard& guard, std::vector<utils::UUID> tasks);
|
void start_remote_worker(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks);
|
||||||
void attach_to_started_tasks(const locator::tablet_replica& replica, std::vector<utils::UUID> tasks);
|
future<std::optional<std::vector<utils::UUID>>> work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks);
|
||||||
future<std::optional<remote_work_results>> work_on_tasks(locator::tablet_replica replica, std::vector<utils::UUID> tasks);
|
|
||||||
future<utils::chunked_vector<mutation>> update_state_after_work_is_done(const service::group0_guard& guard, const locator::tablet_replica& replica, remote_work_results results);
|
|
||||||
};
|
};
|
||||||
|
|
||||||
void abort_view_building_tasks(const db::view::view_building_state_machine& vb_sm,
|
void abort_view_building_tasks(const db::view::view_building_state_machine& vb_sm,
|
||||||
|
|||||||
@@ -13,10 +13,10 @@ namespace db {
|
|||||||
|
|
||||||
namespace view {
|
namespace view {
|
||||||
|
|
||||||
view_building_task::view_building_task(utils::UUID id, task_type type, task_state state, table_id base_id, std::optional<table_id> view_id, locator::tablet_replica replica, dht::token last_token)
|
view_building_task::view_building_task(utils::UUID id, task_type type, bool aborted, table_id base_id, std::optional<table_id> view_id, locator::tablet_replica replica, dht::token last_token)
|
||||||
: id(id)
|
: id(id)
|
||||||
, type(type)
|
, type(type)
|
||||||
, state(state)
|
, aborted(aborted)
|
||||||
, base_id(base_id)
|
, base_id(base_id)
|
||||||
, view_id(view_id)
|
, view_id(view_id)
|
||||||
, replica(replica)
|
, replica(replica)
|
||||||
@@ -49,30 +49,6 @@ seastar::sstring task_type_to_sstring(view_building_task::task_type type) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
view_building_task::task_state task_state_from_string(std::string_view str) {
|
|
||||||
if (str == "IDLE") {
|
|
||||||
return view_building_task::task_state::idle;
|
|
||||||
}
|
|
||||||
if (str == "STARTED") {
|
|
||||||
return view_building_task::task_state::started;
|
|
||||||
}
|
|
||||||
if (str == "ABORTED") {
|
|
||||||
return view_building_task::task_state::aborted;
|
|
||||||
}
|
|
||||||
throw std::runtime_error(fmt::format("Unknown view building task state: {}", str));
|
|
||||||
}
|
|
||||||
|
|
||||||
seastar::sstring task_state_to_sstring(view_building_task::task_state state) {
|
|
||||||
switch (state) {
|
|
||||||
case view_building_task::task_state::idle:
|
|
||||||
return "IDLE";
|
|
||||||
case view_building_task::task_state::started:
|
|
||||||
return "STARTED";
|
|
||||||
case view_building_task::task_state::aborted:
|
|
||||||
return "ABORTED";
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
std::optional<std::reference_wrapper<const view_building_task>> view_building_state::get_task(table_id base_id, locator::tablet_replica replica, utils::UUID id) const {
|
std::optional<std::reference_wrapper<const view_building_task>> view_building_state::get_task(table_id base_id, locator::tablet_replica replica, utils::UUID id) const {
|
||||||
if (!tasks_state.contains(base_id) || !tasks_state.at(base_id).contains(replica)) {
|
if (!tasks_state.contains(base_id) || !tasks_state.at(base_id).contains(replica)) {
|
||||||
return {};
|
return {};
|
||||||
@@ -151,46 +127,6 @@ std::map<dht::token, std::vector<view_building_task>> view_building_state::colle
|
|||||||
return tasks;
|
return tasks;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Returns all tasks for `_vb_sm.building_state.currently_processed_base_table` and `replica` with `STARTED` state.
|
|
||||||
std::vector<utils::UUID> view_building_state::get_started_tasks(table_id base_table_id, locator::tablet_replica replica) const {
|
|
||||||
if (!tasks_state.contains(base_table_id) || !tasks_state.at(base_table_id).contains(replica)) {
|
|
||||||
// No tasks for this replica
|
|
||||||
return {};
|
|
||||||
}
|
|
||||||
|
|
||||||
std::vector<view_building_task> tasks;
|
|
||||||
auto& replica_tasks = tasks_state.at(base_table_id).at(replica);
|
|
||||||
for (auto& [_, view_tasks]: replica_tasks.view_tasks) {
|
|
||||||
for (auto& [_, task]: view_tasks) {
|
|
||||||
if (task.state == view_building_task::task_state::started) {
|
|
||||||
tasks.push_back(task);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
for (auto& [_, task]: replica_tasks.staging_tasks) {
|
|
||||||
if (task.state == view_building_task::task_state::started) {
|
|
||||||
tasks.push_back(task);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// All collected tasks should have the same: type, base_id and last_token,
|
|
||||||
// so they can be executed in the same view_building_worker::batch.
|
|
||||||
#ifdef SEASTAR_DEBUG
|
|
||||||
if (!tasks.empty()) {
|
|
||||||
auto& task = tasks.front();
|
|
||||||
for (auto& t: tasks) {
|
|
||||||
SCYLLA_ASSERT(task.type == t.type);
|
|
||||||
SCYLLA_ASSERT(task.base_id == t.base_id);
|
|
||||||
SCYLLA_ASSERT(task.last_token == t.last_token);
|
|
||||||
}
|
|
||||||
}
|
|
||||||
#endif
|
|
||||||
|
|
||||||
return tasks | std::views::transform([] (const view_building_task& t) {
|
|
||||||
return t.id;
|
|
||||||
}) | std::ranges::to<std::vector>();
|
|
||||||
}
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -39,28 +39,17 @@ struct view_building_task {
|
|||||||
process_staging,
|
process_staging,
|
||||||
};
|
};
|
||||||
|
|
||||||
// When a task is created, it starts with `IDLE` state.
|
|
||||||
// Then, the view building coordinator will decide to do the task and it will
|
|
||||||
// set the state to `STARTED`.
|
|
||||||
// When a task is finished the entry is removed.
|
|
||||||
//
|
|
||||||
// If a task is in progress when a tablet operation (migration/resize) starts,
|
|
||||||
// the task's state is set to `ABORTED`.
|
|
||||||
enum class task_state {
|
|
||||||
idle,
|
|
||||||
started,
|
|
||||||
aborted,
|
|
||||||
};
|
|
||||||
utils::UUID id;
|
utils::UUID id;
|
||||||
task_type type;
|
task_type type;
|
||||||
task_state state;
|
bool aborted;
|
||||||
|
|
||||||
table_id base_id;
|
table_id base_id;
|
||||||
std::optional<table_id> view_id; // nullopt when task_type is `process_staging`
|
std::optional<table_id> view_id; // nullopt when task_type is `process_staging`
|
||||||
locator::tablet_replica replica;
|
locator::tablet_replica replica;
|
||||||
dht::token last_token;
|
dht::token last_token;
|
||||||
|
|
||||||
view_building_task(utils::UUID id, task_type type, task_state state,
|
view_building_task(utils::UUID id, task_type type, bool aborted,
|
||||||
table_id base_id, std::optional<table_id> view_id,
|
table_id base_id, std::optional<table_id> view_id,
|
||||||
locator::tablet_replica replica, dht::token last_token);
|
locator::tablet_replica replica, dht::token last_token);
|
||||||
};
|
};
|
||||||
@@ -92,7 +81,6 @@ struct view_building_state {
|
|||||||
std::vector<std::reference_wrapper<const view_building_task>> get_tasks_for_host(table_id base_id, locator::host_id host) const;
|
std::vector<std::reference_wrapper<const view_building_task>> get_tasks_for_host(table_id base_id, locator::host_id host) const;
|
||||||
std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id) const;
|
std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id) const;
|
||||||
std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id, const locator::tablet_replica& replica) const;
|
std::map<dht::token, std::vector<view_building_task>> collect_tasks_by_last_token(table_id base_table_id, const locator::tablet_replica& replica) const;
|
||||||
std::vector<utils::UUID> get_started_tasks(table_id base_table_id, locator::tablet_replica replica) const;
|
|
||||||
};
|
};
|
||||||
|
|
||||||
// Represents global state of tablet-based views.
|
// Represents global state of tablet-based views.
|
||||||
@@ -113,18 +101,8 @@ struct view_building_state_machine {
|
|||||||
condition_variable event;
|
condition_variable event;
|
||||||
};
|
};
|
||||||
|
|
||||||
struct view_task_result {
|
|
||||||
enum class command_status: uint8_t {
|
|
||||||
success = 0,
|
|
||||||
abort = 1,
|
|
||||||
};
|
|
||||||
db::view::view_task_result::command_status status;
|
|
||||||
};
|
|
||||||
|
|
||||||
view_building_task::task_type task_type_from_string(std::string_view str);
|
view_building_task::task_type task_type_from_string(std::string_view str);
|
||||||
seastar::sstring task_type_to_sstring(view_building_task::task_type type);
|
seastar::sstring task_type_to_sstring(view_building_task::task_type type);
|
||||||
view_building_task::task_state task_state_from_string(std::string_view str);
|
|
||||||
seastar::sstring task_state_to_sstring(view_building_task::task_state state);
|
|
||||||
|
|
||||||
} // namespace view_building
|
} // namespace view_building
|
||||||
|
|
||||||
@@ -136,17 +114,11 @@ template <> struct fmt::formatter<db::view::view_building_task::task_type> : fmt
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
template <> struct fmt::formatter<db::view::view_building_task::task_state> : fmt::formatter<string_view> {
|
|
||||||
auto format(db::view::view_building_task::task_state state, fmt::format_context& ctx) const {
|
|
||||||
return fmt::format_to(ctx.out(), "{}", db::view::task_state_to_sstring(state));
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|
||||||
template <> struct fmt::formatter<db::view::view_building_task> : fmt::formatter<string_view> {
|
template <> struct fmt::formatter<db::view::view_building_task> : fmt::formatter<string_view> {
|
||||||
auto format(db::view::view_building_task task, fmt::format_context& ctx) const {
|
auto format(db::view::view_building_task task, fmt::format_context& ctx) const {
|
||||||
auto view_id = task.view_id ? fmt::to_string(*task.view_id) : "nullopt";
|
auto view_id = task.view_id ? fmt::to_string(*task.view_id) : "nullopt";
|
||||||
return fmt::format_to(ctx.out(), "view_building_task{{type: {}, state: {}, base_id: {}, view_id: {}, last_token: {}}}",
|
return fmt::format_to(ctx.out(), "view_building_task{{type: {}, aborted: {}, base_id: {}, view_id: {}, last_token: {}}}",
|
||||||
task.type, task.state, task.base_id, view_id, task.last_token);
|
task.type, task.aborted, task.base_id, view_id, task.last_token);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
@@ -161,18 +133,3 @@ template <> struct fmt::formatter<db::view::replica_tasks> : fmt::formatter<stri
|
|||||||
return fmt::format_to(ctx.out(), "{{view_tasks: {}, staging_tasks: {}}}", replica_tasks.view_tasks, replica_tasks.staging_tasks);
|
return fmt::format_to(ctx.out(), "{{view_tasks: {}, staging_tasks: {}}}", replica_tasks.view_tasks, replica_tasks.staging_tasks);
|
||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
template <> struct fmt::formatter<db::view::view_task_result> : fmt::formatter<string_view> {
|
|
||||||
auto format(db::view::view_task_result result, fmt::format_context& ctx) const {
|
|
||||||
std::string_view res;
|
|
||||||
switch (result.status) {
|
|
||||||
case db::view::view_task_result::command_status::success:
|
|
||||||
res = "success";
|
|
||||||
break;
|
|
||||||
case db::view::view_task_result::command_status::abort:
|
|
||||||
res = "abort";
|
|
||||||
break;
|
|
||||||
}
|
|
||||||
return format_to(ctx.out(), "{}", res);
|
|
||||||
}
|
|
||||||
};
|
|
||||||
|
|||||||
@@ -25,8 +25,8 @@ view_building_task_mutation_builder& view_building_task_mutation_builder::set_ty
|
|||||||
_m.set_clustered_cell(get_ck(id), "type", data_value(task_type_to_sstring(type)), _ts);
|
_m.set_clustered_cell(get_ck(id), "type", data_value(task_type_to_sstring(type)), _ts);
|
||||||
return *this;
|
return *this;
|
||||||
}
|
}
|
||||||
view_building_task_mutation_builder& view_building_task_mutation_builder::set_state(utils::UUID id, db::view::view_building_task::task_state state) {
|
view_building_task_mutation_builder& view_building_task_mutation_builder::set_aborted(utils::UUID id, bool aborted) {
|
||||||
_m.set_clustered_cell(get_ck(id), "state", data_value(task_state_to_sstring(state)), _ts);
|
_m.set_clustered_cell(get_ck(id), "aborted", data_value(aborted), _ts);
|
||||||
return *this;
|
return *this;
|
||||||
}
|
}
|
||||||
view_building_task_mutation_builder& view_building_task_mutation_builder::set_base_id(utils::UUID id, table_id base_id) {
|
view_building_task_mutation_builder& view_building_task_mutation_builder::set_base_id(utils::UUID id, table_id base_id) {
|
||||||
|
|||||||
@@ -32,7 +32,7 @@ public:
|
|||||||
static utils::UUID new_id();
|
static utils::UUID new_id();
|
||||||
|
|
||||||
view_building_task_mutation_builder& set_type(utils::UUID id, db::view::view_building_task::task_type type);
|
view_building_task_mutation_builder& set_type(utils::UUID id, db::view::view_building_task::task_type type);
|
||||||
view_building_task_mutation_builder& set_state(utils::UUID id, db::view::view_building_task::task_state state);
|
view_building_task_mutation_builder& set_aborted(utils::UUID id, bool aborted);
|
||||||
view_building_task_mutation_builder& set_base_id(utils::UUID id, table_id base_id);
|
view_building_task_mutation_builder& set_base_id(utils::UUID id, table_id base_id);
|
||||||
view_building_task_mutation_builder& set_view_id(utils::UUID id, table_id view_id);
|
view_building_task_mutation_builder& set_view_id(utils::UUID id, table_id view_id);
|
||||||
view_building_task_mutation_builder& set_last_token(utils::UUID id, dht::token last_token);
|
view_building_task_mutation_builder& set_last_token(utils::UUID id, dht::token last_token);
|
||||||
|
|||||||
@@ -22,6 +22,7 @@
|
|||||||
#include "replica/database.hh"
|
#include "replica/database.hh"
|
||||||
#include "service/storage_proxy.hh"
|
#include "service/storage_proxy.hh"
|
||||||
#include "service/raft/raft_group0_client.hh"
|
#include "service/raft/raft_group0_client.hh"
|
||||||
|
#include "service/raft/raft_group0.hh"
|
||||||
#include "schema/schema_fwd.hh"
|
#include "schema/schema_fwd.hh"
|
||||||
#include "idl/view.dist.hh"
|
#include "idl/view.dist.hh"
|
||||||
#include "sstables/sstables.hh"
|
#include "sstables/sstables.hh"
|
||||||
@@ -114,11 +115,11 @@ static locator::tablet_id get_sstable_tablet_id(const locator::tablet_map& table
|
|||||||
return tablet_id;
|
return tablet_id;
|
||||||
}
|
}
|
||||||
|
|
||||||
view_building_worker::view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier, service::raft_group0_client& group0_client, view_update_generator& vug, netw::messaging_service& ms, view_building_state_machine& vbsm)
|
view_building_worker::view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier, service::raft_group0& group0, view_update_generator& vug, netw::messaging_service& ms, view_building_state_machine& vbsm)
|
||||||
: _db(db)
|
: _db(db)
|
||||||
, _sys_ks(sys_ks)
|
, _sys_ks(sys_ks)
|
||||||
, _mnotifier(mnotifier)
|
, _mnotifier(mnotifier)
|
||||||
, _group0_client(group0_client)
|
, _group0(group0)
|
||||||
, _vug(vug)
|
, _vug(vug)
|
||||||
, _messaging(ms)
|
, _messaging(ms)
|
||||||
, _vb_state_machine(vbsm)
|
, _vb_state_machine(vbsm)
|
||||||
@@ -145,6 +146,7 @@ future<> view_building_worker::drain() {
|
|||||||
if (!_as.abort_requested()) {
|
if (!_as.abort_requested()) {
|
||||||
_as.request_abort();
|
_as.request_abort();
|
||||||
}
|
}
|
||||||
|
_state._mutex.broken();
|
||||||
_staging_sstables_mutex.broken();
|
_staging_sstables_mutex.broken();
|
||||||
_sstables_to_register_event.broken();
|
_sstables_to_register_event.broken();
|
||||||
if (this_shard_id() == 0) {
|
if (this_shard_id() == 0) {
|
||||||
@@ -154,8 +156,7 @@ future<> view_building_worker::drain() {
|
|||||||
co_await std::move(state_observer);
|
co_await std::move(state_observer);
|
||||||
co_await _mnotifier.unregister_listener(this);
|
co_await _mnotifier.unregister_listener(this);
|
||||||
}
|
}
|
||||||
co_await _state.clear_state();
|
co_await _state.clear();
|
||||||
_state.state_updated_cv.broken();
|
|
||||||
co_await uninit_messaging_service();
|
co_await uninit_messaging_service();
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -224,22 +225,22 @@ future<> view_building_worker::create_staging_sstable_tasks() {
|
|||||||
|
|
||||||
utils::chunked_vector<canonical_mutation> cmuts;
|
utils::chunked_vector<canonical_mutation> cmuts;
|
||||||
|
|
||||||
auto guard = co_await _group0_client.start_operation(_as);
|
auto guard = co_await _group0.client().start_operation(_as);
|
||||||
auto my_host_id = _db.get_token_metadata().get_topology().my_host_id();
|
auto my_host_id = _db.get_token_metadata().get_topology().my_host_id();
|
||||||
for (auto& [table_id, sst_infos]: _sstables_to_register) {
|
for (auto& [table_id, sst_infos]: _sstables_to_register) {
|
||||||
for (auto& sst_info: sst_infos) {
|
for (auto& sst_info: sst_infos) {
|
||||||
view_building_task task {
|
view_building_task task {
|
||||||
utils::UUID_gen::get_time_UUID(), view_building_task::task_type::process_staging, view_building_task::task_state::idle,
|
utils::UUID_gen::get_time_UUID(), view_building_task::task_type::process_staging, false,
|
||||||
table_id, ::table_id{}, {my_host_id, sst_info.shard}, sst_info.last_token
|
table_id, ::table_id{}, {my_host_id, sst_info.shard}, sst_info.last_token
|
||||||
};
|
};
|
||||||
auto mut = co_await _group0_client.sys_ks().make_view_building_task_mutation(guard.write_timestamp(), task);
|
auto mut = co_await _group0.client().sys_ks().make_view_building_task_mutation(guard.write_timestamp(), task);
|
||||||
cmuts.emplace_back(std::move(mut));
|
cmuts.emplace_back(std::move(mut));
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
vbw_logger.debug("Creating {} process_staging view_building_tasks", cmuts.size());
|
vbw_logger.debug("Creating {} process_staging view_building_tasks", cmuts.size());
|
||||||
auto cmd = _group0_client.prepare_command(service::write_mutations{std::move(cmuts)}, guard, "create view building tasks");
|
auto cmd = _group0.client().prepare_command(service::write_mutations{std::move(cmuts)}, guard, "create view building tasks");
|
||||||
co_await _group0_client.add_entry(std::move(cmd), std::move(guard), _as);
|
co_await _group0.client().add_entry(std::move(cmd), std::move(guard), _as);
|
||||||
|
|
||||||
// Move staging sstables from `_sstables_to_register` (on shard0) to `_staging_sstables` on corresponding shards.
|
// Move staging sstables from `_sstables_to_register` (on shard0) to `_staging_sstables` on corresponding shards.
|
||||||
// Firstly reorgenize `_sstables_to_register` for easier movement.
|
// Firstly reorgenize `_sstables_to_register` for easier movement.
|
||||||
@@ -340,22 +341,16 @@ future<> view_building_worker::run_view_building_state_observer() {
|
|||||||
|
|
||||||
while (!_as.abort_requested()) {
|
while (!_as.abort_requested()) {
|
||||||
bool sleep = false;
|
bool sleep = false;
|
||||||
_state.some_batch_finished = false;
|
|
||||||
try {
|
try {
|
||||||
vbw_logger.trace("view_building_state_observer() iteration");
|
vbw_logger.trace("view_building_state_observer() iteration");
|
||||||
auto read_apply_mutex_holder = co_await _group0_client.hold_read_apply_mutex(_as);
|
auto read_apply_mutex_holder = co_await _group0.client().hold_read_apply_mutex(_as);
|
||||||
|
|
||||||
co_await update_built_views();
|
co_await update_built_views();
|
||||||
co_await update_building_state();
|
co_await check_for_aborted_tasks();
|
||||||
_as.check();
|
_as.check();
|
||||||
|
|
||||||
read_apply_mutex_holder.return_all();
|
read_apply_mutex_holder.return_all();
|
||||||
|
|
||||||
// A batch could finished its work while the worker was
|
|
||||||
// updating the state. In that case we should do another iteration.
|
|
||||||
if (!_state.some_batch_finished) {
|
|
||||||
co_await _vb_state_machine.event.wait();
|
co_await _vb_state_machine.event.wait();
|
||||||
}
|
|
||||||
} catch (abort_requested_exception&) {
|
} catch (abort_requested_exception&) {
|
||||||
} catch (broken_condition_variable&) {
|
} catch (broken_condition_variable&) {
|
||||||
} catch (...) {
|
} catch (...) {
|
||||||
@@ -382,7 +377,7 @@ future<> view_building_worker::update_built_views() {
|
|||||||
auto schema = _db.find_schema(table_id);
|
auto schema = _db.find_schema(table_id);
|
||||||
return std::make_pair(schema->ks_name(), schema->cf_name());
|
return std::make_pair(schema->ks_name(), schema->cf_name());
|
||||||
};
|
};
|
||||||
auto& sys_ks = _group0_client.sys_ks();
|
auto& sys_ks = _group0.client().sys_ks();
|
||||||
|
|
||||||
std::set<std::pair<sstring, sstring>> built_views;
|
std::set<std::pair<sstring, sstring>> built_views;
|
||||||
for (auto& [id, statuses]: _vb_state_machine.views_state.status_map) {
|
for (auto& [id, statuses]: _vb_state_machine.views_state.status_map) {
|
||||||
@@ -411,22 +406,35 @@ future<> view_building_worker::update_built_views() {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
future<> view_building_worker::update_building_state() {
|
// Must be executed on shard0
|
||||||
co_await _state.update(*this);
|
future<> view_building_worker::check_for_aborted_tasks() {
|
||||||
co_await _state.finish_completed_tasks();
|
return container().invoke_on_all([building_state = _vb_state_machine.building_state] (view_building_worker& vbw) -> future<> {
|
||||||
_state.state_updated_cv.broadcast();
|
auto lock = co_await get_units(vbw._state._mutex, 1, vbw._as);
|
||||||
}
|
co_await vbw._state.update_processing_base_table(vbw._db, building_state, vbw._as);
|
||||||
|
if (!vbw._state._batch) {
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
|
||||||
bool view_building_worker::is_shard_free(shard_id shard) {
|
auto my_host_id = vbw._db.get_token_metadata().get_topology().my_host_id();
|
||||||
return !std::ranges::any_of(_state.tasks_map, [&shard] (auto& task_entry) {
|
auto my_replica = locator::tablet_replica{my_host_id, this_shard_id()};
|
||||||
return task_entry.second->replica.shard == shard && task_entry.second->state == view_building_worker::batch_state::in_progress;
|
auto tasks_map = vbw._state._batch->tasks; // Potentially, we'll remove elements from the map, so we need a copy to iterate over it
|
||||||
|
for (auto& [id, t]: tasks_map) {
|
||||||
|
auto task_opt = building_state.get_task(t.base_id, my_replica, id);
|
||||||
|
if (!task_opt || task_opt->get().aborted) {
|
||||||
|
co_await vbw._state._batch->abort_task(id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
if (vbw._state._batch->tasks.empty()) {
|
||||||
|
co_await vbw._state.clean_up_after_batch();
|
||||||
|
}
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
void view_building_worker::init_messaging_service() {
|
void view_building_worker::init_messaging_service() {
|
||||||
ser::view_rpc_verbs::register_work_on_view_building_tasks(&_messaging, [this] (std::vector<utils::UUID> ids) -> future<std::vector<view_task_result>> {
|
ser::view_rpc_verbs::register_work_on_view_building_tasks(&_messaging, [this] (raft::term_t term, shard_id shard, std::vector<utils::UUID> ids) -> future<std::vector<utils::UUID>> {
|
||||||
return container().invoke_on(0, [ids = std::move(ids)] (view_building_worker& vbw) mutable -> future<std::vector<view_task_result>> {
|
return container().invoke_on(shard, [term, ids = std::move(ids)] (auto& vbw) mutable -> future<std::vector<utils::UUID>> {
|
||||||
return vbw.work_on_tasks(std::move(ids));
|
return vbw.work_on_tasks(term, std::move(ids));
|
||||||
});
|
});
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
@@ -435,236 +443,53 @@ future<> view_building_worker::uninit_messaging_service() {
|
|||||||
return ser::view_rpc_verbs::unregister(&_messaging);
|
return ser::view_rpc_verbs::unregister(&_messaging);
|
||||||
}
|
}
|
||||||
|
|
||||||
future<std::vector<view_task_result>> view_building_worker::work_on_tasks(std::vector<utils::UUID> ids) {
|
|
||||||
vbw_logger.debug("Got request for results of tasks: {}", ids);
|
|
||||||
auto guard = co_await _group0_client.start_operation(_as, service::raft_timeout{});
|
|
||||||
auto processing_base_table = _state.processing_base_table;
|
|
||||||
|
|
||||||
auto are_tasks_finished = [&] () {
|
|
||||||
return std::ranges::all_of(ids, [this] (const utils::UUID& id) {
|
|
||||||
return _state.finished_tasks.contains(id) || _state.aborted_tasks.contains(id);
|
|
||||||
});
|
|
||||||
};
|
|
||||||
|
|
||||||
auto get_results = [&] () -> std::vector<view_task_result> {
|
|
||||||
std::vector<view_task_result> results;
|
|
||||||
for (const auto& id: ids) {
|
|
||||||
if (_state.finished_tasks.contains(id)) {
|
|
||||||
results.emplace_back(view_task_result::command_status::success);
|
|
||||||
} else if (_state.aborted_tasks.contains(id)) {
|
|
||||||
results.emplace_back(view_task_result::command_status::abort);
|
|
||||||
} else {
|
|
||||||
// This means that the task was aborted. Throw an error,
|
|
||||||
// so the coordinator will refresh its state and retry without aborted IDs.
|
|
||||||
throw std::runtime_error(fmt::format("No status for task {}", id));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
return results;
|
|
||||||
};
|
|
||||||
|
|
||||||
if (are_tasks_finished()) {
|
|
||||||
// If the batch is already finished, we can return the results immediately.
|
|
||||||
vbw_logger.debug("Batch with tasks {} is already finished, returning results", ids);
|
|
||||||
co_return get_results();
|
|
||||||
}
|
|
||||||
|
|
||||||
// All of the tasks should be executed in the same batch
|
|
||||||
// (their statuses are set to started in the same group0 operation).
|
|
||||||
// If any ID is not present in the `tasks_map`, it means that it was aborted and we should fail this RPC call,
|
|
||||||
// so the coordinator can retry without aborted IDs.
|
|
||||||
// That's why we can identify the batch by random (.front()) ID from the `ids` vector.
|
|
||||||
auto id = ids.front();
|
|
||||||
while (!_state.tasks_map.contains(id) && processing_base_table == _state.processing_base_table) {
|
|
||||||
vbw_logger.warn("Batch with task {} is not found in tasks map, waiting until worker updates its state", id);
|
|
||||||
service::release_guard(std::move(guard));
|
|
||||||
co_await _state.state_updated_cv.wait();
|
|
||||||
guard = co_await _group0_client.start_operation(_as, service::raft_timeout{});
|
|
||||||
}
|
|
||||||
|
|
||||||
if (processing_base_table != _state.processing_base_table) {
|
|
||||||
// If the processing base table was changed, we should fail this RPC call because the tasks were aborted.
|
|
||||||
throw std::runtime_error(fmt::format("Processing base table was changed to {} ", _state.processing_base_table));
|
|
||||||
}
|
|
||||||
|
|
||||||
// Validate that any of the IDs wasn't aborted.
|
|
||||||
for (const auto& tid: ids) {
|
|
||||||
if (!_state.tasks_map[id]->tasks.contains(tid)) {
|
|
||||||
vbw_logger.warn("Task {} is not found in the batch", tid);
|
|
||||||
throw std::runtime_error(fmt::format("Task {} is not found in the batch", tid));
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
if (_state.tasks_map[id]->state == view_building_worker::batch_state::idle) {
|
|
||||||
vbw_logger.debug("Starting batch with tasks {}", _state.tasks_map[id]->tasks);
|
|
||||||
if (!is_shard_free(_state.tasks_map[id]->replica.shard)) {
|
|
||||||
throw std::runtime_error(fmt::format("Tried to start view building tasks ({}) on shard {} but the shard is busy", _state.tasks_map[id]->tasks, _state.tasks_map[id]->replica.shard, _state.tasks_map[id]->tasks));
|
|
||||||
}
|
|
||||||
_state.tasks_map[id]->start();
|
|
||||||
}
|
|
||||||
|
|
||||||
service::release_guard(std::move(guard));
|
|
||||||
while (!_as.abort_requested()) {
|
|
||||||
auto read_apply_mutex_holder = co_await _group0_client.hold_read_apply_mutex(_as);
|
|
||||||
|
|
||||||
if (are_tasks_finished()) {
|
|
||||||
co_return get_results();
|
|
||||||
}
|
|
||||||
|
|
||||||
// Check if the batch is still alive
|
|
||||||
if (!_state.tasks_map.contains(id)) {
|
|
||||||
throw std::runtime_error(fmt::format("Batch with task {} is not found in tasks map anymore.", id));
|
|
||||||
}
|
|
||||||
|
|
||||||
read_apply_mutex_holder.return_all();
|
|
||||||
co_await _state.tasks_map[id]->batch_done_cv.wait();
|
|
||||||
}
|
|
||||||
throw std::runtime_error("View building worker was aborted");
|
|
||||||
}
|
|
||||||
|
|
||||||
// Validates if the task can be executed in a batch on the same shard.
|
|
||||||
static bool validate_can_be_one_batch(const view_building_task& t1, const view_building_task& t2) {
|
|
||||||
return t1.type == t2.type && t1.base_id == t2.base_id && t1.replica == t2.replica && t1.last_token == t2.last_token;
|
|
||||||
}
|
|
||||||
|
|
||||||
static std::unordered_set<table_id> get_ids_of_all_views(replica::database& db, table_id table_id) {
|
static std::unordered_set<table_id> get_ids_of_all_views(replica::database& db, table_id table_id) {
|
||||||
return db.find_column_family(table_id).views() | std::views::transform([] (view_ptr vptr) {
|
return db.find_column_family(table_id).views() | std::views::transform([] (view_ptr vptr) {
|
||||||
return vptr->id();
|
return vptr->id();
|
||||||
}) | std::ranges::to<std::unordered_set>();;
|
}) | std::ranges::to<std::unordered_set>();;
|
||||||
}
|
}
|
||||||
|
|
||||||
future<> view_building_worker::local_state::flush_table(view_building_worker& vbw, table_id table_id) {
|
// If `state::processing_base_table` is diffrent that the `view_building_state::currently_processed_base_table`,
|
||||||
// `table_id` should point to currently processing base table but
|
// clear the state, save and flush new base table
|
||||||
// `view_building_worker::local_state::processing_base_table` may not be set to it yet,
|
future<> view_building_worker::state::update_processing_base_table(replica::database& db, const view_building_state& building_state, abort_source& as) {
|
||||||
// so we need to pass it directly
|
if (processing_base_table != building_state.currently_processed_base_table) {
|
||||||
co_await vbw.container().invoke_on_all([table_id] (view_building_worker& local_vbw) -> future<> {
|
co_await clear();
|
||||||
auto base_cf = local_vbw._db.find_column_family(table_id).shared_from_this();
|
if (building_state.currently_processed_base_table) {
|
||||||
co_await when_all(base_cf->await_pending_writes(), base_cf->await_pending_streams());
|
co_await flush_base_table(db, *building_state.currently_processed_base_table, as);
|
||||||
co_await flush_base(base_cf, local_vbw._as);
|
|
||||||
});
|
|
||||||
|
|
||||||
flushed_views = get_ids_of_all_views(vbw._db, table_id);
|
|
||||||
}
|
|
||||||
|
|
||||||
future<> view_building_worker::local_state::update(view_building_worker& vbw) {
|
|
||||||
const auto& vb_state = vbw._vb_state_machine.building_state;
|
|
||||||
|
|
||||||
// Check if the base table to process was changed.
|
|
||||||
// If so, we clear the state, aborting tasks for previous base table and starting new ones for the new base table.
|
|
||||||
if (processing_base_table != vb_state.currently_processed_base_table) {
|
|
||||||
co_await clear_state();
|
|
||||||
|
|
||||||
if (vb_state.currently_processed_base_table) {
|
|
||||||
// When we start to process new base table, we need to flush its current data, so we can build the view.
|
|
||||||
co_await flush_table(vbw, *vb_state.currently_processed_base_table);
|
|
||||||
}
|
|
||||||
|
|
||||||
processing_base_table = vb_state.currently_processed_base_table;
|
|
||||||
vbw_logger.info("Processing base table was changed to: {}", processing_base_table);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (!processing_base_table) {
|
|
||||||
vbw_logger.debug("No base table is selected to be processed.");
|
|
||||||
co_return;
|
|
||||||
}
|
|
||||||
|
|
||||||
std::vector<table_id> new_views;
|
|
||||||
auto all_view_ids = get_ids_of_all_views(vbw._db, *processing_base_table);
|
|
||||||
std::ranges::set_difference(all_view_ids, flushed_views, std::back_inserter(new_views));
|
|
||||||
if (!new_views.empty()) {
|
|
||||||
// Flush base table again in any new view was created, so the view building tasks will see up-to-date sstables.
|
|
||||||
// Otherwise, we may lose mutations created after previous flush but before the new view was created.
|
|
||||||
co_await flush_table(vbw, *processing_base_table);
|
|
||||||
}
|
|
||||||
|
|
||||||
auto erm = vbw._db.find_column_family(*processing_base_table).get_effective_replication_map();
|
|
||||||
auto my_host_id = erm->get_topology().my_host_id();
|
|
||||||
auto current_tasks_for_this_host = vb_state.get_tasks_for_host(*processing_base_table, my_host_id);
|
|
||||||
|
|
||||||
// scan view building state, collect alive and new (in STARTED state but not started by this worker) tasks
|
|
||||||
std::unordered_map<shard_id, std::vector<view_building_task>> new_tasks;
|
|
||||||
std::unordered_set<utils::UUID> alive_tasks; // save information about alive tasks to cleanup done/aborted ones
|
|
||||||
for (auto& task_ref: current_tasks_for_this_host) {
|
|
||||||
auto& task = task_ref.get();
|
|
||||||
auto id = task.id;
|
|
||||||
|
|
||||||
if (task.state != view_building_task::task_state::aborted) {
|
|
||||||
alive_tasks.insert(id);
|
|
||||||
}
|
|
||||||
|
|
||||||
if (tasks_map.contains(id) || finished_tasks.contains(id)) {
|
|
||||||
continue;
|
|
||||||
}
|
|
||||||
else if (task.state == view_building_task::task_state::started) {
|
|
||||||
auto shard = task.replica.shard;
|
|
||||||
if (new_tasks.contains(shard) && !validate_can_be_one_batch(new_tasks[shard].front(), task)) {
|
|
||||||
// Currently we allow only one batch per shard at a time
|
|
||||||
on_internal_error(vbw_logger, fmt::format("Got not-compatible tasks for the same shard. Task: {}, other: {}", new_tasks[shard].front(), task));
|
|
||||||
}
|
|
||||||
new_tasks[shard].push_back(task);
|
|
||||||
}
|
|
||||||
co_await coroutine::maybe_yield();
|
|
||||||
}
|
|
||||||
|
|
||||||
auto tasks_map_copy = tasks_map;
|
|
||||||
|
|
||||||
// Clear aborted tasks from tasks_map
|
|
||||||
for (auto it = tasks_map_copy.begin(); it != tasks_map_copy.end();) {
|
|
||||||
if (!alive_tasks.contains(it->first)) {
|
|
||||||
vbw_logger.debug("Aborting task {}", it->first);
|
|
||||||
aborted_tasks.insert(it->first);
|
|
||||||
co_await it->second->abort_task(it->first);
|
|
||||||
it = tasks_map_copy.erase(it);
|
|
||||||
} else {
|
|
||||||
++it;
|
|
||||||
}
|
|
||||||
}
|
|
||||||
|
|
||||||
// Create batches for new tasks
|
|
||||||
for (const auto& [shard, shard_tasks]: new_tasks) {
|
|
||||||
auto tasks = shard_tasks | std::views::transform([] (const view_building_task& t) {
|
|
||||||
return std::make_pair(t.id, t);
|
|
||||||
}) | std::ranges::to<std::unordered_map>();
|
|
||||||
auto batch = seastar::make_shared<view_building_worker::batch>(vbw.container(), tasks, shard_tasks.front().base_id, shard_tasks.front().replica);
|
|
||||||
|
|
||||||
for (auto& [id, _]: tasks) {
|
|
||||||
tasks_map_copy.insert({id, batch});
|
|
||||||
}
|
|
||||||
co_await coroutine::maybe_yield();
|
|
||||||
}
|
|
||||||
|
|
||||||
tasks_map = std::move(tasks_map_copy);
|
|
||||||
}
|
|
||||||
|
|
||||||
future<> view_building_worker::local_state::finish_completed_tasks() {
|
|
||||||
for (auto it = tasks_map.begin(); it != tasks_map.end();) {
|
|
||||||
if (it->second->state == view_building_worker::batch_state::idle) {
|
|
||||||
++it;
|
|
||||||
} else if (it->second->state == view_building_worker::batch_state::in_progress) {
|
|
||||||
vbw_logger.debug("Task {} is still in progress", it->first);
|
|
||||||
++it;
|
|
||||||
} else {
|
|
||||||
co_await it->second->work.get_future();
|
|
||||||
finished_tasks.insert(it->first);
|
|
||||||
vbw_logger.info("Task {} was completed", it->first);
|
|
||||||
it->second->batch_done_cv.broadcast();
|
|
||||||
it = tasks_map.erase(it);
|
|
||||||
}
|
}
|
||||||
|
processing_base_table = building_state.currently_processed_base_table;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
future<> view_building_worker::local_state::clear_state() {
|
// If `_batch` ptr points to valid object, co_await its `work` future, save completed tasks and delete the object
|
||||||
for (auto& [_, batch]: tasks_map) {
|
future<> view_building_worker::state::clean_up_after_batch() {
|
||||||
co_await batch->abort();
|
if (_batch) {
|
||||||
|
co_await std::move(_batch->work);
|
||||||
|
for (auto& [id, _]: _batch->tasks) {
|
||||||
|
completed_tasks.insert(id);
|
||||||
}
|
}
|
||||||
|
_batch = nullptr;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// Flush base table, set is as currently processing base table and save which views exist at the time of flush
|
||||||
|
future<> view_building_worker::state::flush_base_table(replica::database& db, table_id base_table_id, abort_source& as) {
|
||||||
|
auto cf = db.find_column_family(base_table_id).shared_from_this();
|
||||||
|
co_await when_all(cf->await_pending_writes(), cf->await_pending_streams());
|
||||||
|
co_await flush_base(cf, as);
|
||||||
|
processing_base_table = base_table_id;
|
||||||
|
flushed_views = get_ids_of_all_views(db, base_table_id);
|
||||||
|
}
|
||||||
|
|
||||||
|
future<> view_building_worker::state::clear() {
|
||||||
|
if (_batch) {
|
||||||
|
_batch->as.request_abort();
|
||||||
|
co_await std::move(_batch->work);
|
||||||
|
_batch = nullptr;
|
||||||
|
}
|
||||||
processing_base_table.reset();
|
processing_base_table.reset();
|
||||||
|
completed_tasks.clear();
|
||||||
flushed_views.clear();
|
flushed_views.clear();
|
||||||
tasks_map.clear();
|
|
||||||
finished_tasks.clear();
|
|
||||||
aborted_tasks.clear();
|
|
||||||
state_updated_cv.broadcast();
|
|
||||||
some_batch_finished = false;
|
|
||||||
vbw_logger.debug("View building worker state was cleared.");
|
|
||||||
}
|
}
|
||||||
|
|
||||||
view_building_worker::batch::batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica)
|
view_building_worker::batch::batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica)
|
||||||
@@ -674,17 +499,12 @@ view_building_worker::batch::batch(sharded<view_building_worker>& vbw, std::unor
|
|||||||
, _vbw(vbw) {}
|
, _vbw(vbw) {}
|
||||||
|
|
||||||
void view_building_worker::batch::start() {
|
void view_building_worker::batch::start() {
|
||||||
if (this_shard_id() != 0) {
|
if (this_shard_id() != replica.shard) {
|
||||||
on_internal_error(vbw_logger, "view_building_worker::batch should be started on shard0");
|
on_internal_error(vbw_logger, "view_building_worker::batch should be started on replica shard");
|
||||||
}
|
}
|
||||||
|
|
||||||
state = batch_state::in_progress;
|
work = do_work().finally([this] {
|
||||||
work = smp::submit_to(replica.shard, [this] () -> future<> {
|
promise.set_value();
|
||||||
return do_work();
|
|
||||||
}).finally([this] () {
|
|
||||||
state = batch_state::finished;
|
|
||||||
_vbw.local()._state.some_batch_finished = true;
|
|
||||||
_vbw.local()._vb_state_machine.event.broadcast();
|
|
||||||
});
|
});
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -699,10 +519,6 @@ future<> view_building_worker::batch::abort() {
|
|||||||
co_await smp::submit_to(replica.shard, [this] () {
|
co_await smp::submit_to(replica.shard, [this] () {
|
||||||
as.request_abort();
|
as.request_abort();
|
||||||
});
|
});
|
||||||
|
|
||||||
if (work.valid()) {
|
|
||||||
co_await work.get_future();
|
|
||||||
}
|
|
||||||
}
|
}
|
||||||
|
|
||||||
future<> view_building_worker::batch::do_work() {
|
future<> view_building_worker::batch::do_work() {
|
||||||
@@ -896,6 +712,124 @@ void view_building_worker::cleanup_staging_sstables(locator::effective_replicati
|
|||||||
_staging_sstables[table_id].erase(first, last);
|
_staging_sstables[table_id].erase(first, last);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
future<view_building_state> view_building_worker::get_latest_view_building_state(raft::term_t term) {
|
||||||
|
return smp::submit_to(0, [&sharded_vbw = container(), term] () -> future<view_building_state> {
|
||||||
|
auto& vbw = sharded_vbw.local();
|
||||||
|
// auto guard = vbw._group0.client().start_operation(vbw._as);
|
||||||
|
|
||||||
|
auto& raft_server = vbw._group0.group0_server();
|
||||||
|
auto group0_holder = vbw._group0.hold_group0_gate();
|
||||||
|
co_await raft_server.read_barrier(&vbw._as);
|
||||||
|
if (raft_server.get_current_term() != term) {
|
||||||
|
throw std::runtime_error(fmt::format("Invalid raft term. Got {} but current term is {}", term, raft_server.get_current_term()));
|
||||||
|
}
|
||||||
|
|
||||||
|
co_return vbw._vb_state_machine.building_state;
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
future<std::vector<utils::UUID>> view_building_worker::work_on_tasks(raft::term_t term, std::vector<utils::UUID> ids) {
|
||||||
|
auto collect_completed_tasks = [&] {
|
||||||
|
std::vector<utils::UUID> completed;
|
||||||
|
for (auto& id: ids) {
|
||||||
|
if (_state.completed_tasks.contains(id)) {
|
||||||
|
completed.push_back(id);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return completed;
|
||||||
|
};
|
||||||
|
|
||||||
|
auto lock = co_await get_units(_state._mutex, 1, _as);
|
||||||
|
// Firstly check if there is any batch that is finished but wasn't cleaned up.
|
||||||
|
if (_state._batch && _state._batch->promise.available()) {
|
||||||
|
co_await _state.clean_up_after_batch();
|
||||||
|
}
|
||||||
|
|
||||||
|
// Check if tasks were already completed.
|
||||||
|
// If only part of the tasks were finished, return the subset and don't execute the remaining tasks.
|
||||||
|
std::vector<utils::UUID> completed = collect_completed_tasks();
|
||||||
|
if (!completed.empty()) {
|
||||||
|
co_return completed;
|
||||||
|
}
|
||||||
|
lock.return_all();
|
||||||
|
|
||||||
|
auto building_state = co_await get_latest_view_building_state(term);
|
||||||
|
|
||||||
|
lock = co_await get_units(_state._mutex, 1, _as);
|
||||||
|
co_await _state.update_processing_base_table(_db, building_state, _as);
|
||||||
|
// If there is no running batch, create it.
|
||||||
|
if (!_state._batch) {
|
||||||
|
if (!_state.processing_base_table) {
|
||||||
|
throw std::runtime_error("view_building_worker::state::processing_base_table needs to be set to work on view building");
|
||||||
|
}
|
||||||
|
|
||||||
|
auto my_host_id = _db.get_token_metadata().get_topology().my_host_id();
|
||||||
|
auto my_replica = locator::tablet_replica{my_host_id, this_shard_id()};
|
||||||
|
std::unordered_map<utils::UUID, view_building_task> tasks;
|
||||||
|
for (auto& id: ids) {
|
||||||
|
auto task_opt = building_state.get_task(*_state.processing_base_table, my_replica, id);
|
||||||
|
if (!task_opt) {
|
||||||
|
throw std::runtime_error(fmt::format("Task {} was not found for base table {} on replica {}", id, *building_state.currently_processed_base_table, my_replica));
|
||||||
|
}
|
||||||
|
tasks.insert({id, *task_opt});
|
||||||
|
}
|
||||||
|
#ifdef SEASTAR_DEBUG
|
||||||
|
auto& some_task = tasks.begin()->second;
|
||||||
|
for (auto& [_, t]: tasks) {
|
||||||
|
SCYLLA_ASSERT(t.base_id == some_task.base_id);
|
||||||
|
SCYLLA_ASSERT(t.last_token == some_task.last_token);
|
||||||
|
SCYLLA_ASSERT(t.replica == some_task.replica);
|
||||||
|
SCYLLA_ASSERT(t.type == some_task.type);
|
||||||
|
SCYLLA_ASSERT(t.replica.shard == this_shard_id());
|
||||||
|
}
|
||||||
|
#endif
|
||||||
|
|
||||||
|
// If any view was added after we did the initial flush, we need to do it again
|
||||||
|
if (std::ranges::any_of(tasks | std::views::values, [&] (const view_building_task& t) {
|
||||||
|
return t.view_id && !_state.flushed_views.contains(*t.view_id);
|
||||||
|
})) {
|
||||||
|
co_await _state.flush_base_table(_db, *_state.processing_base_table, _as);
|
||||||
|
}
|
||||||
|
|
||||||
|
// Create and start the batch
|
||||||
|
_state._batch = std::make_unique<batch>(container(), std::move(tasks), *building_state.currently_processed_base_table, my_replica);
|
||||||
|
_state._batch->start();
|
||||||
|
}
|
||||||
|
|
||||||
|
if (std::ranges::all_of(ids, [&] (auto& id) { return !_state._batch->tasks.contains(id); })) {
|
||||||
|
throw std::runtime_error(fmt::format(
|
||||||
|
"None of the tasks requested to work on is executed in current view building batch. Batch executes: {}, the RPC requested: {}",
|
||||||
|
_state._batch->tasks | std::views::keys, ids));
|
||||||
|
}
|
||||||
|
auto batch_future = _state._batch->promise.get_shared_future();
|
||||||
|
lock.return_all();
|
||||||
|
|
||||||
|
co_await std::move(batch_future);
|
||||||
|
|
||||||
|
lock = co_await get_units(_state._mutex, 1, _as);
|
||||||
|
co_await _state.clean_up_after_batch();
|
||||||
|
co_return collect_completed_tasks();
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -16,6 +16,7 @@
|
|||||||
#include <unordered_set>
|
#include <unordered_set>
|
||||||
#include "locator/abstract_replication_strategy.hh"
|
#include "locator/abstract_replication_strategy.hh"
|
||||||
#include "locator/tablets.hh"
|
#include "locator/tablets.hh"
|
||||||
|
#include "raft/raft.hh"
|
||||||
#include "seastar/core/gate.hh"
|
#include "seastar/core/gate.hh"
|
||||||
#include "db/view/view_building_state.hh"
|
#include "db/view/view_building_state.hh"
|
||||||
#include "sstables/shared_sstable.hh"
|
#include "sstables/shared_sstable.hh"
|
||||||
@@ -31,7 +32,7 @@ class messaging_service;
|
|||||||
}
|
}
|
||||||
|
|
||||||
namespace service {
|
namespace service {
|
||||||
class raft_group0_client;
|
class raft_group0;
|
||||||
}
|
}
|
||||||
|
|
||||||
namespace db {
|
namespace db {
|
||||||
@@ -65,27 +66,16 @@ class view_building_worker : public seastar::peering_sharded_service<view_buildi
|
|||||||
*
|
*
|
||||||
* When `work` future is finished, it means all tasks in `tasks_ids` are done.
|
* When `work` future is finished, it means all tasks in `tasks_ids` are done.
|
||||||
*
|
*
|
||||||
* The batch lives on shard 0 exclusively.
|
* The batch lives on shard, where its executing its work exclusively.
|
||||||
* When the batch starts to execute its tasks, it firstly copies all necessary data
|
|
||||||
* to the designated shard, then the work is done on the local copy of the data only.
|
|
||||||
*/
|
*/
|
||||||
|
|
||||||
enum class batch_state {
|
|
||||||
idle,
|
|
||||||
in_progress,
|
|
||||||
finished,
|
|
||||||
};
|
|
||||||
|
|
||||||
class batch {
|
class batch {
|
||||||
public:
|
public:
|
||||||
batch_state state = batch_state::idle;
|
|
||||||
table_id base_id;
|
table_id base_id;
|
||||||
locator::tablet_replica replica;
|
locator::tablet_replica replica;
|
||||||
std::unordered_map<utils::UUID, view_building_task> tasks;
|
std::unordered_map<utils::UUID, view_building_task> tasks;
|
||||||
|
|
||||||
shared_future<> work;
|
shared_promise<> promise;
|
||||||
condition_variable batch_done_cv;
|
future<> work = make_ready_future();
|
||||||
// The abort has to be used only on `replica.shard`
|
|
||||||
abort_source as;
|
abort_source as;
|
||||||
|
|
||||||
batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica);
|
batch(sharded<view_building_worker>& vbw, std::unordered_map<utils::UUID, view_building_task> tasks, table_id base_id, locator::tablet_replica replica);
|
||||||
@@ -101,35 +91,18 @@ class view_building_worker : public seastar::peering_sharded_service<view_buildi
|
|||||||
|
|
||||||
friend class batch;
|
friend class batch;
|
||||||
|
|
||||||
struct local_state {
|
struct state {
|
||||||
std::optional<table_id> processing_base_table = std::nullopt;
|
std::optional<table_id> processing_base_table = std::nullopt;
|
||||||
// Stores ids of views for which the flush was done.
|
std::unordered_set<utils::UUID> completed_tasks;
|
||||||
// When a new view is created, we need to flush the base table again,
|
std::unique_ptr<batch> _batch = nullptr;
|
||||||
// as data might be inserted.
|
|
||||||
std::unordered_set<table_id> flushed_views;
|
std::unordered_set<table_id> flushed_views;
|
||||||
std::unordered_map<utils::UUID, shared_ptr<batch>> tasks_map;
|
|
||||||
|
|
||||||
std::unordered_set<utils::UUID> finished_tasks;
|
semaphore _mutex = semaphore(1);
|
||||||
std::unordered_set<utils::UUID> aborted_tasks;
|
// All of the methods below should be executed while holding `_mutex` unit!
|
||||||
|
future<> update_processing_base_table(replica::database& db, const view_building_state& building_state, abort_source& as);
|
||||||
bool some_batch_finished = false;
|
future<> flush_base_table(replica::database& db, table_id base_table_id, abort_source& as);
|
||||||
condition_variable state_updated_cv;
|
future<> clean_up_after_batch();
|
||||||
|
future<> clear();
|
||||||
// Clears completed/aborted tasks and creates batches (without starting them) for started tasks.
|
|
||||||
// Returns a map of tasks per shard to execute.
|
|
||||||
future<> update(view_building_worker& vbw);
|
|
||||||
|
|
||||||
future<> finish_completed_tasks();
|
|
||||||
|
|
||||||
// The state can be aborted if, for example, a view is dropped, then all its tasks
|
|
||||||
// are aborted and the coordinator may choose new base table to process.
|
|
||||||
// This method aborts all batches as we stop to processing the current base table.
|
|
||||||
future<> clear_state();
|
|
||||||
|
|
||||||
// Flush table with `table_id` on all shards.
|
|
||||||
// This method should be used only on currently processing base table and
|
|
||||||
// it updates `flushed_views` field.
|
|
||||||
future<> flush_table(view_building_worker& vbw, table_id table_id);
|
|
||||||
};
|
};
|
||||||
|
|
||||||
// Wrapper which represents information needed to create
|
// Wrapper which represents information needed to create
|
||||||
@@ -147,14 +120,14 @@ private:
|
|||||||
replica::database& _db;
|
replica::database& _db;
|
||||||
db::system_keyspace& _sys_ks;
|
db::system_keyspace& _sys_ks;
|
||||||
service::migration_notifier& _mnotifier;
|
service::migration_notifier& _mnotifier;
|
||||||
service::raft_group0_client& _group0_client;
|
service::raft_group0& _group0;
|
||||||
view_update_generator& _vug;
|
view_update_generator& _vug;
|
||||||
netw::messaging_service& _messaging;
|
netw::messaging_service& _messaging;
|
||||||
view_building_state_machine& _vb_state_machine;
|
view_building_state_machine& _vb_state_machine;
|
||||||
abort_source _as;
|
abort_source _as;
|
||||||
named_gate _gate;
|
named_gate _gate;
|
||||||
|
|
||||||
local_state _state;
|
state _state;
|
||||||
std::unordered_set<table_id> _views_in_progress;
|
std::unordered_set<table_id> _views_in_progress;
|
||||||
future<> _view_building_state_observer = make_ready_future<>();
|
future<> _view_building_state_observer = make_ready_future<>();
|
||||||
|
|
||||||
@@ -166,7 +139,7 @@ private:
|
|||||||
|
|
||||||
public:
|
public:
|
||||||
view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier,
|
view_building_worker(replica::database& db, db::system_keyspace& sys_ks, service::migration_notifier& mnotifier,
|
||||||
service::raft_group0_client& group0_client, view_update_generator& vug, netw::messaging_service& ms,
|
service::raft_group0& group0, view_update_generator& vug, netw::messaging_service& ms,
|
||||||
view_building_state_machine& vbsm);
|
view_building_state_machine& vbsm);
|
||||||
future<> init();
|
future<> init();
|
||||||
|
|
||||||
@@ -185,10 +158,11 @@ public:
|
|||||||
void cleanup_staging_sstables(locator::effective_replication_map_ptr erm, table_id table_id, locator::tablet_id tid);
|
void cleanup_staging_sstables(locator::effective_replication_map_ptr erm, table_id table_id, locator::tablet_id tid);
|
||||||
|
|
||||||
private:
|
private:
|
||||||
|
future<view_building_state> get_latest_view_building_state(raft::term_t term);
|
||||||
|
future<> check_for_aborted_tasks();
|
||||||
|
|
||||||
future<> run_view_building_state_observer();
|
future<> run_view_building_state_observer();
|
||||||
future<> update_built_views();
|
future<> update_built_views();
|
||||||
future<> update_building_state();
|
|
||||||
bool is_shard_free(shard_id shard);
|
|
||||||
|
|
||||||
dht::token_range get_tablet_token_range(table_id table_id, dht::token last_token);
|
dht::token_range get_tablet_token_range(table_id table_id, dht::token last_token);
|
||||||
future<> do_build_range(table_id base_id, std::vector<table_id> views_ids, dht::token last_token, abort_source& as);
|
future<> do_build_range(table_id base_id, std::vector<table_id> views_ids, dht::token last_token, abort_source& as);
|
||||||
@@ -202,7 +176,7 @@ private:
|
|||||||
|
|
||||||
void init_messaging_service();
|
void init_messaging_service();
|
||||||
future<> uninit_messaging_service();
|
future<> uninit_messaging_service();
|
||||||
future<std::vector<view_task_result>> work_on_tasks(std::vector<utils::UUID> ids);
|
future<std::vector<utils::UUID>> work_on_tasks(raft::term_t term, std::vector<utils::UUID> ids);
|
||||||
};
|
};
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -483,7 +483,7 @@ public:
|
|||||||
});
|
});
|
||||||
co_await add_partition(mutation_sink, "load", [this] () -> future<sstring> {
|
co_await add_partition(mutation_sink, "load", [this] () -> future<sstring> {
|
||||||
return map_reduce_tables<int64_t>([] (replica::table& tbl) {
|
return map_reduce_tables<int64_t>([] (replica::table& tbl) {
|
||||||
return tbl.get_stats().live_disk_space_used;
|
return tbl.get_stats().live_disk_space_used.on_disk;
|
||||||
}).then([] (int64_t load) {
|
}).then([] (int64_t load) {
|
||||||
return format("{}", load);
|
return format("{}", load);
|
||||||
});
|
});
|
||||||
@@ -1158,6 +1158,104 @@ private:
|
|||||||
}
|
}
|
||||||
};
|
};
|
||||||
|
|
||||||
|
class tablet_sizes : public group0_virtual_table {
|
||||||
|
private:
|
||||||
|
sharded<service::tablet_allocator>& _talloc;
|
||||||
|
sharded<replica::database>& _db;
|
||||||
|
public:
|
||||||
|
tablet_sizes(sharded<service::tablet_allocator>& talloc,
|
||||||
|
sharded<replica::database>& db,
|
||||||
|
sharded<service::raft_group_registry>& raft_gr,
|
||||||
|
sharded<netw::messaging_service>& ms)
|
||||||
|
: group0_virtual_table(build_schema(), raft_gr, ms)
|
||||||
|
, _talloc(talloc)
|
||||||
|
, _db(db)
|
||||||
|
{ }
|
||||||
|
|
||||||
|
future<> execute_on_leader(std::function<void(mutation)> mutation_sink, reader_permit permit) override {
|
||||||
|
auto stats = _talloc.local().get_load_stats();
|
||||||
|
while (!stats) {
|
||||||
|
// Wait for stats to be refreshed by topology coordinator
|
||||||
|
{
|
||||||
|
abort_on_expiry aoe(permit.timeout());
|
||||||
|
reader_permit::awaits_guard ag(permit);
|
||||||
|
co_await seastar::sleep_abortable(std::chrono::milliseconds(200), aoe.abort_source());
|
||||||
|
}
|
||||||
|
if (!co_await is_leader(permit)) {
|
||||||
|
co_await redirect_to_leader(std::move(mutation_sink), std::move(permit));
|
||||||
|
co_return;
|
||||||
|
}
|
||||||
|
stats = _talloc.local().get_load_stats();
|
||||||
|
}
|
||||||
|
|
||||||
|
auto tm = _db.local().get_token_metadata_ptr();
|
||||||
|
|
||||||
|
auto prepare_replica_sizes = [] (const std::unordered_map<host_id, uint64_t>& replica_sizes) {
|
||||||
|
map_type_impl::native_type tmp;
|
||||||
|
for (auto& r: replica_sizes) {
|
||||||
|
auto replica = r.first.uuid();
|
||||||
|
int64_t tablet_size = int64_t(r.second);
|
||||||
|
auto map_element = std::make_pair<data_value, data_value>(data_value(replica), data_value(tablet_size));
|
||||||
|
tmp.push_back(std::move(map_element));
|
||||||
|
}
|
||||||
|
return tmp;
|
||||||
|
};
|
||||||
|
|
||||||
|
auto prepare_missing_replica = [] (const std::unordered_set<host_id>& missing_replicas) {
|
||||||
|
set_type_impl::native_type tmp;
|
||||||
|
for (auto& r: missing_replicas) {
|
||||||
|
tmp.push_back(data_value(r.uuid()));
|
||||||
|
}
|
||||||
|
return tmp;
|
||||||
|
};
|
||||||
|
|
||||||
|
auto map_type = map_type_impl::get_instance(uuid_type, long_type, false);
|
||||||
|
auto set_type = set_type_impl::get_instance(uuid_type, false);
|
||||||
|
for (auto&& [table, tmap] : tm->tablets().all_tables_ungrouped()) {
|
||||||
|
mutation m(schema(), make_partition_key(table));
|
||||||
|
co_await tmap->for_each_tablet([&] (locator::tablet_id tid, const locator::tablet_info& tinfo) -> future<> {
|
||||||
|
auto trange = tmap->get_token_range(tid);
|
||||||
|
int64_t last_token = trange.end()->value().raw();
|
||||||
|
auto& r = m.partition().clustered_row(*schema(), clustering_key::from_single_value(*schema(), data_value(last_token).serialize_nonnull()));
|
||||||
|
const range_based_tablet_id rb_tid {table, trange};
|
||||||
|
std::unordered_map<host_id, uint64_t> replica_sizes;
|
||||||
|
std::unordered_set<host_id> missing_replicas;
|
||||||
|
for (auto& replica : tinfo.replicas) {
|
||||||
|
auto tablet_size_opt = stats->get_tablet_size(replica.host, rb_tid);
|
||||||
|
if (tablet_size_opt) {
|
||||||
|
replica_sizes[replica.host] = *tablet_size_opt;
|
||||||
|
} else {
|
||||||
|
missing_replicas.insert(replica.host);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
set_cell(r.cells(), "replicas", make_map_value(map_type, prepare_replica_sizes(replica_sizes)));
|
||||||
|
set_cell(r.cells(), "missing_replicas", make_set_value(set_type, prepare_missing_replica(missing_replicas)));
|
||||||
|
return make_ready_future<>();
|
||||||
|
});
|
||||||
|
|
||||||
|
mutation_sink(m);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
private:
|
||||||
|
static schema_ptr build_schema() {
|
||||||
|
auto id = generate_legacy_id(system_keyspace::NAME, "tablet_sizes");
|
||||||
|
return schema_builder(system_keyspace::NAME, "tablet_sizes", std::make_optional(id))
|
||||||
|
.with_column("table_id", uuid_type, column_kind::partition_key)
|
||||||
|
.with_column("last_token", long_type, column_kind::clustering_key)
|
||||||
|
.with_column("replicas", map_type_impl::get_instance(uuid_type, long_type, false))
|
||||||
|
.with_column("missing_replicas", set_type_impl::get_instance(uuid_type, false))
|
||||||
|
.with_sharder(1, 0) // shard0-only
|
||||||
|
.with_hash_version()
|
||||||
|
.build();
|
||||||
|
}
|
||||||
|
|
||||||
|
dht::decorated_key make_partition_key(table_id table) {
|
||||||
|
return dht::decorate_key(*_s, partition_key::from_single_value(
|
||||||
|
*_s, data_value(table.uuid()).serialize_nonnull()));
|
||||||
|
}
|
||||||
|
};
|
||||||
|
|
||||||
class cdc_timestamps_table : public streaming_virtual_table {
|
class cdc_timestamps_table : public streaming_virtual_table {
|
||||||
private:
|
private:
|
||||||
replica::database& _db;
|
replica::database& _db;
|
||||||
@@ -1353,6 +1451,7 @@ future<> initialize_virtual_tables(
|
|||||||
co_await add_table(std::make_unique<clients_table>(ss));
|
co_await add_table(std::make_unique<clients_table>(ss));
|
||||||
co_await add_table(std::make_unique<raft_state_table>(dist_raft_gr));
|
co_await add_table(std::make_unique<raft_state_table>(dist_raft_gr));
|
||||||
co_await add_table(std::make_unique<load_per_node>(tablet_allocator, dist_db, dist_raft_gr, ms, dist_gossiper));
|
co_await add_table(std::make_unique<load_per_node>(tablet_allocator, dist_db, dist_raft_gr, ms, dist_gossiper));
|
||||||
|
co_await add_table(std::make_unique<tablet_sizes>(tablet_allocator, dist_db, dist_raft_gr, ms));
|
||||||
co_await add_table(std::make_unique<cdc_timestamps_table>(db, ss));
|
co_await add_table(std::make_unique<cdc_timestamps_table>(db, ss));
|
||||||
co_await add_table(std::make_unique<cdc_streams_table>(db, ss));
|
co_await add_table(std::make_unique<cdc_streams_table>(db, ss));
|
||||||
|
|
||||||
|
|||||||
@@ -18,6 +18,9 @@ target_link_libraries(scylla_dht
|
|||||||
PRIVATE
|
PRIVATE
|
||||||
replica)
|
replica)
|
||||||
|
|
||||||
|
if (Scylla_USE_PRECOMPILED_HEADER_USE)
|
||||||
|
target_precompile_headers(scylla_dht REUSE_FROM scylla-precompiled-header)
|
||||||
|
endif()
|
||||||
add_whole_archive(dht scylla_dht)
|
add_whole_archive(dht scylla_dht)
|
||||||
|
|
||||||
check_headers(check-headers scylla_dht
|
check_headers(check-headers scylla_dht
|
||||||
|
|||||||
6
dist/common/systemd/scylla-server.slice
vendored
6
dist/common/systemd/scylla-server.slice
vendored
@@ -6,13 +6,7 @@ Before=slices.target
|
|||||||
MemoryAccounting=true
|
MemoryAccounting=true
|
||||||
IOAccounting=true
|
IOAccounting=true
|
||||||
CPUAccounting=true
|
CPUAccounting=true
|
||||||
# Systemd deprecated settings BlockIOWeight and CPUShares. But they are still the ones used in RHEL7
|
|
||||||
# Newer SystemD wants IOWeight and CPUWeight instead. Luckily both newer and older SystemD seem to
|
|
||||||
# ignore the unwanted option so safest to get both. Using just the old versions would work too but
|
|
||||||
# seems less future proof. Using just the new versions does not work at all for RHEL7/
|
|
||||||
BlockIOWeight=1000
|
|
||||||
IOWeight=1000
|
IOWeight=1000
|
||||||
MemorySwapMax=0
|
MemorySwapMax=0
|
||||||
CPUShares=1000
|
|
||||||
CPUWeight=1000
|
CPUWeight=1000
|
||||||
|
|
||||||
|
|||||||
1
dist/debian/debian/scylla-server.install
vendored
1
dist/debian/debian/scylla-server.install
vendored
@@ -2,7 +2,6 @@ etc/default/scylla-server
|
|||||||
etc/default/scylla-housekeeping
|
etc/default/scylla-housekeeping
|
||||||
etc/scylla.d/*.conf
|
etc/scylla.d/*.conf
|
||||||
etc/bash_completion.d/nodetool-completion
|
etc/bash_completion.d/nodetool-completion
|
||||||
opt/scylladb/share/p11-kit/modules/*
|
|
||||||
opt/scylladb/share/doc/scylla/*
|
opt/scylladb/share/doc/scylla/*
|
||||||
opt/scylladb/share/doc/scylla/licenses/
|
opt/scylladb/share/doc/scylla/licenses/
|
||||||
usr/lib/systemd/system/*.timer
|
usr/lib/systemd/system/*.timer
|
||||||
|
|||||||
1
dist/redhat/scylla.spec
vendored
1
dist/redhat/scylla.spec
vendored
@@ -122,7 +122,6 @@ ln -sfT /etc/scylla /var/lib/scylla/conf
|
|||||||
%config(noreplace) %{_sysconfdir}/sysconfig/scylla-housekeeping
|
%config(noreplace) %{_sysconfdir}/sysconfig/scylla-housekeeping
|
||||||
%attr(0755,root,root) %dir %{_sysconfdir}/scylla.d
|
%attr(0755,root,root) %dir %{_sysconfdir}/scylla.d
|
||||||
%config(noreplace) %{_sysconfdir}/scylla.d/*.conf
|
%config(noreplace) %{_sysconfdir}/scylla.d/*.conf
|
||||||
/opt/scylladb/share/p11-kit/modules/*
|
|
||||||
/opt/scylladb/share/doc/scylla/*
|
/opt/scylladb/share/doc/scylla/*
|
||||||
%{_unitdir}/scylla-fstrim.service
|
%{_unitdir}/scylla-fstrim.service
|
||||||
%{_unitdir}/scylla-housekeeping-daily.service
|
%{_unitdir}/scylla-housekeeping-daily.service
|
||||||
|
|||||||
65
docs/README-metrics.md
Normal file
65
docs/README-metrics.md
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
# ScyllaDB metrics docs scripts
|
||||||
|
|
||||||
|
The following files extracts metrics from C++ source files and generates documentation:
|
||||||
|
|
||||||
|
- **`scripts/get_description.py`** - Metrics parser and extractor
|
||||||
|
- **`scripts/metrics-config.yml`** - Configuration for special cases only
|
||||||
|
- **`docs/_ext/scylladb_metrics.py`** - Sphinx extension for rendering
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
The system automatically handles most metrics extraction. You only need configuration in the `metrics-config.yml` file for:
|
||||||
|
|
||||||
|
**Complex parameter combinations:**
|
||||||
|
```yaml
|
||||||
|
"cdc/log.cc":
|
||||||
|
params:
|
||||||
|
part_name;suffix: [["static_row", "total"], ["clustering_row", "failed"]]
|
||||||
|
kind: ["total", "failed"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Multiple parameter values:**
|
||||||
|
```yaml
|
||||||
|
"service/storage_proxy.cc":
|
||||||
|
params:
|
||||||
|
_short_description_prefix: ["total_write_attempts", "write_errors"]
|
||||||
|
```
|
||||||
|
|
||||||
|
**Complex expressions:**
|
||||||
|
```yaml
|
||||||
|
"tracing/tracing.cc":
|
||||||
|
params:
|
||||||
|
"max_pending_trace_records + write_event_records_threshold": "max_pending_trace_records + write_event_records_threshold"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Group assignments:**
|
||||||
|
```yaml
|
||||||
|
"cql3/query_processor.cc":
|
||||||
|
groups:
|
||||||
|
"80": query_processor
|
||||||
|
```
|
||||||
|
|
||||||
|
**Skip files:**
|
||||||
|
```yaml
|
||||||
|
"seastar/tests/unit/metrics_test.cc": skip
|
||||||
|
```
|
||||||
|
|
||||||
|
## Validation
|
||||||
|
|
||||||
|
Use the built-in validation to check all metrics files:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Validate all metrics files
|
||||||
|
python scripts/get_description.py --validate -c scripts/metrics-config.yml
|
||||||
|
|
||||||
|
# Validate with verbose output
|
||||||
|
python scripts/get_description.py --validate -c scripts/metrics-config.yml -v
|
||||||
|
```
|
||||||
|
|
||||||
|
The GitHub workflow `docs-validate-metrics.yml` automatically runs validation on PRs to `master` that modify `.cc` files or metrics configuration.
|
||||||
|
|
||||||
|
## Common fixes
|
||||||
|
|
||||||
|
- **"Parameter not found"**: Add parameter mapping to config `params` section
|
||||||
|
- **"Could not resolve param"**: Check parameter name matches C++ code exactly
|
||||||
|
- **"No group found"**: Add group mapping or verify `add_group()` calls
|
||||||
@@ -27,38 +27,48 @@ class MetricsProcessor:
|
|||||||
os.makedirs(output_directory, exist_ok=True)
|
os.makedirs(output_directory, exist_ok=True)
|
||||||
return output_directory
|
return output_directory
|
||||||
|
|
||||||
def _process_single_file(self, file_path, destination_path, metrics_config_path):
|
def _process_single_file(self, file_path, destination_path, metrics_config_path, strict=False):
|
||||||
with open(file_path, 'r', encoding='utf-8') as f:
|
with open(file_path, 'r', encoding='utf-8') as f:
|
||||||
content = f.read()
|
content = f.read()
|
||||||
if self.MARKER in content and not os.path.exists(destination_path):
|
if self.MARKER in content and not os.path.exists(destination_path):
|
||||||
try:
|
try:
|
||||||
metrics_file = metrics.get_metrics_from_file(file_path, "scylla", metrics.get_metrics_information(metrics_config_path))
|
metrics_info = metrics.get_metrics_information(metrics_config_path)
|
||||||
|
# Get relative path to the repo root
|
||||||
|
relative_path = os.path.relpath(file_path, os.path.dirname(os.path.dirname(os.path.dirname(__file__))))
|
||||||
|
repo_root = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
|
||||||
|
old_cwd = os.getcwd()
|
||||||
|
os.chdir(repo_root)
|
||||||
|
# Get metrics from the file
|
||||||
|
try:
|
||||||
|
metrics_file = metrics.get_metrics_from_file(relative_path, "scylla_", metrics_info, strict=strict)
|
||||||
|
finally:
|
||||||
|
os.chdir(old_cwd)
|
||||||
|
if metrics_file:
|
||||||
with open(destination_path, 'w+', encoding='utf-8') as f:
|
with open(destination_path, 'w+', encoding='utf-8') as f:
|
||||||
json.dump(metrics_file, f, indent=4)
|
json.dump(metrics_file, f, indent=4)
|
||||||
except SystemExit:
|
LOGGER.info(f'Generated {len(metrics_file)} metrics for {file_path}')
|
||||||
LOGGER.info(f'Skipping file: {file_path}')
|
else:
|
||||||
|
LOGGER.info(f'No metrics generated for {file_path}')
|
||||||
except Exception as error:
|
except Exception as error:
|
||||||
# Remove [Errno X] prefix from error message
|
LOGGER.info(f'Error processing {file_path}: {str(error)}')
|
||||||
error_msg = str(error)
|
|
||||||
if '[Errno' in error_msg:
|
|
||||||
error_msg = error_msg.split('] ', 1)[1]
|
|
||||||
LOGGER.info(error_msg)
|
|
||||||
|
|
||||||
def _process_metrics_files(self, repo_dir, output_directory, metrics_config_path):
|
def _process_metrics_files(self, repo_dir, output_directory, metrics_config_path, strict=False):
|
||||||
for root, _, files in os.walk(repo_dir):
|
for root, _, files in os.walk(repo_dir):
|
||||||
for file in files:
|
for file in files:
|
||||||
if file.endswith(".cc"):
|
if file.endswith(".cc"):
|
||||||
file_path = os.path.join(root, file)
|
file_path = os.path.join(root, file)
|
||||||
file_name = os.path.splitext(file)[0] + ".json"
|
file_name = os.path.splitext(file)[0] + ".json"
|
||||||
destination_path = os.path.join(output_directory, file_name)
|
destination_path = os.path.join(output_directory, file_name)
|
||||||
self._process_single_file(file_path, destination_path, metrics_config_path)
|
self._process_single_file(file_path, destination_path, metrics_config_path, strict)
|
||||||
|
|
||||||
def run(self, app, exception=None):
|
def run(self, app, exception=None):
|
||||||
repo_dir = os.path.abspath(os.path.join(app.srcdir, ".."))
|
repo_dir = os.path.abspath(os.path.join(app.srcdir, ".."))
|
||||||
metrics_config_path = os.path.join(repo_dir, app.config.scylladb_metrics_config_path)
|
metrics_config_path = os.path.join(repo_dir, app.config.scylladb_metrics_config_path)
|
||||||
output_directory = self._create_output_directory(app, app.config.scylladb_metrics_directory)
|
output_directory = self._create_output_directory(app, app.config.scylladb_metrics_directory)
|
||||||
|
|
||||||
self._process_metrics_files(repo_dir, output_directory, metrics_config_path)
|
strict_mode = getattr(app.config, 'scylladb_metrics_strict_mode', False) or False
|
||||||
|
|
||||||
|
self._process_metrics_files(repo_dir, output_directory, metrics_config_path, strict_mode)
|
||||||
|
|
||||||
|
|
||||||
class MetricsTemplateDirective(DataTemplateJSON):
|
class MetricsTemplateDirective(DataTemplateJSON):
|
||||||
@@ -163,7 +173,7 @@ class MetricsDirective(Directive):
|
|||||||
output = []
|
output = []
|
||||||
try:
|
try:
|
||||||
relative_path_from_current_rst = self._get_relative_path(metrics_directory, app, docname)
|
relative_path_from_current_rst = self._get_relative_path(metrics_directory, app, docname)
|
||||||
files = os.listdir(metrics_directory)
|
files = sorted(os.listdir(metrics_directory))
|
||||||
for _, file in enumerate(files):
|
for _, file in enumerate(files):
|
||||||
output.extend(self._process_file(file, relative_path_from_current_rst))
|
output.extend(self._process_file(file, relative_path_from_current_rst))
|
||||||
except Exception as error:
|
except Exception as error:
|
||||||
@@ -174,6 +184,7 @@ def setup(app):
|
|||||||
app.add_config_value("scylladb_metrics_directory", default="_data/metrics", rebuild="html")
|
app.add_config_value("scylladb_metrics_directory", default="_data/metrics", rebuild="html")
|
||||||
app.add_config_value("scylladb_metrics_config_path", default='scripts/metrics-config.yml', rebuild="html")
|
app.add_config_value("scylladb_metrics_config_path", default='scripts/metrics-config.yml', rebuild="html")
|
||||||
app.add_config_value('scylladb_metrics_option_template', default='metrics_option.tmpl', rebuild='html', types=[str])
|
app.add_config_value('scylladb_metrics_option_template', default='metrics_option.tmpl', rebuild='html', types=[str])
|
||||||
|
app.add_config_value('scylladb_metrics_strict_mode', default=None, rebuild='html', types=[bool])
|
||||||
app.connect("builder-inited", MetricsProcessor().run)
|
app.connect("builder-inited", MetricsProcessor().run)
|
||||||
app.add_object_type(
|
app.add_object_type(
|
||||||
'metrics_option',
|
'metrics_option',
|
||||||
|
|||||||
@@ -29,6 +29,7 @@ def readable_desc_rst(description):
|
|||||||
|
|
||||||
cleaned_line = cleaned_line.lstrip()
|
cleaned_line = cleaned_line.lstrip()
|
||||||
cleaned_line = cleaned_line.replace('"', '')
|
cleaned_line = cleaned_line.replace('"', '')
|
||||||
|
cleaned_line = cleaned_line.replace('`', '\\`')
|
||||||
|
|
||||||
if cleaned_line != '':
|
if cleaned_line != '':
|
||||||
cleaned_line = indent + cleaned_line
|
cleaned_line = indent + cleaned_line
|
||||||
|
|||||||
@@ -1,6 +1,18 @@
|
|||||||
### a dictionary of redirections
|
### a dictionary of redirections
|
||||||
#old path: new path
|
#old path: new path
|
||||||
|
|
||||||
|
# Move the diver information to another project
|
||||||
|
|
||||||
|
/stable/using-scylla/drivers/index.html: https://docs.scylladb.com/stable/drivers/index.html
|
||||||
|
/stable/using-scylla/drivers/dynamo-drivers/index.html: https://docs.scylladb.com/stable/drivers/dynamo-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/index.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/scylla-python-driver.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/scylla-java-driver.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/scylla-go-driver.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/scylla-gocqlx-driver.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/scylla-cpp-driver.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
/stable/using-scylla/drivers/cql-drivers/scylla-rust-driver.html: https://docs.scylladb.com/stable/drivers/cql-drivers.html
|
||||||
|
|
||||||
# Redirect 2025.1 upgrade guides that are not on master but were indexed by Google (404 reported)
|
# Redirect 2025.1 upgrade guides that are not on master but were indexed by Google (404 reported)
|
||||||
|
|
||||||
/master/upgrade/upgrade-guides/upgrade-guide-from-2024.x-to-2025.1/upgrade-guide-from-2024.x-to-2025.1.html: https://docs.scylladb.com/manual/stable/upgrade/index.html
|
/master/upgrade/upgrade-guides/upgrade-guide-from-2024.x-to-2025.1/upgrade-guide-from-2024.x-to-2025.1.html: https://docs.scylladb.com/manual/stable/upgrade/index.html
|
||||||
|
|||||||
@@ -428,3 +428,7 @@ they should be easy to detect. Here is a list of these unimplemented features:
|
|||||||
that can be used to achieve consistent reads on global (multi-region) tables.
|
that can be used to achieve consistent reads on global (multi-region) tables.
|
||||||
This table option was added as a preview to DynamoDB in December 2024.
|
This table option was added as a preview to DynamoDB in December 2024.
|
||||||
<https://github.com/scylladb/scylladb/issues/21852>
|
<https://github.com/scylladb/scylladb/issues/21852>
|
||||||
|
|
||||||
|
* Alternator does not support multi-attribute (composite) keys in GSIs.
|
||||||
|
This feature was added to DynamoDB in November 2025.
|
||||||
|
<https://github.com/scylladb/scylladb/issues/27182>
|
||||||
|
|||||||
@@ -76,7 +76,7 @@ author = u"ScyllaDB Project Contributors"
|
|||||||
|
|
||||||
# List of patterns, relative to source directory, that match files and
|
# List of patterns, relative to source directory, that match files and
|
||||||
# directories to ignore when looking for source files.
|
# directories to ignore when looking for source files.
|
||||||
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'lib', 'lib64','**/_common/*', 'README.md', 'index.md', '.git', '.github', '_utils', 'rst_include', 'venv', 'dev', '_data/**']
|
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store', 'lib', 'lib64','**/_common/*', 'README.md', 'README-metrics.md', 'index.md', '.git', '.github', '_utils', 'rst_include', 'venv', 'dev', '_data/**']
|
||||||
|
|
||||||
# The name of the Pygments (syntax highlighting) style to use.
|
# The name of the Pygments (syntax highlighting) style to use.
|
||||||
pygments_style = "sphinx"
|
pygments_style = "sphinx"
|
||||||
|
|||||||
@@ -79,35 +79,6 @@ and to the TRUNCATE data definition query.
|
|||||||
|
|
||||||
In addition, the timeout parameter can be applied to SELECT queries as well.
|
In addition, the timeout parameter can be applied to SELECT queries as well.
|
||||||
|
|
||||||
|
|
||||||
After [enabling object storage support](../operating-scylla/admin.rst#admin-keyspace-storage-options), configure your endpoints by
|
|
||||||
following these [instructions](../operating-scylla/admin.rst#object-storage-configuration).
|
|
||||||
|
|
||||||
|
|
||||||
Now you can configure your object storage when creating a keyspace:
|
|
||||||
|
|
||||||
```cql
|
|
||||||
CREATE KEYSPACE with STORAGE = { 'type': 'S3', 'endpoint': '$endpoint_name', 'bucket': '$bucket' }
|
|
||||||
```
|
|
||||||
|
|
||||||
**Example**
|
|
||||||
|
|
||||||
```cql
|
|
||||||
CREATE KEYSPACE ks
|
|
||||||
WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 }
|
|
||||||
AND STORAGE = { 'type' : 'S3', 'bucket' : '/tmp/b1', 'endpoint' : 'localhost' } ;
|
|
||||||
```
|
|
||||||
|
|
||||||
Storage options can be inspected by checking the new system schema table: `system_schema.scylla_keyspaces`:
|
|
||||||
|
|
||||||
```cql
|
|
||||||
cassandra@cqlsh> select * from system_schema.scylla_keyspaces;
|
|
||||||
|
|
||||||
keyspace_name | storage_options | storage_type
|
|
||||||
---------------+------------------------------------------------+--------------
|
|
||||||
ksx | {'bucket': '/tmp/xx', 'endpoint': 'localhost'} | S3
|
|
||||||
```
|
|
||||||
|
|
||||||
## PRUNE MATERIALIZED VIEW statements
|
## PRUNE MATERIALIZED VIEW statements
|
||||||
|
|
||||||
A special statement is dedicated for pruning ghost rows from materialized views.
|
A special statement is dedicated for pruning ghost rows from materialized views.
|
||||||
@@ -135,6 +106,15 @@ which is recommended in order to make the operation less heavyweight
|
|||||||
and allow for running multiple parallel pruning statements for non-overlapping
|
and allow for running multiple parallel pruning statements for non-overlapping
|
||||||
token ranges.
|
token ranges.
|
||||||
|
|
||||||
|
By default, the PRUNE MATERIALIZED VIEW statement is relatively slow, only
|
||||||
|
performing one base read or write at a time. This can be changed with the
|
||||||
|
USING CONCURRENCY clause. If the clause is used, the concurrency of reads
|
||||||
|
and writes from the base table will be allowed to increase up to the specified
|
||||||
|
value. For example, to run the PRUNE with 100 parallel reads/writes, you can use:
|
||||||
|
```cql
|
||||||
|
PRUNE MATERIALIZED VIEW my_view WHERE v = 19 USING CONCURRENCY 100;
|
||||||
|
```
|
||||||
|
|
||||||
## Synchronous materialized views
|
## Synchronous materialized views
|
||||||
|
|
||||||
Usually, when a table with materialized views is updated, the update to the
|
Usually, when a table with materialized views is updated, the update to the
|
||||||
|
|||||||
@@ -312,14 +312,6 @@ Please use :ref:`Per-table tablet options <cql-per-table-tablet-options>` instea
|
|||||||
|
|
||||||
See :doc:`Data Distribution with Tablets </architecture/tablets>` for more information about tablets.
|
See :doc:`Data Distribution with Tablets </architecture/tablets>` for more information about tablets.
|
||||||
|
|
||||||
Keyspace storage options :label-caution:`Experimental`
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
By default, SStables of a keyspace are stored locally.
|
|
||||||
As an alternative, you can configure your keyspace to be stored
|
|
||||||
on Amazon S3 or another S3-compatible object store.
|
|
||||||
See :ref:`Keyspace storage options <admin-keyspace-storage-options>` for details.
|
|
||||||
|
|
||||||
.. _consistency-option:
|
.. _consistency-option:
|
||||||
|
|
||||||
Keyspace ``consistency`` options :label-caution:`Experimental`
|
Keyspace ``consistency`` options :label-caution:`Experimental`
|
||||||
@@ -637,7 +629,7 @@ Some examples of primary key definition are:
|
|||||||
key), and ``c`` is the clustering column.
|
key), and ``c`` is the clustering column.
|
||||||
|
|
||||||
|
|
||||||
.. note:: A *null* value is not allowed as any partition-key or clustering-key column. A Null value is *not* the same as an empty string.
|
.. note:: A *null* value is not allowed as any partition-key or clustering-key column. A *null* value is *not* the same as an empty string.
|
||||||
|
|
||||||
.. _partition-key:
|
.. _partition-key:
|
||||||
|
|
||||||
|
|||||||
@@ -67,9 +67,9 @@ Please refer to the :ref:`update parameters <update-parameters>` section for mor
|
|||||||
|
|
||||||
.. code-block:: none
|
.. code-block:: none
|
||||||
|
|
||||||
movie | director | main_actor | year
|
movie | director | main_actor
|
||||||
----------+---------------+------------+------
|
----------+---------------+------------
|
||||||
Serenity | Joseph Whedon | Unknown | null
|
Serenity | Joseph Whedon | Unknown
|
||||||
|
|
||||||
|
|
||||||
``INSERT`` is not required to assign all columns, so if two
|
``INSERT`` is not required to assign all columns, so if two
|
||||||
@@ -80,7 +80,7 @@ columns effects of both statements are preserved:
|
|||||||
|
|
||||||
INSERT INTO NerdMovies (movie, director, main_actor)
|
INSERT INTO NerdMovies (movie, director, main_actor)
|
||||||
VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion');
|
VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion');
|
||||||
INSERT INTO NerdMovies (movie, director, main_actor, year)
|
INSERT INTO NerdMovies (movie, director, year)
|
||||||
VALUES ('Serenity', 'Josseph Hill Whedon', 2005);
|
VALUES ('Serenity', 'Josseph Hill Whedon', 2005);
|
||||||
SELECT * FROM NerdMovies WHERE movie = 'Serenity'
|
SELECT * FROM NerdMovies WHERE movie = 'Serenity'
|
||||||
|
|
||||||
|
|||||||
@@ -28,6 +28,7 @@ Scylla uses the following directory structure to store all its SSTables, for exa
|
|||||||
│ │ │ ├── ...
|
│ │ │ ├── ...
|
||||||
│ │ │ └── mc-1-big-TOC.txt
|
│ │ │ └── mc-1-big-TOC.txt
|
||||||
│ │ ├── staging
|
│ │ ├── staging
|
||||||
|
│ │ ├── quarantine
|
||||||
│ │ └── upload
|
│ │ └── upload
|
||||||
│ └── cf-7ec943202fc611e9a130000000000000
|
│ └── cf-7ec943202fc611e9a130000000000000
|
||||||
│ ├── snapshots
|
│ ├── snapshots
|
||||||
@@ -36,6 +37,7 @@ Scylla uses the following directory structure to store all its SSTables, for exa
|
|||||||
│ │ ├── ks-cf-ka-3-TOC.txt
|
│ │ ├── ks-cf-ka-3-TOC.txt
|
||||||
│ │ └── manifest.json
|
│ │ └── manifest.json
|
||||||
│ ├── staging
|
│ ├── staging
|
||||||
|
│ ├── quarantine
|
||||||
│ └── upload
|
│ └── upload
|
||||||
├── system
|
├── system
|
||||||
│ ├── schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
|
│ ├── schema_columnfamilies-45f5b36024bc3f83a3631034ea4fa697
|
||||||
@@ -167,6 +169,21 @@ The per-table directory may contain several sub-directories, as listed below:
|
|||||||
Used for ingesting external SSTables into Scylla on startup.
|
Used for ingesting external SSTables into Scylla on startup.
|
||||||
|
|
||||||
|
|
||||||
|
* Quarantine directory (`quarantine`)
|
||||||
|
A sub-directory holding SSTables that have been quarantined, typically due to
|
||||||
|
validation failures or corruption detected during scrub operations.
|
||||||
|
|
||||||
|
Quarantined SSTables are isolated to prevent them from being read or used by the
|
||||||
|
database. They can be inspected manually for debugging purposes or removed using
|
||||||
|
the `drop_quarantined_sstables` API operation.
|
||||||
|
|
||||||
|
The scrub operation can be configured to handle quarantined SSTables using the
|
||||||
|
`quarantine_mode` parameter with the following options:
|
||||||
|
- `INCLUDE`: Process both regular and quarantined SSTables (default)
|
||||||
|
- `EXCLUDE`: Skip quarantined SSTables during scrub
|
||||||
|
- `ONLY`: Process only quarantined SSTables
|
||||||
|
|
||||||
|
|
||||||
* Temporary SSTable directory (`<generation>.sstable`)
|
* Temporary SSTable directory (`<generation>.sstable`)
|
||||||
A directory created when writing new SSTables.
|
A directory created when writing new SSTables.
|
||||||
|
|
||||||
|
|||||||
@@ -375,6 +375,30 @@ Columns:
|
|||||||
* `tablets_allocated` - Number of tablet replicas on the node. Migrating tablets are accounted as if migration already finished.
|
* `tablets_allocated` - Number of tablet replicas on the node. Migrating tablets are accounted as if migration already finished.
|
||||||
* `tablets_allocated_per_shard` - `tablets_allocated` divided by shard count on the node.
|
* `tablets_allocated_per_shard` - `tablets_allocated` divided by shard count on the node.
|
||||||
|
|
||||||
|
## system.tablet_sizes
|
||||||
|
|
||||||
|
Contains information about the current tablet disk sizes. Table can contain incomplete data, in which case `missing_replicas`
|
||||||
|
will contain the host IDs of replicas for which the tablet size is not known.
|
||||||
|
Can be queried on any node, but the data comes from the group0 leader.
|
||||||
|
Reads wait for group0 leader to be elected and load balancer stats to become available.
|
||||||
|
|
||||||
|
Schema:
|
||||||
|
```cql
|
||||||
|
CREATE TABLE system.tablet_sizes (
|
||||||
|
table_id uuid,
|
||||||
|
last_token bigint,
|
||||||
|
missing_replicas frozen<set<uuid>>,
|
||||||
|
replicas frozen<map<uuid, bigint>>,
|
||||||
|
PRIMARY KEY (table_id, last_token)
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
Columns:
|
||||||
|
* `table_id` - The table ID of the table for which tablet sizes are reported.
|
||||||
|
* `last_token` - The last token owned by the tablet.
|
||||||
|
* `missing_replicas` - Set of host IDs for replicas for which a tablet size was not found.
|
||||||
|
* `replicas` - A map of replica host IDs and the disk size of the tablet replica, in bytes
|
||||||
|
|
||||||
## system.protocol_servers
|
## system.protocol_servers
|
||||||
|
|
||||||
The list of all the client-facing data-plane protocol servers and listen addresses (if running).
|
The list of all the client-facing data-plane protocol servers and listen addresses (if running).
|
||||||
|
|||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user