Compare commits

..

3 Commits

Author SHA1 Message Date
copilot-swe-agent[bot]
b549a9b8f2 Store active transitions count in load_balancer instance
Move _active_transitions from migration_plan to load_balancer class
as a member variable. This is cleaner since there's no concurrent
make_plan() calls on the same load_balancer instance.

Changes:
- Removed _active_transitions field from migration_plan
- Added _active_transitions member to load_balancer class
- Updated make_plan() to reset counter at the start
- Log messages now reference _active_transitions directly

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-01-08 13:38:27 +00:00
copilot-swe-agent[bot]
965bc9e5d0 Add active tablet transition count to load balancer logs
Track and report the number of active tablet transitions when
making migration plans. This helps operators understand the
current streaming load when the load balancer is preparing
new migrations.

Co-authored-by: tgrabiec <283695+tgrabiec@users.noreply.github.com>
2026-01-07 18:51:18 +00:00
copilot-swe-agent[bot]
179c8ac67f Initial plan 2026-01-07 18:12:46 +00:00
346 changed files with 1866 additions and 7126 deletions

View File

@@ -84,14 +84,3 @@ ninja build/<mode>/scylla
- Strive for simplicity and clarity, add complexity only when clearly justified
- Question requests: don't blindly implement requests - evaluate trade-offs, identify issues, and suggest better alternatives when appropriate
- Consider different approaches, weigh pros and cons, and recommend the best fit for the specific context
## Test Philosophy
- Performance matters. Tests should run as quickly as possible. Sleeps in the code are highly discouraged and should be avoided, to reduce run time and flakiness.
- Stability matters. Tests should be stable. New tests should be executed 100 times at least to ensure they pass 100 out of 100 times. (use --repeat 100 --max-failures 1 when running it)
- Unit tests should ideally test one thing and one thing only.
- Tests for bug fixes should run before the fix - and show the failure and after the fix - and show they now pass.
- Tests for bug fixes should have in their comments which bug fixes (GitHub or JIRA issue) they test.
- Tests in debug are always slower, so if needed, reduce number of iterations, rows, data used, cycles, etc. in debug mode.
- Tests should strive to be repeatable, and not use random input that will make their results unpredictable.
- Tests should consume as little resources as possible. Prefer running tests on a single node if it is sufficient, for example.

View File

@@ -18,8 +18,6 @@ on:
jobs:
release:
permissions:
contents: write
runs-on: ubuntu-latest
steps:
- name: Checkout

View File

@@ -2,9 +2,6 @@ name: "Docs / Build PR"
# For more information,
# see https://sphinx-theme.scylladb.com/stable/deployment/production.html#available-workflows
permissions:
contents: read
env:
FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

View File

@@ -1,8 +1,5 @@
name: Docs / Validate metrics
permissions:
contents: read
on:
pull_request:
branches:

View File

@@ -10,8 +10,6 @@ on:
jobs:
read-toolchain:
runs-on: ubuntu-latest
permissions:
contents: read
outputs:
image: ${{ steps.read.outputs.image }}
steps:

View File

@@ -105,23 +105,11 @@ future<> controller::start_server() {
alternator_port = _config.alternator_port();
_listen_addresses.push_back({addr, *alternator_port});
}
std::optional<uint16_t> alternator_port_proxy_protocol;
if (_config.alternator_port_proxy_protocol()) {
alternator_port_proxy_protocol = _config.alternator_port_proxy_protocol();
_listen_addresses.push_back({addr, *alternator_port_proxy_protocol});
}
std::optional<uint16_t> alternator_https_port;
std::optional<uint16_t> alternator_https_port_proxy_protocol;
std::optional<tls::credentials_builder> creds;
if (_config.alternator_https_port() || _config.alternator_https_port_proxy_protocol()) {
if (_config.alternator_https_port()) {
alternator_https_port = _config.alternator_https_port();
_listen_addresses.push_back({addr, *alternator_https_port});
}
if (_config.alternator_https_port_proxy_protocol()) {
alternator_https_port_proxy_protocol = _config.alternator_https_port_proxy_protocol();
_listen_addresses.push_back({addr, *alternator_https_port_proxy_protocol});
}
if (_config.alternator_https_port()) {
alternator_https_port = _config.alternator_https_port();
_listen_addresses.push_back({addr, *alternator_https_port});
creds.emplace();
auto opts = _config.alternator_encryption_options();
if (opts.empty()) {
@@ -147,29 +135,20 @@ future<> controller::start_server() {
}
}
_server.invoke_on_all(
[this, addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol, creds = std::move(creds)] (server& server) mutable {
return server.init(addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol, creds,
[this, addr, alternator_port, alternator_https_port, creds = std::move(creds)] (server& server) mutable {
return server.init(addr, alternator_port, alternator_https_port, creds,
_config.alternator_enforce_authorization,
_config.alternator_warn_authorization,
_config.alternator_max_users_query_size_in_trace_output,
&_memory_limiter.local().get_semaphore(),
_config.max_concurrent_requests_per_shard);
}).handle_exception([this, addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol] (std::exception_ptr ep) {
logger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}, proxy-protocol port {}, TLS proxy-protocol port {}: {}",
addr,
alternator_port ? std::to_string(*alternator_port) : "OFF",
alternator_https_port ? std::to_string(*alternator_https_port) : "OFF",
alternator_port_proxy_protocol ? std::to_string(*alternator_port_proxy_protocol) : "OFF",
alternator_https_port_proxy_protocol ? std::to_string(*alternator_https_port_proxy_protocol) : "OFF",
ep);
}).handle_exception([this, addr, alternator_port, alternator_https_port] (std::exception_ptr ep) {
logger.error("Failed to set up Alternator HTTP server on {} port {}, TLS port {}: {}",
addr, alternator_port ? std::to_string(*alternator_port) : "OFF", alternator_https_port ? std::to_string(*alternator_https_port) : "OFF", ep);
return stop_server().then([ep = std::move(ep)] { return make_exception_future<>(ep); });
}).then([addr, alternator_port, alternator_https_port, alternator_port_proxy_protocol, alternator_https_port_proxy_protocol] {
logger.info("Alternator server listening on {}, HTTP port {}, HTTPS port {}, proxy-protocol port {}, TLS proxy-protocol port {}",
addr,
alternator_port ? std::to_string(*alternator_port) : "OFF",
alternator_https_port ? std::to_string(*alternator_https_port) : "OFF",
alternator_port_proxy_protocol ? std::to_string(*alternator_port_proxy_protocol) : "OFF",
alternator_https_port_proxy_protocol ? std::to_string(*alternator_https_port_proxy_protocol) : "OFF");
}).then([addr, alternator_port, alternator_https_port] {
logger.info("Alternator server listening on {}, HTTP port {}, HTTPS port {}",
addr, alternator_port ? std::to_string(*alternator_port) : "OFF", alternator_https_port ? std::to_string(*alternator_https_port) : "OFF");
}).get();
});
}

View File

@@ -5986,11 +5986,6 @@ future<executor::request_return_type> executor::list_tables(client_state& client
_stats.api_operations.list_tables++;
elogger.trace("Listing tables {}", request);
co_await utils::get_local_injector().inject("alternator_list_tables", [] (auto& handler) -> future<> {
handler.set("waiting", true);
co_await handler.wait_for_message(std::chrono::steady_clock::now() + std::chrono::minutes{5});
});
rjson::value* exclusive_start_json = rjson::find(request, "ExclusiveStartTableName");
rjson::value* limit_json = rjson::find(request, "Limit");
std::string exclusive_start = exclusive_start_json ? rjson::to_string(*exclusive_start_json) : "";

View File

@@ -374,40 +374,13 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
for (const auto& header : signed_headers) {
signed_headers_map.emplace(header, std::string_view());
}
std::vector<std::string> modified_values;
for (auto& header : req._headers) {
std::string header_str;
header_str.resize(header.first.size());
std::transform(header.first.begin(), header.first.end(), header_str.begin(), ::tolower);
auto it = signed_headers_map.find(header_str);
if (it != signed_headers_map.end()) {
// replace multiple spaces in the header value header.second with
// a single space, as required by AWS SigV4 header canonization.
// If we modify the value, we need to save it in modified_values
// to keep it alive.
std::string value;
value.reserve(header.second.size());
bool prev_space = false;
bool modified = false;
for (char ch : header.second) {
if (ch == ' ') {
if (!prev_space) {
value += ch;
prev_space = true;
} else {
modified = true; // skip a space
}
} else {
value += ch;
prev_space = false;
}
}
if (modified) {
modified_values.emplace_back(std::move(value));
it->second = std::string_view(modified_values.back());
} else {
it->second = std::string_view(header.second);
}
it->second = std::string_view(header.second);
}
}
@@ -420,7 +393,6 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
datestamp = std::move(datestamp),
signed_headers_str = std::move(signed_headers_str),
signed_headers_map = std::move(signed_headers_map),
modified_values = std::move(modified_values),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (future<key_cache::value_ptr> key_ptr_fut) {
@@ -591,11 +563,11 @@ read_entire_stream(input_stream<char>& inp, size_t length_limit) {
class safe_gzip_zstream {
z_stream _zs;
public:
// If gzip is true, decode a gzip header (for "Content-Encoding: gzip").
// Otherwise, a zlib header (for "Content-Encoding: deflate").
safe_gzip_zstream(bool gzip = true) {
safe_gzip_zstream() {
memset(&_zs, 0, sizeof(_zs));
if (inflateInit2(&_zs, gzip ? 16 + MAX_WBITS : MAX_WBITS) != Z_OK) {
// The strange 16 + WMAX_BITS tells zlib to expect and decode
// a gzip header, not a zlib header.
if (inflateInit2(&_zs, 16 + MAX_WBITS) != Z_OK) {
// Should only happen if memory allocation fails
throw std::bad_alloc();
}
@@ -614,21 +586,19 @@ public:
}
};
// ungzip() takes a chunked_content of a compressed request body, and returns
// the uncompressed content as a chunked_content. If gzip is true, we expect
// gzip header (for "Content-Encoding: gzip"), if gzip is false, we expect a
// zlib header (for "Content-Encoding: deflate").
// ungzip() takes a chunked_content with a gzip-compressed request body,
// uncompresses it, and returns the uncompressed content as a chunked_content.
// If the uncompressed content exceeds length_limit, an error is thrown.
static future<chunked_content>
ungzip(chunked_content&& compressed_body, size_t length_limit, bool gzip = true) {
ungzip(chunked_content&& compressed_body, size_t length_limit) {
chunked_content ret;
// output_buf can be any size - when uncompressing input_buf, it doesn't
// need to fit in a single output_buf, we'll use multiple output_buf for
// a single input_buf if needed.
constexpr size_t OUTPUT_BUF_SIZE = 4096;
temporary_buffer<char> output_buf;
safe_gzip_zstream strm(gzip);
bool complete_stream = false; // empty input is not a valid gzip/deflate
safe_gzip_zstream strm;
bool complete_stream = false; // empty input is not a valid gzip
size_t total_out_bytes = 0;
for (const temporary_buffer<char>& input_buf : compressed_body) {
if (input_buf.empty()) {
@@ -731,8 +701,6 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
sstring content_encoding = req->get_header("Content-Encoding");
if (content_encoding == "gzip") {
content = co_await ungzip(std::move(content), request_content_length_limit);
} else if (content_encoding == "deflate") {
content = co_await ungzip(std::move(content), request_content_length_limit, false);
} else if (!content_encoding.empty()) {
// DynamoDB returns a 500 error for unsupported Content-Encoding.
// I'm not sure if this is the best error code, but let's do it too.
@@ -904,9 +872,7 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
} {
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port,
std::optional<uint16_t> port_proxy_protocol, std::optional<uint16_t> https_port_proxy_protocol,
std::optional<tls::credentials_builder> creds,
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {
_memory_limiter = memory_limiter;
@@ -914,28 +880,20 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
_warn_authorization = std::move(warn_authorization);
_max_concurrent_requests = std::move(max_concurrent_requests);
_max_users_query_size_in_trace_output = std::move(max_users_query_size_in_trace_output);
if (!port && !https_port && !port_proxy_protocol && !https_port_proxy_protocol) {
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
}
return seastar::async([this, addr, port, https_port, port_proxy_protocol, https_port_proxy_protocol, creds] {
return seastar::async([this, addr, port, https_port, creds] {
_executor.start().get();
if (port || port_proxy_protocol) {
if (port) {
set_routes(_http_server._routes);
_http_server.set_content_streaming(true);
if (port) {
_http_server.listen(socket_address{addr, *port}).get();
}
if (port_proxy_protocol) {
listen_options lo;
lo.reuse_address = true;
lo.proxy_protocol = true;
_http_server.listen(socket_address{addr, *port_proxy_protocol}, lo).get();
}
_http_server.listen(socket_address{addr, *port}).get();
_enabled_servers.push_back(std::ref(_http_server));
}
if (https_port || https_port_proxy_protocol) {
if (https_port) {
set_routes(_https_server._routes);
_https_server.set_content_streaming(true);
@@ -955,15 +913,7 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
} else {
_credentials = creds->build_server_credentials();
}
if (https_port) {
_https_server.listen(socket_address{addr, *https_port}, _credentials).get();
}
if (https_port_proxy_protocol) {
listen_options lo;
lo.reuse_address = true;
lo.proxy_protocol = true;
_https_server.listen(socket_address{addr, *https_port_proxy_protocol}, lo, _credentials).get();
}
_https_server.listen(socket_address{addr, *https_port}, _credentials).get();
_enabled_servers.push_back(std::ref(_https_server));
}
});
@@ -1036,8 +986,9 @@ client_data server::ongoing_request::make_client_data() const {
// and keep "driver_version" unset.
cd.driver_name = _user_agent;
// Leave "protocol_version" unset, it has no meaning in Alternator.
// Leave "hostname", "ssl_protocol" and "ssl_cipher_suite" unset for Alternator.
// Note: CQL sets ssl_protocol and ssl_cipher_suite via generic_server::connection base class.
// Leave "hostname", "ssl_protocol" and "ssl_cipher_suite" unset.
// As reported in issue #9216, we never set these fields in CQL
// either (see cql_server::connection::make_client_data()).
return cd;
}

View File

@@ -100,9 +100,7 @@ class server : public peering_sharded_service<server> {
public:
server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port,
std::optional<uint16_t> port_proxy_protocol, std::optional<uint16_t> https_port_proxy_protocol,
std::optional<tls::credentials_builder> creds,
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);
future<> stop();

View File

@@ -9,7 +9,6 @@
#include <seastar/core/chunked_fifo.hh>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/exception.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/http/exception.hh>
#include "task_manager.hh"
@@ -265,7 +264,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
if (id) {
module->unregister_task(id);
}
co_await coroutine::maybe_yield();
co_await maybe_yield();
}
});
co_return json_void();

View File

@@ -76,14 +76,11 @@ sstring generate_salt(RandomNumberEngine& g, scheme scheme) {
///
/// Hash a password combined with an implementation-specific salt string.
/// Deprecated in favor of `hash_with_salt_async`. This function is still used
/// when generating password hashes for storage to ensure that
/// `hash_with_salt` and `hash_with_salt_async` produce identical results,
/// preserving backward compatibility.
/// Deprecated in favor of `hash_with_salt_async`.
///
/// \throws \ref std::system_error when an unexpected implementation-specific error occurs.
///
sstring hash_with_salt(const sstring& pass, const sstring& salt);
[[deprecated("Use hash_with_salt_async instead")]] sstring hash_with_salt(const sstring& pass, const sstring& salt);
///
/// Async version of `hash_with_salt` that returns a future.

View File

@@ -204,7 +204,7 @@ future<topology_description> topology_description::clone_async() const {
for (const auto& entry : _entries) {
vec.push_back(entry);
co_await coroutine::maybe_yield();
co_await seastar::maybe_yield();
}
co_return topology_description{std::move(vec)};

View File

@@ -15,7 +15,7 @@
#include "mutation/tombstone.hh"
#include "schema/schema.hh"
#include <seastar/core/sstring.hh>
#include "seastar/core/sstring.hh"
#include "types/concrete_types.hh"
#include "types/types.hh"
#include "types/user.hh"

View File

@@ -1092,7 +1092,6 @@ scylla_core = (['message/messaging_service.cc',
'cql3/statements/list_service_level_attachments_statement.cc',
'cql3/statements/list_effective_service_level_statement.cc',
'cql3/statements/describe_statement.cc',
'cql3/statements/view_prop_defs.cc',
'cql3/update_parameters.cc',
'cql3/util.cc',
'cql3/ut_name.cc',
@@ -2806,35 +2805,38 @@ def write_build_file(f,
seastar_dep = f'$builddir/{mode}/seastar/libseastar.{seastar_lib_ext}'
seastar_testing_dep = f'$builddir/{mode}/seastar/libseastar_testing.{seastar_lib_ext}'
f.write(f'build {seastar_dep}: ninja $builddir/{mode}/seastar/build.ninja | always {profile_dep}\n')
f.write('build {seastar_dep}: ninja $builddir/{mode}/seastar/build.ninja | always {profile_dep}\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(f' subdir = $builddir/{mode}/seastar\n')
f.write(' target = seastar\n')
f.write(f'build {seastar_testing_dep}: ninja $builddir/{mode}/seastar/build.ninja | always {profile_dep}\n')
f.write(' subdir = $builddir/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar\n'.format(**locals()))
f.write('build {seastar_testing_dep}: ninja $builddir/{mode}/seastar/build.ninja | always {profile_dep}\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(f' subdir = $builddir/{mode}/seastar\n')
f.write(' target = seastar_testing\n')
f.write(f' profile_dep = {profile_dep}\n')
f.write(' subdir = $builddir/{mode}/seastar\n'.format(**locals()))
f.write(' target = seastar_testing\n'.format(**locals()))
f.write(' profile_dep = {profile_dep}\n'.format(**locals()))
for lib in abseil_libs:
f.write(f'build $builddir/{mode}/abseil/{lib}: ninja $builddir/{mode}/abseil/build.ninja | always {profile_dep}\n')
f.write(f' pool = submodule_pool\n')
f.write(f' subdir = $builddir/{mode}/abseil\n')
f.write(f' target = {lib}\n')
f.write(f' profile_dep = {profile_dep}\n')
f.write('build $builddir/{mode}/abseil/{lib}: ninja $builddir/{mode}/abseil/build.ninja | always {profile_dep}\n'.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(' subdir = $builddir/{mode}/abseil\n'.format(**locals()))
f.write(' target = {lib}\n'.format(**locals()))
f.write(' profile_dep = {profile_dep}\n'.format(**locals()))
f.write(f'build $builddir/{mode}/stdafx.hh.pch: cxx_build_precompiled_header.{mode} stdafx.hh | {profile_dep} {seastar_dep} {abseil_dep} {gen_headers_dep} {pch_dep}\n')
f.write(f'build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n')
f.write('build $builddir/{mode}/seastar/apps/iotune/iotune: ninja $builddir/{mode}/seastar/build.ninja | $builddir/{mode}/seastar/libseastar.{seastar_lib_ext}\n'
.format(**locals()))
f.write(' pool = submodule_pool\n')
f.write(f' subdir = $builddir/{mode}/seastar\n')
f.write(' target = iotune\n')
f.write(f' profile_dep = {profile_dep}\n')
f.write(textwrap.dedent(f'''\
f.write(' subdir = $builddir/{mode}/seastar\n'.format(**locals()))
f.write(' target = iotune\n'.format(**locals()))
f.write(' profile_dep = {profile_dep}\n'.format(**locals()))
f.write(textwrap.dedent('''\
build $builddir/{mode}/iotune: copy $builddir/{mode}/seastar/apps/iotune/iotune
build $builddir/{mode}/iotune.stripped: strip $builddir/{mode}/iotune
build $builddir/{mode}/iotune.debug: phony $builddir/{mode}/iotune.stripped
'''))
''').format(**locals()))
if args.dist_only:
include_scylla_and_iotune = ''
include_scylla_and_iotune_stripped = ''
@@ -2843,16 +2845,16 @@ def write_build_file(f,
include_scylla_and_iotune = f'$builddir/{mode}/scylla $builddir/{mode}/iotune $builddir/{mode}/patchelf'
include_scylla_and_iotune_stripped = f'$builddir/{mode}/scylla.stripped $builddir/{mode}/iotune.stripped $builddir/{mode}/patchelf.stripped'
include_scylla_and_iotune_debug = f'$builddir/{mode}/scylla.debug $builddir/{mode}/iotune.debug'
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-unstripped-{scylla_version}-{scylla_release}.{arch}.tar.gz: package {include_scylla_and_iotune} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter | always\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz: stripped_package {include_scylla_and_iotune_stripped} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter.stripped | always\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-debuginfo-{scylla_version}-{scylla_release}.{arch}.tar.gz: debuginfo_package {include_scylla_and_iotune_debug} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter.debug | always\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz: copy $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write(f'build $builddir/{mode}/dist/tar/{scylla_product}-{arch}-package.tar.gz: copy $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f' mode = {mode}\n')
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-unstripped-{scylla_version}-{scylla_release}.{arch}.tar.gz: package {include_scylla_and_iotune} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz: stripped_package {include_scylla_and_iotune_stripped} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter.stripped | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-debuginfo-{scylla_version}-{scylla_release}.{arch}.tar.gz: debuginfo_package {include_scylla_and_iotune_debug} $builddir/SCYLLA-RELEASE-FILE $builddir/SCYLLA-VERSION-FILE $builddir/debian/debian $builddir/node_exporter/node_exporter.debug | always\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-package.tar.gz: copy $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write('build $builddir/{mode}/dist/tar/{scylla_product}-{arch}-package.tar.gz: copy $builddir/{mode}/dist/tar/{scylla_product}-{scylla_version}-{scylla_release}.{arch}.tar.gz\n'.format(**locals()))
f.write(' mode = {mode}\n'.format(**locals()))
f.write(f'build $builddir/dist/{mode}/redhat: rpmbuild $builddir/{mode}/dist/tar/{scylla_product}-unstripped-{scylla_version}-{scylla_release}.{arch}.tar.gz\n')
f.write(f' mode = {mode}\n')

View File

@@ -105,7 +105,6 @@ target_sources(cql3
statements/list_service_level_attachments_statement.cc
statements/list_effective_service_level_statement.cc
statements/describe_statement.cc
statements/view_prop_defs.cc
update_parameters.cc
util.cc
ut_name.cc

View File

@@ -898,10 +898,6 @@ pkDef[cql3::statements::create_table_statement::raw_statement& expr]
| '(' k1=ident { l.push_back(k1); } ( ',' kn=ident { l.push_back(kn); } )* ')' { $expr.add_key_aliases(l); }
;
cfamProperties[cql3::statements::cf_properties& expr]
: cfamProperty[expr] (K_AND cfamProperty[expr])*
;
cfamProperty[cql3::statements::cf_properties& expr]
: property[*$expr.properties()]
| K_COMPACT K_STORAGE { $expr.set_compact_storage(); }
@@ -939,22 +935,16 @@ typeColumns[create_type_statement& expr]
*/
createIndexStatement returns [std::unique_ptr<create_index_statement> expr]
@init {
auto idx_props = make_shared<index_specific_prop_defs>();
auto props = index_prop_defs();
auto props = make_shared<index_prop_defs>();
bool if_not_exists = false;
auto name = ::make_shared<cql3::index_name>();
std::vector<::shared_ptr<index_target::raw>> targets;
}
: K_CREATE (K_CUSTOM { idx_props->is_custom = true; })? K_INDEX (K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
: K_CREATE (K_CUSTOM { props->is_custom = true; })? K_INDEX (K_IF K_NOT K_EXISTS { if_not_exists = true; } )?
(idxName[*name])? K_ON cf=columnFamilyName '(' (target1=indexIdent { targets.emplace_back(target1); } (',' target2=indexIdent { targets.emplace_back(target2); } )*)? ')'
(K_USING cls=STRING_LITERAL { idx_props->custom_class = sstring{$cls.text}; })?
(K_WITH cfamProperties[props])?
{
props.extract_index_specific_properties_to(*idx_props);
view_prop_defs view_props = std::move(props).into_view_prop_defs();
$expr = std::make_unique<create_index_statement>(cf, name, targets, std::move(idx_props), std::move(view_props), if_not_exists);
}
(K_USING cls=STRING_LITERAL { props->custom_class = sstring{$cls.text}; })?
(K_WITH properties[*props])?
{ $expr = std::make_unique<create_index_statement>(cf, name, targets, props, if_not_exists); }
;
indexIdent returns [::shared_ptr<index_target::raw> id]
@@ -1102,9 +1092,9 @@ alterTypeStatement returns [std::unique_ptr<alter_type_statement> expr]
*/
alterViewStatement returns [std::unique_ptr<alter_view_statement> expr]
@init {
auto props = cql3::statements::view_prop_defs();
auto props = cql3::statements::cf_prop_defs();
}
: K_ALTER K_MATERIALIZED K_VIEW cf=columnFamilyName K_WITH properties[*props.properties()]
: K_ALTER K_MATERIALIZED K_VIEW cf=columnFamilyName K_WITH properties[props]
{
$expr = std::make_unique<alter_view_statement>(std::move(cf), std::move(props));
}

View File

@@ -14,7 +14,6 @@
#include <seastar/core/shared_ptr.hh>
#include <seastar/coroutine/parallel_for_each.hh>
#include <seastar/coroutine/as_future.hh>
#include <seastar/coroutine/try_future.hh>
#include "service/storage_proxy.hh"
#include "service/migration_manager.hh"
@@ -994,7 +993,7 @@ query_processor::execute_with_params(
auto opts = make_internal_options(p, values, cl);
auto statement = p->statement;
auto msg = co_await coroutine::try_future(execute_maybe_with_guard(query_state, std::move(statement), opts, &query_processor::do_execute_with_params));
auto msg = co_await execute_maybe_with_guard(query_state, std::move(statement), opts, &query_processor::do_execute_with_params);
co_return ::make_shared<untyped_result_set>(msg);
}
@@ -1004,7 +1003,7 @@ query_processor::do_execute_with_params(
shared_ptr<cql_statement> statement,
const query_options& options, std::optional<service::group0_guard> guard) {
statement->validate(*this, service::client_state::for_internal_calls());
co_return co_await coroutine::try_future(statement->execute(*this, query_state, options, std::move(guard)));
co_return co_await statement->execute(*this, query_state, options, std::move(guard));
}

View File

@@ -19,7 +19,7 @@
#include "locator/abstract_replication_strategy.hh"
#include "mutation/canonical_mutation.hh"
#include "prepared_statement.hh"
#include <seastar/coroutine/exception.hh>
#include "seastar/coroutine/exception.hh"
#include "service/migration_manager.hh"
#include "service/storage_proxy.hh"
#include "service/topology_mutation.hh"
@@ -206,9 +206,8 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
locator::replication_strategy_params(ks_md_update->strategy_options(), ks_md_update->initial_tablets(), ks_md_update->consistency_option()),
topo);
// If RF-rack-validity must be enforced for the keyspace according to `enforce_rf_rack_validity_for_keyspace`,
// it's forbidden to perform a schema change that would lead to an RF-rack-invalid keyspace.
// Verify that this change does not.
// If `rf_rack_valid_keyspaces` is enabled, it's forbidden to perform a schema change that
// would lead to an RF-rack-valid keyspace. Verify that this change does not.
// For more context, see: scylladb/scylladb#23071.
try {
// There are two things to note here:
@@ -226,13 +225,13 @@ cql3::statements::alter_keyspace_statement::prepare_schema_mutations(query_proce
// disturb it (see scylladb/scylladb#23345), but we ignore that.
locator::assert_rf_rack_valid_keyspace(_name, tmptr, *rs);
} catch (const std::exception& e) {
if (replica::database::enforce_rf_rack_validity_for_keyspace(qp.db().get_config(), *ks_md)) {
if (qp.db().get_config().rf_rack_valid_keyspaces()) {
// There's no guarantee what the type of the exception will be, so we need to
// wrap it manually here in a type that can be passed to the user.
throw exceptions::invalid_request_exception(e.what());
} else {
// Even when RF-rack-validity is not enforced for the keyspace, we'd
// like to inform the user that the keyspace they're altering will not
// Even when the configuration option `rf_rack_valid_keyspaces` is set to false,
// we'd like to inform the user that the keyspace they're altering will not
// satisfy the restriction after the change--but just as a warning.
// For more context, see issue: scylladb/scylladb#23330.
warnings.push_back(seastar::format(

View File

@@ -11,7 +11,6 @@
#include <seastar/core/coroutine.hh>
#include "cql3/statements/alter_view_statement.hh"
#include "cql3/statements/prepared_statement.hh"
#include "cql3/statements/view_prop_defs.hh"
#include "service/migration_manager.hh"
#include "service/storage_proxy.hh"
#include "validation.hh"
@@ -23,7 +22,7 @@ namespace cql3 {
namespace statements {
alter_view_statement::alter_view_statement(cf_name view_name, std::optional<view_prop_defs> properties)
alter_view_statement::alter_view_statement(cf_name view_name, std::optional<cf_prop_defs> properties)
: schema_altering_statement{std::move(view_name)}
, _properties{std::move(properties)}
{
@@ -53,8 +52,8 @@ view_ptr alter_view_statement::prepare_view(data_dictionary::database db) const
throw exceptions::invalid_request_exception("ALTER MATERIALIZED VIEW WITH invoked, but no parameters found");
}
auto schema_extensions = _properties->properties()->make_schema_extensions(db.extensions());
_properties->validate_raw(view_prop_defs::op_type::alter, db, keyspace(), schema_extensions);
auto schema_extensions = _properties->make_schema_extensions(db.extensions());
_properties->validate(db, keyspace(), schema_extensions);
bool is_colocated = [&] {
if (!db.find_keyspace(keyspace()).get_replication_strategy().uses_tablets()) {
@@ -71,15 +70,28 @@ view_ptr alter_view_statement::prepare_view(data_dictionary::database db) const
}();
if (is_colocated) {
auto gc_opts = _properties->properties()->get_tombstone_gc_options(schema_extensions);
auto gc_opts = _properties->get_tombstone_gc_options(schema_extensions);
if (gc_opts && gc_opts->mode() == tombstone_gc_mode::repair) {
throw exceptions::invalid_request_exception("The 'repair' mode for tombstone_gc is not allowed on co-located materialized view tables.");
}
}
auto builder = schema_builder(schema);
_properties->apply_to_builder(view_prop_defs::op_type::alter, builder, std::move(schema_extensions),
db, keyspace(), is_colocated);
_properties->apply_to_builder(builder, std::move(schema_extensions), db, keyspace(), !is_colocated);
if (builder.get_gc_grace_seconds() == 0) {
throw exceptions::invalid_request_exception(
"Cannot alter gc_grace_seconds of a materialized view to 0, since this "
"value is used to TTL undelivered updates. Setting gc_grace_seconds too "
"low might cause undelivered updates to expire before being replayed.");
}
if (builder.default_time_to_live().count() > 0) {
throw exceptions::invalid_request_exception(
"Cannot set or alter default_time_to_live for a materialized view. "
"Data in a materialized view always expire at the same time than "
"the corresponding data in the parent table.");
}
return view_ptr(builder.build());
}

View File

@@ -12,8 +12,8 @@
#include <seastar/core/shared_ptr.hh>
#include "cql3/statements/view_prop_defs.hh"
#include "data_dictionary/data_dictionary.hh"
#include "cql3/statements/cf_prop_defs.hh"
#include "cql3/statements/schema_altering_statement.hh"
namespace cql3 {
@@ -26,10 +26,10 @@ namespace statements {
/** An <code>ALTER MATERIALIZED VIEW</code> parsed from a CQL query statement. */
class alter_view_statement : public schema_altering_statement {
private:
std::optional<view_prop_defs> _properties;
std::optional<cf_prop_defs> _properties;
view_ptr prepare_view(data_dictionary::database db) const;
public:
alter_view_statement(cf_name view_name, std::optional<view_prop_defs> properties);
alter_view_statement(cf_name view_name, std::optional<cf_prop_defs> properties);
virtual future<> check_access(query_processor& qp, const service::client_state& state) const override;

View File

@@ -19,8 +19,7 @@ namespace statements {
/**
* Class for common statement properties.
*/
class cf_properties {
protected:
class cf_properties final {
const ::shared_ptr<cf_prop_defs> _properties = ::make_shared<cf_prop_defs>();
bool _use_compact_storage = false;
std::vector<std::pair<::shared_ptr<column_identifier>, bool>> _defined_ordering; // Insertion ordering is important

View File

@@ -14,7 +14,6 @@
#include "db/view/view.hh"
#include "exceptions/exceptions.hh"
#include "index/vector_index.hh"
#include "locator/token_metadata_fwd.hh"
#include "prepared_statement.hh"
#include "replica/database.hh"
#include "types/types.hh"
@@ -219,24 +218,18 @@ view_ptr create_index_statement::create_view_for_index(const schema_ptr schema,
std::map<sstring, sstring> tags_map = {{db::SYNCHRONOUS_VIEW_UPDATES_TAG_KEY, "true"}};
builder.add_extension(db::tags_extension::NAME, ::make_shared<db::tags_extension>(tags_map));
}
const schema::extensions_map exts = _view_properties.properties()->make_schema_extensions(db.extensions());
_view_properties.apply_to_builder(view_prop_defs::op_type::create, builder, exts, db, keyspace(), is_colocated);
return view_ptr{builder.build()};
}
create_index_statement::create_index_statement(cf_name name,
::shared_ptr<index_name> index_name,
std::vector<::shared_ptr<index_target::raw>> raw_targets,
::shared_ptr<index_specific_prop_defs> idx_properties,
view_prop_defs view_properties,
::shared_ptr<index_prop_defs> properties,
bool if_not_exists)
: schema_altering_statement(name)
, _index_name(index_name->get_idx())
, _raw_targets(raw_targets)
, _idx_properties(std::move(idx_properties))
, _view_properties(std::move(view_properties))
, _properties(properties)
, _if_not_exists(if_not_exists)
{
}
@@ -259,53 +252,14 @@ static sstring target_type_name(index_target::target_type type) {
void
create_index_statement::validate(query_processor& qp, const service::client_state& state) const
{
if (_raw_targets.empty() && !_idx_properties->is_custom) {
if (_raw_targets.empty() && !_properties->is_custom) {
throw exceptions::invalid_request_exception("Only CUSTOM indexes can be created without specifying a target column");
}
_idx_properties->validate();
// FIXME: This is ugly and can be improved.
const bool is_vector_index = _idx_properties->custom_class && *_idx_properties->custom_class == "vector_index";
const bool uses_view_properties = _view_properties.properties()->count() > 0
|| _view_properties.use_compact_storage()
|| _view_properties.defined_ordering().size() > 0;
if (is_vector_index && uses_view_properties) {
throw exceptions::invalid_request_exception("You cannot use view properties with a vector index");
}
const schema::extensions_map exts = _view_properties.properties()->make_schema_extensions(qp.db().extensions());
_view_properties.validate_raw(view_prop_defs::op_type::create, qp.db(), keyspace(), exts);
// These keywords are still accepted by other schema entities, but they don't have effect on them.
// Since indexes are not bound by any backward compatibility contract in this regard, let's forbid these.
static sstring obsolete_keywords[] = {
"index_interval",
"replicate_on_write",
"populate_io_cache_on_flush",
"read_repair_chance",
"dclocal_read_repair_chance",
};
for (const sstring& keyword : obsolete_keywords) {
if (_view_properties.properties()->has_property(keyword)) {
// We use the same type of exception and the same error message as would be thrown for
// an invalid property via `_view_properties.validate_raw`.
throw exceptions::syntax_exception(seastar::format("Unknown property '{}'", keyword));
}
}
// FIXME: This is a temporary limitation as it might deserve more attention.
if (!_view_properties.defined_ordering().empty()) {
throw exceptions::invalid_request_exception("Indexes do not allow for specifying the clustering order");
}
_properties->validate();
}
std::pair<std::vector<::shared_ptr<index_target>>, cql3::cql_warnings_vec>
create_index_statement::validate_while_executing(data_dictionary::database db, locator::token_metadata_ptr tmptr) const {
cql3::cql_warnings_vec warnings;
std::vector<::shared_ptr<index_target>> create_index_statement::validate_while_executing(data_dictionary::database db) const {
auto schema = validation::validate_column_family(db, keyspace(), column_family());
if (schema->is_counter()) {
@@ -327,22 +281,13 @@ create_index_statement::validate_while_executing(data_dictionary::database db, l
// Regular secondary indexes require rf-rack-validity.
// Custom indexes need to validate this property themselves, if they need it.
if (!_idx_properties || !_idx_properties->custom_class) {
if (!_properties || !_properties->custom_class) {
try {
db::view::validate_view_keyspace(db, keyspace(), tmptr);
db::view::validate_view_keyspace(db, keyspace());
} catch (const std::exception& e) {
// The type of the thrown exception is not specified, so we need to wrap it here.
throw exceptions::invalid_request_exception(e.what());
}
if (db.find_keyspace(keyspace()).uses_tablets()) {
warnings.emplace_back(
"Creating an index in a keyspace that uses tablets requires "
"the keyspace to remain RF-rack-valid while the index exists. "
"Some operations will be restricted to enforce this: altering the keyspace's replication "
"factor, adding a node in a new rack, and removing or decommissioning a node that would "
"eliminate a rack.");
}
}
validate_for_local_index(*schema);
@@ -352,14 +297,14 @@ create_index_statement::validate_while_executing(data_dictionary::database db, l
targets.emplace_back(raw_target->prepare(*schema));
}
if (_idx_properties && _idx_properties->custom_class) {
auto custom_index_factory = secondary_index::secondary_index_manager::get_custom_class_factory(*_idx_properties->custom_class);
if (_properties && _properties->custom_class) {
auto custom_index_factory = secondary_index::secondary_index_manager::get_custom_class_factory(*_properties->custom_class);
if (!custom_index_factory) {
throw exceptions::invalid_request_exception(format("Non-supported custom class \'{}\' provided", *_idx_properties->custom_class));
throw exceptions::invalid_request_exception(format("Non-supported custom class \'{}\' provided", *(_properties->custom_class)));
}
auto custom_index = (*custom_index_factory)();
custom_index->validate(*schema, *_idx_properties, targets, db.features(), db);
_idx_properties->index_version = custom_index->index_version(*schema);
custom_index->validate(*schema, *_properties, targets, db.features(), db);
_properties->index_version = custom_index->index_version(*schema);
}
if (targets.size() > 1) {
@@ -439,7 +384,7 @@ create_index_statement::validate_while_executing(data_dictionary::database db, l
}
}
return std::make_pair(std::move(targets), std::move(warnings));
return targets;
}
void create_index_statement::validate_for_local_index(const schema& schema) const {
@@ -578,7 +523,7 @@ void create_index_statement::validate_target_column_is_map_if_index_involves_key
void create_index_statement::validate_targets_for_multi_column_index(std::vector<::shared_ptr<index_target>> targets) const
{
if (!_idx_properties->is_custom) {
if (!_properties->is_custom) {
if (targets.size() > 2 || (targets.size() == 2 && std::holds_alternative<index_target::single_column>(targets.front()->value))) {
throw exceptions::invalid_request_exception("Only CUSTOM indexes support multiple columns");
}
@@ -592,9 +537,8 @@ void create_index_statement::validate_targets_for_multi_column_index(std::vector
}
}
std::pair<std::optional<create_index_statement::base_schema_with_new_index>, cql3::cql_warnings_vec>
create_index_statement::build_index_schema(data_dictionary::database db, locator::token_metadata_ptr tmptr) const {
auto [targets, warnings] = validate_while_executing(db, tmptr);
std::optional<create_index_statement::base_schema_with_new_index> create_index_statement::build_index_schema(data_dictionary::database db) const {
auto targets = validate_while_executing(db);
auto schema = db.find_schema(keyspace(), column_family());
@@ -610,8 +554,8 @@ create_index_statement::build_index_schema(data_dictionary::database db, locator
}
index_metadata_kind kind;
index_options_map index_options;
if (_idx_properties->custom_class) {
index_options = _idx_properties->get_options();
if (_properties->custom_class) {
index_options = _properties->get_options();
kind = index_metadata_kind::custom;
} else {
kind = schema->is_compound() ? index_metadata_kind::composites : index_metadata_kind::keys;
@@ -620,17 +564,17 @@ create_index_statement::build_index_schema(data_dictionary::database db, locator
auto existing_index = schema->find_index_noname(index);
if (existing_index) {
if (_if_not_exists) {
return std::make_pair(std::nullopt, std::move(warnings));
return {};
} else {
throw exceptions::invalid_request_exception(
format("Index {} is a duplicate of existing index {}", index.name(), existing_index.value().name()));
}
}
bool existing_vector_index = _idx_properties->custom_class && _idx_properties->custom_class == "vector_index" && secondary_index::vector_index::has_vector_index_on_column(*schema, targets[0]->column_name());
bool custom_index_with_same_name = _idx_properties->custom_class && db.existing_index_names(keyspace()).contains(_index_name);
bool existing_vector_index = _properties->custom_class && _properties->custom_class == "vector_index" && secondary_index::vector_index::has_vector_index_on_column(*schema, targets[0]->column_name());
bool custom_index_with_same_name = _properties->custom_class && db.existing_index_names(keyspace()).contains(_index_name);
if (existing_vector_index || custom_index_with_same_name) {
if (_if_not_exists) {
return std::make_pair(std::nullopt, std::move(warnings));
return {};
} else {
throw exceptions::invalid_request_exception("There exists a duplicate custom index");
}
@@ -646,13 +590,13 @@ create_index_statement::build_index_schema(data_dictionary::database db, locator
schema_builder builder{schema};
builder.with_index(index);
return std::make_pair(base_schema_with_new_index{builder.build(), index}, std::move(warnings));
return base_schema_with_new_index{builder.build(), index};
}
future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, utils::chunked_vector<mutation>, cql3::cql_warnings_vec>>
create_index_statement::prepare_schema_mutations(query_processor& qp, const query_options&, api::timestamp_type ts) const {
using namespace cql_transport;
auto [res, warnings] = build_index_schema(qp.db(), qp.proxy().get_token_metadata_ptr());
auto res = build_index_schema(qp.db());
::shared_ptr<event::schema_change> ret;
utils::chunked_vector<mutation> muts;
@@ -682,7 +626,7 @@ create_index_statement::prepare_schema_mutations(query_processor& qp, const quer
column_family());
}
co_return std::make_tuple(std::move(ret), std::move(muts), std::move(warnings));
co_return std::make_tuple(std::move(ret), std::move(muts), std::vector<sstring>());
}
std::unique_ptr<cql3::statements::prepared_statement>

View File

@@ -10,8 +10,6 @@
#pragma once
#include "cql3/statements/index_prop_defs.hh"
#include "cql3/statements/view_prop_defs.hh"
#include "schema_altering_statement.hh"
#include "index_target.hh"
@@ -29,25 +27,20 @@ class index_name;
namespace statements {
class index_specific_prop_defs;
class index_prop_defs;
/** A <code>CREATE INDEX</code> statement parsed from a CQL query. */
class create_index_statement : public schema_altering_statement {
const sstring _index_name;
const std::vector<::shared_ptr<index_target::raw>> _raw_targets;
// Options specific to this index.
const ::shared_ptr<index_specific_prop_defs> _idx_properties;
// Options corresponding to the underlying materialized view.
const view_prop_defs _view_properties;
const ::shared_ptr<index_prop_defs> _properties;
const bool _if_not_exists;
cql_stats* _cql_stats = nullptr;
public:
create_index_statement(cf_name name, ::shared_ptr<index_name> index_name,
std::vector<::shared_ptr<index_target::raw>> raw_targets,
::shared_ptr<index_specific_prop_defs> idx_properties, view_prop_defs view_properties, bool if_not_exists);
::shared_ptr<index_prop_defs> properties, bool if_not_exists);
future<> check_access(query_processor& qp, const service::client_state& state) const override;
void validate(query_processor&, const service::client_state& state) const override;
@@ -60,7 +53,7 @@ public:
schema_ptr schema;
index_metadata index;
};
std::pair<std::optional<base_schema_with_new_index>, cql3::cql_warnings_vec> build_index_schema(data_dictionary::database db, locator::token_metadata_ptr tmptr) const;
std::optional<base_schema_with_new_index> build_index_schema(data_dictionary::database db) const;
view_ptr create_view_for_index(const schema_ptr, const index_metadata& im, const data_dictionary::database&) const;
private:
void validate_for_local_index(const schema& schema) const;
@@ -76,7 +69,7 @@ private:
const sstring& name,
index_metadata_kind kind,
const index_options_map& options);
std::pair<std::vector<::shared_ptr<index_target>>, cql3::cql_warnings_vec> validate_while_executing(data_dictionary::database db, locator::token_metadata_ptr tmptr) const;
std::vector<::shared_ptr<index_target>> validate_while_executing(data_dictionary::database db) const;
};
}

View File

@@ -116,21 +116,21 @@ future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, utils::chun
warnings.push_back("Keyspace `initial` tablets option is deprecated. Use per-table tablet options instead.");
}
// If RF-rack-validity must be enforced for the keyspace according to `enforce_rf_rack_validity_for_keyspace`,
// it's forbidden to create an RF-rack-invalid keyspace. Verify that it's RF-rack-valid.
// If `rf_rack_valid_keyspaces` is enabled, it's forbidden to create an RF-rack-invalid keyspace.
// Verify that it's RF-rack-valid.
// For more context, see: scylladb/scylladb#23071.
try {
// We hold a group0_guard, so it's correct to check this here.
// The topology or schema cannot change while we're performing this query.
locator::assert_rf_rack_valid_keyspace(_name, tmptr, *rs);
} catch (const std::exception& e) {
if (replica::database::enforce_rf_rack_validity_for_keyspace(cfg, *ksm)) {
if (cfg.rf_rack_valid_keyspaces()) {
// There's no guarantee what the type of the exception will be, so we need to
// wrap it manually here in a type that can be passed to the user.
throw exceptions::invalid_request_exception(e.what());
} else {
// Even when RF-rack-validity is not enforced for the keyspace, we'd
// like to inform the user that the keyspace they're creating does not
// Even when the configuration option `rf_rack_valid_keyspaces` is set to false,
// we'd like to inform the user that the keyspace they're creating does not
// satisfy the restriction--but just as a warning.
// For more context, see issue: scylladb/scylladb#23330.
warnings.push_back(seastar::format(

View File

@@ -8,7 +8,6 @@
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
*/
#include "cql3/statements/view_prop_defs.hh"
#include "exceptions/exceptions.hh"
#include "utils/assert.hh"
#include <unordered_set>
@@ -106,7 +105,7 @@ static bool validate_primary_key(
return new_non_pk_column;
}
std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(data_dictionary::database db, locator::token_metadata_ptr tmptr) const {
std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(data_dictionary::database db) const {
// We need to make sure that:
// - materialized view name is valid
// - primary key includes all columns in base table's primary key
@@ -120,7 +119,15 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
cql3::cql_warnings_vec warnings;
auto schema_extensions = _properties.properties()->make_schema_extensions(db.extensions());
_properties.validate_raw(view_prop_defs::op_type::create, db, keyspace(), schema_extensions);
_properties.validate(db, keyspace(), schema_extensions);
if (_properties.use_compact_storage()) {
throw exceptions::invalid_request_exception(format("Cannot use 'COMPACT STORAGE' when defining a materialized view"));
}
if (_properties.properties()->get_cdc_options(schema_extensions)) {
throw exceptions::invalid_request_exception("Cannot enable CDC for a materialized view");
}
// View and base tables must be in the same keyspace, to ensure that RF
// is the same (because we assign a view replica to each base replica).
@@ -146,21 +153,12 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
schema_ptr schema = validation::validate_column_family(db, _base_name.get_keyspace(), _base_name.get_column_family());
try {
db::view::validate_view_keyspace(db, keyspace(), tmptr);
db::view::validate_view_keyspace(db, keyspace());
} catch (const std::exception& e) {
// The type of the thrown exception is not specified, so we need to wrap it here.
throw exceptions::invalid_request_exception(e.what());
}
if (db.find_keyspace(keyspace()).uses_tablets()) {
warnings.emplace_back(
"Creating a materialized view in a keyspaces that uses tablets requires "
"the keyspace to remain RF-rack-valid while the materialized view exists. "
"Some operations will be restricted to enforce this: altering the keyspace's replication "
"factor, adding a node in a new rack, and removing or decommissioning a node that would "
"eliminate a rack.");
}
if (schema->is_counter()) {
throw exceptions::invalid_request_exception(format("Materialized views are not supported on counter tables"));
}
@@ -343,7 +341,16 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
warnings.emplace_back(std::move(warning_text));
}
schema_builder builder{keyspace(), column_family()};
const auto maybe_id = _properties.properties()->get_id();
if (maybe_id && db.try_find_table(*maybe_id)) {
const auto schema_ptr = db.find_schema(*maybe_id);
const auto& ks_name = schema_ptr->ks_name();
const auto& cf_name = schema_ptr->cf_name();
throw exceptions::invalid_request_exception(seastar::format("Table with ID {} already exists: {}.{}", *maybe_id, ks_name, cf_name));
}
schema_builder builder{keyspace(), column_family(), maybe_id};
auto add_columns = [this, &builder] (std::vector<const column_definition*>& defs, column_kind kind) mutable {
for (auto* def : defs) {
auto&& type = _properties.get_reversable_type(*def->column_specification->name, def->type);
@@ -389,8 +396,14 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
}
}
_properties.apply_to_builder(view_prop_defs::op_type::create, builder, std::move(schema_extensions),
db, keyspace(), is_colocated);
_properties.properties()->apply_to_builder(builder, std::move(schema_extensions), db, keyspace(), !is_colocated);
if (builder.default_time_to_live().count() > 0) {
throw exceptions::invalid_request_exception(
"Cannot set or alter default_time_to_live for a materialized view. "
"Data in a materialized view always expire at the same time than "
"the corresponding data in the parent table.");
}
auto where_clause_text = util::relations_to_where_clause(_where_clause);
builder.with_view_info(schema, included.empty(), std::move(where_clause_text));
@@ -401,7 +414,7 @@ std::pair<view_ptr, cql3::cql_warnings_vec> create_view_statement::prepare_view(
future<std::tuple<::shared_ptr<cql_transport::event::schema_change>, utils::chunked_vector<mutation>, cql3::cql_warnings_vec>>
create_view_statement::prepare_schema_mutations(query_processor& qp, const query_options&, api::timestamp_type ts) const {
utils::chunked_vector<mutation> m;
auto [definition, warnings] = prepare_view(qp.db(), qp.proxy().get_token_metadata_ptr());
auto [definition, warnings] = prepare_view(qp.db());
try {
m = co_await service::prepare_new_view_announcement(qp.proxy(), std::move(definition), ts);
} catch (const exceptions::already_exists_exception& e) {

View File

@@ -7,9 +7,9 @@
#pragma once
#include "cql3/statements/schema_altering_statement.hh"
#include "cql3/statements/cf_properties.hh"
#include "cql3/cf_name.hh"
#include "cql3/expr/expression.hh"
#include "cql3/statements/view_prop_defs.hh"
#include <seastar/core/shared_ptr.hh>
@@ -35,7 +35,7 @@ private:
expr::expression _where_clause;
std::vector<::shared_ptr<cql3::column_identifier::raw>> _partition_keys;
std::vector<::shared_ptr<cql3::column_identifier::raw>> _clustering_keys;
view_prop_defs _properties;
cf_properties _properties;
bool _if_not_exists;
public:
@@ -48,7 +48,7 @@ public:
std::vector<::shared_ptr<cql3::column_identifier::raw>> clustering_keys,
bool if_not_exists);
std::pair<view_ptr, cql3::cql_warnings_vec> prepare_view(data_dictionary::database db, locator::token_metadata_ptr tmptr) const;
std::pair<view_ptr, cql3::cql_warnings_vec> prepare_view(data_dictionary::database db) const;
auto& properties() {
return _properties;

View File

@@ -11,7 +11,6 @@
#include <set>
#include <seastar/core/format.hh>
#include "index_prop_defs.hh"
#include "cql3/statements/view_prop_defs.hh"
#include "index/secondary_index.hh"
#include "exceptions/exceptions.hh"
@@ -22,9 +21,7 @@ static void check_system_option_specified(const index_options_map& options, cons
}
}
namespace cql3::statements {
void index_specific_prop_defs::validate() const {
void cql3::statements::index_prop_defs::validate() const {
static std::set<sstring> keywords({ sstring(KW_OPTIONS) });
property_definitions::validate(keywords);
@@ -43,13 +40,13 @@ void index_specific_prop_defs::validate() const {
}
index_options_map
index_specific_prop_defs::get_raw_options() const {
cql3::statements::index_prop_defs::get_raw_options() const {
auto options = get_map(KW_OPTIONS);
return !options ? std::unordered_map<sstring, sstring>() : std::unordered_map<sstring, sstring>(options->begin(), options->end());
}
index_options_map
index_specific_prop_defs::get_options() const {
cql3::statements::index_prop_defs::get_options() const {
auto options = get_raw_options();
options.emplace(db::index::secondary_index::custom_class_option_name, *custom_class);
if (index_version.has_value()) {
@@ -57,25 +54,3 @@ index_specific_prop_defs::get_options() const {
}
return options;
}
void index_prop_defs::extract_index_specific_properties_to(index_specific_prop_defs& target) {
if (properties()->has_property(index_specific_prop_defs::KW_OPTIONS)) {
auto value = properties()->extract_property(index_specific_prop_defs::KW_OPTIONS);
std::visit([&target] <typename T> (T&& val) {
target.add_property(index_specific_prop_defs::KW_OPTIONS, std::forward<T>(val));
}, std::move(value));
}
}
view_prop_defs index_prop_defs::into_view_prop_defs() && {
if (properties()->has_property(index_specific_prop_defs::KW_OPTIONS)) {
utils::on_internal_error(seastar::format(
"Precondition has been violated. The property '{}' is still present", index_specific_prop_defs::KW_OPTIONS));
}
view_prop_defs result = std::move(static_cast<view_prop_defs&>(*this));
return result;
}
} // namespace cql3::statements

View File

@@ -10,7 +10,6 @@
#pragma once
#include "cql3/statements/view_prop_defs.hh"
#include "property_definitions.hh"
#include <seastar/core/sstring.hh>
#include "schema/schema_fwd.hh"
@@ -24,7 +23,7 @@ namespace cql3 {
namespace statements {
class index_specific_prop_defs : public property_definitions {
class index_prop_defs : public property_definitions {
public:
static constexpr auto KW_OPTIONS = "options";
@@ -38,19 +37,6 @@ public:
index_options_map get_options() const;
};
struct index_prop_defs : public view_prop_defs {
/// Extract all of the index-specific properties to `target`.
///
/// If there's a property at an index-specific key, and if `target` already has
/// a value at that key, that value will be replaced.
void extract_index_specific_properties_to(index_specific_prop_defs& target);
/// Turns this object into an object of type `view_prop_defs`, as if moved.
///
/// Precondition: the object MUST NOT contain any index-specific property.
view_prop_defs into_view_prop_defs() &&;
};
}
}

View File

@@ -8,8 +8,8 @@
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
*/
#include <seastar/core/format.hh>
#include <seastar/core/sstring.hh>
#include "seastar/core/format.hh"
#include "seastar/core/sstring.hh"
#include "utils/assert.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include "cql3/statements/request_validations.hh"

View File

@@ -11,7 +11,6 @@
#include <ranges>
#include <seastar/core/format.hh>
#include <stdexcept>
#include "cql3/statements/property_definitions.hh"
#include "exceptions/exceptions.hh"
#include "utils/overloaded_functor.hh"
@@ -103,18 +102,6 @@ bool property_definitions::has_property(const sstring& name) const {
return _properties.contains(name);
}
property_definitions::value_type property_definitions::extract_property(const sstring& name) {
auto it = _properties.find(name);
if (it == _properties.end()) {
throw std::out_of_range{std::format("No property of name '{}'", std::string_view(name))};
}
value_type result = std::move(it->second);
_properties.erase(it);
return result;
}
std::optional<property_definitions::value_type> property_definitions::get(const sstring& name) const {
if (auto it = _properties.find(name); it != _properties.end()) {
return it->second;

View File

@@ -59,8 +59,6 @@ protected:
public:
bool has_property(const sstring& name) const;
value_type extract_property(const sstring& name);
std::optional<value_type> get(const sstring& name) const;
std::optional<extended_map_type> get_extended_map(const sstring& name) const;

View File

@@ -261,8 +261,7 @@ future<> select_statement::check_access(query_processor& qp, const service::clie
auto& cf_name = s->is_view()
? s->view_info()->base_name()
: (cdc ? cdc->cf_name() : column_family());
const schema_ptr& base_schema = cdc ? cdc : _schema;
bool is_vector_indexed = secondary_index::vector_index::has_vector_index(*base_schema);
bool is_vector_indexed = secondary_index::vector_index::has_vector_index(*_schema);
co_await state.has_column_family_access(keyspace(), cf_name, auth::permission::SELECT, auth::command_desc::type::OTHER, is_vector_indexed);
} catch (const data_dictionary::no_such_column_family& e) {
// Will be validated afterwards.
@@ -1977,7 +1976,9 @@ mutation_fragments_select_statement::do_execute(query_processor& qp, service::qu
if (it == indexes.end()) {
throw exceptions::invalid_request_exception("ANN ordering by vector requires the column to be indexed using 'vector_index'");
}
if (index_opt || parameters->allow_filtering() || !(restrictions->is_empty()) || check_needs_allow_filtering_anyway(*restrictions)) {
throw exceptions::invalid_request_exception("ANN ordering by vector does not support filtering");
}
index_opt = *it;
if (!index_opt) {
@@ -2273,9 +2274,7 @@ std::unique_ptr<prepared_statement> select_statement::prepare(data_dictionary::d
throw exceptions::invalid_request_exception("PER PARTITION LIMIT is not allowed with aggregate queries.");
}
bool is_ann_query = !_parameters->orderings().empty() && std::holds_alternative<select_statement::ann_vector>(_parameters->orderings().front().second);
auto restrictions = prepare_restrictions(db, schema, ctx, selection, for_view, _parameters->allow_filtering() || is_ann_query,
auto restrictions = prepare_restrictions(db, schema, ctx, selection, for_view, _parameters->allow_filtering(),
restrictions::check_indexes(!_parameters->is_mutation_fragments()));
if (_parameters->is_distinct()) {

View File

@@ -1,67 +0,0 @@
/*
* Copyright (C) 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "cql3/statements/view_prop_defs.hh"
namespace cql3::statements {
void view_prop_defs::validate_raw(op_type op, const data_dictionary::database db, sstring ks_name,
const schema::extensions_map& exts) const
{
cf_properties::validate(db, std::move(ks_name), exts);
if (use_compact_storage()) {
throw exceptions::invalid_request_exception(format("Cannot use 'COMPACT STORAGE' when defining a materialized view"));
}
if (properties()->get_cdc_options(exts)) {
throw exceptions::invalid_request_exception("Cannot enable CDC for a materialized view");
}
if (op == op_type::create) {
const auto maybe_id = properties()->get_id();
if (maybe_id && db.try_find_table(*maybe_id)) {
const auto schema_ptr = db.find_schema(*maybe_id);
const auto& ks_name = schema_ptr->ks_name();
const auto& cf_name = schema_ptr->cf_name();
throw exceptions::invalid_request_exception(seastar::format("Table with ID {} already exists: {}.{}", *maybe_id, ks_name, cf_name));
}
}
}
void view_prop_defs::apply_to_builder(op_type op, schema_builder& builder, schema::extensions_map exts,
const data_dictionary::database db, sstring ks_name, bool is_colocated) const
{
_properties->apply_to_builder(builder, exts, db, std::move(ks_name), !is_colocated);
if (op == op_type::create) {
const auto maybe_id = properties()->get_id();
if (maybe_id) {
builder.set_uuid(*maybe_id);
}
}
if (op == op_type::alter) {
if (builder.get_gc_grace_seconds() == 0) {
throw exceptions::invalid_request_exception(
"Cannot alter gc_grace_seconds of a materialized view to 0, since this "
"value is used to TTL undelivered updates. Setting gc_grace_seconds too "
"low might cause undelivered updates to expire before being replayed.");
}
}
if (builder.default_time_to_live().count() > 0) {
throw exceptions::invalid_request_exception(
"Cannot set or alter default_time_to_live for a materialized view. "
"Data in a materialized view always expire at the same time than "
"the corresponding data in the parent table.");
}
}
} // namespace cql3::statements

View File

@@ -1,62 +0,0 @@
/*
* Copyright (C) 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include "cql3/statements/cf_properties.hh"
namespace cql3::statements {
/// This type represents the possible properties of the following CQL statements:
///
/// * CREATE MATERIALIZED VIEW,
/// * ALTER MATERIALIZED VIEW.
///
/// Since the sets of the valid properties may differ between those statements, this type
/// is supposed to represent a superset of them.
///
/// This type does NOT guarantee that all of the necessary validation logic will be performed
/// by it. It strives to do that, but you should keep this in mind. What does that mean?
/// Some parts of validation may require more context that's not accessible from here.
///
/// As of yet, this type does not cover all of the validation logic that could be here either.
class view_prop_defs : public cf_properties {
public:
/// The type of a schema operation on a materialized view.
/// These values will be used to guide the validation logic.
enum class op_type {
create,
alter
};
public:
template <typename... Args>
view_prop_defs(Args&&... args) : cf_properties(std::forward<Args>(args)...) {}
// Explicitly delete this method. It's declared in the inherited types.
// The user of this interface should use `validate_raw` instead.
void validate(const data_dictionary::database, sstring ks_name, const schema::extensions_map&) const = delete;
/// Validate the properties for the specified schema operation.
///
/// The validation is *raw* because we mostly validate the properties in their string form (checking if
/// a property exists or not for instance) and only focus on the properties on their own, without
/// having access to any other information.
void validate_raw(op_type, const data_dictionary::database, sstring ks_name, const schema::extensions_map&) const;
/// Apply the properties to the provided schema_builder and validate them.
///
/// NOTE: If the validation fails, this function will throw an exception. What's more important,
/// however, is that the provided schema_builder might have already been modified by that
/// point. Because of that, in presence of an exception, the schema builder should NOT be
/// used anymore.
void apply_to_builder(op_type, schema_builder&, schema::extensions_map, const data_dictionary::database,
sstring ks_name, bool is_colocated) const;
};
} // namespace cql3::statements

View File

@@ -16,7 +16,6 @@
#include <seastar/core/semaphore.hh>
#include <seastar/core/metrics.hh>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/core/sleep.hh>
#include <seastar/coroutine/parallel_for_each.hh>
@@ -378,7 +377,7 @@ future<db::all_batches_replayed> db::batchlog_manager::replay_all_failed_batches
for (const auto& [fm, s] : fms) {
mutations.emplace_back(fm.to_mutation(s));
co_await coroutine::maybe_yield();
co_await maybe_yield();
}
if (!mutations.empty()) {

View File

@@ -502,9 +502,6 @@ public:
void flush_segments(uint64_t size_to_remove);
void check_no_data_older_than_allowed();
// whitebox testing
std::function<future<>()> _oversized_pre_wait_memory_func;
private:
class shutdown_marker{};
@@ -1600,15 +1597,8 @@ future<> db::commitlog::segment_manager::oversized_allocation(entry_writer& writ
scope_increment_counter allocating(totals.active_allocations);
// #27992 - whitebox testing. signal we are trying to lock out
// all allocators
if (_oversized_pre_wait_memory_func) {
co_await _oversized_pre_wait_memory_func();
}
auto permit = co_await std::move(fut);
// #27992 - task reordering _can_ force the available units to negative. this is ok.
SCYLLA_ASSERT(_request_controller.available_units() <= 0);
SCYLLA_ASSERT(_request_controller.available_units() == 0);
decltype(permit) fake_permit; // can't have allocate+sync release semaphore.
bool failed = false;
@@ -1869,15 +1859,13 @@ future<> db::commitlog::segment_manager::oversized_allocation(entry_writer& writ
}
}
}
auto avail = _request_controller.available_units();
SCYLLA_ASSERT(avail <= 0);
SCYLLA_ASSERT(_request_controller.available_units() == 0);
SCYLLA_ASSERT(permit.count() == max_request_controller_units());
auto nw = _request_controller.waiters();
permit.return_all();
// #20633 cannot guarantee controller avail is now full, since we could have had waiters when doing
// return all -> now will be less avail
SCYLLA_ASSERT(nw > 0 || _request_controller.available_units() == (avail + ssize_t(max_request_controller_units())));
SCYLLA_ASSERT(nw > 0 || _request_controller.available_units() == ssize_t(max_request_controller_units()));
if (!failed) {
clogger.trace("Oversized allocation succeeded.");
@@ -3961,9 +3949,6 @@ void db::commitlog::update_max_data_lifetime(std::optional<uint64_t> commitlog_d
_segment_manager->cfg.commitlog_data_max_lifetime_in_seconds = commitlog_data_max_lifetime_in_seconds;
}
void db::commitlog::set_oversized_pre_wait_memory_func(std::function<future<>()> f) {
_segment_manager->_oversized_pre_wait_memory_func = std::move(f);
}
future<std::vector<sstring>> db::commitlog::get_segments_to_replay() const {
return _segment_manager->get_segments_to_replay();

View File

@@ -385,9 +385,6 @@ public:
// (Re-)set data mix lifetime.
void update_max_data_lifetime(std::optional<uint64_t> commitlog_data_max_lifetime_in_seconds);
// Whitebox testing. Do not use for production
void set_oversized_pre_wait_memory_func(std::function<future<>()>);
using commit_load_reader_func = std::function<future<>(buffer_and_replay_position)>;
class segment_error : public std::exception {};

View File

@@ -1447,10 +1447,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"SELECT statements with aggregation or GROUP BYs or a secondary index may use this page size for their internal reading data, not the page size specified in the query options.")
, alternator_port(this, "alternator_port", value_status::Used, 0, "Alternator API port.")
, alternator_https_port(this, "alternator_https_port", value_status::Used, 0, "Alternator API HTTPS port.")
, alternator_port_proxy_protocol(this, "alternator_port_proxy_protocol", value_status::Used, 0,
"Port on which the Alternator API listens for clients using proxy protocol v2. Disabled (0) by default.")
, alternator_https_port_proxy_protocol(this, "alternator_https_port_proxy_protocol", value_status::Used, 0,
"Port on which the Alternator HTTPS API listens for clients using proxy protocol v2. Disabled (0) by default.")
, alternator_address(this, "alternator_address", value_status::Used, "0.0.0.0", "Alternator API listening address.")
, alternator_enforce_authorization(this, "alternator_enforce_authorization", liveness::LiveUpdate, value_status::Used, false, "Enforce checking the authorization header for every request in Alternator.")
, alternator_warn_authorization(this, "alternator_warn_authorization", liveness::LiveUpdate, value_status::Used, false, "Count and log warnings about failed authentication or authorization")
@@ -1570,8 +1566,6 @@ db::config::config(std::shared_ptr<db::extensions> exts)
"\tdisabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option\n"
"\tenabled: New keyspaces use tablets by default, unless disabled by the tablets={'enabled':false} option\n"
"\tenforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option")
, auto_repair_enabled_default(this, "auto_repair_enabled_default", liveness::LiveUpdate, value_status::Used, false, "Set true to enable auto repair for tablet tables by default. The value will be overridden by the per keyspace or per table configuration which is not implemented yet.")
, auto_repair_threshold_default_in_seconds(this, "auto_repair_threshold_default_in_seconds", liveness::LiveUpdate, value_status::Used, 24 * 3600 , "Set the default time in seconds for the auto repair threshold for tablet tables. If the time since last repair is bigger than the configured time, the tablet is eligible for auto repair. The value will be overridden by the per keyspace or per table configuration which is not implemented yet.")
, view_flow_control_delay_limit_in_ms(this, "view_flow_control_delay_limit_in_ms", liveness::LiveUpdate, value_status::Used, 1000,
"The maximal amount of time that materialized-view update flow control may delay responses "
"to try to slow down the client and prevent buildup of unfinished view updates. "

View File

@@ -464,8 +464,6 @@ public:
named_value<uint16_t> alternator_port;
named_value<uint16_t> alternator_https_port;
named_value<uint16_t> alternator_port_proxy_protocol;
named_value<uint16_t> alternator_https_port_proxy_protocol;
named_value<sstring> alternator_address;
named_value<bool> alternator_enforce_authorization;
named_value<bool> alternator_warn_authorization;
@@ -571,8 +569,6 @@ public:
named_value<double> topology_barrier_stall_detector_threshold_seconds;
named_value<bool> enable_tablets;
named_value<enum_option<tablets_mode_t>> tablets_mode_for_new_keyspaces;
named_value<bool> auto_repair_enabled_default;
named_value<int32_t> auto_repair_threshold_default_in_seconds;
bool enable_tablets_by_default() const noexcept {
switch (tablets_mode_for_new_keyspaces()) {

View File

@@ -31,23 +31,19 @@ size_t quorum_for(const locator::effective_replication_map& erm) {
return replication_factor ? (replication_factor / 2) + 1 : 0;
}
static size_t get_replication_factor_for_dc(const locator::effective_replication_map& erm, const sstring& dc) {
size_t local_quorum_for(const locator::effective_replication_map& erm, const sstring& dc) {
using namespace locator;
const auto& rs = erm.get_replication_strategy();
if (rs.get_type() == replication_strategy_type::network_topology) {
const network_topology_strategy* nts =
const network_topology_strategy* nrs =
static_cast<const network_topology_strategy*>(&rs);
return nts->get_replication_factor(dc);
size_t replication_factor = nrs->get_replication_factor(dc);
return replication_factor ? (replication_factor / 2) + 1 : 0;
}
return erm.get_replication_factor();
}
size_t local_quorum_for(const locator::effective_replication_map& erm, const sstring& dc) {
auto rf = get_replication_factor_for_dc(erm, dc);
return rf ? (rf / 2) + 1 : 0;
return quorum_for(erm);
}
size_t block_for_local_serial(const locator::effective_replication_map& erm) {
@@ -192,30 +188,18 @@ void assure_sufficient_live_nodes(
return pending <= live ? live - pending : 0;
};
auto make_rf_zero_error_msg = [cl] (const sstring& local_dc) {
return format("Cannot achieve consistency level {} in datacenter '{}' with replication factor 0. "
"Ensure the keyspace is replicated to this datacenter or use a non-local consistency level.", cl, local_dc);
};
const auto& topo = erm.get_topology();
const sstring& local_dc = topo.get_datacenter();
switch (cl) {
case consistency_level::ANY:
// local hint is acceptable, and local node is always live
break;
case consistency_level::LOCAL_ONE:
if (size_t local_rf = get_replication_factor_for_dc(erm, local_dc); local_rf == 0) {
throw exceptions::unavailable_exception(make_rf_zero_error_msg(local_dc), cl, 1, 0);
}
if (topo.count_local_endpoints(live_endpoints) < topo.count_local_endpoints(pending_endpoints) + 1) {
throw exceptions::unavailable_exception(cl, 1, 0);
}
break;
case consistency_level::LOCAL_QUORUM: {
if (size_t local_rf = get_replication_factor_for_dc(erm, local_dc); local_rf == 0) {
throw exceptions::unavailable_exception(make_rf_zero_error_msg(local_dc), cl, need, 0);
}
size_t local_live = topo.count_local_endpoints(live_endpoints);
size_t pending = topo.count_local_endpoints(pending_endpoints);
if (local_live < need + pending) {

View File

@@ -14,20 +14,15 @@
#include <boost/lexical_cast.hpp>
#include "utils/s3/creds.hh"
#include "utils/http.hh"
#include "object_storage_endpoint_param.hh"
using namespace std::string_literals;
static auto format_url(std::string_view host, unsigned port, bool use_https) {
return fmt::format("{}://{}:{}", use_https ? "https" : "http", host, port);
}
db::object_storage_endpoint_param::object_storage_endpoint_param(s3_storage s)
: _data(std::move(s))
{}
db::object_storage_endpoint_param::object_storage_endpoint_param(std::string endpoint, s3::endpoint_config config)
: object_storage_endpoint_param(s3_storage{format_url(endpoint, config.port, config.use_https), std::move(config.region), std::move(config.role_arn), true /* legacy_format */})
: object_storage_endpoint_param(s3_storage{std::move(endpoint), std::move(config)})
{}
db::object_storage_endpoint_param::object_storage_endpoint_param(gs_storage s)
: _data(std::move(s))
@@ -37,29 +32,13 @@ db::object_storage_endpoint_param::object_storage_endpoint_param() = default;
db::object_storage_endpoint_param::object_storage_endpoint_param(const object_storage_endpoint_param&) = default;
std::string db::object_storage_endpoint_param::s3_storage::to_json_string() const {
if (!legacy_format) {
return fmt::format("{{ \"type\": \"s3\", \"aws_region\": \"{}\", \"iam_role_arn\": \"{}\" }}",
region, iam_role_arn
);
}
auto url = utils::http::parse_simple_url(endpoint);
return fmt::format("{{ \"port\": {}, \"use_https\": {}, \"aws_region\": \"{}\", \"iam_role_arn\": \"{}\" }}",
url.port, url.is_https(), region, iam_role_arn
config.port, config.use_https, config.region, config.role_arn
);
}
std::string db::object_storage_endpoint_param::s3_storage::key() const {
// The `endpoint` is full URL all the time, so only return it as a key
// if it wasn't configured "the old way". In the latter case, split the
// URL and return its host part to mimic the old behavior.
if (!legacy_format) {
return endpoint;
}
auto url = utils::http::parse_simple_url(endpoint);
return url.host;
return endpoint;
}
std::string db::object_storage_endpoint_param::gs_storage::to_json_string() const {
@@ -120,6 +99,8 @@ const std::string& db::object_storage_endpoint_param::type() const {
db::object_storage_endpoint_param db::object_storage_endpoint_param::decode(const YAML::Node& node) {
auto name = node["name"];
auto aws_region = node["aws_region"];
auto iam_role_arn = node["iam_role_arn"];
auto type = node["type"];
auto get_opt = [](auto& node, const std::string& key, auto def) {
@@ -127,20 +108,13 @@ db::object_storage_endpoint_param db::object_storage_endpoint_param::decode(cons
return tmp ? tmp.template as<std::decay_t<decltype(def)>>() : def;
};
// aws s3 endpoint.
if (!type || type.as<std::string>() == s3_type) {
if (!type || type.as<std::string>() == s3_type || aws_region || iam_role_arn) {
s3_storage ep;
auto endpoint = name.as<std::string>();
ep.legacy_format = (!endpoint.starts_with("http://") && !endpoint.starts_with("https://"));
if (!ep.legacy_format) {
ep.endpoint = std::move(endpoint);
} else {
ep.endpoint = format_url(endpoint, node["port"].as<unsigned>(), node["https"].as<bool>(false));
}
auto aws_region = node["aws_region"];
ep.region = aws_region ? aws_region.as<std::string>() : std::getenv("AWS_DEFAULT_REGION");
ep.iam_role_arn = get_opt(node, "iam_role_arn", ""s);
ep.endpoint = name.as<std::string>();
ep.config.port = node["port"].as<unsigned>();
ep.config.use_https = node["https"].as<bool>(false);
ep.config.region = aws_region ? aws_region.as<std::string>() : std::getenv("AWS_DEFAULT_REGION");
ep.config.role_arn = iam_role_arn ? iam_role_arn.as<std::string>() : "";
return object_storage_endpoint_param{std::move(ep)};
}

View File

@@ -25,9 +25,7 @@ class object_storage_endpoint_param {
public:
struct s3_storage {
std::string endpoint;
std::string region;
std::string iam_role_arn;
bool legacy_format; // FIXME convert it to bool_class after seastar#3198
s3::endpoint_config config;
std::strong_ordering operator<=>(const s3_storage&) const = default;
std::string to_json_string() const;

View File

@@ -961,15 +961,15 @@ public:
auto include_pending_changes = [&table_schemas](schema_diff_per_shard d) -> future<> {
for (auto& schema : d.dropped) {
co_await coroutine::maybe_yield();
co_await maybe_yield();
table_schemas.erase(schema->id());
}
for (auto& change : d.altered) {
co_await coroutine::maybe_yield();
co_await maybe_yield();
table_schemas.insert_or_assign(change.new_schema->id(), change.new_schema);
}
for (auto& schema : d.created) {
co_await coroutine::maybe_yield();
co_await maybe_yield();
table_schemas.insert_or_assign(schema->id(), schema);
}
};

View File

@@ -140,7 +140,6 @@ namespace {
system_keyspace::DICTS,
system_keyspace::VIEW_BUILDING_TASKS,
system_keyspace::CLIENT_ROUTES,
system_keyspace::REPAIR_TASKS,
};
if (ks_name == system_keyspace::NAME && tables.contains(cf_name)) {
props.is_group0_table = true;
@@ -491,24 +490,6 @@ schema_ptr system_keyspace::repair_history() {
return schema;
}
schema_ptr system_keyspace::repair_tasks() {
static thread_local auto schema = [] {
auto id = generate_legacy_id(NAME, REPAIR_TASKS);
return schema_builder(NAME, REPAIR_TASKS, std::optional(id))
.with_column("task_uuid", uuid_type, column_kind::partition_key)
.with_column("operation", utf8_type, column_kind::clustering_key)
// First and last token for of the tablet
.with_column("first_token", long_type, column_kind::clustering_key)
.with_column("last_token", long_type, column_kind::clustering_key)
.with_column("timestamp", timestamp_type)
.with_column("table_uuid", uuid_type, column_kind::static_column)
.set_comment("Record tablet repair tasks")
.with_hash_version()
.build();
}();
return schema;
}
schema_ptr system_keyspace::built_indexes() {
static thread_local auto built_indexes = [] {
schema_builder builder(generate_legacy_id(NAME, BUILT_INDEXES), NAME, BUILT_INDEXES,
@@ -2375,7 +2356,6 @@ std::vector<schema_ptr> system_keyspace::all_tables(const db::config& cfg) {
corrupt_data(),
scylla_local(), db::schema_tables::scylla_table_schema_history(),
repair_history(),
repair_tasks(),
v3::views_builds_in_progress(), v3::built_views(),
v3::scylla_views_builds_in_progress(),
v3::truncated(),
@@ -2619,32 +2599,6 @@ future<> system_keyspace::get_repair_history(::table_id table_id, repair_history
});
}
future<utils::chunked_vector<canonical_mutation>> system_keyspace::get_update_repair_task_mutations(const repair_task_entry& entry, api::timestamp_type ts) {
// Default to timeout the repair task entries in 10 days, this should be enough time for the management tools to query
constexpr int ttl = 10 * 24 * 3600;
sstring req = format("INSERT INTO system.{} (task_uuid, operation, first_token, last_token, timestamp, table_uuid) VALUES (?, ?, ?, ?, ?, ?) USING TTL {}", REPAIR_TASKS, ttl);
auto muts = co_await _qp.get_mutations_internal(req, internal_system_query_state(), ts,
{entry.task_uuid.uuid(), repair_task_operation_to_string(entry.operation),
entry.first_token, entry.last_token, entry.timestamp, entry.table_uuid.uuid()});
utils::chunked_vector<canonical_mutation> cmuts(muts.begin(), muts.end());
co_return cmuts;
}
future<> system_keyspace::get_repair_task(tasks::task_id task_uuid, repair_task_consumer f) {
sstring req = format("SELECT * from system.{} WHERE task_uuid = {}", REPAIR_TASKS, task_uuid);
co_await _qp.query_internal(req, [&f] (const cql3::untyped_result_set::row& row) mutable -> future<stop_iteration> {
repair_task_entry ent;
ent.task_uuid = tasks::task_id(row.get_as<utils::UUID>("task_uuid"));
ent.operation = repair_task_operation_from_string(row.get_as<sstring>("operation"));
ent.first_token = row.get_as<int64_t>("first_token");
ent.last_token = row.get_as<int64_t>("last_token");
ent.timestamp = row.get_as<db_clock::time_point>("timestamp");
ent.table_uuid = ::table_id(row.get_as<utils::UUID>("table_uuid"));
co_await f(std::move(ent));
co_return stop_iteration::no;
});
}
future<gms::generation_type> system_keyspace::increment_and_get_generation() {
auto req = format("SELECT gossip_generation FROM system.{} WHERE key='{}'", LOCAL, LOCAL);
auto rs = co_await _qp.execute_internal(req, cql3::query_processor::cache_internal::yes);
@@ -3295,7 +3249,7 @@ future<service::topology> system_keyspace::load_topology_state(const std::unorde
supported_features = decode_features(deserialize_set_column(*topology(), row, "supported_features"));
}
if (row.has("topology_request") && nstate != service::node_state::left) {
if (row.has("topology_request")) {
auto req = service::topology_request_from_string(row.get_as<sstring>("topology_request"));
ret.requests.emplace(host_id, req);
switch(req) {
@@ -3845,35 +3799,4 @@ future<> system_keyspace::apply_mutation(mutation m) {
return _qp.proxy().mutate_locally(m, {}, db::commitlog::force_sync(m.schema()->static_props().wait_for_sync_to_commitlog), db::no_timeout);
}
// The names are persisted in system tables so should not be changed.
static const std::unordered_map<system_keyspace::repair_task_operation, sstring> repair_task_operation_to_name = {
{system_keyspace::repair_task_operation::requested, "requested"},
{system_keyspace::repair_task_operation::finished, "finished"},
};
static const std::unordered_map<sstring, system_keyspace::repair_task_operation> repair_task_operation_from_name = std::invoke([] {
std::unordered_map<sstring, system_keyspace::repair_task_operation> result;
for (auto&& [v, s] : repair_task_operation_to_name) {
result.emplace(s, v);
}
return result;
});
sstring system_keyspace::repair_task_operation_to_string(system_keyspace::repair_task_operation op) {
auto i = repair_task_operation_to_name.find(op);
if (i == repair_task_operation_to_name.end()) {
on_internal_error(slogger, format("Invalid repair task operation: {}", static_cast<int>(op)));
}
return i->second;
}
system_keyspace::repair_task_operation system_keyspace::repair_task_operation_from_string(const sstring& name) {
return repair_task_operation_from_name.at(name);
}
} // namespace db
auto fmt::formatter<db::system_keyspace::repair_task_operation>::format(const db::system_keyspace::repair_task_operation& op, fmt::format_context& ctx) const
-> decltype(ctx.out()) {
return fmt::format_to(ctx.out(), "{}", db::system_keyspace::repair_task_operation_to_string(op));
}

View File

@@ -57,8 +57,6 @@ namespace paxos {
struct topology_request_state;
class group0_guard;
class raft_group0_client;
}
namespace netw {
@@ -187,7 +185,6 @@ public:
static constexpr auto RAFT_SNAPSHOTS = "raft_snapshots";
static constexpr auto RAFT_SNAPSHOT_CONFIG = "raft_snapshot_config";
static constexpr auto REPAIR_HISTORY = "repair_history";
static constexpr auto REPAIR_TASKS = "repair_tasks";
static constexpr auto GROUP0_HISTORY = "group0_history";
static constexpr auto DISCOVERY = "discovery";
static constexpr auto BROADCAST_KV_STORE = "broadcast_kv_store";
@@ -267,7 +264,6 @@ public:
static schema_ptr raft();
static schema_ptr raft_snapshots();
static schema_ptr repair_history();
static schema_ptr repair_tasks();
static schema_ptr group0_history();
static schema_ptr discovery();
static schema_ptr broadcast_kv_store();
@@ -407,22 +403,6 @@ public:
int64_t range_end;
};
enum class repair_task_operation {
requested,
finished,
};
static sstring repair_task_operation_to_string(repair_task_operation op);
static repair_task_operation repair_task_operation_from_string(const sstring& name);
struct repair_task_entry {
tasks::task_id task_uuid;
repair_task_operation operation;
int64_t first_token;
int64_t last_token;
db_clock::time_point timestamp;
table_id table_uuid;
};
struct topology_requests_entry {
utils::UUID id;
utils::UUID initiating_host;
@@ -444,10 +424,6 @@ public:
using repair_history_consumer = noncopyable_function<future<>(const repair_history_entry&)>;
future<> get_repair_history(table_id, repair_history_consumer f);
future<utils::chunked_vector<canonical_mutation>> get_update_repair_task_mutations(const repair_task_entry& entry, api::timestamp_type ts);
using repair_task_consumer = noncopyable_function<future<>(const repair_task_entry&)>;
future<> get_repair_task(tasks::task_id task_uuid, repair_task_consumer f);
future<> save_truncation_record(const replica::column_family&, db_clock::time_point truncated_at, db::replay_position);
future<replay_positions> get_truncated_positions(table_id);
future<> drop_truncation_rp_records();
@@ -757,8 +733,3 @@ public:
}; // class system_keyspace
} // namespace db
template <>
struct fmt::formatter<db::system_keyspace::repair_task_operation> : fmt::formatter<string_view> {
auto format(const db::system_keyspace::repair_task_operation&, fmt::format_context& ctx) const -> decltype(ctx.out());
};

View File

@@ -57,7 +57,7 @@
#include "locator/network_topology_strategy.hh"
#include "mutation/mutation.hh"
#include "mutation/mutation_partition.hh"
#include <seastar/core/on_internal_error.hh>
#include "seastar/core/on_internal_error.hh"
#include "service/migration_manager.hh"
#include "service/raft/raft_group0_client.hh"
#include "service/storage_proxy.hh"
@@ -2244,7 +2244,7 @@ future<> view_builder::start_in_background(service::migration_manager& mm, utils
// Guard the whole startup routine with a semaphore,
// so that it's not intercepted by `on_drop_view`, `on_create_view`
// or `on_update_view` events.
auto units = co_await get_units(_sem, view_builder_semaphore_units);
auto units = co_await get_units(_sem, 1);
// Wait for schema agreement even if we're a seed node.
co_await mm.wait_for_schema_agreement(_db, db::timeout_clock::time_point::max(), &_as);
@@ -2659,7 +2659,7 @@ future<> view_builder::add_new_view(view_ptr view, build_step& step) {
co_await utils::get_local_injector().inject("add_new_view_pause_last_shard", utils::wait_for_message(5min));
}
co_await _sys_ks.register_view_for_building(view->ks_name(), view->cf_name(), step.current_token());
co_await _sys_ks.register_view_for_building_for_all_shards(view->ks_name(), view->cf_name(), step.current_token());
step.build_status.emplace(step.build_status.begin(), view_build_status{view, step.current_token(), std::nullopt});
}
@@ -2667,74 +2667,40 @@ static bool should_ignore_tablet_keyspace(const replica::database& db, const sst
return db.features().view_building_coordinator && db.has_keyspace(ks_name) && db.find_keyspace(ks_name).uses_tablets();
}
future<> view_builder::dispatch_create_view(sstring ks_name, sstring view_name) {
if (should_ignore_tablet_keyspace(_db, ks_name)) {
return make_ready_future<>();
}
return with_semaphore(_sem, view_builder_semaphore_units, [this, ks_name = std::move(ks_name), view_name = std::move(view_name)] () mutable {
// This runs on shard 0 only; seed the global rows before broadcasting.
return handle_seed_view_build_progress(ks_name, view_name).then([this, ks_name = std::move(ks_name), view_name = std::move(view_name)] () mutable {
return container().invoke_on_all([ks_name = std::move(ks_name), view_name = std::move(view_name)] (view_builder& vb) mutable {
return vb.handle_create_view_local(std::move(ks_name), std::move(view_name));
});
});
});
}
future<> view_builder::handle_seed_view_build_progress(sstring ks_name, sstring view_name) {
auto view = view_ptr(_db.find_schema(ks_name, view_name));
auto& step = get_or_create_build_step(view->view_info()->base_id());
return _sys_ks.register_view_for_building_for_all_shards(view->ks_name(), view->cf_name(), step.current_token());
}
future<> view_builder::handle_create_view_local(sstring ks_name, sstring view_name){
if (this_shard_id() == 0) {
return handle_create_view_local_impl(std::move(ks_name), std::move(view_name));
} else {
return with_semaphore(_sem, view_builder_semaphore_units, [this, ks_name = std::move(ks_name), view_name = std::move(view_name)] () mutable {
return handle_create_view_local_impl(std::move(ks_name), std::move(view_name));
});
}
}
future<> view_builder::handle_create_view_local_impl(sstring ks_name, sstring view_name) {
auto view = view_ptr(_db.find_schema(ks_name, view_name));
auto& step = get_or_create_build_step(view->view_info()->base_id());
return when_all(step.base->await_pending_writes(), step.base->await_pending_streams()).discard_result().then([this, &step] {
return flush_base(step.base, _as);
}).then([this, view, &step] () {
// This resets the build step to the current token. It may result in views currently
// being built to receive duplicate updates, but it simplifies things as we don't have
// to keep around a list of new views to build the next time the reader crosses a token
// threshold.
return initialize_reader_at_current_token(step).then([this, view, &step] () mutable {
return add_new_view(view, step);
}).then_wrapped([this, view] (future<>&& f) {
try {
f.get();
} catch (abort_requested_exception&) {
vlogger.debug("Aborted while setting up view for building {}.{}", view->ks_name(), view->cf_name());
} catch (raft::request_aborted&) {
vlogger.debug("Aborted while setting up view for building {}.{}", view->ks_name(), view->cf_name());
} catch (...) {
vlogger.error("Error setting up view for building {}.{}: {}", view->ks_name(), view->cf_name(), std::current_exception());
}
// Waited on indirectly in stop().
static_cast<void>(_build_step.trigger());
});
});
}
void view_builder::on_create_view(const sstring& ks_name, const sstring& view_name) {
if (this_shard_id() != 0) {
if (should_ignore_tablet_keyspace(_db, ks_name)) {
return;
}
// Do it in the background, serialized and broadcast from shard 0.
static_cast<void>(dispatch_create_view(ks_name, view_name).handle_exception([ks_name, view_name] (std::exception_ptr ep) {
vlogger.warn("Failed to dispatch view creation {}.{}: {}", ks_name, view_name, ep);
}));
// Do it in the background, serialized.
(void)with_semaphore(_sem, 1, [ks_name, view_name, this] {
auto view = view_ptr(_db.find_schema(ks_name, view_name));
auto& step = get_or_create_build_step(view->view_info()->base_id());
return when_all(step.base->await_pending_writes(), step.base->await_pending_streams()).discard_result().then([this, &step] {
return flush_base(step.base, _as);
}).then([this, view, &step] () mutable {
// This resets the build step to the current token. It may result in views currently
// being built to receive duplicate updates, but it simplifies things as we don't have
// to keep around a list of new views to build the next time the reader crosses a token
// threshold.
return initialize_reader_at_current_token(step).then([this, view, &step] () mutable {
return add_new_view(view, step).then_wrapped([this, view] (future<>&& f) {
try {
f.get();
} catch (abort_requested_exception&) {
vlogger.debug("Aborted while setting up view for building {}.{}", view->ks_name(), view->cf_name());
} catch (raft::request_aborted&) {
vlogger.debug("Aborted while setting up view for building {}.{}", view->ks_name(), view->cf_name());
} catch (...) {
vlogger.error("Error setting up view for building {}.{}: {}", view->ks_name(), view->cf_name(), std::current_exception());
}
// Waited on indirectly in stop().
(void)_build_step.trigger();
});
});
});
}).handle_exception_type([] (replica::no_such_column_family&) { });
}
void view_builder::on_update_view(const sstring& ks_name, const sstring& view_name, bool) {
@@ -2743,7 +2709,7 @@ void view_builder::on_update_view(const sstring& ks_name, const sstring& view_na
}
// Do it in the background, serialized.
(void)with_semaphore(_sem, view_builder_semaphore_units, [ks_name, view_name, this] {
(void)with_semaphore(_sem, 1, [ks_name, view_name, this] {
auto view = view_ptr(_db.find_schema(ks_name, view_name));
auto step_it = _base_to_build_step.find(view->view_info()->base_id());
if (step_it == _base_to_build_step.end()) {
@@ -2758,75 +2724,45 @@ void view_builder::on_update_view(const sstring& ks_name, const sstring& view_na
}).handle_exception_type([] (replica::no_such_column_family&) { });
}
future<> view_builder::dispatch_drop_view(sstring ks_name, sstring view_name) {
if (should_ignore_tablet_keyspace(_db, ks_name)) {
return make_ready_future<>();
}
return with_semaphore(_sem, view_builder_semaphore_units, [this, ks_name = std::move(ks_name), view_name = std::move(view_name)] () mutable {
// This runs on shard 0 only; broadcast local cleanup before global cleanup.
return container().invoke_on_all([ks_name, view_name] (view_builder& vb) mutable {
return vb.handle_drop_view_local(std::move(ks_name), std::move(view_name));
}).then([this, ks_name = std::move(ks_name), view_name = std::move(view_name)] () mutable {
return handle_drop_view_global_cleanup(std::move(ks_name), std::move(view_name));
});
});
}
future<> view_builder::handle_drop_view_local(sstring ks_name, sstring view_name) {
if (this_shard_id() == 0) {
return handle_drop_view_local_impl(std::move(ks_name), std::move(view_name));
} else {
return with_semaphore(_sem, view_builder_semaphore_units, [this, ks_name = std::move(ks_name), view_name = std::move(view_name)] () mutable {
return handle_drop_view_local_impl(std::move(ks_name), std::move(view_name));
});
}
}
future<> view_builder::handle_drop_view_local_impl(sstring ks_name, sstring view_name) {
vlogger.info0("Stopping to build view {}.{}", ks_name, view_name);
// The view is absent from the database at this point, so find it by brute force.
([&, this] {
for (auto& [_, step] : _base_to_build_step) {
if (step.build_status.empty() || step.build_status.front().view->ks_name() != ks_name) {
continue;
}
for (auto it = step.build_status.begin(); it != step.build_status.end(); ++it) {
if (it->view->cf_name() == view_name) {
_built_views.erase(it->view->id());
step.build_status.erase(it);
return;
}
}
}
})();
return make_ready_future<>();
}
future<> view_builder::handle_drop_view_global_cleanup(sstring ks_name, sstring view_name) {
if (this_shard_id() != 0) {
return make_ready_future<>();
}
vlogger.info0("Starting view global cleanup {}.{}", ks_name, view_name);
return when_all_succeed(
_sys_ks.remove_view_build_progress_across_all_shards(ks_name, view_name),
_sys_ks.remove_built_view(ks_name, view_name),
remove_view_build_status(ks_name, view_name))
.discard_result()
.handle_exception([ks_name, view_name] (std::exception_ptr ep) {
vlogger.warn("Failed to cleanup view {}.{}: {}", ks_name, view_name, ep);
});
}
void view_builder::on_drop_view(const sstring& ks_name, const sstring& view_name) {
if (this_shard_id() != 0) {
if (should_ignore_tablet_keyspace(_db, ks_name)) {
return;
}
// Do it in the background, serialized and broadcast from shard 0.
static_cast<void>(dispatch_drop_view(ks_name, view_name).handle_exception([ks_name, view_name] (std::exception_ptr ep) {
vlogger.warn("Failed to dispatch view drop {}.{}: {}", ks_name, view_name, ep);
}));
vlogger.info0("Stopping to build view {}.{}", ks_name, view_name);
// Do it in the background, serialized.
(void)with_semaphore(_sem, 1, [ks_name, view_name, this] {
// The view is absent from the database at this point, so find it by brute force.
([&, this] {
for (auto& [_, step] : _base_to_build_step) {
if (step.build_status.empty() || step.build_status.front().view->ks_name() != ks_name) {
continue;
}
for (auto it = step.build_status.begin(); it != step.build_status.end(); ++it) {
if (it->view->cf_name() == view_name) {
_built_views.erase(it->view->id());
step.build_status.erase(it);
return;
}
}
}
})();
if (this_shard_id() != 0) {
// Shard 0 can't remove the entry in the build progress system table on behalf of the
// current shard, since shard 0 may have already processed the notification, and this
// shard may since have updated the system table if the drop happened concurrently
// with the build.
return _sys_ks.remove_view_build_progress(ks_name, view_name);
}
return when_all_succeed(
_sys_ks.remove_view_build_progress(ks_name, view_name),
_sys_ks.remove_built_view(ks_name, view_name),
remove_view_build_status(ks_name, view_name))
.discard_result()
.handle_exception([ks_name, view_name] (std::exception_ptr ep) {
vlogger.warn("Failed to cleanup view {}.{}: {}", ks_name, view_name, ep);
});
}).handle_exception_type([] (replica::no_such_keyspace&) {});
}
future<> view_builder::do_build_step() {
@@ -2837,7 +2773,7 @@ future<> view_builder::do_build_step() {
return seastar::async(std::move(attr), [this] {
exponential_backoff_retry r(1s, 1min);
while (!_base_to_build_step.empty() && !_as.abort_requested()) {
auto units = get_units(_sem, view_builder_semaphore_units).get();
auto units = get_units(_sem, 1).get();
++_stats.steps_performed;
try {
execute(_current_step->second, exponential_backoff_retry(1s, 1min));
@@ -3697,20 +3633,20 @@ sstring build_status_to_sstring(build_status status) {
on_internal_error(vlogger, fmt::format("Unknown view build status: {}", (int)status));
}
void validate_view_keyspace(const data_dictionary::database& db, std::string_view keyspace_name, locator::token_metadata_ptr tmptr) {
const auto& rs = db.find_keyspace(keyspace_name).get_replication_strategy();
void validate_view_keyspace(const data_dictionary::database& db, std::string_view keyspace_name) {
const bool tablet_views_enabled = db.features().views_with_tablets;
// Note: if the configuration option `rf_rack_valid_keyspaces` is enabled, we can be
// sure that all tablet-based keyspaces are RF-rack-valid. We check that
// at start-up and then we don't allow for creating RF-rack-invalid keyspaces.
const bool rf_rack_valid_keyspaces = db.get_config().rf_rack_valid_keyspaces();
const bool required_config = tablet_views_enabled && rf_rack_valid_keyspaces;
if (rs.uses_tablets() && !db.features().views_with_tablets) {
const bool uses_tablets = db.find_keyspace(keyspace_name).get_replication_strategy().uses_tablets();
if (!required_config && uses_tablets) {
throw std::logic_error("Materialized views and secondary indexes are not supported on base tables with tablets. "
"To be able to use them, make sure all nodes in the cluster are upgraded.");
}
try {
locator::assert_rf_rack_valid_keyspace(keyspace_name, tmptr, rs);
} catch (const std::exception& e) {
throw std::logic_error(fmt::format(
"Materialized views and secondary indexes are not supported on the keyspace '{}': {}",
keyspace_name, e.what()));
"To be able to use them, enable the configuration option `rf_rack_valid_keyspaces` and make sure "
"that the cluster feature `VIEWS_WITH_TABLETS` is enabled.");
}
}

View File

@@ -9,7 +9,6 @@
#pragma once
#include "gc_clock.hh"
#include "locator/token_metadata_fwd.hh"
#include "query/query-request.hh"
#include "schema/schema_fwd.hh"
#include "readers/mutation_reader.hh"
@@ -319,7 +318,7 @@ endpoints_to_update get_view_natural_endpoint(
///
/// Preconditions:
/// * The provided `keyspace_name` must correspond to an existing keyspace.
void validate_view_keyspace(const data_dictionary::database&, std::string_view keyspace_name, locator::token_metadata_ptr tmptr);
void validate_view_keyspace(const data_dictionary::database&, std::string_view keyspace_name);
}

View File

@@ -169,11 +169,10 @@ class view_builder final : public service::migration_listener::only_view_notific
base_to_build_step_type _base_to_build_step;
base_to_build_step_type::iterator _current_step = _base_to_build_step.end();
serialized_action _build_step{std::bind(&view_builder::do_build_step, this)};
static constexpr size_t view_builder_semaphore_units = 1;
// Ensures bookkeeping operations are serialized, meaning that while we execute
// a build step we don't consider newly added or removed views. This simplifies
// the algorithms. Also synchronizes an operation wrt. a call to stop().
seastar::named_semaphore _sem{view_builder_semaphore_units, named_semaphore_exception_factory{"view builder"}};
seastar::named_semaphore _sem{1, named_semaphore_exception_factory{"view builder"}};
seastar::abort_source _as;
future<> _started = make_ready_future<>();
// Used to coordinate between shards the conclusion of the build process for a particular view.
@@ -267,14 +266,6 @@ private:
future<> maybe_mark_view_as_built(view_ptr, dht::token);
future<> mark_as_built(view_ptr);
void setup_metrics();
future<> dispatch_create_view(sstring ks_name, sstring view_name);
future<> dispatch_drop_view(sstring ks_name, sstring view_name);
future<> handle_seed_view_build_progress(sstring ks_name, sstring view_name);
future<> handle_create_view_local(sstring ks_name, sstring view_name);
future<> handle_drop_view_local(sstring ks_name, sstring view_name);
future<> handle_create_view_local_impl(sstring ks_name, sstring view_name);
future<> handle_drop_view_local_impl(sstring ks_name, sstring view_name);
future<> handle_drop_view_global_cleanup(sstring ks_name, sstring view_name);
template <typename Func1, typename Func2>
future<> write_view_build_status(Func1&& fn_group0, Func2&& fn_sys_dist) {

View File

@@ -17,7 +17,7 @@
#include "locator/abstract_replication_strategy.hh"
#include "locator/tablets.hh"
#include "raft/raft.hh"
#include <seastar/core/gate.hh>
#include "seastar/core/gate.hh"
#include "db/view/view_building_state.hh"
#include "sstables/shared_sstable.hh"
#include "utils/UUID.hh"

View File

@@ -102,13 +102,13 @@ view_update_generator::view_update_generator(replica::database& db, sharded<serv
, _early_abort_subscription(as.subscribe([this] () noexcept { do_abort(); }))
{
setup_metrics();
discover_staging_sstables();
_db.plug_view_update_generator(*this);
}
view_update_generator::~view_update_generator() {}
future<> view_update_generator::start() {
discover_staging_sstables();
_started = seastar::async([this]() mutable {
auto drop_sstable_references = defer([&] () noexcept {
// Clear sstable references so sstables_manager::stop() doesn't hang.

View File

@@ -7,6 +7,7 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
import os
import sys
import errno
import logging

View File

@@ -8,7 +8,7 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
import argparse
from scylla_blocktune import tune_yaml, tune_fs, tune_dev
from scylla_blocktune import *
if __name__ == "__main__":
ap = argparse.ArgumentParser('Tune filesystems for ScyllaDB')

View File

@@ -14,9 +14,7 @@ import subprocess
import time
import tempfile
import shutil
import re
import distro
from scylla_util import out, is_debian_variant, is_suse_variant, pkg_install, is_redhat_variant, systemd_unit, is_gentoo
from scylla_util import *
from subprocess import run

View File

@@ -10,8 +10,9 @@
import os
import sys
import argparse
import shutil
from scylla_util import is_debian_variant, pkg_install, systemd_unit, sysconfig_parser, is_gentoo, is_arch, is_amzn2, is_suse_variant, is_redhat_variant
import shlex
import distro
from scylla_util import *
UNIT_DATA= '''
[Unit]

View File

@@ -10,7 +10,7 @@
import os
import sys
import argparse
from scylla_util import is_container, sysconfig_parser
from scylla_util import *
if __name__ == '__main__':
if not is_container() and os.getuid() > 0:

View File

@@ -10,7 +10,7 @@
import os
import sys
import argparse
from scylla_util import is_nonroot, is_container, etcdir, sysconfig_parser
from scylla_util import *
if __name__ == '__main__':
if not is_nonroot() and not is_container() and os.getuid() > 0:

View File

@@ -8,6 +8,8 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
import os
import sys
import yaml
import argparse
import subprocess

View File

@@ -8,8 +8,9 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
import os
import sys
from scylla_util import systemd_unit
import subprocess
import shutil
from scylla_util import *
if __name__ == '__main__':
if os.getuid() > 0:

View File

@@ -9,7 +9,7 @@
import os
import re
from scylla_util import etcdir, datadir, bindir, scriptsdir, is_nonroot, is_container, is_developer_mode
from scylla_util import *
import resource
import subprocess
import argparse

View File

@@ -10,7 +10,7 @@
import os
import sys
import shutil
from scylla_util import pkg_install
from scylla_util import *
from subprocess import run, DEVNULL
if __name__ == '__main__':

View File

@@ -7,8 +7,9 @@
#
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
from pathlib import Path
from datetime import datetime
from scylla_util import scylladir_p
from scylla_util import *
if __name__ == '__main__':
log = scylladir_p() / 'scylla-server.log'

View File

@@ -10,7 +10,7 @@
import os
import sys
import argparse
from scylla_util import etcdir_p
from scylla_util import *
if __name__ == '__main__':
if os.getuid() > 0:

View File

@@ -11,9 +11,10 @@ import os
import sys
import argparse
import re
import distro
import shutil
from scylla_util import systemd_unit, is_redhat_variant, pkg_install
from scylla_util import *
from subprocess import run
from pathlib import Path

View File

@@ -12,10 +12,8 @@ import sys
import glob
import platform
import distro
import re
import traceback
from scylla_util import sysconfig_parser, out, get_set_nic_and_disks_config_value, check_sysfs_numa_topology_is_valid, sysconfdir_p, scylla_excepthook, perftune_base_command
from scylla_util import *
from subprocess import run
def get_cur_cpuset():

View File

@@ -9,6 +9,7 @@
import os
import argparse
import distutils.util
import pwd
import grp
import sys
@@ -17,22 +18,12 @@ import logging
import pyudev
import psutil
import platform
import shutil
from pathlib import Path
from scylla_util import is_unused_disk, out, pkg_install, systemd_unit, SystemdException, is_debian_variant, is_redhat_variant, is_offline
from subprocess import run, SubprocessError, Popen
from scylla_util import *
from subprocess import run, SubprocessError
LOGGER = logging.getLogger(__name__)
def strtobool(val):
val = val.lower()
if val in ('y', 'yes', 't', 'true', 'on', '1'):
return 1
elif val in ('n', 'no', 'f', 'false', 'off', '0'):
return 0
else:
raise ValueError(f"invalid truth value {val!r}")
class UdevInfo:
def __init__(self, device_file):
self.context = pyudev.Context()
@@ -154,7 +145,7 @@ if __name__ == '__main__':
args = parser.parse_args()
# Allow args.online_discard to be used as a boolean value
args.online_discard = strtobool(args.online_discard)
args.online_discard = distutils.util.strtobool(args.online_discard)
root = args.root.rstrip('/')
if args.volume_role == 'all':
@@ -236,7 +227,7 @@ if __name__ == '__main__':
with open(discard_path) as f:
discard = f.read().strip()
if discard != '0':
proc = Popen(['blkdiscard', disk])
proc = subprocess.Popen(['blkdiscard', disk])
procs.append(proc)
for proc in procs:
proc.wait()

View File

@@ -8,9 +8,8 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
import os
import sys
import argparse
from scylla_util import systemd_unit
from scylla_util import *
def update_rsysconf(rsyslog_server):
if ':' not in rsyslog_server:

View File

@@ -9,7 +9,8 @@
import os
import sys
from scylla_util import is_redhat_variant, out, sysconfig_parser
import subprocess
from scylla_util import *
from subprocess import run
if __name__ == '__main__':

View File

@@ -12,14 +12,12 @@ import sys
import argparse
import glob
import shutil
import io
import stat
import re
from scylla_util import colorprint, is_valid_nic, scylladir_p, is_offline, is_debian_variant, is_redhat_variant, is_suse_variant, is_gentoo, out, is_unused_disk, scriptsdir, is_nonroot, sysconfdir_p, sysconfig_parser, systemd_unit, swap_exists, get_product, etcdir
import distro
from scylla_util import *
from subprocess import run, DEVNULL
PRODUCT = get_product(etcdir())
interactive = False
HOUSEKEEPING_TIMEOUT = 60
def when_interactive_ask_service(interactive, msg1, msg2, default = None):
@@ -296,9 +294,11 @@ if __name__ == '__main__':
sys.exit(1)
disks = args.disks
nic = args.nic
set_nic_and_disks = args.setup_nic_and_disks
swap_directory = args.swap_directory
swap_size = args.swap_size
ec2_check = not args.no_ec2_check
kernel_check = not args.no_kernel_check
verify_package = not args.no_verify_package
enable_service = not args.no_enable_service

View File

@@ -9,7 +9,7 @@
import os
import sys
from scylla_util import sysconfig_parser, sysconfdir_p
from scylla_util import *
from subprocess import run
if __name__ == '__main__':

View File

@@ -12,7 +12,7 @@ import sys
import argparse
import psutil
from pathlib import Path
from scylla_util import swap_exists, out, systemd_unit
from scylla_util import *
from subprocess import run
def GB(n):

View File

@@ -10,8 +10,9 @@
import os
import sys
import argparse
import subprocess
import re
from scylla_util import sysconfig_parser, sysconfdir_p, get_set_nic_and_disks_config_value, is_valid_nic, check_sysfs_numa_topology_is_valid, out, perftune_base_command, hex2list
from scylla_util import *
from subprocess import run
def bool2str(val):

View File

@@ -3,9 +3,12 @@
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
import configparser
import glob
import io
import os
import re
import shlex
import shutil
import subprocess
import yaml
import sys
@@ -16,6 +19,7 @@ from datetime import datetime, timedelta
import distro
from scylla_sysconfdir import SYSCONFDIR
from scylla_product import PRODUCT
from multiprocessing import cpu_count
@@ -334,7 +338,7 @@ def apt_install(pkg, offline_exit=True):
if apt_is_updated():
break
try:
run('apt-get update', shell=True, check=True, stderr=PIPE, encoding='utf-8')
res = run('apt-get update', shell=True, check=True, stderr=PIPE, encoding='utf-8')
break
except CalledProcessError as e:
print(e.stderr, end='')
@@ -515,12 +519,3 @@ class sysconfig_parser:
def commit(self):
with open(self._filename, 'w') as f:
f.write(self._data)
def get_product(dir):
if dir is None:
dir = etcdir()
try:
with open(os.path.join(dir, 'SCYLLA-PRODUCT-FILE')) as f:
return f.read().strip()
except FileNotFoundError:
return 'scylla'

View File

@@ -1,6 +1,7 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import sys
import signal
import subprocess

View File

@@ -54,14 +54,6 @@ these default locations can overridden by specifying
`--alternator-encryption-options keyfile="..."` and
`--alternator-encryption-options certificate="..."`.
In addition to **alternator_port** and **alternator_https_port**, the two
options **alternator_port_proxy_protocol** (for HTTP) and
**alternator_https_port_proxy_protocol** (for HTTPS) allow running Alternator
behind a reverse proxy, such as HAProxy or AWS PrivateLink, and still report
the correct client address. The reverse proxy must be configured to use
[Proxy Protocol v2](https://www.haproxy.org/download/1.8/doc/proxy-protocol.txt)
(the binary header format).
By default, ScyllaDB saves a snapshot of deleted tables. But Alternator does
not offer an API to restore these snapshots, so these snapshots are not useful
and waste disk space - deleting a table does not recover any disk space.
@@ -154,5 +146,4 @@ with Tablets enabled.
getting-started
compatibility
new-apis
network
```

View File

@@ -1,159 +0,0 @@
# Reducing network costs in Alternator
In some deployments, the network between the application and the ScyllaDB
cluster is limited in bandwidth, or network traffic is expensive.
This document surveys some mechanisms that can be used to reduce this network
traffic.
## Compression
An easy way to reduce network traffic between the application and the
ScyllaDB cluster is to compress the requests and responses. The resulting
saving can be substantial if items are long and compress well - e.g., long
strings. But additionally, Alternator's protocol (the DynamoDB API) is
JSON-based - textual and verbose - and compresses well.
Note that enabling compression also has a downside - it requires more
computation by both client and server. So the choice of whether to enable
compression depends on the relative cost of network traffic and CPU.
### Compression of requests
When the application sends a large request - notably a `PutItem` or
`BatchWriteItem` - we can reduce network usage by compressing the content
of this request.
Alternator's request protocol - the DynamoDB API - is based on HTTP or HTTPS.
The standard HTTP 1.1 mechanism for compressing a request is to compress
the request's body using some compression algorithm (e.g., gzip) and
send a header `Content-Encoding: gzip` stating which compression algorithm
was used. Alternator currently supports two compression algorithms, `gzip`
and `deflate`, both standardized in ([RFC 9110](https://www.rfc-editor.org/rfc/rfc9110.html)).
Other standard compression types which are listed in
[IANA's HTTP Content Coding Registry](https://www.iana.org/assignments/http-parameters/http-parameters.xhtml#content-coding),
including `zstd` ([RFC 8878][https://www.rfc-editor.org/rfc/rfc8878.html]),
are not yet supported by Alternator.
Note that HTTP's compression only compresses the request's _body_ - not the
request's _headers_ - so it is beneficial to avoid sending unnecessary headers
in the request, as they will not be compressed. See more on this below.
To use compressed requests, the client library (SDK) used by the application
should be configured to actually compress requests. Amazon's AWS SDKs support
this feature in some languages, but not in others, and may automatically
compress only requests that are longer than a certain size. Check their website
<https://docs.aws.amazon.com/sdkref/latest/guide/feature-compression.html>
for more details for the specific SDK you are using. ScyllaDB also publishes
extensions for these SDKs which may have better support for compressed
requests (and other features mentioned in this document), so again please
consult the documentation of the specific SDK that you are using.
### Compression of responses
Some types of requests, notably `GetItem` and `BatchGetItem`, can have small
requests but large responses, so it can be beneficial to compress these
responses. The standard HTTP mechanism for doing this is that the client
provides a header like `Accept-Encoding: gzip` in the request, signalling
that it is ready to accept a gzip'ed response. Alternator may then compress
the response, or not, depending on its configuration and the response's
size. If Alternator does compress the response body, it sets a header
`Content-Encoding: gzip` in the response.
Currently, Alternator supports response compression with either `gzip`
or `deflate` encodings. If the client requests response compression
(via the `Accept-Encoding` header), then by default Alternator compresses
responses over 4 KB in length (leaving smaller responses uncompressed),
and uses compression level 6 (where 1 is the fastest, 9 is best compression).
These defaults can be changed with the configuration options
`alternator_response_compression_threshold_in_bytes` and
`alternator_response_gzip_compression_level`, respectively.
To use compressed responses, the client library (SDK) used by application
should be configured to send an `Accept-Encoding: gzip` header and to
understand the potentially-compressed response. Although DynamoDB does
support compressing responses, it is not clear if any of Amazon's AWS SDKs can
use this feature. ScyllaDB publishes extensions for these SDKs which may
have better support for compressed responses (and other features mentioned
in this document), so please consult the documentation of the specific SDK that
you are using to check if it can make use of the response compression
feature that Alternator supports.
## Fewer and shorter headers
As we saw above, the HTTP 1.1 protocol allows compressing the _bodies_ of
requests and responses. But HTTP 1.1 does not have a mechanism to compress
_headers_. So both client (SDK) and server (Alternator) should avoid sending
headers that are unnecessary or unnecessarily long.
The Alternator server sends headers like the following in its responses:
```
Content-Length: 2
Content-Type: application/x-amz-json-1.0
Date: Tue, 30 Dec 2025 20:00:01 GMT
Server: Seastar httpd
```
This is a bit over 100 bytes. Most of it is necessary, but the `Date`
and `Server` headers are not strictly necessary and a future version of
Alternator will most likely make them optional (or remove them altogether).
The request headers add significantly larger overhead, and AWS SDKs add
even more than necessary. Here is an example:
```
Content-Length: 300
amz-sdk-request: attempt=1
amz-sdk-invocation-id: caf3eb0e-8138-4cbd-adff-44ac357de04e
X-Amz-Date: 20251230T200829Z
host: 127.15.196.80:8000
User-Agent: Boto3/1.42.12 md/Botocore#1.42.12 md/awscrt#0.27.2 ua/2.1 os/linux#6.17.12-300.fc43.x86_64 md/arch#x86_64 lang/python#3.14.2 md/pyimpl#CPython m/Z,N,e,D,P,b cfg/retry-mode#legacy Botocore/1.42.12 Resource
Content-Type: application/x-amz-json-1.0
Authorization: AWS4-HMAC-SHA256 Credential=cassandra/20251230/us-east-1/dynamodb/aws4_request, SignedHeaders=content-type;host;x-amz-date;x-amz-target, Signature=34100a8cd044ef131c1ae025c91c6fc3468507c28449615cdb2edb4d82298be0"
X-Amz-Target: DynamoDB_20120810.CreateTable
Accept-Encoding: identity
```
There is a lot of waste here: Some headers like `amz-sdk-invocation-id` are
not needed at all. Others like `User-Agent` are useful for debugging but the
200-byte string sent by boto3 (AWS's Python SDK) on every request is clearly
excessive. In AWS's SDKs the user cannot make the request headers shorter,
but we plan to provide such a feature in ScyllaDB's extensions to AWS's SDKs.
Note that the request signing protocol used by Alternator and DynamoDB,
known as AWS Signature Version 4, needs the headers `x-amz-date` and
`Authorization` - which together use (as can be seen in the above example)
more than 230 bytes in each and every request. We could save these 230 bytes
by using SSL - instead of AWS Signature Version 4 - for authentication.
This idea is not yet implemented however - it is not supported by Alternator,
DynamoDB, or any of the client SDKs.
Some middleware or gateways modify HTTP traffic going through them and add
even more headers for their own use. If you suspect this might be happening,
inspect the HTTP traffic of your workload with a packet analyzer to see if
any unnecessary HTTP headers appear in requests arriving to Alternator, or
in responses arriving back at the application.
## Rack-aware request routing
In some deployments, the network between the client and the ScyllaDB server
is only expensive if the communication crosses **rack** boundaries.
A "rack" is a ScyllaDB term roughly equivalent to AWS's "availability zone".
Typically, a ScyllaDB cluster is divided to three racks and each item of
data is replicated once on each rack. Each instance of the client application
also lives in one of these racks. In many deployments, communication inside
the rack is free but communicating to a different rack costs extra. It is
therefore important that the client be aware of which rack it is running
on, and send the request to a ScyllaDB node on the same rack. It should not
choose one of the three replicas as random. This is known as "rack-aware
request routing", and should be enabled in the ScyllaDB extension of the
AWS SDK.
## Networking inside the ScyllaDB cluster
The previous sections were all about reducing the amount of network traffic
between the client application and the ScyllaDB cluster. If the network
between different ScyllaDB nodes is also metered - especially between nodes
in different racks - then this intra-cluster networking should also be reduced.
The best way to do that is to enable compression of internode communication.
Refer to [Advanced Internode (RPC) Compression](../operating-scylla/procedures/config-change/advanced-internode-compression.html)
for instructions on how to enable it.

View File

@@ -8,22 +8,28 @@ SSTable Version Support
------------------------
.. list-table::
:widths: 50 50
:widths: 33 33 33
:header-rows: 1
* - SSTable Version
- ScyllaDB Version
- ScyllaDB Enterprise Version
- ScyllaDB Open Source Version
* - 3.x ('ms')
- 2025.4 and above
- None
* - 3.x ('me')
- 2022.2 and above
- 5.1 and above
* - 3.x ('md')
- 2021.1
* The supported formats are ``me`` and ``ms``.
* The ``md`` format is used only when upgrading from an existing cluster using
``md``. The ``sstable_format`` parameter is ignored if it is set to ``md``.
* Note: The ``sstable_format`` parameter specifies the SSTable format used for
**writes**. The legacy SSTable formats (``ka``, ``la``, ``mc``) remain
supported for reads, which is essential for restoring clusters from existing
backups.
- 4.3, 4.4, 4.5, 4.6, 5.0
* - 3.0 ('mc')
- 2019.1, 2020.1
- 3.x, 4.1, 4.2
* - 2.2 ('la')
- N/A
- 2.3
* - 2.1.8 ('ka')
- 2018.1
- 2.2

View File

@@ -9,6 +9,8 @@ ScyllaDB SSTable Format
.. include:: _common/sstable_what_is.rst
In ScyllaDB 6.0 and above, *me* format is enabled by default.
For more information on each of the SSTable formats, see below:
* :doc:`SSTable 2.x <sstable2/index>`

View File

@@ -12,6 +12,8 @@ ScyllaDB SSTable - 3.x
.. include:: ../_common/sstable_what_is.rst
In ScyllaDB 6.0 and above, the ``me`` format is mandatory, and ``md`` format is used only when upgrading from an existing cluster using ``md``. The ``sstable_format`` parameter is ignored if it is set to ``md``.
Additional Information
-------------------------

View File

@@ -42,10 +42,9 @@ the administrator. The tablet load balancer decides where to migrate
the tablets, either within the same node to balance the shards or across
the nodes to balance the global load in the cluster.
During this process, the balancer will compute disk usage based on the
actual disk sizes of the tablets located on the nodes. It will then issue
tablet migrations which migrate tablets from nodes with higher to nodes
with lower disk utilization, equalizing the load.
The number of tablets the load balancer maintains on a node is directly
proportional to the node's storage capacity. A node with twice
the storage will have twice the number of tablets located on it.
As a table grows, each tablet can split into two, creating a new tablet.
The load balancer can migrate the split halves independently to different nodes
@@ -194,18 +193,12 @@ Limitations and Unsupported Features
.. warning::
If a keyspace has tablets enabled and contains a materialized view or a secondary index, it must
remain :term:`RF-rack-valid <RF-rack-valid keyspace>` throughout its lifetime. Failing to keep that
invariant satisfied may result in data inconsistencies, performance problems, or other issues.
If a keyspace has tablets enabled, it must remain :term:`RF-rack-valid <RF-rack-valid keyspace>`
throughout its lifetime. Failing to keep that invariant satisfied may result in data inconsistencies,
performance problems, or other issues.
The invariant is enforced while the keyspace contains a materialized view or a secondary index, or
if the `rf_rack_valid_keyspaces` option is set, by rejecting operations that would violate the RF-rack-valid property:
Altering a keyspace's replication factor, adding a node in a new rack, or removing the last node
in a rack, will be rejected if they would result in an RF-rack-invalid keyspace.
To enable materialized views and secondary indexes for tablet keyspaces, the keyspace
must be :term:`RF-rack-valid <RF-rack-valid keyspace>`.
See :ref:`Views with tablets <admin-views-with-tablets>` for details.
To enable materialized views and secondary indexes for tablet keyspaces, use
the `--rf-rack-valid-keyspaces` See :ref:`Views with tablets <admin-views-with-tablets>` for details.
Resharding in keyspaces with tablets enabled has the following limitations:

View File

@@ -289,7 +289,7 @@ with tablets enabled. Keyspaces using tablets must also remain :term:`RF-rack-va
throughout their lifetime. See :ref:`Limitations and Unsupported Features <tablets-limitations>`
for details.
**The** ``initial`` **sub-option (deprecated)**
**The ``initial`` sub-option (deprecated)**
A good rule of thumb to calculate initial tablets is to divide the expected total storage used
by tables in this keyspace by (``replication_factor`` * 5GB). For example, if you expect a 30TB
@@ -422,7 +422,7 @@ An empty list is allowed, and it's equivalent to numeric replication factor of 0
Altering from a rack list to a numeric replication factor is not supported.
Keyspaces which use rack lists are :term:`RF-rack-valid <RF-rack-valid keyspace>` if each rack in the rack list contains at least one node (excluding :doc:`zero-token nodes </architecture/zero-token-nodes>`).
Keyspaces which use rack lists are :term:`RF-rack-valid <RF-rack-valid keyspace>`.
.. _drop-keyspace-statement:

View File

@@ -42,55 +42,14 @@ Creating a secondary index on a table uses the ``CREATE INDEX`` statement:
create_index_statement: CREATE [ CUSTOM ] INDEX [ IF NOT EXISTS ] [ `index_name` ]
: ON `table_name` '(' `index_identifier` ')'
: [ USING `string` [ WITH `index_properties` ] ]
: [ USING `string` [ WITH OPTIONS = `map_literal` ] ]
index_identifier: `column_name`
:| ( FULL ) '(' `column_name` ')'
index_properties: index_property (AND index_property)*
index_property: OPTIONS = `map_literal`
:| view_property
where `view_property` is any :ref:`property <mv-options>` that can be used when creating
a :doc:`materialized view </features/materialized-views>`. The only exception is `CLUSTERING ORDER BY`,
which is not supported by secondary indexes.
If the statement is provided with a materialized view property, it will not be applied to the index itself.
Instead, it will be applied to the underlying materialized view of it.
For instance::
CREATE INDEX userIndex ON NerdMovies (user);
CREATE INDEX ON Mutants (abilityId);
-- Create a secondary index called `catsIndex` on the table `Animals`.
-- The indexed column is `cats`. Both properties, `comment` and
-- `synchronous_updates`, are view properties, so the underlying materialized
-- view will be configured with: `comment = 'everyone likes cats'` and
-- `synchronous_updates = true`.
CREATE INDEX catsIndex ON Animals (cats) WITH comment = 'everyone likes cats' AND synchronous_updates = true;
-- Create a secondary index called `dogsIndex` on the same table, `Animals`.
-- This time, the indexed column is `dogs`. The property `gc_grace_seconds` is
-- a view property, so the underlying materialized view will be configured with
-- `gc_grace_seconds = 13`.
CREATE INDEX dogsIndex ON Animals (dogs) WITH gc_grace_seconds = 13;
-- The view property `CLUSTERING ORDER BY` is not supported by secondary indexes,
-- so this statement will be rejected by Scylla.
CREATE INDEX bearsIndex ON Animals (bears) WITH CLUSTERING ORDER BY (bears ASC);
View properties of a secondary index have the same limitations as those imposed by materialized views.
For instance, a materialized view cannot be created specifying ``gc_grace_seconds = 0``, so creating
a secondary index with the same property will not be possible either.
Example::
-- This statement will be rejected by Scylla because creating
-- a materialized view with `gc_grace_seconds = 0` is not possible.
CREATE INDEX names ON clients (name) WITH gc_grace_seconds = 0;
-- This statement will also be rejected by Scylla.
-- It's not possible to use `COMPACT STORAGE` with a materialized view.
CREATE INDEX names ON clients (name) WITH COMPACT STORAGE;
CREATE INDEX userIndex ON NerdMovies (user);
CREATE INDEX ON Mutants (abilityId);
The ``CREATE INDEX`` statement is used to create a new (automatic) secondary index for a given (existing) column in a
given table. A name for the index itself can be specified before the ``ON`` keyword, if desired. If data already exists

View File

@@ -20,7 +20,9 @@ command line option when launchgin scylla.
You can define endpoint details in the `scylla.yaml` file. For example:
```yaml
object_storage_endpoints:
- name: https://s3.us-east-1.amazonaws.com:443
- name: s3.us-east-1.amazonaws.com
port: 443
https: true
aws_region: us-east-1
```
@@ -76,7 +78,9 @@ The examples above are intended for development or local environments. You shoul
For the EC2 Instance Metadata Service to function correctly, no additional configuration is required. However, STS requires the IAM Role ARN to be defined in the `scylla.yaml` file, as shown below:
```yaml
object_storage_endpoints:
- name: https://s3.us-east-1.amazonaws.com:443
- name: s3.us-east-1.amazonaws.com
port: 443
https: true
aws_region: us-east-1
iam_role_arn: arn:aws:iam::123456789012:instance-profile/my-instance-instance-profile
```
@@ -96,7 +100,9 @@ in `scylla.yaml`:
```yaml
object_storage_endpoints:
- name: https://s3.us-east-2.amazonaws.com:443
- name: s3.us-east-2.amazonaws.com
port: 443
https: true
aws_region: us-east-2
```

View File

@@ -30,7 +30,7 @@ SELECT * FROM system.role_attributes WHERE role='r' and attribute_name='service
### Service Level Configuration Table
```
CREATE TABLE system.service_levels_v2 (
CREATE TABLE system_distributed.service_levels (
service_level text PRIMARY KEY,
timeout duration,
workload_type text,
@@ -45,19 +45,14 @@ The table column names meanings are:
*shares* - a number that represents this service level priority in relation to other service levels.
```
select * from system.service_levels_v2 ;
select * from system_distributed.service_levels ;
service_level | shares | timeout | workload_type
---------------+--------+---------+---------------
sl | 100 | 500ms | interactive
service_level | timeout | workload_type
---------------+---------+---------------
sl | 500ms | interactive
```
* Please note that `system.service_levels_v2` is used as service levels configuration table only with consistent topology and service levels on Raft,
which is enabled by default but not mandatory yet.
Otherwise, the old `system_distributed.service_levels` is used as configuration table.
For more information see [service levels on Raft](#service-levels-on-raft) section. *
### Service Level Timeout
Service level timeout can be used to assign a default timeout value for all operations for a particular service level.
@@ -195,20 +190,6 @@ The command displays a table with: option name, effective service level the valu
timeout | sl1 | 2s
```
## Service levels on Raft
Since ScyllaDB 6.0 and ScyllaDB Enterprise 2024.2, service levels metadata is managed by Raft group0 and stored in `system.service_levels_v2`.
In existing clusters, service levels metadata is automatically copied from `system_distributed.service_levels` to `system.service_levels_v2`
during the topology upgrade (see [upgrade to raft topology section](topology-over-raft.md#upgrade-from-legacy-topology-to-raft-based-topology)). Because of this, no service levels operations should be done during the topology upgrade.
Before the service levels on Raft, service levels cache was updated in 10s intervals by polling `system_distributed.service_levels` table.
Once the service levels are migrated to raft, the cache is updated on every group0 state apply fiber, if there are any mutations to
`system.service_levels_v2`, `system.role_attributes` or ` system.role_members`.
With service levels on Raft, the cache also has an additional layer, called `service level effective cache`. This layer combines service levels
and auth information and stores effective service level for each role (see [effective service level section](#effective-service-level)).
## Implementation
### Integration with auth

View File

@@ -1,111 +0,0 @@
# Size based load balancing
Until size based load balancing was introduced, ScyllaDB performed disk based balancing
based on disk capacity and tablet count with the assumption that every tablet uses the
same amount of disk space. This means that the number of tablets located on a node was
proportional to the gross disk capacity of that node. Because the used disk space of
different tablets can vary greatly, this could create imbalance in disk utilization.
Size based load balancing aims to achieve better disk utilization accross nodes in a
cluster. The load balancer will continuously gather information about available disk
space and tablet sizes from all the nodes. It then incrementally computes tablet
migration plans which equalize disk utilization accross the cluster.
# Basic operation
The load balancer runs as a background task on the same shard that the topology
coordinator is running on. The topology coordinator collects information needed
for the load balancer to make decisions about which tablets to migrate. This
information is collected periodically by the coordinator from all the nodes in
the cluster in the form of the data structure ``load_stats``. The information in
this struct relevant for the load balancer is stored in its member tablet_stats,
which contains:
- ``tablet_sizes``: the disk size in bytes of all the tablets on the given node.
- ``effective_capacity``: contains the sum of available disk space and all the tablet sizes on the given node.
The balancer will use this information to compute the disk load (disk utilization)
on every node and shard. It will then migrate tablets from the most to the least
loaded nodes and shards.
# Table balance
The secondary goal of the load balancer is to achieve table balance. This means that
the balancer needs to equalize the following ratios:
- disk used by the table on a shard / total disk used by the table in the rack
- effective capacity of a shard / effective capacity of the rack
Otherwise, we could have imbalance where a shard contains more data from a table,
which can potentially overload the CPU of that shard in case the given table is hot.
# ``load_stats`` reconciliation
Because the ``load_stats`` collection interval is 1 minute, by the time the balancer
starts using the information in ``load_stats``, that information can be stale. This can
be caused by tablet migrations or table resize (split or merge). To get around this,
we need to update the information about tablet sizes in ``load_stats`` after a migration
or resize. Issued tablet migrations update this information in ``load_stats`` by also
migrating the tablet size from one host to another. This tablet size migration
reconciliation is performed during the end_migration stage of the tablet migration.
Tablet size reconciliation for split or merge is performed during tablet resize
finalization. For a split, the reconciliation will divide the tablet size, and create
two tablets in place of the original tablet pre-split. For merge, it will accumulate
the tablet sizes of the tablets pre-merge, create a merged tablet size and remove the
tablet sizes pre-merge from ``load_stats``.
# Force capacity based balancing
The load balancer has the ability to fall back on capacity based balancing. This can be
enforced by a config parameter force_capacity_based_balancing. During capacity based
balancing, the balancer will not look up the actual tablet sizes from ``load_stats``,
and will instead assume each tablet size is equal to default_target_tablet_sizes. It will
also use the gross disk capacity (sent in ``load_stats`` struct in the capacity member),
instead of effective capacity.
# Excluding nodes with incomplete tablet sizes
Even with tablet size reconciliation (during migration and table resize), it is still
possible for the tablet size information in ``load_stats`` to not match the current tablet
information found in the tablet map. In order to avoid problems with the balancer
making decisions based on incomplete or incorrect data, size based load balancing
will not balance nodes which have incomplete tablet sizes. Instead, it will ignore
these nodes (these nodes will not be selected as sources or destinations for tablet
migrations), and will wait for correct tablet sizes to arrive after the next ``load_stats``
refresh by the topology coordinator.
One exception to this are nodes which have been excluded from the cluster. These nodes
are down and therefor are not able to send fresh ``load_stats``. But they have to be drained
of their tablets (via tablet rebuild), and the balancer must do this even with incomplete
tablet data. So, only excluded nodes are allowed to have missing tablet sizes.
# Size based balancing cluster feature
During rolling upgrades, it can take hours or even days until all the nodes in a cluster
have been upgraded. This means that the non-upgraded nodes will send ``load_stats`` without
tablet sizes. Considering the balancer ignores these nodes during balancing, we would
have a problem with some of the nodes not being load balanced for extended periods of
time. To avoid this, size based load balancing is only enabled after the cluster feature
is enabled. Until that time, the balancer will fall back on capacity based balancing.
# Minimal tablet size
After a table is created and it begins to receive data, there can be a period of time
where the data in some of the tablets has been flushed, while others have not. During
this time, the load balancer will migrate seemingly empty (but actually not yet flushed)
tablet away from the nodes where they have been created. Later, when all the tablets
have been flushed, the balancer will migrate the tablets again. In order to avoid these
unnecessary early migrations, we introduce the idea of minimal tablet size. The balancer
will treat any tablet smaller than minimal tablet size as having a size of minimal
tablet size. This reduces early migrations.
# Balance threshold percentage
The balancer considers if a set nodes are balanced by computing the delta of disk load
between the most loaded and least loaded node, and dividing it with the load of the most
loaded node:
delta = (most_loaded - least_loaded) / most_loaded
If this computed value is below a threshold, the nodes are considered balanced. This threshold
can be configured with the ``size_based_balance_threshold_percentage`` config option.

View File

@@ -114,16 +114,9 @@ in `./testlog`. Scylla data files are stored in `/tmp`.
There are several test directories that are excluded from orchestration by `test.py`:
- test/boost
- test/ldap
- test/raft
- test/ldap
- test/unit
- test/vector_search
- test/vector_search_validator
- test/alternator
- test/broadcast_tables
- test/cql
- test/cqlpy
- test/rest_api
This means that `test.py` will not run tests directly, but will delegate all work to `pytest`.
That's why all these directories do not have `suite.yaml` files.

View File

@@ -862,9 +862,6 @@ From the admin's point of view, the steps are as follows:
or via observing the logs
- After all nodes report `done` via the GET endpoint, the upgrade has fully finished
Note that during the upgrade no service levels or auth operations should be done,
as those services are performing migrations to raft metadata.
The `upgrade_state` static column in `system.topology` serves the key role
in coordinating the upgrade. It goes through the following states in the following
order:

View File

@@ -1,5 +1,5 @@
Install ScyllaDB |CURRENT_VERSION|
=====================================
Install ScyllaDB
=================
.. toctree::
:maxdepth: 2

View File

@@ -44,10 +44,9 @@ you want to install.
**Example**
.. code-block:: console
:substitutions:
.. code:: console
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version |CURRENT_VERSION|
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 2025.1.1
Versions Earlier than 2025.1
@@ -65,5 +64,13 @@ For example:
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-product scylla-enterprise --scylla-version 2024.1
To install a supported version of *ScyllaDB Open Source*, run the command with
the ``--scylla-version`` option to specify the version you want to install.
For example:
.. code:: console
curl -sSf get.scylladb.com/server | sudo bash -s -- --scylla-version 6.2.1
.. include:: /getting-started/_common/setup-after-install.rst

View File

@@ -15,7 +15,7 @@ All releases are available as a Docker container, EC2 AMI, GCP, and Azure images
By *supported*, it is meant that:
- A binary installation package is available.
- A binary installation package is available to `download <https://www.scylladb.com/download/>`_.
- The download and install procedures are tested as part of the ScyllaDB release process for each version.
- An automated install is included from :doc:`ScyllaDB Web Installer for Linux tool </getting-started/installation-common/scylla-web-installer>` (for the latest versions).

View File

@@ -6,6 +6,8 @@
* :doc:`ScyllaDB SStable </operating-scylla/admin-tools/scylla-sstable>` - Validates and dumps the content of SStables, generates a histogram, dumps the content of the SStable index.
* :doc:`ScyllaDB Types </operating-scylla/admin-tools/scylla-types/>` - Examines raw values obtained from SStables, logs, coredumps, etc.
* :doc:`cassandra-stress </operating-scylla/admin-tools/cassandra-stress/>` A tool for benchmarking and load testing a ScyllaDB and Cassandra clusters.
* :doc:`SSTabledump </operating-scylla/admin-tools/sstabledump>`
* :doc:`SSTableMetadata </operating-scylla/admin-tools/sstablemetadata>`
* scylla local-file-key-generator - Generate a local file (system) key for :doc:`encryption at rest </operating-scylla/security/encryption-at-rest>`, with the provided length, Key algorithm, Algorithm block mode and Algorithm padding method.
* `scyllatop <https://www.scylladb.com/2016/03/22/scyllatop/>`_ - A terminal base top-like tool for scylladb collectd/prometheus metrics.
* :doc:`scylla_dev_mode_setup</getting-started/installation-common/dev-mod>` - run ScyllaDB in Developer Mode.

View File

@@ -14,6 +14,8 @@ Admin Tools
ScyllaDB Types </operating-scylla/admin-tools/scylla-types/>
sstableloader
cassandra-stress </operating-scylla/admin-tools/cassandra-stress/>
sstabledump
sstablemetadata
ScyllaDB Logs </getting-started/logging/>
perftune
Virtual Tables </operating-scylla/admin-tools/virtual-tables/>

View File

@@ -4,14 +4,21 @@ ScyllaDB SStable
Introduction
-------------
ScyllaDB SStable is a ScyllaDB-native tool for examining and manipulating SStables, written in C++ and built on top of the ScyllaDB codebase.
The tool is built-into the ScyllaDB executable and can be invoked via ``scylla sstable``.
This tool allows you to examine the content of SStables by performing operations such as dumping the content of SStables,
validating the content of SStables, and more. See `Supported Operations`_ for the list of available operations.
Run ``scylla sstable --help`` for additional information about the tool and the operations.
This tool is similar to SStableDump_, with notable differences:
* Built on the ScyllaDB C++ codebase, it supports all SStable formats and components that ScyllaDB supports.
* Expanded scope: this tool supports much more than dumping SStable data components (see `Supported Operations`_).
* More flexible on how schema is obtained and where SStables are located: SStableDump_ only supports dumping SStables located in their native data directory. To dump an SStable, one has to clone the entire ScyllaDB data directory tree, including system table directories and even config files. ``scylla sstable`` can dump sstables from any path with multiple choices on how to obtain the schema, see Schema_.
``scylla sstable`` was developed to supplant SStableDump_ as ScyllaDB-native tool, better tailored for the needs of ScyllaDB.
.. _SStableDump: /operating-scylla/admin-tools/sstabledump
Usage
------

View File

@@ -0,0 +1,53 @@
SSTabledump
============
.. warning:: SSTabledump is deprecated since ScyllaDB 5.4, and will be removed in the next release.
Please consider switching to :doc:`ScyllaDB SSTable </operating-scylla/admin-tools/scylla-sstable>`.
This tool allows you to converts SSTable into a JSON format file.
If you need more flexibility or want to dump more than just the data-component, see :doc:`scylla-sstable </operating-scylla/admin-tools/scylla-sstable>`.
Use the full path to the data file when executing the command.
For example:
.. code-block:: shell
sstabledump /var/lib/scylla/data/keyspace1/standard1-7119946056b211e98e85000000000001/mc-12-big-Data.db
Example output:
.. code-block:: shell
[
{
"partition" : {
"key" : [ "3137343334334f4f4d30" ],
"position" : 0
},
"rows" : [
{
"type" : "static_block",
"position" : 281,
"cells" : [
{ "name" : "\"C0\"", "value" : "0xb7789e7cdf2af541061f207714a3b8e14c72f74e663bd5c2577ac329bcb3161cf10c", "tstamp" : "2019-04-04T08:22:24.336001Z" },
{ "name" : "\"C1\"", "value" : "0xe8ed77f078a23e37f8a7246ccd8cd4099585c7031e242529e5070246860d7a1b1e85", "tstamp" : "2019-04-04T08:22:24.336001Z" },
{ "name" : "\"C2\"", "value" : "0x3b836d4333d2d5a02a63ced47596bfb5f80ecb8e80686061c3daaba87380994b7b61", "tstamp" : "2019-04-04T08:22:24.336001Z" },
{ "name" : "\"C3\"", "value" : "0x9220219581df87ff131306b8bf793c14ae8ebf8c8af1b638827ebfcab85660a378b8", "tstamp" : "2019-04-04T08:22:24.336001Z" },
{ "name" : "\"C4\"", "value" : "0x5b4c972cdeb330035b82dc0b1daa9051fff7956d45e3c6c2b21dfb1fd2bb43fb1146", "tstamp" : "2019-04-04T08:22:24.336001Z" }
]
}
]
**NOTE:**
When running as a user that is not ``root`` or ``scylla`` an error (java traceback) might be observed.
To work around the error, please use the following syntax:
.. code-block:: sh
cassandra_storagedir="/tmp" sstabledump [filename]

View File

@@ -0,0 +1,55 @@
SSTableMetadata
===============
.. warning:: SSTableMetadata is deprecated since ScyllaDB 5.4, and will be removed in the next release.
Please consider switching to :ref:`scylla sstable dump-statistics` and :ref:`scylla sstable dump-summary`.
SSTableMetadata prints metadata in ``Statistics.db`` and ``Summary.db`` about the specified SSTables to the console.
Use the full path to the data file when executing the command.
For example:
.. code-block:: console
$ sstablemetadata /var/lib/scylla/data/keyspace1/standard1-e6a565803a5e11ee83de7d84e184e393/me-9-big-Data.db
SSTable: /var/lib/scylla/data/keyspace1/standard1-e6a565803a5e11ee83de7d84e184e393/me-9-big-Data.db
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Bloom Filter FP chance: 0.010000
Minimum timestamp: 1691989005063001
Maximum timestamp: 1691989010554004
SSTable min local deletion time: 2147483647
SSTable max local deletion time: 2147483647
Compressor: -
TTL min: 0
TTL max: 0
First token: -9220288174854979040 (key=4e374b384d4b39313930)
Last token: -5881915680038312597 (key=324b4e31343339373231)
minClustringValues: []
maxClustringValues: []
Estimated droppable tombstones: 0.0
SSTable Level: 0
Repaired at: 0
Replay positions covered: {}
totalColumnsSet: 22895
totalRows: 4579
originatingHostId: 6a684ae1-7b76-44cc-9e07-32975af0e60e
Estimated tombstone drop times:Count Row Size Cell Count
1 0 0
2 0 0
3 0 0
4 0 0
5 0 4579
6 0 0
7 0 0
8 0 0
...
1414838745986 0
Estimated cardinality: 42
EncodingStats minTTL: 0
EncodingStats minLocalDeletionTime: 1442880000
EncodingStats minTimestamp: 1691988990337000
KeyType: org.apache.cassandra.db.marshal.BytesType
ClusteringTypes: []
StaticColumns: {}
RegularColumns: {C3:org.apache.cassandra.db.marshal.BytesType, C4:org.apache.cassandra.db.marshal.BytesType, C0:org.apache.cassandra.db.marshal.BytesType, C1:org.apache.cassandra.db.marshal.BytesType, C2:org.apache.cassandra.db.marshal.BytesType}

View File

@@ -111,7 +111,9 @@ should follow this format:
.. code-block:: yaml
object_storage_endpoints:
- name: http[s]://<endpoint_address_or_domain_name>[:<port_number>]
- name: <endpoint_address_or_domain_name>
port: <port_number>
https: <true_or_false> # optional
aws_region: <region_name> # optional, e.g. us-east-1
iam_role_arn: <iam_role> # optional
@@ -121,7 +123,9 @@ Example:
.. code:: yaml
object_storage_endpoints:
- name: https://s3.us-east-1.amazonaws.com
- name: s3.us-east-1.amazonaws.com
port: 443
https: true
aws_region: us-east-1
iam_role_arn: arn:aws:iam::123456789012:instance-profile/my-instance-instance-profile
@@ -263,15 +267,13 @@ Use ``scylla --help`` to get the list of experimental features.
Views with Tablets
------------------
Materialized Views (MV) and Secondary Indexes (SI) are supported in keyspaces that use tablets
only when the keyspaces are :term:`RF-rack-valid <RF-rack-valid keyspace>`.
Materialized Views (MV) and Secondary Indexes (SI) are enabled in keyspaces that use tablets
only when :term:`RF-rack-valid keyspaces <RF-rack-valid keyspace>` are enforced. That can be
done in the ``scylla.yaml`` configuration file by specifying
When a keyspace contains a Materialized View or Secondary Index, some operations are restricted to maintain
the RF-rack condition. The following actions are not allowed while the view or index is present:
.. code-block:: yaml
* Altering the keyspace's replication factor to a value that would violate the RF-rack-valid property
* Adding a node in a new rack in an existing datacenter
* Removing the last node in a rack
rf_rack_valid_keyspaces: true
Monitoring

View File

@@ -102,6 +102,7 @@ Other Tools
ScyllaDB has various other tools, mainly to work with sstables.
If you are diagnosing a problem that is related to sstables misbehaving or being corrupt, you may find these useful:
* :doc:`sstabledump </operating-scylla/admin-tools/sstabledump/>`
* :doc:`ScyllaDB SStable </operating-scylla/admin-tools/scylla-sstable/>`
* :doc:`ScyllaDB Types </operating-scylla/admin-tools/scylla-types/>`

View File

@@ -49,13 +49,6 @@ Procedure
* **seeds** - Specifies the IP address of an existing node in the cluster. The new node will use this IP to connect to the cluster and learn the cluster topology and state.
.. warning::
Adding a node in a new rack in an existing datacenter may violate the :term:`RF-rack-valid <RF-rack-valid keyspace>` constraints of some keyspace.
If a keyspace uses tablets and contains a Materialized View or Secondary Index, or if the `rf_rack_valid_keyspaces` option is set,
the invariant is enforced for the keyspace, and the new node will be rejected if adding it would violate the invariant.
#. Start the nodes with the following command:
.. include:: /rst_include/scylla-commands-start-index.rst

View File

@@ -30,13 +30,6 @@ Removing a Running Node
.. include:: /operating-scylla/_common/decommission_warning.rst
.. warning::
Removing a node in a rack may violate the :term:`RF-rack-valid <RF-rack-valid keyspace>` constraints of some keyspace.
If a keyspace uses tablets and contains a Materialized View or Secondary Index, or if the `rf_rack_valid_keyspaces` option is set,
the invariant is enforced for the keyspace, and the node removal will be rejected if it would violate the invariant.
#. Run the ``nodetool netstats`` command to monitor the progress of the token reallocation.
#. Run the ``nodetool status`` command to verify that the node has been removed.

View File

@@ -89,7 +89,7 @@ Enabling Experimental Features
There are two ways to enable experimental features:
* Enable all experimental features (which may be risky)
* Enable only the features you want to try
* Enable only the features you want to try (available in Scylla Open Source from version 3.2)
Enable All Experimental Features
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

View File

@@ -154,7 +154,7 @@ is holding your keys. You can use the following options:
* - Local Key Provider
- ``LocalFileSystemKeyProviderFactory`` (**default**)
- Stores the key on the same machine as the data.
* - Replicated Key Provider (**deprecated**)
* - Replicated Key Provider
- ``ReplicatedKeyProviderFactory``
- Stores table keys in a ScyllaDB table where the table itself is encrypted
using the system key.
@@ -183,14 +183,13 @@ Local Key Provider
The Local Key Provider is less safe than other options and as such it is not
recommended for production use. It is the default key provider for the
node-local encryption configuration in ``scylla.yaml`` and table encryption
because it does not require any external resources.
In production environments, it is recommended to use an external KMS instead.
node-local encryption configuration in ``scylla.yaml`` because it does not
require any external resources. In production environments, it is recommended
to use an external KMS instead.
The Local Key Provider is the default key provider for the node-local encryption
configuration in ScyllaDB (``user_info_encryption`` and ``system_info_encryption``
in ``scylla.yaml``) as well as table encryption.
It stores the encryption keys locally on disk in a text file.
in ``scylla.yaml``). It stores the encryption keys locally on disk in a text file.
The location of this file is specified in ``scylla.yaml``, or in the table schema.
The user has the option to generate the key(s) themselves, or let ScyllaDB
generate the key(s) for them.
@@ -211,18 +210,16 @@ Replicated Key Provider
.. note::
**Warning**: The replicated key provider is deprecated and will be removed
in a future ScyllaDB release.
The Replicated Key Provider is not recommended for production use because it
does not support key rotation. For compatibility with DataStax Cassandra, it
is the default key provider for per-table encryption setup. In production
environments, an external KMS should be used instead.
The Replicated Key Provider stores and distributes the encryption keys across
every node in the cluster through a special ScyllaDB system table
(``system_replicated_keys.encrypted_keys``). The Replicated Key Provider
requires two additional keys to operate:
The Replicated Key Provider is the default key provider for per-table encryption
setup in ScyllaDB (``scylla_encryption_options`` in table schema). It stores and
distributes the encryption keys across every node in the cluster through a
special ScyllaDB system table (``system_replicated_keys.encrypted_keys``). The
Replicated Key Provider requires two additional keys to operate:
* A system key - used to encrypt the data in the system table. The system key
can be either a local key, or a KMIP key.
@@ -305,7 +302,7 @@ Depending on your key provider, you will either have the option to allow
ScyllaDB to generate an encryption key, or you will have to provide one:
* Local Key Provider - you can provide your own keys, otherwise ScyllaDB will generate them for you
* Replicated Key Provider - you must generate a system key yourself (**deprecated**)
* Replicated Key Provider - you must generate a system key yourself
* KMIP Key Provider - you can provide your own keys, otherwise ScyllaDB will generate them for you
* KMS Key Provider - you must generate a key yourself in AWS
* GCP Key Provider - you must generate a key yourself in GCP
@@ -435,8 +432,6 @@ desired key provider:
You cannot use the same key for both the system key and the local
secret key. They must be different keys.
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. group-tab:: KMIP Key Provider
The KMIP Key Provider will first try to discover existing keys in the KMIP
@@ -999,8 +994,6 @@ in the ``scylla.yaml`` file.
The Replicated Key Provider cannot be used in ``user_info_encryption``.
You can only use it to :ref:`Encrypt a Single Table <ear-create-table>`.
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. group-tab:: KMIP Key Provider
* Make sure to :ref:`set up a KMIP Host <encryption-at-rest-set-kmip>`.
@@ -1082,8 +1075,6 @@ in the ``scylla.yaml`` file.
The Replicated Key Provider cannot be used in ``user_info_encryption``.
You can only use it to :ref:`Encrypt a Single Table <ear-create-table>`.
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. group-tab:: KMIP Key Provider
.. code-block:: yaml
@@ -1305,8 +1296,6 @@ This procedure demonstrates how to encrypt a new table.
.. group-tab:: Replicated Key Provider
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
* Ensure you have a system key. The system key can be either a local key,
or a KMIP key. If you don't have a system key, create one by following
the procedure in :ref:`Create Encryption Keys <ear-create-encryption-key>`.
@@ -1408,7 +1397,6 @@ This procedure demonstrates how to encrypt a new table.
;
.. group-tab:: Replicated Key Provider
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. code-block:: cql
@@ -1833,8 +1821,6 @@ Once this encryption is enabled, it is used for all system data.
The Replicated Key Provider cannot be used for system encryption. You can
only use it to :ref:`Encrypt a Single Table <ear-create-table>`.
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. group-tab:: KMIP Key Provider
* Make sure to :ref:`set up a KMIP Host <encryption-at-rest-set-kmip>`.
@@ -1916,8 +1902,6 @@ Once this encryption is enabled, it is used for all system data.
The Replicated Key Provider cannot be used for system encryption. You
can only use it to :ref:`Encrypt a Single Table <ear-create-table>`.
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. group-tab:: KMIP Key Provider
.. code-block:: yaml
@@ -2143,8 +2127,6 @@ varies depending on the key provider you are using.
The Replicated Key Provider does not support key rotation. If you need to
rotate keys, you must migrate to a different key provider.
**Warning**: The replicated key provider is deprecated and will be removed in a future ScyllaDB release.
.. group-tab:: KMIP Key Provider
.. DSE docs: https://docs.datastax.com/en/dse/6.9/securing/configure-kmip-encryption.html?#secRekeyKMIP

Some files were not shown because too many files have changed in this diff Show More