Files
scylladb/test/perf/perf_alternator.cc
Avi Kivity 6df04c9e5b Update seastar submodule
Changed seastar::http::experimental to seastar::http to reflect
graduation of the seastar http API.

Changed call to seastar::rename_file() (in sstables/storage.cc,
sstables/sstable_directory.cc, sstable/sstables.cc and
db/hints/internal/hint_storage.cc) to reflect new default parameter.

Updated scylla_gdb test helper get_task() to work with updated
accept loop in Seatar. This is just test code (attempts to find
a task to operate on), not used in real scylla-gdb.py work, but
nevertheless the adjustment keeps backward compatibility.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1798
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2043

* seastar 485a62b2...510f3148 (43):
  > reactor_backend: fix iocb double-free and shutdown hang during AIO teardown
  > file: fix default DMA alignment
  > http: add to_reply() to redirect_exception with extra-header support
  > core: propagate syscall errors via `coroutine::exception`
  > file: assert dma alignments are powers of two
  > doc: Document undocumented io_tester features and fix output example
  > backtrace: print the build_id along with the backtrace
  > reactor: default to oneline backtraces
  > Merge 'json: formatter: support types with user-defined conversion to sstring' from Benny Halevy
    tests: json_formatter: test formatter::write with string types
    json: formatter: support types with user-defined conversion to sstring
  > httpd_test: fix build failure with Seastar_SSTRING=OFF
  > net/tls: introduce ssl_call wrapper for SSL I/O
  > build: disable unused command line argument error for C++ module
  > coroutine/generator: fix setup of generator's waiting task
  > tests/tls: set 1000-day validity for self-signed CA cert
  > net: tls: openssl: disable certificate compression
  > reactor: reduce steady_clock::now() calls per scheduling quantum
  > fair_queue: remove notify_request_finished()
  > loop: use small_vector for parallel_for_each_state incomplete futures
  > dodge false sharing in spinlock
  > Merge 'Handle nowait support for reads and writes independently' from Pavel Emelyanov
    file: Change nowait_works mode detection
    file: Introduce read-only nowait_mode
    filesystem: Make nowait_works bit a enum class too
    file: Make nowait_works bit a enum class
  > Merge 'net/tls: improve OpenSSL error queue hygiene' from Gellért Peresztegi-Nagy
    net/tls: assert clean error queue before SSL operations
    net/tls: clear error queue after successful SSL operations
    net/tls: clear error queue after successful SSL_CTX_new
    net/tls: drain error queue on unexpected error codes
    net/tls: use make_openssl_error for BIO creation failure
  > vla.hh: add missing includes
  > Merge 'smp: make smp::count non-static' from Avi Kivity
    smp: convert all smp::count usages to instance-aware alternatives
    smp: add per-instance shard_count and this_smp() infrastructure
    disk_params: document pre-init smp::count access with explicit 0
    reactor_backend: document pre-init smp::count access with explicit 0
    tests: alien_test: pass shard count to alien thread explicitly
  > build: fix cmake missing ninja on Ubuntu 26.04
  > rpc: Fix uint64 wraparound of expired timeout in send_entry()
  > Merge 'Generalize some RPC tests' from Pavel Emelyanov
    tests: Generalize async connection-based scheduling RPC tests
    tests: Generalize sync connection-based scheduling RPC tests
    tests: Remove redundant variadic/nonvariadic RPC tuple tests
    tests: Generalize max timeout RPC tests
  > net: tls: openssl: Share BIO ptrs across shards
  > http: fix compilation on clang 22 with c++26
  > build: openssl tools needed for test cert generation
  > reactor: support rename2
  > future: fix forwarding of reference types
  > Merge 'Zero-copy http chunked data sink' from Pavel Emelyanov
    http: Make chunked data sink zero-copy
    tests/prometheus_http: Rewrite on top of http::client
    tests/httpd: Rewrite content_length_limit on top of http::client
  > tests: Replace ad-hoc http_consumer with production HTTP parser
  > Merge 'co_return to accept same expressions and types as return' from Alexey Bashtanov
    tests/unit/{coroutines,futures}: strict types on co_return and set_value
    api: introduce version 10:
    core/{coroutine,future}: make `co_return` more strict with types
    core/{coroutine,future}: preparations to fix `co_return` type semantics
  > Merge 'Perftune.py: add special handling for mlx5 rss queues number calculation' from Vladislav Zolotarov
    perftune.py: NetPerfTuner: enhance RSS (a.k.a. "Rx") queues accounting for mlx5 devices
    perftune.py: update docstring of NetPerfTuner.__get_rps_cpus() method
    perftune.py: add a method that parses and models the output of the 'ethtool -l' command for a given interface
  > httpd: rewrite do_accepts/do_accept_one as coroutines
  > file: add mmap support to file
  > http: Move client code out of experimental namespace
  > file: add hugetlbfs support to file system detection
  > tests: Replace test_source_impl with util::as_input_stream
  > tests: Replace buf_source_impl with util::as_input_stream
  > Merge 'rpc_tester: expose throuput for rpc tester' from Marcin Szopa
    rpc_tester: remove unused payload size variable from job_rpc_streaming class
    rpc_tester: add start time tracking for throughput calculation, print throughput and msg/s for job_rpc
    rpc_tester: refactor result emission to use dedicated functions for messages and throughput
  > iostream: cast first argument of `std::min` to `size_t`

Closes scylladb/scylladb#29952
2026-05-20 13:47:12 +03:00

548 lines
20 KiB
C++

/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
*/
#include <functional>
#include <seastar/core/abort_source.hh>
#include <signal.h>
#include <seastar/core/future.hh>
#include <seastar/core/sleep.hh>
#include <seastar/core/thread.hh>
#include <seastar/core/app-template.hh>
#include <seastar/http/client.hh>
#include <seastar/http/request.hh>
#include <seastar/http/reply.hh>
#include <seastar/util/defer.hh>
#include <seastar/util/short_streams.hh>
#include <seastar/core/smp.hh>
#include <tuple>
#include <boost/program_options.hpp>
#include "db/config.hh"
#include "test/perf/perf.hh"
#include "test/lib/random_utils.hh"
namespace perf {
using namespace seastar;
namespace bpo = boost::program_options;
struct test_config {
std::string workload;
int port;
unsigned partitions;
bool prepopulate_partitions;
unsigned duration_in_seconds;
unsigned operations_per_shard;
unsigned concurrency;
unsigned scan_total_segments;
bool flush;
std::string remote_host;
bool continue_after_error;
std::string json_result_file;
};
std::ostream& operator<<(std::ostream& os, const test_config& cfg) {
return os << "{workload=" << cfg.workload
<< ", partitions=" << cfg.partitions
<< ", concurrency=" << cfg.concurrency
<< ", duration_in_seconds=" << cfg.duration_in_seconds
<< ", operations-per-shard=" << cfg.operations_per_shard
<< ", flush=" << cfg.flush
<< "}";
}
static http::client get_client(const test_config& c, int port = 0) {
if (port == 0) {
port = c.port;
}
return http::client(socket_address(net::inet_address(c.remote_host), port));
}
static future<> make_request(http::client& cli, sstring operation, sstring body) {
auto req = http::request::make("POST", "localhost", "/");
req._headers["X-Amz-Target:"] = "DynamoDB_20120810." + operation;
req.write_body("application/x-amz-json-1.0", std::move(body));
return cli.make_request(std::move(req), [] (const http::reply& rep, input_stream<char>&& in_) {
return do_with(std::move(in_), [] (auto& in) {
return util::skip_entire_stream(in).then([&in] () {
return in.close();
});
});
});
}
static void wait_for_alternator(const test_config& c, abort_source& as) {
for (int attempt = 0; attempt < 3000; ++attempt) {
as.check();
try {
auto cli = get_client(c);
auto close = defer([&] { cli.close().get(); });
make_request(cli, "ListTables", "{}").get();
return;
} catch (...) {
}
sleep_abortable(std::chrono::milliseconds(100), as).get();
if (attempt >= 100 && attempt % 10 == 0) {
std::cout << fmt::format("Retrying connect to alternator port (attempt {})", attempt + 1) << std::endl;
}
}
throw std::runtime_error("Timed out waiting for alternator port to become ready");
}
static void delete_alternator_table(http::client& cli) {
try {
make_request(cli, "DeleteTable", R"({"TableName": "workloads_test"})").get();
} catch(...) {
// table may exist or not
}
}
static void create_alternator_table(http::client& cli) {
delete_alternator_table(cli); // cleanup in case of leftovers
make_request(cli, "CreateTable", R"(
{
"AttributeDefinitions": [{
"AttributeName": "p",
"AttributeType": "S"
},
{
"AttributeName": "c",
"AttributeType": "S"
}
],
"TableName": "workloads_test",
"BillingMode": "PAY_PER_REQUEST",
"KeySchema": [{
"AttributeName": "p",
"KeyType": "HASH"
},
{
"AttributeName": "c",
"KeyType": "RANGE"
}
]
}
)").get();
}
static void create_alternator_table_with_gsi(http::client& cli) {
delete_alternator_table(cli); // cleanup in case of leftovers
make_request(cli, "CreateTable", R"(
{
"AttributeDefinitions": [{
"AttributeName": "p",
"AttributeType": "S"
},
{
"AttributeName": "c",
"AttributeType": "S"
},
{
"AttributeName": "C0",
"AttributeType": "S"
},
{
"AttributeName": "C1",
"AttributeType": "S"
},
{
"AttributeName": "C2",
"AttributeType": "S"
},
{
"AttributeName": "C3",
"AttributeType": "S"
},
{
"AttributeName": "C4",
"AttributeType": "S"
}
],
"TableName": "workloads_test",
"BillingMode": "PAY_PER_REQUEST",
"KeySchema": [{
"AttributeName": "p",
"KeyType": "HASH"
},
{
"AttributeName": "c",
"KeyType": "RANGE"
}
],
"GlobalSecondaryIndexes": [
{ "IndexName": "idx_C0",
"KeySchema": [
{ "AttributeName": "C0", "KeyType": "HASH" }
],
"Projection": { "ProjectionType": "ALL" }
},
{ "IndexName": "idx_C1",
"KeySchema": [
{ "AttributeName": "C1", "KeyType": "HASH" }
],
"Projection": { "ProjectionType": "ALL" }
},
{ "IndexName": "idx_C2",
"KeySchema": [
{ "AttributeName": "C2", "KeyType": "HASH" }
],
"Projection": { "ProjectionType": "ALL" }
},
{ "IndexName": "idx_C3",
"KeySchema": [
{ "AttributeName": "C3", "KeyType": "HASH" }
],
"Projection": { "ProjectionType": "ALL" }
},
{ "IndexName": "idx_C4",
"KeySchema": [
{ "AttributeName": "C4", "KeyType": "HASH" }
],
"Projection": { "ProjectionType": "ALL" }
}
]
}
)").get();
}
// Exercise various types documented here: https://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_AttributeValue.html
static constexpr auto update_item_suffix = R"(
"UpdateExpression": "set C0 = :C0, C1 = :C1, C2 = :C2, C3 = :C3, C4 = :C4, C5 = :C5, C6 = :C6, C7 = :C7, C8 = :C8, C9 = :C9",
"ExpressionAttributeValues": {{
{}
":C0": {{
"B": "dGhpcyB0ZXh0IGlzIGJhc2U2NC1lbmNvZGVk"
}},
":C1": {{
"BOOL": true
}},
":C2": {{
"BS": ["U3Vubnk=", "UmFpbnk=", "U25vd3k="]
}},
":C3": {{
"L": [ {{"S": "Cookies"}} , {{"S": "Coffee"}}, {{"N": "3.14159"}}]
}},
":C4": {{
"M": {{"Name": {{"S": "Joe"}}, "Age": {{"N": "35"}}}}
}},
":C5": {{
"N": "123.45"
}},
":C6": {{
"NS": ["42.2", "-19", "7.5", "3.14"]
}},
":C7": {{
"NULL": true
}},
":C8": {{
"S": "Hello"
}},
":C9": {{
"SS": ["Giraffe", "Hippo" ,"Zebra"]
}}
}},
"ReturnValues": "NONE"
}}
)";
static future<> update_item(const test_config& _, http::client& cli, uint64_t seq) {
auto prefix = format(R"({{
"TableName": "workloads_test",
"Key": {{
"p": {{
"S": "{}"
}},
"c": {{
"S": "{}"
}}
}},)", seq, seq);
return make_request(cli, "UpdateItem", prefix + seastar::format(update_item_suffix, ""));
}
static future<> update_item_gsi(const test_config& _, http::client& cli, uint64_t seq) {
auto prefix = format(R"({{
"TableName": "workloads_test",
"Key": {{
"p": {{
"S": "{}"
}},
"c": {{
"S": "{}"
}}
}},
"UpdateExpression": "set C0 = :C0, C1 = :C1, C2 = :C2, C3 = :C3, C4 = :C4",
"ExpressionAttributeValues": {{
":C0": {{
"S": "{}"
}},
":C1": {{
"S": "{}"
}},
":C2": {{
"S": "{}"
}},
":C3": {{
"S": "{}"
}},
":C4": {{
"S": "{}"
}}
}},
"ReturnValues": "NONE"
}})", seq, seq, seq>>1, seq>>2, seq>>3, seq>>4, seq>>5); // different values so that some gsi (mv) updates will land on different shards
return make_request(cli, "UpdateItem", prefix);
}
static future<> update_item_rmw(const test_config& _, http::client& cli, uint64_t seq) {
auto prefix = format(R"({{
"TableName": "workloads_test",
"Key": {{
"p": {{
"S": "{}"
}},
"c": {{
"S": "{}"
}}
}},)", seq, seq);
// making conditional write is one way of making sure scylla will do read before write (rmw)
// for our static data this condition is always true to simplify things
auto condition_exp = R"(
"ConditionExpression": "((NOT attribute_exists(C2)) OR size(C6) <= :val1) AND (C8 <> :val2 OR C6 IN (:val1)) ",
)";
auto condition_attribute_values = R"(
":val1": {
"N": "10"
},
":val2": {
"S": "some_value"
},
)";
return make_request(cli, "UpdateItem", prefix + condition_exp +
format(update_item_suffix, condition_attribute_values));
}
static future<> get_item(const test_config& _, http::client& cli, uint64_t seq) {
auto body = format(R"({{
"TableName": "workloads_test",
"Key": {{
"p": {{
"S": "{}"
}},
"c": {{
"S": "{}"
}}
}},
"ProjectionExpression": "C0, C1, C2, C3, C4, C5, C6, C7, C8, C9",
"ConsistentRead": false,
"ReturnConsumedCapacity": "TOTAL"
}})",seq, seq);
co_await make_request(cli, "GetItem", std::move(body));
}
static future<> scan(const test_config& c, http::client& cli, uint64_t seq) {
// This uses "parallel scan" feature, see https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Scan.html#Scan.ParallelScan
auto body = format(R"({{
"TableName": "workloads_test",
"Select": "ALL_ATTRIBUTES",
"Segment": {},
"TotalSegments": {},
"ConsistentRead": false
}})", seq % c.scan_total_segments, c.scan_total_segments);
co_await make_request(cli, "Scan", std::move(body));
}
static void flush_table(const test_config& c) {
auto cli = get_client(c, 10000);
auto req = http::request::make("POST", "localhost", "/storage_service/keyspace_flush/alternator_workloads_test");
cli.make_request(std::move(req), [] (const http::reply& rep, input_stream<char>&& in) {
return in.close();
}).get();
cli.close().get();
}
static void create_partitions(const test_config& c, http::client& cli) {
std::cout << "Creating " << c.partitions << " partitions..." << std::endl;
for (unsigned seq = 0; seq < c.partitions; ++seq) {
update_item(c, cli, seq).get();
}
if (c.flush) {
std::cout << "Flushing partitions..." << std::endl;
flush_table(c);
}
}
auto make_client_pool(const test_config& c) {
std::vector<http::client> res;
res.reserve(c.concurrency);
for (unsigned i = 0; i < c.concurrency; i++) {
res.push_back(get_client(c));
}
return res;
}
void workload_main(const test_config& c, sharded<abort_source>* as) {
std::cout << "Running test with config: " << c << std::endl;
wait_for_alternator(c, as->local());
auto cli = get_client(c);
auto finally = defer([&] {
delete_alternator_table(cli);
cli.close().get();
});
if (c.workload != "write_gsi") {
create_alternator_table(cli);
} else {
create_alternator_table_with_gsi(cli);
}
using fun_t = std::function<future<>(const test_config&, http::client&, uint64_t)>;
std::map<std::string, fun_t> workloads = {
{"read", get_item},
{"scan", scan},
{"write", update_item},
{"write_gsi", update_item_gsi},
// needs to be executed together with --alternator-write-isolation only_rmw_uses_lwt
// for realistic scenario
{"write_rmw", update_item_rmw},
};
if (c.prepopulate_partitions && (c.workload == "read" || c.workload == "scan")) {
create_partitions(c, cli);
}
auto it = workloads.find(c.workload);
if (it == workloads.end()) {
throw std::runtime_error(fmt::format("unknown workload '{}'", c.workload));
}
fun_t fun = it->second;
static thread_local std::vector<http::client> cli_pool;
smp::invoke_on_all([&c] {
cli_pool = make_client_pool(c);
}).get();
// Cleanup thread-local connections to avoid destruction issues at exit
auto close_pools = defer([] {
smp::invoke_on_all([] {
return parallel_for_each(cli_pool, [] (auto& cli) {
return cli.close();
}).then([] {
cli_pool.clear();
});
}).get();
});
auto results = time_parallel([&] {
as->local().check();
static thread_local auto cli_iter = -1;
auto seq = tests::random::get_int<uint64_t>(c.partitions - 1);
return fun(c, cli_pool[++cli_iter % c.concurrency], seq);
}, c.concurrency, c.duration_in_seconds, c.operations_per_shard, !c.continue_after_error);
aggregated_perf_results agg(results);
std::cout << agg << std::endl;
if (!c.json_result_file.empty()) {
Json::Value params;
params["workload"] = c.workload;
params["partitions"] = c.partitions;
params["concurrency"] = c.concurrency;
params["duration"] = c.duration_in_seconds;
params["operations_per_shard"] = c.operations_per_shard;
params["remote_host"] = c.remote_host;
params["flush"] = c.flush;
params["scan_total_segments"] = c.scan_total_segments;
params["cpus"] = smp::count;
perf::write_json_result(c.json_result_file, agg, params, c.workload);
}
}
// This benchmark runs the whole Scylla so it needs scylla config and
// commandline. Example usage:
// ./build/release/scylla perf-alternator --workdir /tmp/scylla-workdir --smp 1 --cpus 0 --developer-mode 1 --alternator-port 8000 --alternator-write-isolation only_rmw_uses_lwt --workload read 2> /dev/null
std::function<int(int, char**)> alternator(std::function<int(int, char**)> scylla_main, std::function<future<>(lw_shared_ptr<db::config> cfg, sharded<abort_source>& as)>* after_init_func) {
return [=](int ac, char** av) -> int {
test_config c;
bpo::options_description opts_desc;
opts_desc.add_options()
("workload", bpo::value<std::string>()->default_value(""), "which workload type to run")
("partitions", bpo::value<unsigned>()->default_value(10000), "number of partitions")
("prepopulate-partitions", bpo::value<bool>()->default_value(true), "relevant for read workloads, can be disabled when data is prepopulated externally")
("duration", bpo::value<unsigned>()->default_value(5), "test duration in seconds")
("operations-per-shard", bpo::value<unsigned>()->default_value(0), "run this many operations per shard (overrides duration)")
("concurrency", bpo::value<unsigned>()->default_value(100), "workers per core")
("flush", bpo::value<bool>()->default_value(true), "flush memtables before test")
("remote-host", bpo::value<std::string>()->default_value(""), "address of remote alternator service, use localhost by default")
("remote-port", bpo::value<unsigned>()->default_value(8000), "address of remote alternator port")
("scan-total-segments", bpo::value<unsigned>()->default_value(10), "single scan operation will retrieve 1/scan-total-segments portion of a table")
("continue-after-error", bpo::value<bool>()->default_value(false), "continue test after failed request")
("json-result", bpo::value<std::string>()->default_value(""), "file to write json results to")
;
bpo::variables_map opts;
bpo::store(bpo::command_line_parser(ac, av).options(opts_desc).allow_unregistered().run(), opts);
c.workload = opts["workload"].as<std::string>();
c.partitions = opts["partitions"].as<unsigned>();
c.prepopulate_partitions = opts["prepopulate-partitions"].as<bool>();
c.duration_in_seconds = opts["duration"].as<unsigned>();
c.operations_per_shard = opts["operations-per-shard"].as<unsigned>();
c.concurrency = opts["concurrency"].as<unsigned>();
c.flush = opts["flush"].as<bool>();
c.remote_host = opts["remote-host"].as<std::string>();
c.scan_total_segments = opts["scan-total-segments"].as<unsigned>();
c.continue_after_error = opts["continue-after-error"].as<bool>();
c.json_result_file = opts["json-result"].as<std::string>();
if (c.scan_total_segments < 1 || c.scan_total_segments > 1'000'000) {
throw std::invalid_argument("scan-total-segments must be between 1 and 1'000'000");
}
// Remove test options to not disturb scylla main app
for (auto& opt : opts_desc.options()) {
auto name = opt->canonical_display_name(bpo::command_line_style::allow_long);
std::tie(ac, av) = cut_arg(ac, av, name);
}
if (c.workload.empty()) {
std::cerr << "Missing --workload command-line value!" << std::endl;
return 1;
}
if (!c.remote_host.empty()) {
c.port = opts["remote-port"].as<unsigned>();
app_template app;
return app.run(ac, av, [c = std::move(c)] () -> future<> {
return run_standalone([c = std::move(c)] (sharded<abort_source>* as) {
workload_main(c, as);
});
});
}
*after_init_func = [c = std::move(c)] (lw_shared_ptr<db::config> cfg, sharded<abort_source>& as) mutable {
c.port = cfg->alternator_port();
c.remote_host = cfg->api_address();
return seastar::async([c = std::move(c), &as] {
try {
workload_main(c, &as);
} catch(...) {
std::cerr << "Test failed: " << std::current_exception() << std::endl;
raise(SIGKILL); // request abnormal shutdown
}
raise(SIGINT); // request shutdown
});
};
return scylla_main(ac, av);
};
}
} // namespace perf