Check abort_source on each retry iteration in
wait_for_alternator and wait_for_cql so the
wait can be interrupted on shutdown.
Didn't use sleep_abortable as the sleep is very short
anyway.
This patch is mostly for the purpose of running pgo CI job.
We may receive connection error if asyncio.sleep(5) in
pgo.py is not sufficient waiting time.
In pgo.py we do wait for port but only for cql,
anyway it's better to have high level check than
trying to wait for alternator port there.
The patchset fixes abort_source implementation for perf-alternator and perf-cql-raw. It moves
run_standalone function to common code in perf.hh with necessary templating.
We also add extensive testing so that it's more difficult to break the tooling in the future.
Fixes SCYLLADB-560
Backport: no, internal tooling improvement
Closesscylladb/scylladb#28541
* github.com:scylladb/scylladb:
test: cluster: add tests for perf tools
test: perf: fix port race condition on startup in connect workload
test: perf: prepare benchmarks to bind to custom host
test: perf: make perf-alterantor remote port configurable
test: perf: fix ASAN leak warnings in perf-alternator
Reapply "main: test: add future and abort_source to after_init_func"
Correct the upload operation logic. The previous flow incorrectly
checked for the test file on S3 even when performing operations that do
not download the file, such as uploads.
Other workloads at startup call prepopulate() which connects
with retry loop therefore it waits until cql port is open.
This commit adds a single place where we will wait for port
for all workloads.
Timeout is set to 5 minutes so that even slowest machines
are able to start.
This patch changes the load balancing simulator so that it computes
table load based on tablet sizes instead of tablet count.
best_shard_overcommit measured minimal allowed overcommit in cases
where the number of tablets can not be evenly distributed across
all the available shards. This is still the case, but instead of
computing it as an integer div_ceil() of the average shard load,
it is now computed by allocating the tablet sizes using the
largest-tablet-first method. From these, we can get the lowest
overcommit for the given set of nodes, shards and tablet sizes.
This change adds a random tablet size generator. The tablet sizes are
created in load_stats.
Further changes to the load balance simulator:
- apply_plan() updates the load_stats after a migration plan is issued by the
load balancer,
- adds the option to set a command line option which controls the tablet size
deviation factor.
With size based load balancing, we will have to move the tablet size in
load_stats after each internode migration issued by balance_tablets().
This will be done in a subsequent commit in apply_plan() which is
called from rebalance_tablets().
Currently, rebalance_tablets() is passed a load_stats_ptr which is
defined as:
using load_stats_ptr = lw_shared_ptr<const load_stats>;
Because this is a pointer to const, apply_plan() can't modify it.
So, we pass a reference to load_stats to rebalance_tablets() and create
a load_stats_ptr from it for each call to balance_tablets().
This reverts commit 7bf7ff785a. The commit
tried to add clean shutdown to `scylla perf` paths, but forgot at least
`scylla perf-alternator --workload wr` which now crashes on uninitialized
`c.as`.
Fixes#28473Closesscylladb/scylladb#28478
Adds --json-result option to perf-cql-raw and perf-alternator, the same as perf-simple-query has.
It is useful for automating test runs.
Related: https://scylladb.atlassian.net/browse/SCYLLADB-434
Bacport: no, original benchmark is not backported
Closesscylladb/scylladb#28451
* github.com:scylladb/scylladb:
test: perf: add example commands to perf-alternator and perf-cql-raw
test: perf: add option to write results to json in perf-cql-raw
test: perf: add option to write results to json in perf-alternator
test: perf: move write_json_result to a common file
When one request is super slow and req/s high
in theory we have a collision on id, this patch
avoids that by reusing id and aborting when there
is no free one (unlikely).
This commit avoids leaking seastar::async future from two benchmark
tools: perf-alternator and perf-cql-raw. Additionally it adds
abort_source for fast and clean shutdown.
Pass a pointer to service::topology and db::system_keyspace to load
balancer. It will be used in the following patches to create
rack_list_colocation plan.
Fills memtable with rows and a tombstone which deletes all rows which
are already in cache.
Similar to raft log workload, but more extreme.
With -c1 -m4G, observed really bad performance:
update: 1711.976196 [ms], preemption: {count: 22603, 99%: 0.943127 [ms], max: 1494.571776 [ms]}, cache: 2148/2906 [MB], alloc/comp: 1334/869 [MB] (amp: 0.651), pr/me/dr 1062186/0/1062187
cache: 2148/2906 [MB], memtable: 738/1024 [MB], alloc/comp: 993/0 [MB] (amp: 0.000)
Which means that max reactor stall during cache update was 1.5 [s]
0.7 GB memtables. 2.1 GB in cache.
This patch adds a metric for pre-compression size of sstable files.
This patch adds a per-table metric
`scylla_column_family_total_disk_space_before_compression`,
which measures the hypothetical total size of sstables on disk,
if Data.db was replaced with an uncompressed equivalent.
As for the implementation:
Before the patch, tables and sstable sets are already tracking their total physical file size.
Whenever sstables are added or removed, the size delta is propagated from the sstable up through sstable sets into table_stats.
To implement the new metric, we turn the size delta that is getting passed around from a one-dimensional to a two-dimensional value, which includes both the physical and the pre-compression size.
New functionality, no backport needed.
Closesscylladb/scylladb#26996
* github.com:scylladb/scylladb:
replica/table: add a metric for hypothetical total file size without compression
replica/table: keep track of total pre-compression file size
Every table and sstable set keeps track of the total file size
of contained sstables.
Due to a feature request, we also want to keep track of the hypothetical
file size if Data files were uncompressed, to add a metric that
shows the compression ratio of sstables.
We achieve this by replacing the relevant `uint_64 bytes_on_disk`
counters everywhere with a struct that contains both the actual
(post-compression) size and the hypothetical pre-compression size.
This patch isn't supposed to change any observable behavior.
In the next patch, we will use these changes to add a new metric.
There is no need to have `create_audit` separate from `start_audit`.
`create_audit` just stores the passed parameters, while `start_audit`
does the actual initialization and startup work.
Refs #26022
So tombstones can be purged correctly based on the tombstone gc mode.
Currently if repair-mode is used, tombstones are not purged at all,
which can lead to purged tombstone being re-replicated to replicas which
already purged them via read-repair.
This is not a correctness problem, tombstones are not included in data
query resutl or digest, these purgable tombstone are only a nuissance
for read repair, where they can create extra differences between
replicas. Note that for the read repair to trigger, some difference
other than in purgable tombstones has to exist, because as mentioned
above, these are not included in digets.
Fixes: scylladb/scylladb#24332Closesscylladb/scylladb#26351
Before this change, new connections were handled in a default
scheduling group (`main`), because before the user is authenticated
we do not know which service level should be used. With the new
`sl:driver` service level, creation of new connections can be moved to
`sl:driver`.
We switch the service level as early as possible, in `do_accepts`.
There is a possibility, that `sl:driver` will not exist yet, for
instance, in specific upgrade cases, or if it was removed. Therefore,
we also switch to `sl:driver` after a connection is accepted.
Refs: scylladb/scylladb#24411
The querier object is a confusing one. Based on its name it should be in the query/ module and it is already in the query namespace. The query namespace is used for symbols which span the coordinator and replica, or that are mostly coordinator side. The querier is mainly in this namespace due to its similar name and because at the time it was introduced, namespace replica didn't exist yet. But this is a mistake which confuses people.
The querier is actually a completely replica-side logic, implementing the caching of the readers on the replica. Move it to the replica module and namespace to make this more clear.
Code cleanup, no backport.
Closesscylladb/scylladb#26280
* github.com:scylladb/scylladb:
replica: move querier code to replica namespace
root,replica: mv querier to replica/
Applying lazy evaluation to the BTI encoding of clustering keys
was probably a bad default.
The possible benefits are dubious (because it's quite likely that the laziness
won't allow us to avoid that much work), but the overhead needed to
implement the laziness is large and immediate.
In this patch we get rid of the laziness.
We rewrite lazy_comparable_bytes_from_clustering_position
and lazy_comparable_bytes_from_ring_position
so that they performs the key translation eagerly,
all components to a single bytes_ostream in one synchronous call.
perf_bti_key_translation (microbenchmark added in this series, 1 iteration is 100 translations of a clustering key with 8 cells of int32_type):
```
Before:
test iterations median mad min max allocs tasks inst cycles
lcb_mismatch_test.lcb_mismatch 9233 109.930us 0.000ns 109.930us 109.930us 4356.000 0.000 2615394.3 614709.6
After:
test iterations median mad min max allocs tasks inst cycles
lcb_mismatch_test.lcb_mismatch 50952 19.487us 0.000ns 19.487us 19.487us 198.000 0.000 603120.1 109042.9
```
Enhancement, backport not required.
Closesscylladb/scylladb#26302
* github.com:scylladb/scylladb:
sstables/trie: BTI-translate the entire partition key at once
sstables/trie: avoid an unnecessary allocation of std::generator in last_block_offset()
sstables/trie: perform the BTI-encoding of position_in_partition eagerly
types/comparable_bytes: add comparable_bytes_from_compound
test/perf: add perf_bti_key_translation
Split the tablets mutations by number of rows, based on
`min_tablets_in_mutation` (currently calibrated to 1024),
similar to the splitting done in
`storage_service::merge_topology_snapshot`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
tablets-per-table must be a power of 2, so round up 10000 to 16K.
also, reduce number of tables to have a total of about 100K
tablets, otherwise we hit the maximum commitlog mutation size
limit in save_tablet_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Moved files:
- generic_server.hh
- generic_server.cc
- protocol_server.hh
Fixes: #22112
This is a cleanup, no need to backport
Closesscylladb/scylladb#25090
The query namespace is used for symbols which span the coordinator and
replica, or that are mostly coordinator side. The querier is mainly in
this namespace due to its similar name, but this is a mistake which
confuses people. Now that the code was moved to replica/, also fix the
namespace to be namespace replica.
The querier object is a confusing one. Based on its name it should be in
the query/ module and it is already in the query namespace. But this is
actually a completely replica-side logic, implementing the caching of
the readers on the replica. Move it to the replica module to make this
more clear.
Some files in compaction/ have using namespace {compaction,sstables}
clauses, some even in headers. This is considered bad practice and
muddies the namespace use. Remove them.
The namespace usage in this directory is very inconsistent, with files
and classes scattered in:
* global namespace
* namespace compaction
* namespace sstables
With cases, where all three used in the same file. This code used to
live in sstables/ and some of it still retains namespace sstables as a
heritage of that time. The mismatch between the dir (future module) and
the namespace used is confusing, so finish the migration and move all
code in compaction/ to namespace compaction too.
This patch, although large, is mechanic and only the following kind of
changes are made:
* replace namespace sstable {} with namespace compaction {}
* add namespace compaction {}
* drop/add sstables::
* drop/add compaction::
* move around forward-declarations so they are in the correct namespace
context
This refactoring revealed some awkward leftover coupling between
sstables and compaction, in sstables/sstable_set.cc, where the
make_sstable_set() methods of compaction strategies are implemented.
Simulates reblancing on a single scale-out involving simultaneous
addition of multiple nodes per rack.
Default parameters create a cluster with 2 racks, 70 tables, 256
tablets/table, 10 nodes, 88 shards/node.
Adds 6 nodes in parallel (3 per rack).
Current result on my laptop:
testlog - Rebalance took 21.874 [s] after 82 iteration(s)