The idea behind readers/ is that each reader has its minimal header with
just a factory method declaration. The delegating reader is defined in
the factory header because it has a derived class in row_cache_test.cc.
Move the definition to delegating_impl.hh so users not interested in
deriving from it don't pay the price in header include cost.
A recent commit 370707b111 (re)introduced
a timeout for every group0 Raft operation. This timeout was set to 60
seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody".
However, one of the things we do as a group0 operation is schema
changes, and we already noticed a few years ago, see commit
0b2cf21932, that in some extremely
overloaded test machines where tests run hundreds of times (!) slower
than usual, a single big schema operation - such as Alternator's
DeleteTable deleting a table and multiple of its CDC or view tables -
sometimes takes more than 60 seconds. The above fix changed the
client's timeout to wait for 300 seconds instead of 60 seconds,
but now we also need to increase our Raft timeout, or the server can
time out. We've seen this happening recently making some tests flaky
in CI (issue #23543).
So let's make this timeout configurable, as a new configuration option
group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e,
60 seconds), the same as the existing default. The test framework
overrides this default with a a higher 300 second timeout, matching
the client-side timeout.
Before this patch, this timeout was already configurable in a strange
way, using injections. But this was a misstep: We already have more
than a dozen timeouts configurable through the normal configration,
and this one should have been configured in the same way. There is
nothing "holy" about the default of 60 seconds we chose, and who
knows maybe in the future we might need to tweek it in the field,
just like we made the other timeouts tweakable. Injections cannot
be used in release mode, but configuration options can.
Fixes#23543
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#23717
Name the gates and phased barriers we use
to make it easy to debug gate_closed_exception
Refs https://github.com/scylladb/seastar/pull/2688
* Enhancement only, no backport needed
Closesscylladb/scylladb#23329
* github.com:scylladb/scylladb:
utils: loading_cache: use named_gate
utils: flush_queue: use named_gate
sstables_manager: use named gate
sstables_loader: use named gate
utils: phased_barrier, pluggable: use named gate
utils: s3::client::multipart_upload: use named gate
utils: s3::client: use named_gate
transport: controller: use named gate
tracing: trace_keyspace_helper: use named gate
task_manager: module: use named gate
topology_coordinator: use named gate
storage_service: use named gate
storage_proxy: wait_for_hint_sync_point: use named gate
storage_proxy: remote: use named gate
service: session: use named gate
service: raft: raft_rpc: use named gate
service: raft: raft_group0: use named gate
service: raft: persistent_discovery: use named gate
service: raft: group0_state_machine: use named gate
service: migration_manager: use named gate
replica: table: use named gate
replica: compaction_group, storage_group: use named gate
redis: query_processor: use named gate
repair: repair_meta: use named gate
reader_concurrency_semaphore: use named gate
raft: server_impl: use named gate
querier_cache: use named gate
gms: gossiper: use named gate
generic_server: use named gate
db: sstables_format_listener: use named gate
db: snapshot: backup_task: use named gate
db: snapshot_ctl: use named gate
hints: hints_sender: use named gate
hints: manager: use named gate
hints: hint_endpoint_manager: use named gate
commitlog: segment_manager: use named gate
db: batchlog_manager: use named gate
query_processor: remote: use named gate
compaction: compaction_state: use named gate
alternator/server: use named_gate
The latter class is invented to let tests access private fields of an
sstable (mostly methods). The former is in fact an extended version of
that also does some checks. Howerver, they don't inherit from each
other, and the sstable_assertions partially duplicates some funtionality
of the test one.
Add the inheritance, remove the duplicated methods from the child class,
update the callers (the test class returns future<>s, the assertions one
"knows" it runs in seastar thread) and marm sstable::read_toc() private.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23697
Currently if raft is enabled all nodes are voters in group0. However it is not necessary to have all nodes to be voters - it only slows down the raft group operation (since the quorum is large) and makes deployments with asymmetrical DCs problematic (2 DCs with 5 nodes along 1 DC with 10 nodes will lose the majority if large DC is isolated).
The topology coordinator will now maintain a state where there are only limited number of voters, evenly distributed across the DCs and racks.
After each node addition or removal the voters are recalculated and rebalanced if necessary. That means:
* When a new node is added, it might become a voter depending on the current distribution of voters - either if there are still some voter "slots" available, or if the new node is a better candidate than some existing voter (in which case the existing node voter status might be revoked).
* When a voter node is removed or stopped (shut down), its voter status is revoked and another node might become a voter instead (this can also depend on other circumstances, like e.g. changing the number of DCs).
* If a node addition or removal causes a change in number of data centers (DCs) or racks, the rebalance action might become wider (as there are some special rules applying to 1 vs 2 vs more DCs, also changing the number of racks might cause similar effects in the voters distribution)
Special conditions for various number of DCs:
* 1 DC: Can have up to the maximum allowed number of voters (5 - see below)
* 2 DCs: The distribution of the voters will be asymmetric (if possible), meaning that we can tolerate a loss of the DC with the smaller number of voters (if both would have the same number of voters we'd lose majority if any of the DCs is lost). For example, if we have 2 DCs with 2 nodes each, one of them will only have 1 voter (despite the limit of 5). Also, if one of the 2 DCs has more racks than the other and the node count allows it, the DC with the more racks will have more voters.
* 3 and more DCs: The distribution of the voters will be so that every DC has strictly less than half of the total voters (so a loss of any of the DCs cannot lead to the majority loss). Again, DCs with more racks are being preferred in the voter distribution.
At the moment we will be handling the zero-token nodes in the same way as the regular nodes (i.e. the zero-token nodes will not take any priority in the voter distribution). Technically it doesn't make much sense to have a zero-token node that is not a voter (when there are regular nodes in the same DC being voters), but currently the intended purpose of zero-token nodes is to form an "arbiter DC" (in case of 2 DCs, creating a third DC with zero-token nodes only), so for that intended purpose no special handling is needed and will work out of the box. If a preference of zero token nodes will eventually be needed/requested, it will be added separately from this PR.
The maximum number of voters of 5 has been chosen as the smallest "safe" value. We can lose majority when multiple nodes (possibly in different dcs and racks) die independently in a short time span. With less than 5 voters, we would lose majority if 2 voters died, which is very unlikely to happen but not entirely impossible. With 5 voters, at least 3 voters must die to lose majority, which can be safely considered impossible in the case of independent failures.
Currently the limit will not be configurable (we might introduce configurable limits later if that would be needed/requested).
Tests added:
* boost/group0_voter_registry_test.cc: run time on CI: ~3.5s
* topology_custom/test_raft_voters.py: parametrized with 1 or 3 nodes per DC, the run time on CI: 1: ~20s. 3: ~40s, approx 1 min total
Fixes: scylladb/scylladb#18793
No backport: This is a new feature that will not be backported.
Closesscylladb/scylladb#21969
* https://github.com/scylladb/scylladb:
raft: distribute voters by rack inside DC
raft/test: fix lint warnings in `test_raft_no_quorum`
raft/test: add the upgrade test for limited voters feature
raft topology: handle on_up/on_down to add/remove node from voters
raft: fix the indentation after the limited voters changes
raft: implement the limited voters feature
raft: drop the voter removal from the decommission
raft/test: disable the `stop_before_becoming_raft_voter` test
raft/test: stop the server less gracefully in the voters test
After load-balancer was made capacity-aware it no longer equalizes tablet count per shard, but rather utilization of shard's storage. This makes the old presentation mode not useful in assessing whether balance was reached, since nodes with less capacity will get fewer tablets when in balanced state. This PR adds a new default presentation mode which scales tablet size by its storage utilization so that tablets which have equal shard utilization take equal space on the graph.
To facilitate that, a new virtual table was added: system.load_per_node, which allows the tool to learn about load balancer's view on per-node capacity. It can also serve as a debugging interface to get a view of current balance according to the load-balancer.
Closesscylladb/scylladb#23584
* github.com:scylladb/scylladb:
tablet-mon.py: Add presentation mode which scales tablet size by its storage utilization
tablet-mon.py: Center tablet id text properly in the vertical axis
tablet-mon.py: Show migration stage tag in table mode only when migrating
virtual-tables: Introduce system.load_per_node
virtual_tables: memtable_filling_virtual_table: Propagate permit to execute()
docs: virtual-tables: Fix instructions
service: tablets: Keep load_stats inside tablet_allocator
This adaptor adapts a mutation reader pausable consumer to the frozen
mutation visitor interface. The pausable consumer protocol allows the
consumer to skip the remaining parts of the partition and resume the
consumption with the next one. To do this, the consumer just has to
return stop_iteration::yes from one of the consume() overloads for
clustering elements, then return stop_iteration::no from
consume_end_of_partition(). Due to a bug in the adaptor, this sequence
leads to terminating the consumption completely -- so any remaining
partitions are also skipped.
This protocol implementation bug has user-visible effects, when the
only user of the adaptor -- read repair -- happens during a query which
has limitations on the amount of content in each partition.
There are two such queries: select distinct ... and select ... with
partition limit. When converting the repaired mutation to to query
result, these queries will trigger the skip sequence in the consumer and
due to the above described bug, will skip the remaining partitions in
the results, omitting these from the final query result.
This patch fixes the protocol bug, the return value of the underlying
consumer's consume_end_of_partition() is now respected.
A unit test is also added which reproduces the problem both with select
distinct ... and select ... per partition limit.
Follow-up work:
* frozen_mutation_consumer_adaptor::on_end_of_partition() calls the
underlying consumer's on_end_of_stream(), so when consuming multiple
frozen mutations, the underlying's on_end_of_stream() is called for
each partition. This is incorrect but benign.
* Improve documentation of mutation_reader::consume_pausable().
Fixes: #20084Closesscylladb/scylladb#23657
Adds new live updatable config: uninitialized_connections_semaphore_cpu_concurrency.
It should help to reduce cpu usage by limiting cpu concurrency for new connections. As a last resort when those connections are waiting for initial processing too long (over 1m) they are shed.
New connections_shed and connections_blocked metrics are added for tracking.
Testing:
- manually via simple program creating high number of connection and constantly re-connecting
- added benchmark
Following are benchmark results:
Before:
```
> build/release/test/perf/perf_generic_server --smp=1
170101.41 tps ( 13.1 allocs/op, 0.0 logallocs/op, 7.0 tasks/op, 4695 insns/op, 3178 cycles/op, 0 errors)
[...]
throughput: mean=173850.06 standard-deviation=1844.48 median=174509.66 median-absolute-deviation=874.23 maximum=175087.49 minimum=170588.54
instructions_per_op: mean=4725.59 standard-deviation=13.35 median=4729.38 median-absolute-deviation=12.49 maximum=4738.61 minimum=4709.96
cpu_cycles_per_op: mean=3135.08 standard-deviation=32.13 median=3122.68 median-absolute-deviation=22.29 maximum=3179.38 minimum=3103.15
```
After:
```
> build/release/test/perf/perf_generic_server --smp=1
167373.19 tps ( 13.1 allocs/op, 0.0 logallocs/op, 7.0 tasks/op, 4821 insns/op, 3371 cycles/op, 0 errors)
[...]
throughput:
mean= 171199.55 standard-deviation=2484.58
median= 171667.06 median-absolute-deviation=2087.63
maximum=173689.11 minimum=167904.76
instructions_per_op:
mean= 4801.90 standard-deviation=16.54
median= 4796.78 median-absolute-deviation=9.32
maximum=4830.71 minimum=4789.81
cpu_cycles_per_op:
mean= 3245.26 standard-deviation=32.28
median= 3230.44 median-absolute-deviation=16.52
maximum=3297.39 minimum=3215.62
```
The patch adds around 67 insns/op so it's effect on performance should be negligible.
Fixes: https://github.com/scylladb/scylladb/issues/22844Closesscylladb/scylladb#22828
* github.com:scylladb/scylladb:
transport: move on_connection_close into connection destructor
test: perf: make aggregated_perf_results formatting more human readable
transport: add blocked and shed connection metrics
generic_server: throttle and shed incoming connections according to semaphore limit
generic_server: add data source and sink wrappers bookkeeping network IO
generic_server: coroutinize part of server::do_accepts
test: add benchmark for generic_server
test: perf: add option to count multiple ops per time_parallel iteration
generic_server: add semaphore for limiting new connections concurrency
generic_server: add config to the constructor
generic_server: add on_connection_ready handler
Similar to test/cluster/test_data_resurrection_in_memtable.py but works
on a single node and uses more low-level mechanism. These tests can also
reproduce more advanced scenarios, like concurrent reads, with some
reading from flushed memtables.
The cache should not garbage-collect tombstone which cover data in the
memtable. Add overlap checks (get_max_purgeable) to garbage collection
to detect tombstones which cover data in the memtable and to prevent
their garbage collection.
Distribute the voters evenly across racks in the datacenters.
When distributing the voters across datacenters, the datacenters with
more racks will be preferred in case of a tie. Also, in case of
asymmetric voter distribution (2 DCs), the DC with more racks will have
more voters (if the node counts allow it).
In case of a single datacenter, the voters will be distributed across
racks evenly (in the similar manner as done for the whole datacenters).
The intention is that similar to losing a datacenter, we want to avoid
losing the majority if a rack goes down - so if there are multiple racks,
we want to distribute the voters across them in such a way that losing
the whole rack will not cause the majority loss (if possible).
Currently if raft is enabled all nodes are voters in group0. However it
is not necessary to have all nodes to be voters - it only slows down
the raft group operation (since the quorum is large) and makes
deployments with asymmetrical DCs problematic (2 DCs with 5 nodes along
1 DC with 10 nodes will lose the majority if large DC is isolated).
The topology coordinator will now maintain a state where there are only
limited number of voters, evenly distributed across the DCs and racks.
After each node addition or removal the voters are recalculated and
rebalanced if necessary. That means:
* When a new node is added, it might become a voter depending on the
current distribution of voters - either if there are still some voter
"slots" available, or if the new node is a better candidate than some
existing voter (in which case the existing node voter status might be
revoked).
* When a voter node is removed or stopped (shut down), its voter status
is revoked and another node might become a voter instead (this can also
depend on other circumstances, like e.g. changing the number of DCs).
* If a node addition or removal causes a change in number of datacenters
(DCs) or racks, the rebalance action might become wider (as there are
some special rules applying to 1 vs 2 vs more DCs, also changing the
number of racks might cause similar effects in the voters distribution)
Special conditions for various number of DCs:
* 1 DC: Can have up to the maximum allowed number of voters (5 - see below)
* 2 DCs: The distribution of the voters will be asymmetric (if possible),
meaning that we can tolerate a loss of the DC with the smaller number
of voters (if both would have the same number of voters we'd lose the
majority if any of the DCs is lost).
For example, if we have 2 DCs with 2 nodes each, one of them will only
have 1 voter (despite the limit of 5). Also, if one of the 2 DCs has
more racks than the other and the node count allows it, the DC with
the more racks will have more voters.
* 3 and more DCs: The distribution of the voters will be so that every
DC has strictly less than half of the total voters (so a loss of any
of the DCs cannot lead to the majority loss). Again, DCs with more
racks are being preferred in the voter distribution.
At the moment we will be handling the zero-token nodes in the same way
as the regular nodes (i.e. the zero-token nodes will not take any
priority in the voter distribution). Technically it doesn't make much
sense to have a zero-token node that is not a voter (when there are
regular nodes in the same DC being voters), but currently the intended
purpose of zero-token nodes is to form an "arbiter DC" (in case of 2 DCs,
creating a third DC with zero-token nodes only), so for that intended
purpose no special handling is needed and will work out of the box.
If a preference of zero token nodes will eventually be needed/requested,
it will be added separately from this PR.
Currently the voter limits will not be configurable (we might introduce
configurable limits later if that would be needed/requested).
The feature is enabled by the `group0_limited_voters` feature flag
to avoid issues with cluster upgrade (the feature will be only enabled
once all nodes in the cluster are upgraded to the version supporting
the feature).
Fixes: scylladb/scylladb#18793
There are two snapshot-on-all-shards methods on the database -- the one
that snapshots a keyspace and the one that snapshots a vector of tables.
The latter snapshots a single table with a neat helper, while the former
has the helper open-coded.
Re-using the helper in keyspace snapshot is worth it, but needs to patch
the helper to work on uuid, rather than ks:cf pair of strings.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23532
Before, it was equalizing per-node load (tablet count), which is wrong
in heterogeneous clusters. Nodes with fewer shards will end up with
overloaded shards.
Refs #23378Closesscylladb/scylladb#23478
* github.com:scylladb/scylladb:
tablets: Make tablet allocation equalize per-shard load
tablets: load_balancer: Fix reporting of total load per node
This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing
`enable_tablets` option. It can be set to the following values:
disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option
enabled: New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option
enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option
`tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether
tablets are disabled or enabled by default for new keyspaces, respectively.
In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}`
keyspace option, when the keyspace is created.
`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces,
like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`
Refs scylladb/scylla-enterprise#4355
* Requires backport to 2025.1
Closesscylladb/scylladb#22273
* github.com:scylladb/scylladb:
boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy
tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option
db/config: add tablets_mode_for_new_keyspaces option
Following the recent refactoring of removing "flat" and "v2" from reader
names, replacing all the fully qualified names with simply "mutation_reader".
Closesscylladb/scylladb#23346
Move `object_storage.yaml` endpoints to `scylla.yaml`
This change also removes the `object_storage.yaml` file
altogether and adds tests for fetching the endpoints
via the `v2/config/object_storage_endpoints` REST api.
Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed.
This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f
Refs https://github.com/scylladb/scylladb/issues/22428Closesscylladb/scylladb#22952
* github.com:scylladb/scylladb:
Remove db::config::object_storage_config
Move `object_storage.yaml` endpoints to `scylla.yaml`
This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes:
- We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards.
- We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options).
- We add a cluster feature which guards the creation of dictionary-compressed SSTables.
- We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards.
- We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries.
- We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that).
- We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio.
- We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement.
Known imperfections:
- The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first.
New feature, no backporting involved.
Closesscylladb/scylladb#23025
* github.com:scylladb/scylladb:
docs: add user-facing documentation for SSTable compression with shared dicts
docs/dev: add sstable-compression-dicts.md
test: add test_sstable_compression_dictionaries_autotrain.py
test: add test_sstable_compression_dictionaries_basic.py
test/pylib/rest_client: add `keyspace_upgrade_sstables` helper
main: run a sstable_dict_autotrainer
api: add the estimate_compression_ratios API call
dict_autotrainer: introduce sstable_dict_autotrainer
db/system_keyspace: add query_dict_timestamp
compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor
main: clean up sstable compression dicts after table drops
sstables/compress: discard hidden compression options after the decompressor is created
compress: change compressor_ptr from shared_ptr to unique_ptr
api: add the retrain_dict API call
storage_service: add some dict-related routines
main: in compression_dict_updated_callback, recognize and use SSTable compression dicts
storage_service: add do_sample_sstables()
messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs
db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict
raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback
database: add sample_data_files()
database: add take_sstable_set_snapshot()
compress: teach `lz4_processor` about dictionaries
compress: teach `zstd_processor` about dictionaries
sstables: delegate compressor creation to the compressor factory
sstables: plug an `sstable_compressor_factory` into `sstables_manager`
sstables: introduce sstable_compressor_factory
utils/hashers: add get_sha256()
gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature
compress: add hidden dictionary options
compress: remove `compression_parameters::get_compressor()`
sstables/compress: remove get_sstable_compressor()
sstables/compress: move ownership of `compressor` to `sstable::compression`
compress: remove compressor::option_names()
compress: clean up the constructor of zstd_processor
compress: squash zstd.cc into compress.cc
sstables/compress: break the dependency of `compression_parameters` on `compressor`
compress.hh: switch compressor::name() from an instance member to a virtual call
bytes: adapt fmt_hex to std::span<const std::byte>
This method is only used by the loader code (and tests). Also, There's the
highest_version_seen() peer that sits in the loader code either.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23324
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
Following up on the previous commits, we avoid constructing
compressors where not necessary,
by checking things directly on `compression_parameters` instead.
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.
`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.
But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.
We stuff the ownership of this compressor into `sstable::compression`.
To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
Note: this commit is meant to be a code refactoring only and is not intended
to change the observable behaviour.
Today `schema` contains a `compression_parameters`.
`compression_parameters` contains an instance of
`compressor`, and SSTable writers just share that instance.
This is fine because `compressor` is a stateless object,
functionally dependent on the schema.
But in later parts of the series, we will break this functional
dependency by adding dictionaries to compressors. Two writers
for the same schema might have different dictionaries, so they won't
be able to just share a single instance contained in the schema.
And when that happens, having a `compressor` instance
in the `schema`/`compression_parameters` will become awkward,
since it won't be actually used. It will be only a container for options.
In addition, for performance reasons, we will want to share some pieces
of compressors across shards, which will require -- in the general case --
a construction of a compressor to be asynchronous, and therefore not
possible inside the constructor of `compression_parameters`.
This commit modifies `compression_parameters` so that it doesn't hold or
construct instances of `compressor`.
Before this patch, the `compressor` instance constructed in
`compression_parameters` has an additional role of validating and
holding compressor-specific options.
(Today the only such option is the zstd compression level).
This means that the pieces of logic responsible for compressor-specific
options have to be rewritten. That ends up being the bulk of this commit.
That map became redundant once we added
object_storage_endpoints in the config, this patch removes
it and switches all the user code to use the new option.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Before, it was equalizing per-node load (tablet count), which is wrong
in heterogenous clusters. Nodes with fewer shards will end up with
overloaded shards.
Refs #23378
This helper facilitate snapshot creation by various test cases in database_test.cc. This PR generalizes all overloads into one that suits all callers and patches one more test case to use it as well.
Closesscylladb/scylladb#23482
* github.com:scylladb/scylladb:
test/database: Re-use take_snapshot() helper once more
test/database: Remove most of take_snapshot() helper overloads
Rather than lowres_clock, as since
32b7cab917,
loading_cache_for_test uses manual_clock for timing
and relying on lowres_clock to time the test might
run out of memory on fast test machines.
Fixes#23497
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#23498
There's a test case that can call the recently patched take_snapshot()
helper as well. This changes nothing, but makes further patching a bit
simpler (not in this branch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are 3 of those that help tests (re)shuffle cql_test_env/database,
skip_flush == true/false options and keyspace/table/snapshot names.
There's little sense in having that many of those, just one overload
with default arguments suits most of the callers.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add path constants to `test` module and use them in different test suites
instead of own dups of the same code:
- TOP_SRC_DIR : ScyllaDB's source code root directory
- TEST_DIR : the directory with test.py tests and libs
- BUILD_DIR : directory with ScyllaDB's build artefacts
Add TestSuite.log_dir attribute as a ScyllaDB's build mode subdir of a path
provided using `--tmpdir` CLI argument. Don't use `tmpdir` name because it
mixed up with pytest's built-in fixture and `--tmpdir` option itself.
Change default value for `--tmdir` from `./testlog` to `TOP_SRC_DIR/testlog`
Refactor `ResourceGather*` classes to use path from a `test` object instead of
providing it separately.
Move modes constants to `test` module and remove duplications.
Move `prepare_dirs()` and `start_3rd_party_services()` from `pylib.util` to
`pylib.suite.base` to avoid circular imports (with little refactoring to
use `pathlib.Path` instead of `str` as paths.)
Also, in some places refactor to use f-strings for formatting.
Normally, when a node is shutting down, `gate_closed_exception` and `rpc::closed_error`
in `send_to_live_endpoints` should be ignored. However, if these exceptions are wrapped
in a `nested_exception`, an error message is printed, causing tests to fail.
This commit adds handling for nested exceptions in this case to prevent unnecessary
error messages.
Fixesscylladb/scylladb#23325Fixesscylladb/scylladb#23305Fixesscylladb/scylladb#21815
Backport: looks like this is quite a frequent issue, therefore backport to 2025.1.
Closesscylladb/scylladb#23336
* github.com:scylladb/scylladb:
database: Pass schema_ptr as const ref in `wrap_commitlog_add_error`
database: Unify exception handling in `do_apply` and `apply_with_commitlog`
storage_proxy: Ignore wrapped `gate_closed_exception` and `rpc::closed_error` when node shuts down.
exceptions: Add `try_catch_nested` to universally handle nested exceptions of the same type.
`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for
new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`.
Refs scylladb/scylla-enterprise#4355
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The new option deprecates the existing `enable_tablets` option.
It will be extended in the next patch with a 3rd value: "enforced"
while will enable tablets by default for new keyspace but
without the posibility to opt out using the `tablets = {'enabled':
false}` keyspace schema option.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
fmt 11.1 apparently marks to_string() as [[nodiscard]]. Here we aren't
interested in the result, so explicitly ignore it to avoid an error.
Closesscylladb/scylladb#23403
Clang 20 complains when it sees a user-defined literal operator
defined with a space before the underscore. Assume it's adhering
to the standard and comply.
Closesscylladb/scylladb#23401