this issue was identified by clang-20.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#21835
* github.com:scylladb/scylladb:
locator: remove unused member variable
auth: remove unused member variable
db: remove unused member variable
This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test.
Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays.
This hard-coded one maximum second delay was considered *huge* - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client.
So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before.
Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it.
The new parameter's help string mentions both these use cases of the parameter.
Fixes#18187
This is new functionality, no need to backport to any open source release.
Closesscylladb/scylladb#21647
* github.com:scylladb/scylladb:
materialized views: test for the MV delay configuration parameter
service: add injection for skipping view update backlog
materialized view: make flow-control maximum delay configurable
This change goes thru locator:topology to use node&
instead of node* where nullptr is not possible. There are
places where the node object is used in unordered_set, in
those cases the node is wrapped in std::reference_wrapper.
Fixesscylladb/scylladb#20357Closesscylladb/scylladb#21863
Currently truncating a table works by issuing an RPC to all the nodes which call `database::truncate_table_on_all_shards()`, which makes sure that older writes are dropped.
It works with tablets, but is not safe. A concurrent replication process may bring back old data.
This change makes makes TRUNCATE TABLE a topology operation, so that it excludes with other processes in the system which could interfere with it. More specifically, it makes TRUNCATE a global topology request.
Backporting is not needed.
Fixes#16411Closesscylladb/scylladb#19789
* github.com:scylladb/scylladb:
docs: docs: topology-over-raft: Document truncate_table request
storage_proxy: fix indentation and remove empty catch/rethrow
test: add tests for truncate with tablets
storage_proxy: use new TRUNCATE for tablets
truncate: make TRUNCATE a global topology operation
storage_service: move logic of wait_for_topology_request_completion()
RPC: add truncate_with_tablets RPC with frozen_topology_guard
feature_service: added cluster feature for system.topology schema change
system.topology_requests: change schema
storage_proxy: propagate group0 client and TSM dependency
Replace boost::make_iterator_range() with std::ranges::subrange.
This change improves code modernization and reduces external dependencies:
- Replace boost::make_iterator_range() with std::ranges::subrange
- Remove boost/range/iterator_range.hpp include
- Improve iterator type detection in interval.hh using std::ranges::const_iterator_t<Range>
This is part of ongoing efforts to modernize our codebase and minimize
external dependencies.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21787
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.
in this change, we drop the support fmt < 10.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21847
The function get_service_levels is used to retrieve all service levels
and it is called from multiple different contexts.
Importantly, it is called internally from the context of group0 state reload,
where it should be executed with a long timeout, similarly to other
internal queries, because a failure of this function affects the entire
group0 client, and a longer timeout can be tolerated.
The function is also called in the context of the user command LIST
SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is
preferred.
The commit introduces a function parameter to indicate whether the
context is internal or not. For internal context, a long timeout is
chosen for the query. Otherwise, the timeout is shorter, the same as
before. When the distinction is not important, a default value is
chosen which maintains the same behavior.
The main purpose is to fix the case where the timeout is too short and causes
a failure that propagates and fails the group0 client.
Fixesscylladb/scylladb#20483Closesscylladb/scylladb#21748
Until this patch, the materialized view flow-control algorithm
(https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/)
used a constant delay_limit_us hard-coded to one second, which means
that when the size of view-update backlog reached the maximum (10%
of memory), we delay every request by an additional second - while
smaller amounts of backlog will result in smaller delays.
This hard-coded one maximum second delay was considered *huge* - it will
slow down a client with concurrency 1000 to just 1000 requests per
second - but we already saw some workloads where it was not enough -
such as a test workload running very slow reads at high concurrency
on a slow machine, where a latency of over one second was expected
for each read, so adding a one second latecy for writes wasn't having
any noticable affect on slowing down the client.
So this patch replaces the hard-coded default with a live-updateable
configuration parameter, `view_flow_control_delay_limit_in_ms`, which
defaults to 1000ms as before.
Another useful way in which the new `view_flow_control_delay_limit_in_ms`
can be used is to set it to 0. In that case, the view-update flow
control always adds zero delay, and in effect - does absolutely
nothing. This setting can be used in emergency situations where it
is suspected that the MV flow control is not behaving properly, and
the user wants to disable it.
The new parameter's help string mentions both these use cases of
the parameter.
Fixes#18187
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
"
This rather large patch series moves storage proxy and some adjacent
services (like migration manager) to use host ids to identify nodes rather
than ips. Messaging service gains a capability to address nodes by host
ids (which allows dropping translations from topology coordinator code
that worked on host ids already) and also makes sure that a node with
incorrect host id will reject a message (can happen during address
changes).
The series gets rid of the raft address map completely and replaces it with
the gossiper address map which is managed by the gossiper since translation
is now done in the layer below raft.
Fixes: scylladb/scylladb#6403
perf-simple-query -- smp 1 -m 1G output
Before:
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
64336.82 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41291 insns/op, 24485 cycles/op, 0 errors)
62669.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41277 insns/op, 24695 cycles/op, 0 errors)
69172.12 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41326 insns/op, 24463 cycles/op, 0 errors)
56706.60 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41143 insns/op, 24513 cycles/op, 0 errors)
56416.65 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 41186 insns/op, 24851 cycles/op, 0 errors)
throughput: mean=61860.35 standard-deviation=5395.48 median=62669.58 median-absolute-deviation=5153.75 maximum=69172.12 minimum=56416.65
instructions_per_op: mean=41244.62 standard-deviation=76.90 median=41276.94 median-absolute-deviation=58.55 maximum=41326.19 minimum=41142.80
cpu_cycles_per_op: mean=24601.35 standard-deviation=167.39 median=24512.64 median-absolute-deviation=116.65 maximum=24851.45 minimum=24462.70
After:
enable-cache=1
Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no}
Disabling auto compaction
Creating 10000 partitions...
65237.35 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 40733 insns/op, 23145 cycles/op, 0 errors)
59283.09 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40624 insns/op, 23948 cycles/op, 0 errors)
70851.03 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40625 insns/op, 23027 cycles/op, 0 errors)
70549.61 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40650 insns/op, 23266 cycles/op, 0 errors)
68634.96 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 40622 insns/op, 22935 cycles/op, 0 errors)
throughput: mean=66911.21 standard-deviation=4814.60 median=68634.96 median-absolute-deviation=3638.40 maximum=70851.03 minimum=59283.09
instructions_per_op: mean=40650.89 standard-deviation=47.55 median=40624.60 median-absolute-deviation=27.11 maximum=40733.37 minimum=40622.33
cpu_cycles_per_op: mean=23264.16 standard-deviation=402.12 median=23145.29 median-absolute-deviation=237.63 maximum=23947.96 minimum=22934.59
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13531/
SCT (longevity-100gb-4h with nemesis_selector: ['topology_changes']): https://jenkins.scylladb.com/view/staging/job/scylla-staging/job/gleb/job/move-to-host-id/3/
Tested mixed cluster manually.
"
* 'gleb/move-to-host-id-v2' of github.com:scylladb/scylla-dev: (55 commits)
group0: drop unused field from replace_info struct
test: rename raft_address_map_test to address_map_test and move if from raft tests
raft_address_map: remove raft address map
topology coordinator: do not modify expire state for left/new nodes any more in raft address map
topology coordinator: drop expiring entries in gossiper address map on error injections since raft one is no longer used
group0: drop raft address map dependency from raft_rpc
group0: move raft_ticker_type definition from raft_address_map.hh
storage_service: do not update raft address map on gossiper events
group0: drop raft address map dependency from raft_server_with_timeouts
group0: move group0 upgrade code to host ids
repair: drop raft address map dependency
group0: remove unused raft address map getter from raft_group0
group0: drop raft address map from group0_state_machine dependency since it is not used there any more
group0: remove dependency on raft address map from group0_state_id_handler
gossiper: add get_application_state_ptr that searches by host_id
gossiper: change get_live_token_owners to return host ids
view: move view building to host id
hints: use host id to send hints
storage_proxy: remove id_vector_to_addr since it is no longer used
db: consistency_level: change is_sufficient_live_nodes to work on host ids
...
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.
in this change, we:
- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
`std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
to feed `std::views::transform()` a view range.
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
limitations:
there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21700
This commit addresses inconsistent spelling annotations that triggered
codespell warnings in our codebase.
Problem:
- Previous annotations like "CREATEing" and "DROPing" were flagged as
misspellings by the codespell workflow
- These annotations were used to describe CQL statement execution contexts
Solution:
- Updated annotations to "CREAT'ing" and "DROP'ing"
- Preserves the intent of the original annotations
- Silences codespell warnings without changing the underlying meaning
- Ensures consistent and spell-checker-friendly code documentation
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21741
In this rather large path we mode to address nodes in storage proxy by
host ids instead of ips. Some subsystems storage proxy calls to are
not yet converted to host ids, so we translate back and forth when we
interact with them.
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Once e.g. `ALTER KEYSPACE` is performed, all in-memory objects should be updated accordingly, but this is not entirely true for keyspace metadata object. The reason for that is that keyspace metadata are stored in 2 system tables: `system_schema.keyspaces` and `system_schema.scylla_keyspaces`. Up until now the in-memory keyspace metadata object has been updated only with entries from the first table, and missed updates when entries from the 2nd table changed. These entries were e.g. initial tablets or storage options.
This change fixes this oversight by considering both tables when checking if keyspace metadata need to be updated. From the implementation point of view, the change is simple: we're considering `system_schema.scylla_keyspaces` also in `merge_keyspaces()` and if old and new schemas have any differences, we include that when altering ks.
Fixes#20768
Backport: no need, I don't think the issue is severe, atm it seems like it can only influence the tablets number, which should not bring the cluster down nor result in returning bad data, it can mostly influence the speed of the db.
Closesscylladb/scylladb#20852
Before these changes, we dereferenced `app_state` in
`manager::endpoint_downtime_not_bigger_than()` before checking that it's
not a null pointer. We fix that.
Fixesscylladb/scylladb#21699Closesscylladb/scylladb#21676
`coroutine::parallel_for_each` accepts both a range and a pair of
iterators. let's use the former when appropriate. it is simpler this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21684
When users start an operation asynchronously with API, they are expected to check the operation's status. Hence, the status should be kept in task manager for reasonable time after the operation is done. The operations that are started internally usually don't need to stay in task manager for that long.
Add api_task_ttl that will be used for tasks started with API. By default it's 1 hour. The time for which non-API tasks stay in task manager isn't changed.
Fixes: #21499.
Refs: #21425.
No backport needed - previous versions may use task_ttl
Closesscylladb/scylladb#21505
* github.com:scylladb/scylladb:
test: add test to check user_task_ttl
tasks: api: move make_task method
docs: nodetool: update backup and restore commands docs
docs: update task manager docs
nodetool: add nodetool tasks user-ttl command
node_ops: use user task ttl for node ops virtual task
tasks: use user_task_ttl for tasks started by user
api: task_manager: add /task_manager/user_ttl to get and set user task ttl
tasks: add task_manager::task::is_user_task method
tasks: keep updateable_value of task_ttl in task manager
db: config: add user_task_ttl_seconds named value
Stop taking snapshots of MVs and allow taking snapshot of individual tables, now one can take a snapshot of any base table, any view or index. Also add tests to cover new cases both boost test (using cc code) and pytest (using the API)
Also, update documentation to reflect the change
fixes: #21339fixes: #20760Closesscylladb/scylladb#21433
Modernize the codebase by replacing Boost range adaptors with C++23 standard library views,
reducing external dependencies and leveraging modern C++ language features.
Key Changes:
- Replace `boost::adaptors::filtered` with `std::views::filter`
- Remove `#include <boost/range/adaptor/filtered.hpp>`
- Utilize standard library range views
Motivation:
- Reduce project's external dependency footprint
- Leverage standard library's range and view capabilities
- Improve long-term code maintainability
- Align with modern C++ best practices
Implementation Challenges and Considerations:
1. Range Conversion and Move Semantics
- `std::ranges::to` adaptor requires rvalue references
- Necessitated updates to variable and parameter constness
- Example: `cql3/restrictions/statement_restrictions.cc` modified to remove `const`
from `common` to enable efficient range conversion
2. Range Iteration and Mutation
- Range views may mutate internal state during iteration
- Cannot pass ranges by const reference in some scenarios
- Solution: Pass ranges by rvalue reference to explicitly indicate
state invalidation
Limitations:
- One instance of `boost::adaptors::filtered` temporarily preserved
due to lack of a C++23 alternative for `boost::join()`
- A comprehensive replacement will be addressed in a follow-up change
This change is part of our ongoing effort to modernize the codebase,
reducing external dependencies and adopting modern C++ practices.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21648
Add user_task_ttl_seconds config option and keep the value in task manager.
In the following patches tasks started by user will be kept in task
manager for user_task_ttl_seconds after they are finished.
The non local strategy system keyspaces usually contain very litte data.
All the tables within them have to be repaired for all the token ranges,
which could be large in clusters with a large number of nodes. In multiple
DC setup, the repair in RBNO is dominated by the network latency. As a
result, it takes a long time to repair those tables even if they are
almost empty.
To speed up the RBNO bootstrap, especially for starting empty clusters,
this patch enables small table optimization for RBNO for system tables.
We could enable it for small user tables as a follow up.
Tests:
1) A 5ms latency is added to simulate cross dc network delay, 256 tokens
per node, 10 nodes:
- Before
topology_custom dev topology_custom.test_boot_time.1 1287.06s
- After
topology_custom dev topology_custom.test_boot_time.1 12.48s
The test shows 100X boot time improvement
2) A SCT test to bootstrap 3 DCs, 3 nodes in each DC.
- Before
Time to bootstrap = 1h23m
- After
Time to bootstrap = 13m
The test shows 6X bootstrap time improvement
Fixes#19131
The semantics of Scylla's materialized views may vary depending on how their
primary keys correspond to the base table's one. One of the differences is
how we handle writes to columns in the base table that are not selected by
a view:
* Case 1: The view's PK is a permutation of the base table's PK:
Since the view's primary key cannot be changed in an update, a row in
the view remains alive as long as the corresponding row in the base table
is alive.
The tricky part comes when the base table has columns that are NOT selected
by the view. CQL3 used to not allow for defining a table that didn't have
any other columns besides its primary key. Also, when inserting a row into
a table, it was mandatory to provide at least one value aside from the
primary key. At some point it changed [1] and the implementation of the
solution relied on the notion of the row marker.
Putting the details aside, consider the following scenario:
(i) the base table has a primary key consisting of columns
c_1, ..., c_k, and it has regular columns rc_1, ..., rc_n,
(ii) the primary key of an MV defined on that table consists of
a permutation of c_1, ..., c_k. The MV doesn't select at least
one of the regular columns of the base table. Without loss of
generality, let that unselected column be rc_1.
(iii) the base table has a row R whose only non-null value is the one
in the regular column rc_1.
Now, what will R correspond to in the MV? The base table doesn't have a row
marker, but all of its regular columns in the MV will be NULLs. That's NOT
allowed.
To solve that problem, all unselected columns have corresponding virtual
columns in the MV; the only information they provide is whether there is
a value in the base table or not. This way, the MV knows if a row is still
alive or not.
For that reason, we send view updates to virtual columns in the following
cases:
(i) the value in the column changes from NULL to a value, i.e. it's
created,
(ii) the value in the column exists, but its TTL has been updated.
* Case 2: The view's PK has one more column that the base table's one:
Since the primary key of the view has a regular column C from the base
table, it is guaranteed that if there's a row in the MV, the corresponding
row in the base table can remain alive: since C is part of the view's PK,
it must have a value, so the row in the base table has a value in C too.
The problem with virtual columns from the previous case doesn't manifest
in this one. The liveness of the cell in C determines the liveness of
the whole row in the view.
The semantics gets more complex, but the conclusion is this: in case 1,
virtual columns exist and we may need to generate view updates for them,
while in case 2 virtual columns do NOT exist and so we don't generate
view updates for them.
What changes in this patch is we adjust the code to it. If a view has
a regular column from the base table as part of its primary key, we
no longer emit view updates when we change a column unselected by that
view. It is purely an OPTIMIZATION change.
[1]: https://issues.apache.org/jira/browse/CASSANDRA-4361Fixesscylladb/scylladb#21652Closesscylladb/scylladb#21653
Schema of system tables is defined statically and table_schema_version needs to be explicitly set in code like this:
```
builder.with_version(system_keyspace::generate_schema_version(table_id, version_offset));
```
Whenever schema is changed, the schema version needs to change, otherwise we hit undefined behavior when trying to interpret mutation data created with the old schema using the new schema.
It's not obvious that one needs to do that and developers often forget to do that. There were several instances of mistakes of omission, some caught during review, some not, e.g.: 31ea74b96e.
This patch changes definitions to call the new `schema_builder::with_hash_version()`, which will make the schema builder compute version from schema definition so that changes of the schema will automatically change the version. This way we no longer rely on the developer to remember to bump the version offset.
All nodes should arrive at the same version, which is verified by existing `test_group0_schema_versioning` and a new unit test: `test_system_schema_version_is_stable`.
Closesscylladb/scylladb#21602
* github.com:scylladb/scylladb:
system_tables: Compute schema version automatically
schema_builder: Introduce with_hash_version()
schema: Store raw_view_info in schema::raw_schema
schema: Remove dead comment
hashing: Add hasher for unordered_map
hashing: Add hasher for unique_ptr
hashing: Add hasher for double
[avi: add missing include <memory> to hashing.hh]
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.
The current repair scheduler implementation does the following:
- A tablet is picked to be repaired when is requested by user
- The tablet repair can be scheduled along with tablet migration and
rebuild. It runs in the tablet_migration track.
- Repair jobs are scheduled in a smart way so that at any point in time,
there are no more than configured jobs per shard, which is similar to
scylla manager's control.
New feature. No backport is needed.
Closesscylladb/scylladb#21088
* github.com:scylladb/scylladb:
test: Add tests for tablet repair scheduler
repair: Add restful API for tablet repair
repair: Add tablet repair scheduler internal API support
docs: Update system_keyspace.md for tablet repair related info
docs: Add docs for tablet repair migration
repair: Add core tablet repair scheduler support
messaging_service: Introduce TABLET_REPAIR verb
tablet_allocator: Introduce stream_weight for tablet_migration_streaming_info
network_topology_strategy: Preserve fields of task_info in reallocate_tablets
This adds a new tablet migration kind: repair. It allows tablet repair
scheduler to use this migration kind to schedule repair jobs.
The current repair scheduler implementation does the following:
- A tablet is picked to be repaired when the time since last repair is
bigger than a threshold (auto repair mode) or it is requested by user
(manual repair mode)
- The tablet repair can be scheduled along with tablet migration and
rebuild. It runs in the tablet_migration track.
- Repair jobs are scheduled in a smart way so that at any point in time,
there are no more than configured jobs per shard, which is similar to
scylla manager's control.
In this patch, both the manual repair and the auto repair are not
enabled yet.
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::find_if`.
in this change, we:
- replace `boost::find_if` with `std::ranges::find_if`
- remove all `#include <boost/range/algorithm/find_if.hpp>`
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Our "sstring_view" is an historic alias for the standard std::string_view.
The patch changes the last remaining random uses of this old alias across
our source directory to the standard type name.
After this patch, there are no more uses of the "sstring_view" alias.
It will be removed in the following patch.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
For historic reasons, we have (in bytes.hh) a type sstring_view which
is an alias for std::string_view - since the same standard type can hold
a pointer into both a seastar::sstring and std::string.
This alias in unnecessary and misleading to new developers (who might
assume it is somehow different from std::string_view). This patch doesn't
yet remove all occurances of sstring_view (the request in #4062), but
begins to do it by renaming one commonly-used function, to_sstring_view(bytes)
to to_string_view() and of course changes all its uses to the new name.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This depends on the previous change to the schema_builder
which makes version computation depend on definition only
instead of being new time uuid.
This way we avoid the possibility for a common mistake
when schema of a system table is extended but we forget
to bump up its version passed to .with_version().
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.
Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.
In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21574
Add a buffer hint to the multishard reader. This is an internal hint, used by the multishard reader to provide a hint to the shard reader, on how much data exactly is needed by the multishard reader from the respective shard. This hint allows eliminating extraneous cross-shard round-trips and possible shard reader evict-recreate cycles. Building on this, repair sets its own row buffer size as the max buffer size on the multishard reader, ensuring that the row buffer is filled with the minimum amount of cross-shard round trips and minimal reader recreation.
To further eliminate unnecessary evictions, this PR also disables the multishard reader's read-ahead which is a mechanism that was designed to reduce latency for user-reads but it can be too aggressive for repair, causing unnecessary extra congestion on the already struggling streaming semaphores.
Refs: https://github.com/scylladb/scylladb/issues/18269
Fixes: https://github.com/scylladb/scylladb/issues/21113
The performance impact was measured with an SCT test, which creates a cluster of 3 nodes with 16 shards, then adds a 4th one with 12 shards.
Currently, it is the bootstrap time which is the worse in the case of mixed shard clusters, see below for the improvement measured during bootstrap:
| | master | buffer-hint | metric |
| ------------ | ------------- | ------------- | --------------------------------------------------- |
| evictions | 0.9M | 93.0K | scylla_database_paused_reads_permit_based_evictions |
| read (bytes) | 9.0T | 3.9T | scylla_reactor_aio_bytes_read |
| read (ops) | 88.0M | 33.5M | scylla_reactor_aio_reads |
| time | 56min | 20min | N/A |
This is a performance improvement, no backport required.
Closesscylladb/scylladb#20815
* github.com:scylladb/scylladb:
test/boost/mutation_reader_test: add test for multishard reader buffer hint
repair/row_level: disable read-ahead
db/config: introduce repair_multishard_reader_enable_read_ahead
readers/multishard: implement the read_ahead flag
replica/database: make_multishard_streaming_reader(): expose the read_ahead parameter
readers/multishard: add read_ahead parameter
repair/row_level: set max buffer size on multishard reader
replica/database: make_multishard_streaming_reader(): expose buffer_hint parameter
db/config: introduce enable_repair_multishard_reader_buffer_hint
readers/multishard: multishard_reader: pass hint to shard_reader
readers/multishard: shard_reader_v2::fill_reader_buffer(): respect the hint
readers/multishard: propagate fill_buffer_hint to shard_reader:fill_reader_buffer()
readers/multishard: shard_reader: extract buffer-fill into its own method
Separate the configuration for enabling the tablets feature from the enablement of tablets when creating new keyspaces.
This change always enables the TABLETS cluster feature and the tablets logic respectively.
The `enable_tablets` config option just controls whether tablets are enabled or disabled by default for new keyspaces.
If `enable_tablets` is set to `true`, tablets can be disabled using `CREATE KEYSPACE WITH tablets = { 'enabled': false }` as it is today.
If `enable_tablets` is set to `false`, tablets can be enabled using `CREATE KEYSPACE WITH tablets = { 'enabled': true }`.
The motivation for this change is to simplify the user experience of using tablets by setting the default for new keyspaces to false amd allowing the user to simply opt-in by using tablets = {enabled: true }.
This is not pissible today.
The user has to enable tablets by default for all new keyspaces (that use the NetworkTopologyStrategy) and then actively opt-out to use vnodes.
* Not required to be backported to OSS versions. May be backported to specific enterprise versions
* This PR resubmits https://github.com/scylladb/scylladb/pull/20729 that was reverted in 73b1f66b70 due to https://github.com/scylladb/scylladb/issues/21159 which is now fixed
Closesscylladb/scylladb#21451
* github.com:scylladb/scylladb:
data_dictionary: keyspace_metadata::describe: print tablets enabled also when defaulted
tablets_test: test enable/disable tablets when creating a new keyspace
treewide: always allow tablets keyspaces
feature_service: prevent enabling both tablets and gossip topology changes
alternator: create_keyspace_metadata: enable tablets using feature_service
With the tablets feature always enabled (Unless gossip toopology
changes are forced), the enable_tablets option now controls only
the default for newly created keyspaces.
Even when set to `false`, tablets are still enabled as a
feature and the user may explicitly enable tablets
using `CREATE KEYSPACE <name> WITH tablets = {'enabled': true}`
Note: best viewed with `git show -w`
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Tablets require raft consistent topology changes.
Therefore, document that they are incompatible in
the config help and prevent their usage in
`feature_config_from_db_config`
Fixesscylladb/scylladb#21075
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Fixes#20716
Adds optional abort_source to all s3 client operations. If provided, will propagate to actual HTTP client and allow for aborting actual net op.
Note: this uses an abort source per call, not a client-local one.
This is for two reasons:
1.) The usage pattern of the client object is to create it outside the eventual owning object (task) that hosts the relevant abort source
2.) It is quite possible to want to have different/no abort source for some operation usage.
Also adds forward usage of task abort_source in backup tasks upload s3 call, making it more readily abort-able.
Closesscylladb/scylladb#21431
* github.com:scylladb/scylladb:
backup_task: Use task abort source in s3 client call
s3::client: Make operations (individually) abortable
Fixes#20716
Propagates abort source in task object to actual network call,
thus making the upload workload more quickly abortable.
v2: Fix test to handle two versions after each other