Extension of data_source, with the ability to
a.) Seek in any direction, i.e. move backwards. Thus not pure stream.
b.) Read a limited number of bytes.
The very transparent reason for the interface is to have a base
abstraction for providing a read-only file layer for networked
resources.
Moves the config wrapper to own file (to reduce recompilation for modifying)
and refactors to handle extending this parameter to non-s3 endpoint configs.
The `describe_multi_item` function treated the last reference-captured
argument as the number of used RCU half units. The caller
`batch_get_item`, however, expected this parameter to hold an item size.
This RCU value was then passed to
`rcu_consumed_capacity_counter::get_half_units`, treating the
already-calculated RCU integer as if it were a size in bytes.
This caused a second conversion that undercounted the true RCU. During
conversion, the number of bytes is divided by `RCU_BLOCK_SIZE_LENGTH`
(=4KB), so the double conversion divided the number of bytes by 16 MB.
The fix removes the second conversion in `describe_multi_item` and
changes the API of `describe_multi_item`.
Fixes: https://github.com/scylladb/scylladb/pull/25847Closesscylladb/scylladb#25842
Expecting the group 0 read barrier to succeed with a timeout of 1s, just
after restarting 3 out of 5 voters, turned out to be flaky. In some
unlikely scenarios, such as multiple vote splits, the Raft leader
election could finish after the read barrier times out.
To deflake the test, we increase the timeout of Raft operations back to
300s for read barriers we expect to succeed.
Fixes#26457Closesscylladb/scylladb#26489
Using the name regular as the incremental mode could be confusing, since
regular might be interpreted as the non-incremental repair. It is better
to use incremental directly.
Before:
- regular (standard incremental repair)
- full (full incremental repair)
- disabled (incremental repair disabled)
After:
- incremental (standard incremental repair)
- full (full incremental repair)
- disabled (incremental repair disabled)
Fixes#26503Closesscylladb/scylladb#26504
Using `driver_connect()` after a cluster restart isn't enough to ensure
full CQL availability, but the test assumes that it is.
Fix that by making the test wait for CQL availability via `get_ready_cql()`.
Also, replace some manual usages of wait_for_cql_and_get_hosts with
`get_ready_cql()` too.
Fixesscylladb/scylladb#25362Closesscylladb/scylladb#25366
db/view/view_building_worker: move discover_existing_staging_sstables() to the foreground
This patch moves `discover_existing_staging_sstables()` to be executed
from main level, instead of running it on the background fiber.
This method need to be run only once during the startup to collect
existing staging sstables, so there is no need to do it in the
background. This change will increase debugability of any further issues
related to it (like https://github.com/scylladb/scylladb/issues/26403).
Fixes https://github.com/scylladb/scylladb/issues/26417
The patch should be backported to 2025.4
Closesscylladb/scylladb#26446
* github.com:scylladb/scylladb:
db/view/view_building_worker: move discover_existing_staging_sstables() to the foreground
db/view/view_building_worker: futurize and rename `start_background_fibers()`
There was a race between loop in `view_building_worker::run_view_building_state_observer()`
and a moment when a batch was finishing its work (`.finally()` callback
in `view_building_worker::batch::start()`).
State observer waits on `_vb_state_machine.event` CV and when it's
awoken, it takes group0 read apply mutex and updates its state. While
updating the state, the observer looks at `batch::state` field and
reacts to it accordingly.
On the other hand, when a batch finishes its work, it sets `state` field
to `batch_state::finished` and does a broadcast on
`_vb_state_machine.event` CV.
So if the batch will execute the callback in `.finally()` while the
observer is updating its state, the observer may miss the event on the
CV and it will never notice that the batch was finished.
This patch fixes this by adding a `some_batch_finished` flag. Even if
the worker won't see an event on the CV, it will notice that the flag
was set and it will do next iteration.
Fixesscylladb/scylladb#26204Closesscylladb/scylladb#26289
In f828fe0d59 ("setup: add the lazytime XFS version") we added the
lazytime mount option to /var/lib/scylla, but it was quickly reverted
(8f5e80e61a) as it caused a regression on CentOS 7.
We reinstate it now with a kernel version check. This will avoid
the lazytime mount option on CentOS 7, which is unsupported anyway.
The lazytime option avoids marking the inode as dirty if it's only for the
purpose of updating mtime/ctime. This won't help much while writing sstables
(since the write also updates extent information), but may help a little
with with commitlog writes, since those are pure overwrites.
It likely won't help with the RWF_NOWAIT violations seen in [1], since
those are likely due to in-memory locking, not flushing dirty inodes
to disk.
Tested with an install to Ubuntu 24.04 LTS followed by a scylla_setup run.
The lazytime option was added the the .mount file and showed up in
the live mount.
[1] https://github.com/scylladb/seastar/issues/2974
Closes scylladb/scylladb#26436
Fixes#26002
The test uses CQL tracing to check which files were read by a query.
This is flaky if the coordinator and the replica are different shards,
because the Python driver only waits for the coordinator, and not
for replicas, to finish writing their traces.
(So it might happen that the Python driver returns a result
with only coordinator events and no replica events).
Let's just dodge the issue by using --smp=1.
Fixesscylladb/scylladb#26432Closesscylladb/scylladb#26434
We noticed during work on scylladb/seastar#2802 that on i7i family
(later proved that it's valid for i4i family as well),
the disks are reporting the physical sector sizes incorrectly
as 512bytes, whilst we proved we can render much better write IOPS with
4096bytes.
This is not the case on AWS i3en family where the reported 512bytes
physical sector size is also the size we can achieve the best write IOPS.
This patch works around this issue by changing `scylla_io_setup` to parse
the instance type out of `/sys/devices/virtual/dmi/id/product_name`
and run iotune with the correct request size based on the instance type.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#25315
pass an appropriate query state for auth queries called from service
level cache reload. we use the function qos_query_state to select a
query_state based on caller context - for internal queries, we set a
very long timeout.
the service level cache reload is called from group0 reload. we want it
to have a long timeout instead of the default 5 seconds for auth
queries, because we don't have strict latency requirement on the one
hand, and on the other hand a timeout exception is undesired in the
group0 reload logic and can break group0 on the node.
Fixes https://github.com/scylladb/scylladb/issues/25290
backport possible to improve stability
Closesscylladb/scylladb#26180
* github.com:scylladb/scylladb:
service/qos: set long timeout for auth queries on SL cache update
auth: add query_state parameter to query functions
auth: refactor query_all_directly_granted
This patch moves `discover_existing_staging_sstables()` to be executed
from main level, instead of running it on the background fiber.
This method need to be run only once during the startup to collect
existing staging sstables, so there is no need to do it in the
background. This change will increase debugability of any further issues
related to it (like scylladb/scylladb#26403).
Fixesscylladb/scylladb#26417
Next commit will move `discover_existing_staging_sstables()`
to the foreground, so to prepare for this we need to futurize
`start_background_fibers()` method and change its name to better reflect
its purpose.
`sl:driver` is expected to be used for new and control connections,
but other connections that run user load should not use it after
the user is authenticated.
Refs: scylladb/scylladb#24411
Before `sl:driver` was introduced, service levels were assigned as
follows:
1. New connections were processed in `main`.
2. After user authentication was completed, the connection's SL was
changed to the user's SL (or `sl:default` if the user had no SL).
This commit introduces `service_level_state` to `client_state` and
implements the following logic in `transport/server`:
1. If `sl:driver` is not present in the system (for example, it was
removed), service levels behave as described above.
2. If `sl:driver` is present, the flow is:
I. New connections use `sl:driver`.
II. After user authentication is completed, the connection's SL is
changed to the user's SL (or `sl:default`).
III. If a REGISTER (to events) request is handled, the client is
processing the control connection. We mark the client_state
to permanently use `sl:driver`.
The aforementioned state `2.III` is represented by
`_control_connection` flag in `client_state`.
Fixes: scylladb/scylladb#24411
Before this change, unauthorized connections stayed in `main`
scheduling group. It is not ideal, in such case, rather `sl:default`
should be used, to have a consistent behavior with a scenario
where users is authenticated but there is no service level assigned
to the user.
This commit adds a call to `update_scheduling_group` at the end of
connection creation for an unauthenticated user, to make sure the
service level is switched to `sl:default`.
Fixes: scylladb/scylladb#26040
Before this change, new connections were handled in a default
scheduling group (`main`), because before the user is authenticated
we do not know which service level should be used. With the new
`sl:driver` service level, creation of new connections can be moved to
`sl:driver`.
We switch the service level as early as possible, in `do_accepts`.
There is a possibility, that `sl:driver` will not exist yet, for
instance, in specific upgrade cases, or if it was removed. Therefore,
we also switch to `sl:driver` after a connection is accepted.
Refs: scylladb/scylladb#24411
Driver service level is a special service level that is created
automatically by the system. Therefore, it requires special handling
in DESC SCHEMA WITH INTERNALS and those test verifies the special
behavior.
Refs: scylladb/scylladb#24411
This commit:
- Increases the number of allowed scheduling groups to allow the
creation of `sl:driver`.
- Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
`sl:driver` until all nodes have increased the number of
scheduling groups.
- Starts using `get_create_driver_service_level_mutations`
to unconditionally create `sl:driver` on
`raft_initialize_discovery_leader`. The purpose of this code
path is ensuring existence of `sl:driver` in new system and tests.
- Starts using `migrate_to_driver_service_level` to create `sl:driver`
if it is not already present. The creation of `sl:driver` is
managed by `topology_coordinator`, similar to other system keyspace
updates, such as the `view_builder` migration. The purpose of this
code path is handling upgrades.
- Modifies related tests to pass after `sl:driver` is added.
Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.
Refs: scylladb/scylladb#24411
This commit implements `get_create_driver_service_level_mutations`
and `migrate_to_driver_service_level` in service_level_controller.
Both methods create `sl:driver` with shares=200 and store this fact
in `system.scylla_local`. Both methods will be used later in this
patch series for automatic creation of sl:driver.
Refs: scylladb/scylladb#24411
Later in this patch series, `sl:driver` will be added as a special
service level created automatically by the system. It needs special
handling in `DESC SCHEMA ...` to ensure that during backup restore:
1. CREATE SERVICE LEVEL does not fail if `sl:driver` already exists
2. If `sl:driver` exists, its configuration is fully restored (emit
ALTER SERVICE LEVEL).
3. If `sl:driver` was removed, the information is retained (emit
DROP SERVICE LEVEL instead of CREATE/ALTER).
Refs: scylladb/scylladb#24411
This adds a reference to sl_controller so that, later in this patch
series, topology_coordinator can manage creating `sl:driver` once
group0 is fully operational.
Refs: scylladb/scylladb#24411
This commit extends sytem.scylla_local table with an additional
key/value pair that can be used later in this patch series to
keep an information that `sl:driver` was already created. The purpose
of storing this information is to ensure that `sl:driver` is
not recreated after being intentionally removed.
A new mutation is included in `register_raft_pull_snapshot` to keep
`service_level_driver_created` in state machine shapshot, which is
required for proper propagation of the value when a new node is added
to the cluster.
Refs: scylladb/scylladb#24411
Previously, tests used the hardcoded value 7 for the maximum number of
user service levels. This commit introduces a named variable that can
be shared across tests to avoid cases where this magic number goes
out of sync.
The current description is not accurate: the function doesn't throw
an exception if there's an invalid materialized view. Instead, it
simply logs the keyspaces that violate the requirement.
Furthermore, the experimental feature `views-with-tablets` is no longer
necessary for considering a materialized view as valid. It was dropped
in scylladb/scylladb@b409e85c20. The
replacement for it is the cluster feature `VIEWS_WITH_TABLETS`.
Fixesscylladb/scylladb#26420Closesscylladb/scylladb#26421
This patch adds tests for:
- tablet migration during view building
- tablet merge during view building.
Those tests were missing from the original testing plan.
We want to backport it to 2025.4 to ensure the release is bug-free.
Closesscylladb/scylladb#26414
* github.com:scylladb/scylladb:
test/cluster/test_view_building_coordinator: add test for tablet merge
test/cluster/test_view_building_coordinator: add test for tablet migration
Seastar httpd recommended users to stop using contiguous requet.content string and read body they need from request's input_stream instead. However, "official" deprecation of request content had been only made recently.
This PR patches REST API server to turn this feature on and patches few handlers that mess with request bodies to read them from request stream.
Using newer seastar API, no need to backport
Closesscylladb/scylladb#26418
* github.com:scylladb/scylladb:
api: Switch to request content streaming
api: Fix indentation after previous patch
api: Coroutinize set_relabel_config handler
api: Coroutinize set_error_injection handler
This dependency reference is carried into column_family handlers block to make get_built_views handler work. However, the handler in question should live in view_builder block, because it works with v.b. data. This PR moves the handler there, while at it, coroutinizes it, and removes the no longer needed sys.ks. reference from column_family.
API dependencies cleanup work, no need to backport
Closesscylladb/scylladb#26381
* github.com:scylladb/scylladb:
api: Fix indentation after previous patch
api: Coroutinize get_built_indexes handler code
api: Remove system_keyspace ref from column_family API block
api: Move get_built_indexes from column_family to view_builder
If mis-used, the script says
error: unrecognized option: ..., see ./scripts/pull_github_pr.sh -h for usage
but if using the suggested -h option it prints just the same.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26378
The PR #26154 dropped the `-fvisibility=hidden` compiler flag and
replaced it with `-fvisibility-inlines-hidden` as the former caused
issues in how the `noncopyable_function::operator bool` method executed
leading to incorrect return values. Apply the same fix to cmake.
Fixes#26391Closesscylladb/scylladb#26431
There are three handler that need to be patched all at once with the
server itself being marked with set_content_streaming
For two simple handler just get the content string with
read_entire_stream_contiguous helper. This is what httpd server did
anyway.
The "start_restore" handler used the contiguous contents to parse json
from using rjson utility. This handler is patched to use
read_entire_stream() that returns a vector of temporary buffers. The
rjson parser has a helper to pars from that vector, so the change is
also optimization.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Without the invoke_on_all lambda, for simplicity
Also keep indentation "broken" for the ease of review
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In the Raft-based recovery procedure, we create a new group 0 and add
live nodes to it one by one. This means that for some time there are
nodes which belong to the topology, but not to the new group 0. The
voter handler running on the recovery leader incorrectly considers these
nodes while choosing voters.
The consequences:
- misleading logs, for example, "making servers {<ID of a non-member>}
voters", where the non-member won't become a voter anyway,
- increased chance of majority loss during the recovery procedure, for
example, all 3 nodes that first joined the new group 0 are in the same
dc and rack, but only one of them becomes a voter because the voter
handler tries to make non-members in other dcs/racks voters.
Fixes#26321Closesscylladb/scylladb#26327
Some code wants its TLS sockets to close immediately without sending BYE
message and waiting for the response. Recent seastar update changed the
way this functionality is requested (scylladb/seastar#2986)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26253