When an sstable is unlinked, it remains in the _active list of the
sstable manager. Its memory might be reclaimed and later reloaded,
causing issues since the sstable is already unlinked. This patch updates
the on_unlink method to reclaim memory from the sstable upon unlinking,
remove it from memory tracking, and thereby prevent the issues described
above.
Added a testcase to verify the fix.
Fixes#21887
This is a bug fix in the bloom filter reload/reclaim mechanism and should be backported to older versions.
Closesscylladb/scylladb#21895
* github.com:scylladb/scylladb:
sstables_manager: reclaim memory from sstables on unlink
sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable()
sstables: introduce disable_component_memory_reload()
sstables_manager: log sstable name when reclaiming components
(cherry picked from commit d4129ddaa6)
Closesscylladb/scylladb#21997
New logs allow us to easily distinguish two cases in which
waiting for apply times out:
- the node didn't receive the entry it was waiting for,
- the node received the entry but didn't apply it in time.
Distinguishing these cases simplifies reasoning about failures.
The first case indicates that something went wrong on the leader.
The second case indicates that something went wrong on the node
on which waiting for apply timed out.
As it turns out, many different bugs result in the `read_barrier`
(which calls `wait_for_apply`) timeout. This change should help
us in debugging bugs like these.
We want to backport this change to all supported branches so that
it helps us in all tests.
Fixesscylladb/scylladb#22160Closesscylladb/scylladb#22157
The series contains small fixes to the gossiper one of which fixes#21930. Others I noticed while debugged the issue.
Fixes: #21930
- (cherry picked from commit 91cddcc17f)
Parent PR: #21956Closesscylladb/scylladb#21990
* github.com:scylladb/scylladb:
gossiper: do not reset _just_removed_endpoints in non raft mode
gossiper: do not call apply for the node's old state
In the current scenario, if during startup, a node crashes after initiating gossip and before joining group0,
then it keeps floating in the gossiper forever because the raft based gossiper purging logic is only effective
once node joins group0. This orphan node hinders the successor node from same ip to join cluster since it collides
with it during gossiper shadow round.
This commit intends to fix this issue by adding a background thread which periodically checks for such orphan entries in
gossiper and removes them.
A test is also added in to verify this logic. This test fails without this background thread enabled, hence
verifying the behavior.
Fixes: scylladb/scylladb#20082Closesscylladb/scylladb#21600
(cherry picked from commit 6c90a25014)
Closesscylladb/scylladb#21821
Update the service level cache in the node startup sequence, after the
service level and auth service are initialized.
The cache update depends on the service level data accessor being set
and the auth service being initialized. Before the commit, it may happen that a
cache update is not triggered after the initialization. The commit adds
an explicit call to update the cache where it is guaranteed to be ready.
Fixesscylladb/scylladb#21763Closesscylladb/scylladb#21773
(cherry picked from commit 373855b493)
The function get_service_levels is used to retrieve all service levels
and it is called from multiple different contexts.
Importantly, it is called internally from the context of group0 state reload,
where it should be executed with a long timeout, similarly to other
internal queries, because a failure of this function affects the entire
group0 client, and a longer timeout can be tolerated.
The function is also called in the context of the user command LIST
SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is
preferred.
The commit introduces a function parameter to indicate whether the
context is internal or not. For internal context, a long timeout is
chosen for the query. Otherwise, the timeout is shorter, the same as
before. When the distinction is not important, a default value is
chosen which maintains the same behavior.
The main purpose is to fix the case where the timeout is too short and causes
a failure that propagates and fails the group0 client.
Fixesscylladb/scylladb#20483Closesscylladb/scylladb#21748
(cherry picked from commit 53224d90be)
Otherwise, the read will be considered as on-cpu during promoted index
search, which will severely underutlize the disk because by default
on-cpu concurrency is 1.
I verified this patch on the worst case scenario, where the workload
reads missing rows from a large partition. So partition index is
cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522).
But there is IO during promoted index search (via cached_file).
Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s.
The problem is much less pronounced if there is data file or partition index IO involved
because that IO will signal read concurrency semaphore to invite more concurrency.
Fixes#21325
(cherry picked from commit 868f5b59c4)
(cherry picked from commit 0f2101b055)
Refs #21323Closesscylladb/scylladb#21359
* github.com:scylladb/scylladb:
utils: cached_file: Mark permit as awaiting on page miss
utils: cached_file: Push resource_unit management down to cached_file
Otherwise, the read will be considered as on-cpu during promoted index
search, which will severely underutlize the disk because by default
on-cpu concurrency is 1.
I verified this patch on the worst case scenario, where the workload
reads missing rows from a large partition. So partition index is
cached (no IO) and there is no data file IO. But there is IO during
promoted index search (via cached_file). Before the patch this
workload was doing 4k req/s, after the patch it does 30k req/s.
The problem is much less pronounced if there is data file or index
file IO involved because that IO will signal read concurrency
semaphore to invite more concurrency.
(cherry picked from commit 0f2101b055)
It saves us permit operations on the hot path when we hit in cache.
Also, it will lay the ground for marking the permit as awaiting later.
(cherry picked from commit 868f5b59c4)
In commit 2596d157, we added a condition to run auto-backport.py only
when the GitHub Action is triggered by a push to the default branch.
However, this introduced an unexpected error due to incorrect condition
handling.
Problem:
- `github.event.before` evaluates to an empty string
- GitHub Actions' single-pass expression evaluation system causes
the step to always execute, regardless of `github.event_name`
Despite GitHub's documentation suggesting that ${{ }} can be omitted,
it recommends using explicit ${{}} expressions for compound conditions.
Changes:
- Use explicit ${{}} expression for compound conditions
- Avoid string interpolation in conditional statements
Root Cause:
The previous implementation failed because of how GitHub Actions
evaluates conditional expressions, leading to an unintended script
execution and a 404 error when attempting to compare commits.
Example Error:
```
python .github/scripts/auto-backport.py --repo scylladb/scylladb --base-branch refs/heads/master --commits ..2b07d93beac7bc83d955dadc20ccc307f13f20b6
shell: /usr/bin/bash -e {0}
env:
DEFAULT_BRANCH: master
GITHUB_TOKEN: ***
Traceback (most recent call last):
File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 201, in <module>
main()
File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 162, in main
commits = repo.compare(start_commit, end_commit).commits
File "/usr/lib/python3/dist-packages/github/Repository.py", line 888, in compare
headers, data = self._requester.requestJsonAndCheck(
File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck
return self.__check(
File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check
raise self.__createException(status, responseHeaders, output)
github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits", "status": "404"}
```
Fixesscylladb/scylladb#21808
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21809
(cherry picked from commit e04aca7efe)
Closesscylladb/scylladb#21819
Scrub compaction can pick up input sstables from maintenance sstable set
but on compaction completion, it doesn't update the maintenance set
leaving the original sstable in set after it has been scrubbed. To fix
this, on compaction completion has to update the maintenance sstable if
the input originated from there. This PR solves the issue by updating the
correct sstable_sets on compaction completion.
Fixes#20030
This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2.
Closesscylladb/scylladb#21582
* github.com:scylladb/scylladb:
compaction: remove unused `update_sstable_lists_on_off_strategy_completion`
compaction_group: replace `update_sstable_lists_on_off_strategy_completion`
compaction_group: rename `update_main_sstable_list_on_compaction_completion`
compaction_group: update maintenance sstable set on scrub compaction completion
compaction_group: store table::sstable_list_builder::result in replacement_desc
table::sstable_list_builder: remove old sstables only from current list
table::sstable_list_builder: return removed sstables from build_new_list
(cherry picked from commit 58baeac0ad)
Closesscylladb/scylladb#21789
This fixes a use-after-free bug when parsing clustering key across
pages.
Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet).
Details of the first problem:
Clustering key index lookup is based on the index file page cache. We
do a binary search within the index, which involves parsing index
blocks touched by the algorithm. Index file pages are 4 KB chunks
which are stored in LSA.
To parse the first key of the block, we reuse clustering_parser, which
is also used when parsing the data file. The parser is stateful and
accepts consecutive chunks as temporary_buffers. The parser is
supposed to keep its state across chunks.
In 93482439, the promoted index cursor was optimized to avoid
fully page copy when parsing index blocks. Instead, parser is
given a temporary_buffer which is a view on the page.
A bit earlier, in b1b5bda, the parser was changed to keep shared
fragments of the buffer passed to the parser in its internal state (across pages)
rather than copy the fragments into a new buffer. This is problematic
when buffers come from page cache because LSA buffers may be moved
around or evicted. So the temporary_buffer which is a view on the LSA
buffer is valid only around the duration of a single consume() call to
the parser.
If the blob which is parsed (e.g. variable-length clustering key
component) spans pages, the fragments stored in the parser may be
invalidated before the component is fully parsed. As a result, the
parsed clustering key may have incorrect component values. This never
causes parsing errors because the "length" field is always parsed from
the current buffer, which is valid, and component parsing will end at
the right place in the next (valid) buffer.
The problematic path for clustering_key parsing is the one which calls
primitive_consumer::read_bytes(), which is called for example for text
components. Fixed-size components are not parsed like this, they store
the intermediate state by copying data.
This may cause incorrect clustering keys to be parsed when doing
binary search in the index, diverting the search to an incorrect
block.
Details of the solution:
We adapt page_view to a temporary_buffer-like API. For this, a new concept
is introduced called ContiguousSharedBuffer. We also change parsers so that
they can be templated on the type of the buffer they work with (page_view vs
temporary_buffer). This way we don't introduce indirection to existing algorithms.
We use page_view instead of temporary_buffer in the promoted
index parser which works with page cache buffers. page_view can be safely
shared via share() and stored across allocating sections. It keeps hold to the
LSA buffer even across allocating sections by the means of cached_file::page_ptr.
Fixes#20766Closesscylladb/scylladb#20837
* github.com:scylladb/scylladb:
sstables: bsearch_clustered_cursor: Add trace-level logging
sstables: bsearch_clustered_cursor: Move definitions out of line
test, sstables: Verify parsing stability when allocating section is retried
test, sstables: Verify parsing stability when buffers cross page boundary
sstables: bsearch_clustered_cursor: Switch parsers to work with page_view
cached_file: Adapt page_view to ContiguousSharedBuffer
cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start
sstables, utils: Allow parsers to work with different buffer types
sstables: promoted_index_block_parser: Make reset() always bring parser to initial state
sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method
sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried
(cherry picked from commit fb8743b2d6)
Closesscylladb/scylladb#20906
schema_change_test currently fails due to failure to start a cql test
env in unit tests after the point where this is called (in one of the
test cases):
forward_jump_clocks(std::chrono::seconds(60*60*24*31));
The problem manifests with a failure to join the cluster due to
missing_column exception ("missing_column: done") being thrown from
system_keyspace::get_topology_request_state(). It's a symptom of
join request being missing in system.topology_requests. It's missing
because the row is expired.
When request is created, we insert the
mutations with intended TTL of 1 month. The actual TTL value is
computed like this:
ttl_opt topology_request_tracking_mutation_builder::ttl() const {
return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1)
- std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch());
}
_ts comes from the request_id, which is supposed to be a timeuuid set
from current time when request starts. It's set using
utils::UUID_gen::get_time_UUID(). It reads the system clock without
adding the clock offset, so after forward_jump_clocks(), _ts and
gc_clock::now() may be far off. In some cases the accumulated offset
is larger than 1month and the ttl becomes negative, causing the
request row to expire immediately and failing the boot sequence.
The fix is to use db_clock, which respects offsets and is consistent
with gc_clock.
The test doesn't fail in CI becuase there each test case runs in a
separate process, so there is no bootstrap attempt (by new cql test
env) after forward_jump_clocks().
Closes scylladb/scylladb#21558
(cherry picked from commit 1d0c6aa26f)
Closesscylladb/scylladb#21583Fixes#21581
tablet_repair_task_impl keeps a vector of tablet_repair_task_meta,
each of which keeps an effective_replication_map_ptr. So, after
the task completes, the token metadata version will not change for
task_ttl seconds.
Implement tablet_repair_task_impl::release_resources method that clears
tablet_repair_task_meta vector when the task finishes.
Set task_ttl to 1h in test_tablet_repair to check whether the test
won't time out.
Fixes: #21503.
Closesscylladb/scylladb#21504
(cherry picked from commit 572b005774)
Closesscylladb/scylladb#21621
Building upon commit 69b47694, this change addresses a subtle synchronization
weakness in node visibility checks during recovery mode testing.
Previous Approach:
- Waited only for the first node to see its peers
- Insufficient to guarantee full cluster consistency
Current Solution:
1. Implement comprehensive node visibility verification
2. Ensure all nodes mutually recognize each other
3. Prevent potential schema propagation race conditions
Key Improvements:
- Robust cluster state validation before keyspace creation
- Eliminate partial visibility scenarios
Fixesscylladb/scylladb#21724
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21726
(cherry picked from commit 65949ce607)
Closesscylladb/scylladb#21733
Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks.
A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair.
Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked.
Fixes: #21612.
Needs backport to 6.1 and 6.2 as they contain the bug.
Closesscylladb/scylladb#21616
* github.com:scylladb/scylladb:
test: add test to check if repair is properly aborted
repair: add shard param to task_manager_module::is_aborted
repair: use task abort source to abort repair
repair: drop _aborted_pending_repairs and utilize tasks abort mechanism
repair: fix task_manager_module::abort_all_repairs
(cherry picked from commit 5ccbd500e0)
Closesscylladb/scylladb#21641
Alternator's "/localnodes" HTTP requests is supposed to return the list
of nodes in the local DC to which the user can send requests.
Before commit bac7c33313 we used the
gossiper is_alive() method to determine if a node should be returned.
That commit changed the check to is_normal() - because a node can be
alive but in non-normal (e.g., joining) state and not ready for
requests.
However, it turns out that checking is_normal() is not enough, because
if node is stopped abruptly, other nodes will still consider it "normal",
but down (this is so-called "DN" state). So we need to check **both**
is_alive() and is_normal().
This patch also adds a test reproducing this case, where a node is
shut down abruptly. Before this patch, the test failed ("/localnodes"
continued to return the dead node), and after it it passes.
Fixes#21538
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21540
(cherry picked from commit 7607f5e33e)
Closesscylladb/scylladb#21633
During migration cleanup, there's a small window in which the storage
group was stopped but not yet removed from the list. So concurrent
operations traversing the list could work with stopped groups.
During a test which emitted schema changes during migrations,
a failure happened when updating the compaction strategy of a table,
but since the group was stopped, the compaction manager was unable
to find the state for that group.
In order to fix it, we'll skip stopped groups when traversing the
list since they're unused at this stage of migration and going away
soon.
Fixes#20699.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit b8d6f864bc)
Closesscylladb/scylladb#21203
Fixes#21159
When an exception is thrown in sstable write etc such that
storage_manager::isolate is initiated, we start a shutdown chain
for message service, gossip etc. These are synced (properly) in
storage_manager::stop, but if we somehow call gossiper::shutdown
outside the normal service::stop cycle, we can end up running the
method simultaneously, intertwined (missing the guard because of
the state change between check and set). We then end up co_awaiting
an invalid future (_failure_detector_loop_done) - a second wait.
Fixed by
a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added
in 20496ed, ages ago. However, it should not be needed nowadays.
b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure.
(cherry picked from commit c28a5173d9)
Closesscylladb/scylladb#21394
The test is only sending a subset of the running servers for the rolling
restart. The rolling restart is checking the visibility of the restarted
node agains the other nodes, but if that set is incomplete some of the
running servers might not have seen the restarted node yet.
Improved the manager client rolling restart method to consider all the
running nodes for checking the restarted node visibility.
Fixes: scylladb/scylladb#19959Closesscylladb/scylladb#21477
(cherry picked from commit 92db2eca0b)
Closesscylladb/scylladb#21555
After merged 5a470b2bfb, we found that scylla_raid_setup fails on offline mode
installation.
This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect
internet.
Seems like it occur because of missing "policycoreutils-python-utils"
package, which is the package for "semange" command.
So we need to implement the relabeling patch without using the command.
Fixes https://github.com/scylladb/scylladb/issues/21441
Also, since Amazon Linux 2 has different package name for semange, we need to
adjust package name.
Fixes https://github.com/scylladb/scylladb/issues/21351Closesscylladb/scylladb#21474
* github.com:scylladb/scylladb:
scylla_raid_setup: support installing semanage on Amazon Linux 2
scylla_raid_setup: fix failure on SELinux package installation
(cherry picked from commit 1c212df62d)
Closesscylladb/scylladb#21546
The stream-session is the receiving end of streaming, it reads the
mutation fragment stream from an RPC stream and writes it onto the disk.
As such, this part does no disk IO and therefore, using a permit with
count resources is superfluous. Furthermore, after
d98708013c, the count resources on this
permit can cause a deadlock on the receiver end, via the
`db::view::check_view_update_path()`, which wants to read the content of
a system table and therefore has to obtain a permit of its own.
Switch to a tracking-only permit, primarily to resolve the deadlock, but
also because admission is not necessary for a read which does no IO.
Refs: scylladb/scylladb#20885 (partial fix, solves only one of the deadlocks)
Fixes: scylladb/scylladb#21264Fixes: scylladb/scylladb#21570Closesscylladb/scylladb#21059
(cherry picked from commit 7c75fc599f)
Closesscylladb/scylladb#21571
stop() methods, like destructors must always succeed,
and returning errors from them is futile as there is
nothing else we can do with them by continue with shutdown.
stop_ongoing_compactions, in particular, currently returns the status
of stopped compaction tasks from `stop_tasks`, but still all tasks
must be stopped after it, even if they failed, so assert that
and ignore the errors.
Fixes scylladb/scylladb#21159
* Needs backport to 6.2 and 6.1, as commit 8cc99973eb causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body
(cherry picked from commit e942c074f2)
(cherry picked from commit d8500472b3)
(cherry picked from commit c08ba8af68)
(cherry picked from commit a7a55298ea)
(cherry picked from commit 6cce67bec8)
Refs #21299Closesscylladb/scylladb#21435
* github.com:scylladb/scylladb:
compaction_manager: stop: await _stop_future if engaged
compaction_manager: really_do_stop: assert that no tasks are left behind
compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors
compaction/compaction_manager: stop_tasks(): unlink stopped tasks
compaction/compaction_manager: make _tasks an intrusive list
The current condition that consults the compaction manager
state for awaiting `_stop_future` works since _stop_future
is assigned after the state is set to `stopped`, but it is
incidental. What matters is that `_stop_future` is engaged.
While at it, exchange _stop_future with a ready future
so that stop() can be safely called multiple times.
And dropped the superfluous co_return.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 6cce67bec8)
stop_ongoing_compactions now ignores any errors returned
by tasks, and it should leave no task left behind.
Assert that here, before the compaction_manager is destroyed.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a7a55298ea)
stop() methods, like destructors must always succeed,
and returning errors from them is futile as there is
nothing else we can do with them but continue with shutdown.
Leaked errors on the stop path may cause termination
on shutdown, when called in a deferred action destructor.
Fixesscylladb/scylladb#21298
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit c08ba8af68)
Stopped tasks currently linger in _tasks until the fiber that created
the task is scheduled again and unlinks the task. This window between
stop and remove prevents reliable checks for empty _tasks list after all
tasks are stopped.
Unlink the task early so really_do_stop() can safely check for an empty
_tasks list (next patch).
(cherry picked from commit d8500472b3)
_tasks is currently std::list<shared_ptr<compaction_task_executor>>, but
it has no role in keeping the instances alive, this is done by the
fibers which create the task (and pin a shared ptr instance).
This lends itself to an intrusive list, avoiding that extra
allocation upon push_back().
Using an intrusive list also makes it simpler and much cheaper (O(1) vs.
O(N)) to remove tasks from the _tasks list. This will be made use of in
the next patch.
Code using _task has to be updated because the value_type changes from
shared_ptr<compaction_task_executor> to compaction_task_executor&.
(cherry picked from commit e942c074f2)
For performance reasons, mutation_partition_v2::maybe_drop(), and by extension
also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&)
can evict empty row entries, and hence change the continuity of the merged
entry.
For checking that apply_to_incomplete respects continuity,
test_apply_to_incomplete_respects_continuity obtains the continuity of
the partition entry before and after apply_to_incomplete by calling
e.squashed().get_continuity(). But squashed() uses apply_monotonically(),
so in some circumstances the result of squashed() can have smaller
continuity than the argument of squashed(), which messes with the thing
that the test is trying to check, and causes spurious failures.
This patch changes the method of calculating the continuity set,
so that it matches the entry exactly, fixing the test failures.
Fixesscylladb/scylladb#13757Closesscylladb/scylladb#21459
(cherry picked from commit 35921eb67e)
Closesscylladb/scylladb#21496
Since Scylla is a public repo, when we create a fork, it doesn't fork the team and permissions (unlike private repos where it does).
When we have a backport PR with conflicts, the developers need to be able to update the branch to fix the conflicts. To do so, we modified the logic of the backport automation as follows:
- Every backport PR (with and without conflicts) will be open directly on the `scylladbbot` fork repo
- When there are conflicts, an email will be sent to the original PR author with an invitation to become a contributor in the `scylladbbot` fork with `push` permissions. This will happen only once if Auther is not a contributor.
- Together with sending the invite, all backport labels will be removed and a comment will be added to the original PR with instructions
- The PR author must add the backport labels after the invitation is accepted
Fixes: https://github.com/scylladb/scylladb/issues/18973Closesscylladb/scylladb#21401
(cherry picked from commit 77604b4ac7)
Closesscylladb/scylladb#21465
Adding an auto-backport.py script to handle backport automation instead of Mergify.
The rules of backport are as follows:
* Merged or Closed PRs with any backport/x.y label (one or more) and promoted-to-master label
* Backport PR will be automatically assigned to the original PR author
* In case of conflicts the backport PR will be open in the original autoor fork in draft mode. This will give the PR owner the option to resolve conflicts and push those changes to the PR branch (Today in Scylla when we have conflicts, the developers are forced to open another PR and manually close the backport PR opened by Mergify)
* Fixing cherry-pick the wrong commit SHA. With the new script, we always take the SHA from the stable branch
* Support backport for enterprise releases (from Enterprise branch)
Fixes: https://github.com/scylladb/scylladb/issues/18973
(cherry picked from commit f9e171c7af)
Closesscylladb/scylladb#21470
The skipped ranges should be multiplied by the number of tables
Otherwise the finished ranges ratio will not reach 100%.
Fixes#21174
(cherry picked from commit cffe3dc49f)
(cherry picked from commit 1392a6068d)
(cherry picked from commit 9868ccbac0)
Refs #21252Closesscylladb/scylladb#21314
* github.com:scylladb/scylladb:
test: Add test_node_ops_metrics.py
repair: Make the ranges more consistent in the log
repair: Fix finished ranges metrics for removenode
When a compaction_group is removed via `compaction_manager::remove`,
it is erase from `_compaction_state`, and therefore compaction
is definitely not enabled on it.
This triggers an internal error if tablets are cleaned up
during drop/truncate, which checks that compaction is disabled
in all compaction groups.
Note that the callers of `compaction_disabled` aren't really
interested in compaction being actively disabled on the
compaction_group, but rather if it's enabled or not.
A follow-up patch can be consider to reverse the logic
and expose `compaction_enabled` rather than `compaction_disabled`.
Fixesscylladb/scylladb#20060
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 78ceaeabca)
Closesscylladb/scylladb#21405
ALTER tablets-enabled KEYSPACES (KS) may fail due to
group0_concurrent_modification, in which case it's repeated by a for
loop surrounding the code. But because raft's add_entry consumes the
raft's guard (by std::move'ing the guard object), retries of ALTER KS
will use a moved-from guard object, which is UB, potentially a crash.
The fix is to remove the before mentioned for loop altogether and rethrow the exception, as the rf_change event
will be repeated by the topology state machine if it receives the
concurrent modification exception, because the event will remain present
in the global requests queue, hence it's going to be executed as the
very next event.
Note: refactor is implemented in the follow-up commit.
Fixes: https://github.com/scylladb/scylladb/issues/21102
Should be backported to every 6.x branch, as it may lead to a crash.
(cherry picked from commit de511f56ac)
(cherry picked from commit 3f4c8a30e3)
(cherry picked from commit 522bede8ec)
Refs https://github.com/scylladb/scylladb/pull/21121Closesscylladb/scylladb#21340
* github.com:scylladb/scylladb:
test: topology: add disable_schema_agreement_wait utility function
test: add UT to test retrying ALTER tablets KEYSPACE
cql/tablets: fix indentation in `rf_change` event handler
cql/tablets: fix retrying ALTER tablets KEYSPACE
This collector reads nvme temperature sensor, which was observed to
cause bad performance on Azure cloud following the reading of the
sensor for ~6 seconds. During the event, we can see elevated system
time (up to 30%) and softirq time. CPU utilization is high, with
nvm_queue_rq taking several orders of magnitude more time than
normally. There are signs of contention, we can see
__pv_queued_spin_lock_slowpath in the perf profile, called. This
manifests as latency spikes and potentially also throughput drop due
to reduced CPU capacity.
By default, the monitoring stack queries it once every 60s.
(cherry picked from commit 93777fa907)
Closesscylladb/scylladb#21305