Before 17f4a151ce the node was marked as
been replaced in join_group0 state, before it actually joins the group0,
so by the time it actually joins and starts transferring snapshot/log no
traffic is sent to it. The commit changed this to mark the node as
being replaced after the snapshot/log is already transferred so we can
get the traffic to the node while it sill did not caught up with a
leader and this may causes problems since the state is not complete.
Mark the node as being replaced earlier, but still add the new node to
the topology later as the commit above intended.
(cherry picked from commit c0939d86f9)
During replace with the same IP a node may get queries that were intended
for the node it was replacing since the new node declares itself UP
before it advertises that it is a replacement. But after the node
starts replacing procedure the old node is marked as "being replaced"
and queries no longer sent there. It is important to do so before the
new node start to get raft snapshot since the snapshot application is
not atomic and queries that run parallel with it may see partial state
and fail in weird ways. Queries that are sent before that will fail
because schema is empty, so they will not find any tables in the first
place. The is pre-existing and not addressed by this patch.
(cherry picked from commit 644e7a2012)
The test performs consecutive schema changes in RECOVERY mode. The
second change relies on the first. However the driver might route the
changes to different servers and we don't have group 0 to guarantee
linearizability. We must rely on the first change coordinator to push
the schema mutations to other servers before returning, but that only
happens when it sees other servers as alive when doing the schema
change. It wasn't guaranteed in the test. Fix this.
Fixesscylladb/scylladb#20791
Should be backported to all branches containing this test to reduce
flakiness.
(cherry picked from commit f390d4020a)
Closesscylladb/scylladb#20810
In the current scenario, We check if a node being removed is normal
on the node initiating the removenode request. However, we don't have a
similar check on the topology coordinator. The node being removed could be
normal when we initiate the request, but it doesn't have to be normal when
the topology coordinator starts handling the request.
For example, the topology coordinator could have removed this node while handling
another removenode request that was added to the request queue earlier.
This commit intends to fix this issue by adding more checks in the enqueuing phase
and return errors for duplicate requests for node removal.
This PR fixes a bug. Hence we need to backport it.
Fixes: scylladb/scylladb#20271
(cherry picked from commit b25b8dccbd)
Closesscylladb/scylladb#20801
The test configures write timeout to much smaller value to make the test
run faster since for some writes sleep is inserted to hit the timeout,
but it makes aarch64 debug flaky since timeout happens when it should
not because of a natural slowness.
(cherry picked from commit 71a5b1c6dd)
Closesscylladb/scylladb#20778
Currently the function calls boost::partial_sort with a middle
iterator that might be out of bound and cause undefined behavior.
Check the vector size, and do a partial sort only if its longer
than `max_sstables`, otherwise sort the whole vector.
Fixesscylladb/scylladb#20608
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 39ce358d82)
Closesscylladb/scylladb#20664
This is a manual backport of #20212 to 6.0, superseding #20346 (which had run into conflicts).
Please see the individual commit messages for backport notes.
Fixes#10305Closesscylladb/scylladb#20349
* github.com:scylladb/scylladb:
generic_server: make server::stop() idempotent
generic_server: coroutinize server::shutdown()
generic_server: make server::shutdown() idempotent
test/generic_server: add test case
configure, cmake: sort the lists of boost unit tests
generic_server: convert connection tracking to seastar::gate
Consider the following:
```
T
0 split prepare starts
1 repair starts
2 split prepare finishes
3 repair adds unsplit sstables
4 repair ends
5 split executes
```
If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path.
The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set.
Fixes https://github.com/scylladb/scylladb/issues/19378.
Fixes https://github.com/scylladb/scylladb/issues/19416.
Please replace this line with justification for the backport/* labels added to this PR
(cherry picked from commit 239344ab55)
(cherry picked from commit 74612ad358)
Refs https://github.com/scylladb/scylladb/pull/19427Closesscylladb/scylladb#20593
* github.com:scylladb/scylladb:
tablets: Fix race between repair and split
compaction: Allow "offline" sstable to be split
To drop a semaphore it should not be held by anyone, so we need to
release out units before checking if a semaphore can be dropped.
Fixes: scylladb/scylladb#20602
(cherry picked from commit 9cc54932ae)
Closesscylladb/scylladb#20622
Currently, when attempting to send a hint, we might choose its recipients in one of two ways:
- If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.
There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded.
As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving.
Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again.
Fixesscylladb/scylladb#20558Fixesscylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835
This is a backport of the original PR without the tests, done avoid the need of resolving merge conflicts in that area.
Closesscylladb/scylladb#20559
* github.com:scylladb/scylladb:
hints: send hints with CL=ALL if target is leaving
hints: inline do_send_one_mutation
Consider the following:
T
0 split prepare starts
1 repair starts
2 split prepare finishes
3 repair adds unsplit sstables
4 repair ends
5 split executes
If repair produces sstable after split prepare phase, the replica
will not split that sstable later, as prepare phase is considered
completed already. That causes split execution to fail as replicas
weren't really prepared. This also can be triggered with
load-and-stream which shares the same write (consumer) path.
The approach to fix this is the same employed to prevent a race
between split and migration. If migration happens during prepare
phase, it can happen source misses the split request, but the
tablet will still be split on the destination (if needed).
Similarly, the repair writer becomes responsible for splitting
the data if underlying table is in split mode. That's implemented
in replica::table for correctness, so if node crashes, the new
sstable missing split is still split before added to the set.
Fixes#19378.
Fixes#19416.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 74612ad358)
Fixes: scylladb/scylladb#18902
This PR is intended to make debugging easier, hence backporting it to
previous versions shall be useful while debugging issues there
(cherry picked from commit a616f10)
For fixing the backport, parentheses () were added after variable captures
in lambdas, absence of which wasn't supported in earlier versions of C++.
Closesscylladb/scylladb#20564
ScyllaDB doesn't support custom compressors. The available compressors
are the only available ones, not the default ones.
Adjust the text to reflect this.
(cherry picked from commit 08f109724b)
Closesscylladb/scylladb#20525
Because of https://github.com/scylladb/scylladb/issues/9285 hit weighted
load balancer may sometimes return same node twice. It may cause wrong
data to be read or unexpected errors to be returned to a client. Since
the original bug is not easy to fix and it is rare lets introduce a
workaround. We will check for duplicates and will use non HWLB one if
one is found.
(cherry picked from commit 807e37502a)
Closesscylladb/scylladb#20470
This commit adds a page listing the ScyllDB limits
we know today.
The page can and should be extended when other limits
are confirmed.
Closesscylladb/scylladb#19399
(cherry picked from commit 072542a5cc)
Currently, when attempting to send a hint, we might choose its
recipients in one of two ways:
- If the original destination is a natural endpoint of the hint, we only
send the hint to that node and none other,
- Otherwise, we send the hint to all current replicas of the mutation.
There is a problem when we decommission a node: while data is streamed
away from that node, it is still considered to be a natural endpoint of
the data that it used to own. Because of that, it might happen that a
hint is sent directly to it but streaming will miss it, effectively
resulting in the hint being discarded.
As sending the hint _only_ to the leaving replica is a rather bad idea,
send the hint to all replicas also in the case when the original
destiantion of the hint is leaving.
Note that this is a conservative fix written only with the decommission
+ vnode-based keyspaces combo in mind. In general, such "data loss" can
occur in other situations where the replica set is changing and we go
through a streaming phase, i.e. other topology operations in case of
vnodes and tablet load balancing. However, the consistency guarantees of
hinted handoff in the face of topology changes are not defined and it is
not clear what they should be, if there should be any at all. The
picture is further complicated by the fact that hints are used by
materialized views, and sending view updates to more replicas than
necessary can introduce inconsistencies in the form of "ghost rows".
This fix was developed in response to a failing test which checked the
hint replay + decommission scenario, and it makes it work again.
Fixesscylladb/scylla-dtest#4582
Refs scylladb/scylladb#19835
(cherry picked from commit 61ac0a336d)
It's a small method and it is only used once in send_one_mutation.
Inlining it lets us get rid of its declaration in the header - now, if
one needs to change the variables passed from one function to another,
it is no longer necessary to change the header.
(cherry picked from commit 8abb06ab82)
Bind variables in CQL have two formats: positional (?) where a variable is referred to by its relative position in the statement, and named (:var), where the user is expected to supply a name->value mapping.
In 19a6e69001 we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to.
However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection.
Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the dialect and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables.
A unit test is added.
Fixes https://github.com/scylladb/scylladb/issues/15559
This may be useful to users transitioning from Cassandra, so merits a backport.
(cherry picked from commit f9322799af)
(cherry picked from commit d69bf4f010)
(cherry picked from commit ea8441dfa3)
Refs https://github.com/scylladb/scylladb/pull/19493
Subsumes #20389Closesscylladb/scylladb#20551
* github.com:scylladb/scylladb:
cql3: add option to not unify bind variables with the same name
cql3: introduce dialect infrastructure
cql3: prepared_statement_cache: drop cache key default constructor
test: cql-pytest: config_value_context: remove strange ast.literal_eval call
Merge 'config: round-trip boolean configuration variables' from Avi Kivity
Bind variables in CQL have two formats: positional (`?`) where a
variable is referred to by its relative position in the statement,
and named (`:var`), where the user is expected to supply a
name->value mapping.
In 19a6e69001 we identified the case where a named bind variable
appears twice in a query, and collapsed it to a single entry in the
statement metadata. Without this, a driver using the named variable
syntax cannot disambiguate which variable is referred to.
However, it turns out that users can use the positional call form
even with the named variable syntax, by using the positional
API of the driver. To support this use case, we add a configuration
variable to disable the same-variable detection.
Because the detection has to happen when the entire statement is
visible, we have to supply the configuration to the parser. We
call it the `dialect` and pass it from all callers. The alternative
would be to add a pre-prepare call similar to fill_prepare_context that
rewrites all expressions in a statement to deduplicate variables.
A unit test is added.
Fixes#15559
(cherry picked from commit ea8441dfa3)
(cherry picked from commit edb3068ecf)
A dialect is a different way to interpret the same CQL statement.
Examples:
- how duplicate bind variable names are handled (later in this series)
- whether `column = NULL` in LWT can return true (as is now) or
whether it always returns NULL (as in SQL)
Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.
The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
(cherry picked from commit d69bf4f010)
cql-pytest's config_value_context is used to run a code sequence with
different ScyllaDB configuration applied for a while. When it reads
the original value (in order to restore it later), it applies
ast.literal_eval() to it. This is strange, since the config variable isn't
a Python literal.
It was added in 8c464b2ddb ("guardrails: restrict replication
strategy (RS)"). Presumably, as a workaround for #19604 - it sufficiently
massaged the input we read via SELECT to be acceptable later via UPDATE.
Now that #19604 is fixed, we can remove the call to ast.literal_eval,
but have to fix up the parameters to config_value_context to something
that will be accepted without further massaging.
This is a step towards fixing #15559, where we want to run some tests
with a boolean configuration variable changed, and literal_eval is
transforming the string representation of integers to integers and
confusing the driver.
Closesscylladb/scylladb#19696
(cherry picked from commit d5af86bd8a)
When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted
on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in
both directions.
Not a regression, so a backport isn't strictly necessary.
Closesscylladb/scylladb#19792
* github.com:scylladb/scylladb:
config: specialize from-string conversion for bool
config: wrap boost::lexical_cast<> when converting from strings
(cherry picked from commit 9eb47b3ef0)
The test cases in this file use an error injection to reduce raft group
0 timeouts (from the default 1 minute), in order to speed up the tests;
the scenarios expect these timeouts to happen, so we want them to happen
as quick as possible, but we don't want to reduce timeouts so much that
it will make other operations fail when we don't expect them to (e.g.
when the test wants to add a node to the cluster).
Unfortunately the selected 5 seconds in debug mode was not enough and
made the tests flaky: scylladb/scylladb#20111.
Increase it to 10 seconds. This unfortunately will slow down these tests
as they have to sometimes wait for 10 seconds for the timeout to happen.
But better to have this than a flaky test.
Fixes: scylladb/scylladb#20111
(cherry picked from commit 52fdf5b4c9)
Closesscylladb/scylladb#20478
for following reasons:
1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing.
2. the ppa in question does not provide the packages used in production. it does provides the package for *building* scylla
3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages.
so, in this change, we remove all references to enabling the Scylla/PPA repository.
Fixesscylladb/scylladb#20449
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit fe0e961856)
Closesscylladb/scylladb#20454
The Alternator TTL scanning code uses an object "scan_ranges_context"
to hold the scanning context. One of the members of this object is
a service::query_state, and that in turn holds a reference to a
service::client_state. The existing constructor created a temporary
client_state object and saved a reference to it - which can result
in use after free as the temporary object is freed as soon as the
constructor ends.
The fix is to save a client_state in the scan_ranges_context object,
instead of a temporary object.
Fixes#19988
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 15f8046fcb)
Closesscylladb/scylladb#20437
in 372a4d1b79, we introduced a change
which was for debugging the logging message. but the logging message
intended for printing the temp_dir not prints an `optional<int>`. this
is both confusing, and more importantly, it hurts the debuggability.
in this change, the related change is reverted.
Fixesscylladb/scylladb#20408
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit d26bb9ae30)
Closesscylladb/scylladb#20435
Even after 13caac7, we still have more files incorrect permission, since
we use "cp -r" and creating new file with redirect.
To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on
newly created files.
Fixes#14383
Related #19775
(cherry picked from commit 9d7fed40b5)
Closesscylladb/scylladb#20433
before this change, if user does not have `/bin/sh` around, when
installing scylla packages, the script in `%pretrans" is executed,
and fails due to missing `/bin/sh`. per
https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans
> Note that the %pretrans scriptlet will, in the particular case of
> system installation, run before anything at all has been installed.
> This implies that it cannot have any dependencies at all. For this
> reason, %pretrans is best avoided, but if used it MUST (by necessity)
> be written in Lua. See
> https://rpm-software-management.github.io/rpm/manual/lua.html for more
> information.
but we were trying to warn users upgrading from scylla < 1.7.3, which
was released 7 years ago at the time of writing.
in this change, we drop the `%pretrans` section. hopefuly they will
find their way out if they still exist.
Fixesscylladb/scylladb#20321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 6970c502c9)
Closesscylladb/scylladb#20386
If there's a token metadata for a given table, and it is in split mode,
it will be registered such that split monitor can look at it, for
example, to start split work, or do nothing if table completed it.
during topology change, e.g. drain, split is stalled since it cannot
take over the state machine.
It was noticed that the log is being spammed with a message saying the
table completed split work, since every tablet metadata update, means
waking up the monitor on behalf of a table. So it makes sense to
demote the logging level to debug. That persists until drain completes
and split can finally complete.
Another thing that was noticed is that during drain, a table can be
submitted for processing faster than the monitor can handle, so the
candidate queue may end up with multiple duplicated entries for same
table, which means unnecessary work. That is fixed by using a
sequenced set, which keeps the current FIFO behavior.
Fixes#20339.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit 26facd807e)
Closesscylladb/scylladb#20344
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.
Throw if batchlog manager isn't initialized.
Fixes: https://github.com/scylladb/scylladb/issues/20236.
Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access.
(cherry picked from commit d8e4393418)
(cherry picked from commit f38bb6483a)
Refs https://github.com/scylladb/scylladb/pull/20251Closesscylladb/scylladb#20392
* github.com:scylladb/scylladb:
test: add test to ensure repair won't fail with uninitialized bm
repair: throw if batchlog manager isn't initialized
Currently if a coordinator and a node being replaced are in the same DC
while inter-dc encryption is enabled (connections between nodes in the
same DC should not be encrypted) the replace operation will fail. It
fails because a coordinator uses non encrypted connection to push raft
data to the new node, but the new node will not accept such connection
until it knows which DC the coordinator belongs to and for that the raft
data needs to be transferred.
The series adds the test for this scenario and the fix for the
chicken&egg problem above.
The series (or at least the fix itself) needs to be backported because
this is a serious regression.
Fixes: https://github.com/scylladb/scylladb/issues/19025
(cherry picked from commit 84757a4ed3)
(cherry picked from commit b98282a976)
(cherry picked from commit 2f1b1fd45e)
(cherry picked from commit 17f4a151ce)
(cherry picked from commit 32a59ba98f)
Refs https://github.com/scylladb/scylladb/pull/20290Closesscylladb/scylladb#20399
* github.com:scylladb/scylladb:
topology coordinator: fix indentation after the last patch
topology coordinator: do not add replacing node without a ring to topology
test: add test for replace in clusters with encryption enabled
test.py: add server encryption support to cluster manager
.gitignore: fix pattern for resources to match only one specific directory
When only inter dc encryption is enabled a non encrypted connection
between two nodes is allowed only if both nodes are in the same dc.
If a nodes that initiates the connection knows that dst is in the same
dc and hence use non encrypted connection, but the dst not yet knows the
topology of the src such connection will not be allowed since dst cannot
guaranty that dst is in the same dc.
Currently, when topology coordinator is used, a replacing node will
appear in the coordinator's topology immediately after it is added to the
group0. The coordinator will try to send raft message to the new node
and (assuming only inter dc encryption is enabled and replacing node and
the coordinator are in the same dc) it will try to open regular, non encrypted,
connection to it. But the replacing node will not have the coordinator
in it's topology yet (it needs to sync the raft state for that). so it
will reject such connection.
To solve the problem the patch does not add a replacing node that was
just added to group0 to the topology. It will be added later, when
tokens will be assigned to it. At this point a replacing node will
already make sure that its topology state is up-to-date (since it will
execute a raft barrier in join_node_response_params handler) and it knows
coordinator's topology. This aligns replace behaviour with bootstrap
since bootstrap also does not add a node without a ring to the topology.
The patch effectively reverts b8ee8911caFixes: scylladb/scylladb#19025
(cherry picked from commit 17f4a151ce)
repair_service::repair_flush_hints_batchlog_handler may access batchlog
manager while it is uninitialized.
Batchlog manager cannot be initialized before repair as we have the
dependencies chain:
repair_service -> storage_service::join_cluster -> batchlog_manager.
Throw if batchlog manager isn't initialized. That won't cause repair
to fail.
(cherry picked from commit d8e4393418)
Check whether we can stop a generic server without first asking it to
listen.
The test fails currently; the failure mode is a hang, which triggers the 5
minute timeout set in the test:
> unknown location(0): fatal error: in "stop_without_listening":
> seastar::timed_out_error: timedout
> seastar/src/testing/seastar_test.cc(43): last checkpoint
> test/boost/generic_server_test.cc(34): Leaving test case
> "stop_without_listening"; testing time: 300097447us
Backport notes for 6.0:
- Replace
#include "utils/assert.hh"
SCYLLA_ASSERT(false);
with
#include <cassert>
assert(false);
due to 6.0 lacking commit aa1270a00c ("treewide: change assert() to
SCYLLA_ASSERT()", 2024-08-05). The header file "utils/assert.hh"
wouldn't be difficult to backport, but separating it from the treewide
changes in commit aa1270a00c might not be the best idea.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit dbc0ca6354)
Both lists were obviously meant to be sorted originally, but by today
we've introduced many instances of disorder -- thus, inserting a new test
in the proper place leaves the developer scratching their head. Sort both
lists.
Backport notes for 6.0:
- Conflicts in "configure.py" and "test/boost/CMakeLists.txt",
unsurprisingly. For the backport, I sorted the boost unit test list in
each file manually, from scratch.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit 931f2f8d73)
If we call server::stop() right after "server" construction, it hangs:
With the server never listening (never accepting connections and never
serving connections), nothing ever calls server::maybe_stop().
Consequently,
co_await _all_connections_stopped.get_future();
at the end of server::stop() deadlocks.
Such a server::stop() call does occur in controller::do_start_server()
[transport/controller.cc], when
- cserver->start() (sharded<cql_server>::start()) constructs a
"server"-derived object,
- start_listening_on_tcp_sockets() throws an exception before reaching
listen_on_all_shards() (for example because it fails to set up client
encryption -- certificate file is inaccessible etc.),
- the "deferred_action"
cserver->stop().get();
is invoked during cleanup.
(The cserver->stop() call exposing the connection tracking problem dates
back to commit ae4d5a60ca ("transport::controller: Shut down distributed
object on startup exception", 2020-11-25), and it's been triggerable
through the above code path since commit 6b178f9a4a
("transport/controller: split configuring sockets into separate
functions", 2024-02-05).)
Tracking live connections and connection acceptances seems like a good fit
for "seastar::gate", so rewrite the tracking with that. "seastar::gate"
can be closed (and the returned future can be waited for) without anyone
ever having entered the gate.
NOTE: this change makes it quite clear that neither server::stop() nor
server::shutdown() must be called multiple times. The permitted sequences
are:
- server::shutdown() + server::stop()
- or just server::stop().
Fixes#10305
Backport notes for 6.0:
- Conflict in "generic_server.hh", due to 6.0 not having commit
324b3c43c0 ("generic_server: use async function in
`for_each_gently()`", 2024-08-08), which is part of the feature series
"service levels: update connections parameters automatically"
<https://github.com/scylladb/scylladb/pull/19085>.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
(cherry picked from commit 5a04743663)
This series fixes an issue where histogram Summaries return an infinite value.
It updated the quantile calculation logic to address cases where values fall into the infinite bucket of a histogram.
Now, instead of returning infinite (max int), the calculation will return the last bucket limit, ensuring finite outputs in all cases.
The series adds a test for summaries with a specific test case for this scenario.
Fixes#20255
Need backport to 6.0, 6.1 and 2023.1 and above
(cherry picked from commit 011aa91a8c)
(cherry picked from commit 644e6f0121)
Refs #20257Closesscylladb/scylladb#20304
* github.com:scylladb/scylladb:
test/estimated_histogram_test Add summary tests
utils/histogram.hh: Make summary support inifinite bucket.
It is unsafe to restrict the sync nodes for repair to the source data center if it has too low replication factor in network_topology_replication_strategy, or if other nodes in that DC are ignored.
Also, this change restricts the usage of source_dc to `network_topology` and `everywhere_topology`
strategies, as with simple replication strategy
there is no guarantee that there would be any
more replicas in that data center.
Fixes#16826
Reproducer submitted as https://github.com/scylladb/scylla-dtest/pull/3865
It fails without this fix and passes with it.
* Requires backport to live versions. Issue hit in the filed with 2022.2.14
(cherry picked from commit 8b1877f3ca)
(cherry picked from commit 0419b1d522)
(cherry picked from commit b5d0ab092c)
(cherry picked from commit 9729dd21c3)
(cherry picked from commit 8665eef98c)
(cherry picked from commit 5f655e41e3)
Refs #16827Closesscylladb/scylladb#20229
* github.com:scylladb/scylladb:
raft_rebuild: propagate source_dc force option to rebuild_option
repair: do_rebuild_replace_with_repair: use source_dc only when safe
repair: replace_with_repair: pass the replace_node downstream
repair: replace_with_repair: pass ignore_nodes as a set of host_id:s
repair: replace_rebuild_with_repair: pass ks_erms from caller
nodetool: rebuild: add force option
Add and use utils::optional_param to pass source_dc