Commit Graph

44583 Commits

Author SHA1 Message Date
Benny Halevy
eebf97c545 view: check_needs_view_update_path: get token_metadata_ptr
check_needs_view_update_path is async and might yield
so the token_metadata reference passed to it must be kept
alive throughout the call.

Fixes scylladb/scylladb#20979

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit eaa3b774a6)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#21038
2024-10-21 10:27:52 +02:00
Artsiom Mishuta
a728695d10 test.py: deselect remove_data_dir_of_dead_node event
deselect remove_data_dir_of_dead_node event from test_random_failures
due to ussue #20751

(cherry picked from commit 9b0e15678e)

Closes scylladb/scylladb#21138
2024-10-17 11:38:35 +02:00
Piotr Smaron
82a34aa837 test: fix flaky test_multidc_alter_tablets_rf
The testcase is flaky due to a known python driver issue:
https://github.com/scylladb/python-driver/issues/317.
This issue causes the `CREATE KEYSPACE` statement to be sometimes
executed twice in a row, and the 2nd CREATE statement causes the test to
fail.
In order to work around it, it's enough to add `if not exists` when
creating a ks.

Fixes: #21034

Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch.

(cherry picked from commit f8475915fb)

Closes scylladb/scylladb#21107
2024-10-15 09:26:28 +03:00
Piotr Dulikowski
d10c6a86cc SCYLLA-VERSION-GEN: correct the logic for skipping SCYLLA-*-FILE
The SCYLLA-VERSION-GEN file skips updating the SCYLLA-*-FILE files if
the commit hash from SCYLLA-RELEASE-FILE is the same. The original
reason for this was to prevent the date in the version string from
changing if multiple modes are built across midnight
(scylladb/scylla-pkg#826). However - intentionally or not - it serves
another purpose: it prevents an infinite loop in the build process.

If the build.ninja file needs to be rebuilt, the configure.py script
unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one
of the SCYLLA-*-FILE files is updated then this triggers rebuild
of build.ninja. Apparently, this is sufficient for ninja to enter an
infinite loop.

However, the check assumes that the RELEASE is in the format

  <build identifier>.<date>.<commit hash>

and assumes that none of the components have a dot inside - otherwise it
breaks and just works incorrectly. Specifically, when building a private
version, it is recommended to set the build identifier to
`count.yourname`.

Previously, before 85219e9, this problem wasn't noticed most likely
because reconfigure process was broken and stopped overwriting
the build.ninja file after the first iteration.

Fix the problem by fixing the logic that extracts the commit hash -
instead of looking at the third dot-separated field counting from the
left side, look at the last field.

Fixes: scylladb/scylladb#21027
(cherry picked from commit 64ca58125e)

Closes scylladb/scylladb#21103
2024-10-15 09:26:00 +03:00
Botond Dénes
554838691b Merge '[Backport 6.2] compaction: fix potential data resurrection with file-based migration' from Ferenc Szili
This is a manual backport of #20788

When tablets are migrated with file-based streaming, we can have a situation where a tombstone is garbage collected before the data it shadows lands. For instance, if we have a tablet replica with 3 sstables:

1. sstable containing an expired tombstone
2. sstable with additional data
3. sstable containing data which is shadowed by the expired tombstone in sstable 1

If this tablet is migrated, and the sstables are streamed in the order listed above, the first two sstables can be compacted before the third sstable arrives. In that case, the expired tombstone will be garbage collected, and data in the third sstable will be resurrected after it arrives to the pending replica.

This change fixes this problem by disabling tombstone garbage collection for pending replicas.

This fixes a problem in Enterprise, but the change is in OSS in order to have as few differences between OSS and Enterprise and to have a common infrastructure for disabling tombstone GC on pending replicas.

Fixes #21090

Closes scylladb/scylladb#21061

* github.com:scylladb/scylladb:
  test: test tombstone GC disabled on pending replica
  tablet_storage_group_manager: update tombstone_gc_enabled in compaction group
  database::table: add tombstone_gc_enabled(locator::tablet_id)
2024-10-15 09:25:22 +03:00
Kefu Chai
b691dddf6b install.sh: install seastar/scripts/addr2line.py as well
seastar extracted `addr2line` python module out back in
e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was
not updated accordingly. it still installs `seastar-addr2line`
without installing its new dependency. this leaves us with a
broken `seastar-addr2line` in the relocatable tarball.
```console
$ /opt/scylladb/scripts/seastar-addr2line
Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module>
    from addr2line import BacktraceResolver
ModuleNotFoundError: No module named 'addr2line'
```

in this change, we redistribute `addr2line.py` as well. this
should address the issue above.

Fixes scylladb/scylladb#21077

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit da433aad9d)

Closes scylladb/scylladb#21085
2024-10-14 09:52:21 +03:00
Botond Dénes
85b1c64a33 Merge '[Backport 6.2] storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from ScyllaDB
During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking:

- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in  filter_for_query(): the map is considered incorrect if the list  of replicas contains a node from a data center whose replication factor is 0.

 Please note: This PR does not fix the issue found in scylladb/scylladb#20282;   it only adds condition checks to prevent undefined behavior in cases of  inconsistent inputs.

Refs scylladb/scylladb#20625

As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2.

(cherry picked from commit 132358dc92)

(cherry picked from commit ae23d42889)

(cherry picked from commit ad93cf5753)

(cherry picked from commit 8db6d6bd57)

(cherry picked from commit c373edab2d)

Refs #20851

Closes scylladb/scylladb#21067

* github.com:scylladb/scylladb:
  Add conditions checking for get_read_executor
  Avoid an extra call to block_for in db::filter_for_query.
  Improve code readability in consistency_level.cc and storage_proxy.cc
  tools: Add build_info header with functions providing build type information
  tests: Add tests for alter table with RF=1 to RF=0
2024-10-14 09:51:50 +03:00
Benny Halevy
6e67a993ba storage_service: rebuild: warn about tablets-enabled keyspaces
Until we automatically support rebuild for tablets-enabled
keyspaces, warn the user about them.

The reason this is not an error, is that after
increasing RF in a new datacenter, the current procedure
is to run `nodetool rebuild` on all nodes in that dc
to rebuild the new vnode replicas.
This is not required for tablets, since the additional
replicas are rebuilt automatically as part of ALTER KS.

However, `nodetool rebuild` is also run after local
data loss (e.g. due to corruption and removal of sstables).
In this case, rebuild is not supported for tablets-enabled
keyspaces, as tablet replicas that had lost data may have
already been migrated to other nodes, and rebuilding the
requested node will not know about it.
It is advised to repair all nodes in the datacenter instead.

Refs scylladb/scylladb#17575

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ed1e9a1543)

Closes scylladb/scylladb#20722
2024-10-14 09:47:35 +03:00
Michał Chojnowski
b8a9fd4e49 reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources
can_admit_read() returns reason::memory_resources when the permit is queued due
to lack of count resources, and it returns reason::count_resources when the
permit is queued due to lack of memory resources. It's supposed to be the other
way around.

This bug is causing the two counts to be swapped in the stat dumps printed to
the logs when semaphores time out.

(cherry picked from commit 6cf3747c5f)

Closes scylladb/scylladb#21030
scylla-6.2.0-candidate-20241013105935 scylla-6.2.0
2024-10-13 18:34:18 +03:00
Jenkins Promoter
363cf881d4 Update ScyllaDB version to: 6.2.0 2024-10-13 14:15:40 +03:00
Sergey Zolotukhin
68a55facdf Add conditions checking for get_read_executor
During the investigation of scylladb/scylladb#20282, it was discovered that
implementations of speculating read executors have undefined behavior
when called with an incorrect number of read replicas. This PR
introduces two levels of condition checking:

- Condition checking in speculating read executors for the number of replicas.
- Checking the consistency of the Effective Replication Map in
  get_endpoints_for_reading(): the map is considered incorrect the number of
  read replica nodes is higher than replication factor. The check is
  applied only when built in non release mode.

Please note: This PR does not fix the issue found in scylladb/scylladb#20282;
it only adds condition checks to prevent undefined behavior in cases of
inconsistent inputs.

Refs scylladb/scylladb#20625

(cherry picked from commit c373edab2d)
2024-10-11 18:20:43 +00:00
Sergey Zolotukhin
9010d0a22f Avoid an extra call to block_for in db::filter_for_query.
(cherry picked from commit 8db6d6bd57)
2024-10-11 18:20:43 +00:00
Sergey Zolotukhin
3c0f43b6eb Improve code readability in consistency_level.cc and storage_proxy.cc
Add const correctness and rename some variables to improve code readability.

(cherry picked from commit ad93cf5753)
2024-10-11 18:20:43 +00:00
Sergey Zolotukhin
a22e4476ac tools: Add build_info header with functions providing build type information
A new header provides `constexpr` functions to retrieve build
type information: `get_build_type()`, `is_release_build()`,
and `is_debug_build()`. These functions are useful when adding
changes that should be enabled at compile time only for
specific build types.

(cherry picked from commit ae23d42889)
2024-10-11 18:20:42 +00:00
Sergey Zolotukhin
14650257c0 tests: Add tests for alter table with RF=1 to RF=0
Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor
from 1 to 0 for one of two data centers. Tablet version fails due to issue described in
scylladb/scylladb#20625.

Test for scylladb/scylladb#20625

(cherry picked from commit 132358dc92)
2024-10-11 18:20:42 +00:00
Ferenc Szili
2a318817ba test: test tombstone GC disabled on pending replica
This tests if tombstone GC is disabled on pending replicas
2024-10-11 14:10:30 +02:00
Ferenc Szili
5f052a2b52 tablet_storage_group_manager: update tombstone_gc_enabled in compaction group
In order to avoid cases during tablet migrations where we garbage
collect tombstones before the data it shadows arrives, we will
disable tombstone GC on pending replicas.

To achieve this we added a tombston_gc_enabled flag to compaction_group.
This flag is updated from updte_effective_repliction_map method of the
tablet_storage_group_manager class.
2024-10-11 14:09:30 +02:00
David Garcia
e018b38a54 docs: Fix confgroup links
It was not possible to link to configuration parameters groups in docs/reference/configuration-parameters.rst if they contained a space.

(cherry picked from commit 2247bdbc8c)

Closes scylladb/scylladb#21037
2024-10-11 14:31:28 +03:00
Ferenc Szili
14ce5e14d0 database::table: add tombstone_gc_enabled(locator::tablet_id)
This change adds the flag tombstone_gc_enabled to compaction_group.
The value of this flag will be set in
tablet_storage_group_manager::update_effective_replication_map().
2024-10-11 13:29:30 +02:00
Piotr Smaron
d1a31460a0 cql/tablets: handle MVs in ALTER tablets KEYSPACE
ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized
views (MV), and only produced tablets mutations changing tables.
With this patch we're producing tablets mutations for both tables and
MVs, hence when e.g. we change the replication factor (RF) of a KS, both the
tables' RFs and MVs' RFs are updated along with tablets replicas.
The `test_tablet_rf_change` testcase has been extended to also verify
that MVs' tablets replicas are updated when RF changes.

Fixes: #20240
(cherry picked from commit e0c1a51642)

Closes scylladb/scylladb#21022
2024-10-11 14:14:09 +03:00
Botond Dénes
9175cc528b Merge '[Backport 6.2] cql: improve validating RF's change in ALTER tablets KS' from ScyllaDB
This patch series fixes a couple of bugs around validating if RF is not changed by too much when performing ALTER tablets KS.
RF cannot change by more than 1 in total, because tablets load balancer cannot handle more work at once.

Fixes: #20039

Should be backported to 6.0 & 6.1 (wherever tablets feature is present), as this bug may break the cluster.

(cherry picked from commit 042825247f)

(cherry picked from commit adf453af3f)

(cherry picked from commit 9c5950533f)

(cherry picked from commit 47acdc1f98)

(cherry picked from commit 93d61d7031)

(cherry picked from commit 6676e47371)

(cherry picked from commit 2aabe7f09c)

(cherry picked from commit ee56bbfe61)

Refs #20208

Closes scylladb/scylladb#21009

* github.com:scylladb/scylladb:
  cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS
  cql: join new and old KS options in ALTER tablets KS
  cql: fix validation of ALTERing RFs in tablets KS
  cql: harden `alter_keyspace_statement.cc::validate_rf_difference`
  cql: validate RF change for new DCs in ALTER tablets KS
  cql: extend test_alter_tablet_keyspace_rf
  cql: refactor test_tablets::test_alter_tablet_keyspace
  cql: remove unused helper function from test_tablets
2024-10-11 14:13:43 +03:00
Botond Dénes
18be4f454e Merge '[Backport 6.2] Node replace and remove operations: Add deprecate IP addresses usage warning.' from ScyllaDB
- As part of deprecation of IP address usage, warning messages were added when IP addresses specified in the `ignore-dead-nodes` and `--ignore-dead-nodes-for-replace` options for scylla and nodetool.
- Slight optimizations for `utils::split_comma_separated_list`, ` host_id_or_endpoint lists` and `storage_service` remove node operations, replacing `std::list` usage with `std::vector`.

Fixes scylladb/scylladb#19218

Backport: 6.2 as it's not yet released.

(cherry picked from commit 3b9033423d)

(cherry picked from commit a871321ecf)

(cherry picked from commit 9c692438e9)

(cherry picked from commit 6398b7548c)

Refs #20756

Closes scylladb/scylladb#20958

* github.com:scylladb/scylladb:
  config: Add a warning about use of IP address for join topology and replace operations.
  nodetool: Add IP address usage warning for 'ignore-dead-nodes'.
  tests: Fix incorrect UUIDs in test_nodeops
  utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists
2024-10-11 14:12:51 +03:00
Botond Dénes
f35a083abe repair/row_level: remove reader timeout
This timeout was added to catch reader related deadlocks. We have not
seen such deadlocks for a long time, but we did see false-timeouts
caused by this, see explanation below. Since the cost now outweight the
benefit, remove the timeout altogether.

The false timeout happens during mixed-shard repair. The
`reader_permit::set_timeout()` call is called on the top-level permit
which repair has a handle on. In the case of the mixed-shard repair,
this belongs to the multishard reader. Calling set_timeout() on the
multishard reader has no effect on the actual shard readers, except in
one case: when the shard reader is created, it inherits the multishard
reader's current timeout. As the shard reader can be alive for a long
time, this timeout is not refreshed and ultimately causes a timeout and
fails the repair.

Refs: #18269
(cherry picked from commit 3ebb124eb2)

Closes scylladb/scylladb#20955
2024-10-11 14:11:03 +03:00
Anna Stuchlik
57affc7fad doc: document the option to run ScyllaDB in Docker on macOS
This commit adds a description of a workaround to create a multi-node ScyllaDB cluster
with Docker on macOS.

Refs https://github.com/scylladb/scylladb/issues/16806
See https://forum.scylladb.com/t/running-3-node-scylladb-in-docker/1057/4

(cherry picked from commit 7eb1dc2ae5)

Closes scylladb/scylladb#20931
2024-10-11 14:10:06 +03:00
Raphael S. Carvalho
927e526e2d replica: Fix schema change during migration cleanup
During migration cleanup, there's a small window in which the storage
group was stopped but not yet removed from the list. So concurrent
operations traversing the list could work with stopped groups.

During a test which emitted schema changes during migrations,
a failure happened when updating the compaction strategy of a table,
but since the group was stopped, the compaction manager was unable
to find the state for that group.

In order to fix it, we'll skip stopped groups when traversing the
list since they're unused at this stage of migration and going away
soon.

Fixes #20699.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
(cherry picked from commit cf58674029)

Closes scylladb/scylladb#20899
2024-10-11 14:07:42 +03:00
Calle Wilund
b224665575 database: Also forced new schema commitlog segment on user initiated memtable flush
Refs #20686
Refs #15607

In #15060 we added forced new commitlog segment on user initated flush,
mainly so that tests can verify tombstone gc and other compaction related
things, without having to wait for "organic" segment deletion.
Schema commitlog was not included, mainly because we did not have tests
featuring compaction checks of schema related tables, but also because
it was assumed to be lower general througput.
There is however no real reason to not include it, and it will make some
testing much quicker and more predictable.

(cherry picked from commit 60f8a9f39d)

Closes scylladb/scylladb#20705
2024-10-11 14:03:17 +03:00
Gleb Natapov
9afb1afefa storage_proxy: make sure there is no end iterator in _live_iterators array
storage_proxy::cancellable_write_handlers_list::update_live_iterators
assumes that iterators in _live_iterators can be dereferenced, but
the code does not make any attempt to make sure this is the case. The
iterator can be the end iterator which cannot be dereferenced.

The patch makes sure that there is no end iterator in _live_iterators.

Fixes scylladb/scylladb#20874

(cherry picked from commit da084d6441)

Closes scylladb/scylladb#21003
2024-10-10 17:09:27 +03:00
Kefu Chai
72153cac96 auth: capture boost::regex_error not std::regex_error
in a3db5401, we introduced the TLS certi authenticator, which is
configured using `auth_certificate_role_queries` option . the
value of this option contains a regular expression. so there are
chances the regular expression is malformatted. in that case,
when converting its value presenting the regular expression to an
instance of `boost::regex`, Boost.Regex throws a `boost::regex_error`
exception, not `std::regex_error`.

since we decided to use Boost.Regex, let's catch `boost::regex_error`.

Refs a3db5401
Fixes scylladb/scylladb#20941
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 439c52c7c5)

Closes scylladb/scylladb#20952
2024-10-09 21:58:40 +03:00
Michał Chojnowski
f988980260 utils/rjson.cc: correct a comment about assert()
Commit aa1270a00c changed most uses
of `assert` in the codebase to `SCYLLA_ASSERT`.

But the comment fixed in this patch is talking specifically about
`assert`, and shouldn't have been changed. It doesn't make sense
after the change.

(cherry picked from commit da7edc3a08)

Closes scylladb/scylladb#20976
2024-10-09 21:50:26 +03:00
Anna Stuchlik
1d11adf766 doc: remove outdated JMX references
This commit removes references to JMX from the docs.

Context:
The JMX server has been dropped and removed from installation. The user can
install it manually if needed, as documented with https://github.com/scylladb/scylladb/issues/18687.

This commit removes the outdated information about JMX from other pages
in the documentation, including the docs for nodetool, the list of ports,
and the admin section.

Also, the no longer relevant JMX information is removed from
the Docker Hub docs.

Fixes https://github.com/scylladb/scylladb/issues/18687
Fixes https://github.com/scylladb/scylladb/issues/19575

(cherry picked from commit 4e43d542cd)

Closes scylladb/scylladb#20988
2024-10-09 20:57:49 +03:00
Jenkins Promoter
dae1d18145 Update ScyllaDB version to: 6.2.0-rc3 2024-10-09 15:10:48 +03:00
Kamil Braun
e9588a8a53 Merge '[Backport 6.2] Wait for all users of group0 server to complete before destroying it' from ScyllaDB
Group0 server is often used in asynchronous context, but we do not wait
for them to complete before destroying the server. We already have
shutdown gate for it, so lets use it in those asynch functions.

Also make sure to signal group0 abort source if initialization fails.

Fixes scylladb/scylladb#20701

Backport to 6.2 since it contains af83c5e53e and it made the race easier to hit, so tests became flaky.

(cherry picked from commit ba22493a69)

(cherry picked from commit e642f0a86d)

Refs #20891

Closes scylladb/scylladb#21008

* github.com:scylladb/scylladb:
  group: hold group0 shutdown gate during async operations
  group0: Stop group0 if node initialization fails
2024-10-09 12:19:16 +02:00
Piotr Smaron
c73d0ffbaa cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS
Tablets load balancer is unable to process more than a single pending
replica, thus ALTER tablets KS cannot accept an ALTER statement which
would result in creating 2+ pending replicas, hence it has to validate
if the sum of absoulte differences of RFs specified in the statement is
not greter than 1.

(cherry picked from commit ee56bbfe61)
2024-10-08 18:06:52 +00:00
Piotr Smaron
c7b5571766 cql: join new and old KS options in ALTER tablets KS
A bug has been discovered while trying to ALTER tablets KS and
specifying only 1 out of 2 DCs - the not specified DC's RF has been
zeroed. This is because ALTER tablets KS updated the KS only with the
RF-per-DC mapping specified in the ALTER tablets KS statement, so if a
DC was ommitted, it was assigned a value of RF=0.
This commit fixes that plus additionally passes all the KS options, not
only the replication options, to the topology coordinator, where the KS
update is performed.
`initial_tablets` is a special case, which requires a special handling
in the source code, as we cannot simply update old initial_tablet's
settings with the new ones, because if only ` and TABLETS = {'enabled':
true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but
rather keep the old value - this is tested by the
`test_alter_preserves_tablets_if_initial_tablets_skipped` testcase.
Other than that, the above mentioned testcase started to fail with
these changes, and it appeared to be an issue with the test not waiting
until ALTER is completed, and thus reading the old value, hence the
test's body has been modified to wait for ALTER to complete before
performing validation.

(cherry picked from commit 2aabe7f09c)
2024-10-08 18:06:48 +00:00
Piotr Smaron
92325073a9 cql: fix validation of ALTERing RFs in tablets KS
The validation has been corrected with:
1. Checking if a DC specified in ALTER exists.
2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that
   needs their RFs to be validated.

(cherry picked from commit 6676e47371)
2024-10-08 18:06:47 +00:00
Piotr Smaron
f5c0969c06 cql: harden alter_keyspace_statement.cc::validate_rf_difference
This function assumed that strings passed as arguments will be of
integer types, but that wasn't the case, and we missed that because this
function didn't have any validation, so this change adds proper
validation and error logging.
Arguments passed to this function were forwarded from a call to
`ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also
`class:replication_strategy` pair. Second pair's member has been casted
into an `int` type and somehow the code was still running fine, but only
extra testing added later discovered a bug in here.

(cherry picked from commit 93d61d7031)
2024-10-08 18:06:46 +00:00
Gleb Natapov
90ced080a8 group: hold group0 shutdown gate during async operations
Wait for all outstanding async work that uses group0 to complete before
destroying group0 server.

Fixes scylladb/scylladb#20701

(cherry picked from commit e642f0a86d)
2024-10-08 18:06:45 +00:00
Piotr Smaron
7674d80c31 cql: validate RF change for new DCs in ALTER tablets KS
ALTER tablets KS validated if RF is not changed by more than 1 for DCs
that already had replicas, but not for DCs that didn't have them yet, so
specifying an RF jump from 0 to 2 was possible when listing a new DC in
ALTER tablets KS statement, which violated internal invariants of
tablets load balancer.
This PR fixes that bug and adds a multi-dc testcases to check if adding
replicas to a new DC and removing replicas from a DC is honoring the RF
change constraints.

Refs: #20039
(cherry picked from commit 47acdc1f98)
2024-10-08 18:06:45 +00:00
Gleb Natapov
06ceef34a7 group0: Stop group0 if node initialization fails
Commit af83c5e53e moved aborting of group0 into the storage service
drain function. But it is not called if node fails during initialization
(if it failed to join cluster for instance). So lets abort on both
paths (but only once).

(cherry picked from commit ba22493a69)
2024-10-08 18:06:44 +00:00
Piotr Smaron
ec83367b45 cql: extend test_alter_tablet_keyspace_rf
Added cases to also test decreasing RF and setting the same RF.
Also added extra explanatory comments.

(cherry picked from commit 9c5950533f)
2024-10-08 18:06:44 +00:00
Piotr Smaron
dfe2e20442 cql: refactor test_tablets::test_alter_tablet_keyspace
1. Renamed the testcase to emphasize that it only focuses on testing
   changing RF - there are other tests that test ALTER tablets KS
in general.
2. Fixed whitespaces according to PEP8

(cherry picked from commit adf453af3f)
2024-10-08 18:06:42 +00:00
Piotr Smaron
ad2191e84f cql: remove unused helper function from test_tablets
`change_default_rf` is not used anywhere, moreover it uses
`replication_factor` tag, which is forbidden in ALTER tablets KS
statement.

(cherry picked from commit 042825247f)
2024-10-08 18:06:41 +00:00
Sergey Zolotukhin
855abd7368 config: Add a warning about use of IP address for join topology and replace
operations.

When the '--ignore-dead-nodes-for-replace' config option contains
IP addresses, a warning will be logged, notifying the user that
using IP addresses with this option is deprecated and will no
longer be supported in the next release.

Fixes scylladb/scylladb#19218

(cherry picked from commit 6398b7548c)
2024-10-03 14:10:30 +00:00
Sergey Zolotukhin
086dc6d53c nodetool: Add IP address usage warning for 'ignore-dead-nodes'.
Since we are deprecating the use of IP addresses, a warning message will be printed
if 'nodetool removenode --ignore-dead-nodes' is used with IP addresses.

(cherry picked from commit 9c692438e9)
2024-10-03 14:10:29 +00:00
Sergey Zolotukhin
09b0b3f7d6 tests: Fix incorrect UUIDs in test_nodeops
It was found that the UUIDs used in test_nodeops were
invalid. This update replaces those UUIDs with newly generated
random UUIDs.

(cherry picked from commit a871321ecf)
2024-10-03 14:10:28 +00:00
Sergey Zolotukhin
3bbb7a24b1 utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists
- utils::split_comma_separated_list now accepts a reference to sstring instead
  of a copy to avoid extra memory allocations. Additionally, the results of
  trimming are moved to the resulting vector instead of being copied.
- service/storage_service removenode, raft_removenode, find_raft_nodes_from_hoeps,
  parse_node_list and api/storage_service::set_storage_service were changed to use
  std::vector<host_id_or_endpoint> instead of std::list<host_id_or_endpoint> as
  std::vector is a more cache-friendly structure,  resulting in better performance.

(cherry picked from commit 3b9033423d)
2024-10-03 14:10:27 +00:00
Pavel Emelyanov
b43454c658 cql: Check that CREATEing tablets/vnodes is consistent with the CLI
There are two bits that control whenter replication strategy for a
keyspace will use tablets or not -- the configuration option and CQL
parameter. This patch tunes its parsing to implement the logic shown
below:

    if (strategy.supports_tablets) {
         if (cql.with_tablets) {
             if (cfg.enable_tablets) {
                 return create_keyspace_with_tablets();
             } else {
                 throw "tablets are not enabled";
             }
         } else if (cql.with_tablets = off) {
              return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
              if (cfg.enable_tablets) {
                  return create_keyspace_with_tablets();
              } else {
                  return create_keyspace_without_tablets();
              }
         }
     } else { // strategy doesn't support tablets
         if (cql.with_tablets == on) {
             throw "invalid cql parameter";
         } else if (cql.with_tablets == off) {
             return create_keyspace_without_tablets();
         } else { // cql.with_tablets is not specified
             return create_keyspace_without_tablets();
         }
     }

closes: #20088

In order to enable tablets "by default" for NetworkTopologyStrategy
there's explicit check near ks_prop_defs::get_initial_tablets(), that's
not very nice. It needs more care to fix it, e.g. provide feature
service reference to abstract_replication_strategy constructor. But
since ks_prop_defs code already highjacks options specifically for that
strategy type (see prepare_options() helper), it's OK for now.

There's also #20768 misbehavior that's preserved in this patch, but
should be fixed eventually as well.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
(cherry picked from commit ebedc57300)

Closes scylladb/scylladb#20927
2024-10-03 17:09:49 +03:00
Jenkins Promoter
93700ff5d1 Update ScyllaDB version to: 6.2.0-rc2 scylla-6.2.0-rc2-candidate-20241002105948 scylla-6.2.0-rc2 2024-10-02 14:58:37 +03:00
Anna Stuchlik
5e2b4a0e80 doc: add metric updates from 6.1 to 6.2
This commit specifies metrics that are new in version 6.2 compared to 6.1,
as specified in https://github.com/scylladb/scylladb/issues/20176.

Fixes https://github.com/scylladb/scylladb/issues/20176

(cherry picked from commit a97db03448)

Closes scylladb/scylladb#20930
2024-10-02 12:07:06 +03:00
Calle Wilund
bb5dc0771c commitlog: Fix buffer_list_bytes not updated correctly
Fixes #20862

With the change in 60af2f3cb2 the bookkeep
for buffer memory was changed subtly, the problem here that we would
shrink buffer size before we after flush use said buffer's size to
decrement the buffer_list_bytes value, previously inc:ed by the full,
allocated size. I.e. we would slowly grow this value instead of adjusting
properly to actual used bytes.

Test included.

(cherry picked from commit ee5e71172f)

Closes scylladb/scylladb#20902
2024-10-01 17:41:02 +03:00