Commit Graph

47627 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
0cdcf82cd0 Merge 'topology coordinator: do not proceed further on invalid boostrap tokens' from Piotr Dulikowski
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.

Fix this by adding the missing break.

Fixes: scylladb/scylladb#23897

From the code inspection alone it looks like 2025.1 and 6.2 have this problem, so marking for backport to both of them.

Closes scylladb/scylladb#23914

* https://github.com/scylladb/scylladb:
  test: cluster: add test_bad_initial_token
  topology coordinator: do not proceed further on invalid boostrap tokens
  cdc: add sanity check for generating an empty generation
2025-04-28 12:45:33 +02:00
Botond Dénes
d582c436e5 Merge 'tasks: check whether a node is alive before rpc' from Aleksandra Martyniuk
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.

Fixes: https://github.com/scylladb/scylladb/issues/22514.

Needs backport to 2025.1 and 6.2 as they contain the bug.

Closes scylladb/scylladb#23787

* github.com:scylladb/scylladb:
  test: add test for getting tasks children
  tasks: check whether a node is alive before rpc
2025-04-28 09:32:45 +03:00
Nadav Har'El
262530f27c Merge 'mv: make base_info in view schemas immutable' from Wojciech Mitros
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as we'll as potentially causing
errors when we forget to set it at some point.

Instead, we want to make the base info an immutable member of view
schemas (inside view_info). To achieve this, in this series we remove
all base_info members that can change due to a base schema update,
and we calculate the remaining values during view update generation,
using the most up-to-date base schema version.

To calculate the values that depend on the base schema version, we
need to iterate over the view primary key and find the corresponding
columns, which adds extra overhead for each batch of view updates.
However, this overhead should be relatively small, as when creating
a view update, we need to prepare each of its columns anyway. And
if we need to read the old value of the base row, the relative
overhead is even lower.

After this change, the base info in view schemas stays the same
for all base schema updates, so we'll no longer get issues with
base_info being incompatible with a base schema version. Additionally,
it's a step towards making the schema objects immutable, which
we sometimes incorrectly assumed in the past (they're still not
completely immutable yet, as some other fields in view_info other
than base_info are initialized lazily and may depend on the base
schema version).

Fixes https://github.com/scylladb/scylladb/issues/9059
Fixes https://github.com/scylladb/scylladb/issues/21292
Fixes https://github.com/scylladb/scylladb/issues/22194
Fixes https://github.com/scylladb/scylladb/issues/22410

Closes scylladb/scylladb#23337

* github.com:scylladb/scylladb:
  test: remove flakiness from test_schema_is_recovered_after_dying
  mv: add a test for dropping an index while it's building
  base_info: remove the lw_shared_ptr variant
  view_info: don't re-set base_info after construction
  base_info: remove base_info snapshot semantics
  base_info: remove base schema from the base_info
  schema_registry: store base info instead of base schema for view entries
  base_info: make members non-const
  view_info: move the base info to a separate header
  view_info: move computation of view pk columns not in base pk to view_updates
  view_info: move base-dependent variables into base_info
  view_info: set base info on construction
2025-04-27 19:12:12 +03:00
David Garcia
cf7d846b9e docs: update dependencies
This is a mandatory dependency update to resolve a critical Dependabot alert. For more details, see the [Dependabot alerts](https://docs.github.com/en/code-security/dependabot/dependabot-alerts/viewing-and-updating-dependabot-alerts).

Closes scylladb/scylladb#23918

Fixes #23935
2025-04-27 18:45:11 +03:00
Piotr Szymaniak
e588c8667f alternator: Limit attribute name lengths
Attribute names are now checked against DynamoDB-compatible length
limits. When exceeded, Alternator emits exception identical or similar
to the DDB one. It might be worth noting that DDB emits more than a
single kind of an exception string for some exceptions. The tests'
catch clauses handle all the observed kinds of messages from DynamoDB.
The validation differentiates between key and non-key attributes and
applies the limit accordingly.

AWS DDB raises exceptions with somewhat different contents when the
get request contains ProjectionExpression, so this case needed separate
treatment to emit the corresponding exception string. The
length-validating function was declared and defined in
expressions.hh/.cc respectively, because that's where the relevant
parsing happens.

** Tests

The following tests were validated when handling this issue:
test_limit_attribute_length_nonkey_good,
test_limit_attribute_length_nonkey_bad,
test_limit_attribute_length_key_good,
test_limit_attribute_length_key_bad,
test_limit_attribute_length_gsi_lsi_good,
test_limit_attribute_length_gsi_lsi_bad,
test_limit_attribute_length_gsi_lsi_projection_bad.

Some of the tests were expanded into being more granular. Namely, there
is a new test function
`test_limit_attribute_length_key_bad_incoherent_names`
which groups tests with too long attribute names in the case of
incorrect (incoherent) user requests.
Similarily, there is a new test function
`test_limit_attribute_length_gsi_lsi_bad_incoherent_names`
All the tests cover now each combination of the key/keys being too long.
Both the new fuctions contain tests that verify that ScyllaDB throws
length-related exceptions (instead of the coherency-related), similar
to what DynamoDB does.

The new test test_limit_gsiu_key_len_bad covers the case of too long
attribute name inside GlobalSecondaryIndexUpdates.
The new test test_limit_gsiu_key_len_bad_incoherent_names covers the
case of incorrect (incoherent) user requests containing too long
attribute names and GlobalSecondaryIndexUpdates.

test_limit_attribute_length_key_bad was found to have contaned an
illegal KeySchema structure.

Some of the tests were corrected their match clause.

All the tests are stripped of the xfail flag except
test_limit_attribute_length_key_bad, which has it changed since it
still fails due to Projection in GSI and LIS not implemented in Alternator.
The xfail now points to #5036.

Fixes scylladb/scylladb#9169

Closes scylladb/scylladb#23097
2025-04-27 18:39:20 +03:00
Piotr Dulikowski
82e1678fbe test: mv: skip test_mv_tablets_empty_ip in debug mode
This test shuts down a node and then replaces it with another one while
continuously writing to the cluster. The test has been observed to take
a lot of time in debug mode and time out on the replace operation.
Replace takes very long because rebuilding tablets on the new node is
very slow, and the slowest part is memtable flush which happens at the
beginning of streaming. The slowness seems to be specific to the debug
mode.

Turn off the test in debug mode to deflake the CI. As a follow-up, the
test is planned to be reworked into an quicker error injection test so
that the code path tested by this test will be again exercised in debug
unit tests (scylladb/scylladb#23898)

Fixes: scylladb/scylladb#20316

Closes scylladb/scylladb#23900
2025-04-27 18:06:08 +03:00
Piotr Dulikowski
670a69007e test: cluster: add test_bad_initial_token
Adds a test which checks that rollback works properly in case when a bad
value of the initial_token function is provided.
2025-04-25 12:25:15 +02:00
Piotr Dulikowski
845cedea7f topology coordinator: do not proceed further on invalid boostrap tokens
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.

Fix this by adding the missing break.

Fixes: scylladb/scylladb#23897
2025-04-25 11:30:01 +02:00
Piotr Dulikowski
66acaa1bf8 cdc: add sanity check for generating an empty generation
It doesn't make sense to create an empty CDC generation because it does
not make sense to have a cluster with no tokens. Add a sanity check to
cdc::make_new_generation_description which fails if somebody attempts to
do that (i.e. when the set of current tokens + optionally bootstrapping
node's tokens is empty).

The function does not work correctly if it is misused, as we saw in
scylladb/scylladb#23897. While the function should not be misused in the
first place, it's better to throw an exception rather than crash -
especially that this crash could happen on the topology coordinator.
2025-04-25 11:25:07 +02:00
Aleksandra Martyniuk
76cd707b18 test: test_tablets: wait for cql
Wait for cql after rolling restart in test_two_tablets_concurrent_repair_and_migration_repair_writer_level
to prevent failing queries.

Fixes: #23620.

Closes scylladb/scylladb#23796
2025-04-24 21:25:29 +03:00
Patryk Jędrzejczak
2a8bb47cfb test: test_zero_token_nodes_topology_ops: use host IDs for ignored nodes
Providing IP of an ignored node during removenode made the test flaky.
It could happen that the address map contained mappings of two
nodes with the same IP:
1. the node being ignored,
2. the node that expectedly failed replacing earlier in the test.

So, `address_map::find_by_addr()` called in `find_raft_nodes_from_hoeps`
could return the host ID of the second node instead of the first node
and cause removenode to fail.

We fix flakiness in this patch by providing the host ID of the ignored
node instead of its IP. We would have to do it anyway sooner or later
because providing IP is deprecated.

The bug in `find_raft_nodes_from_hoeps` is tracked by
scylladb/scylladb#23846.

The test became flaky because of f0af3f261e.
That patch is not present in 2025.1, so the test isn't flaky outside
master, and hence there is no reason to backport this patch.

Fixes scylladb/scylladb#23499

Closes scylladb/scylladb#23863
2025-04-24 20:17:19 +03:00
Pavel Emelyanov
68a178eba9 Merge 'replica: skip flush of dropped table' from Aleksandra Martyniuk
Currently, flush throws no_such_column_family if a table is dropped. Skip the flush of dropped table instead.

Fixes: #16095.

Needs backport to 2025.1 and 6.2 as they contain the bug

Closes scylladb/scylladb#23876

* github.com:scylladb/scylladb:
  test: test table drop during flush
  replica: skip flush of dropped table
2025-04-24 20:02:59 +03:00
Wojciech Mitros
ee5883770a test: remove flakiness from test_schema_is_recovered_after_dying
Due to the changes in creating schemas with base info the
test_schema_is_recovered_after_dying seems to be flaky when checking
that the schema is actually lost after 'grace_period'. We don't
actually guarantee that the the schema will be lost at that exact
moment so there's no reason to test this. To remove the flakiness,
we remove the check and the related sleep, which should also slightly
improve the speed of this test.
2025-04-24 01:09:35 +02:00
Wojciech Mitros
bf7bba9634 mv: add a test for dropping an index while it's building
Dropping an index is a schema change of its base table and
a schema drop of the index's materialized view. This combination
of schema changes used to cause issues during view building, because
when a view schema was dropped, it wasn't getting updated with the
new version of the base schema, and while the view building was
in progress, we would update the base schema for the base table
mutation reader and try generating updates with a view schema that
wasn't compatible with the base schema, failing on an `on_internal_error`.

In this patch we add a test for this scenario. We create an index,
halt its view building process using an injection, and drop it.
If no errors are thrown, the test succeeds.

The test was failing before https://github.com/scylladb/scylladb/pull/23337
and is passing afterwards.
2025-04-24 01:09:32 +02:00
Wojciech Mitros
d77f11d436 base_info: remove the lw_shared_ptr variant
The base_dependent_view_info is no longer needed to be shared or
modified in the view_info, so we no longer need to keep it as
a shared pointer.
2025-04-24 01:08:40 +02:00
Wojciech Mitros
d7bd86591e view_info: don't re-set base_info after construction
In the previous commits we made sure that the base info is not dependent
on the base schema version, and the info dependent on the base schema
version is calculated when it's needed. In this patch we remove the
unnecessary re-setting of the base_info.

The set_base_info method isn't removed completely, because it also has
a secondary function - zeroing the view_info fields other than base_info.
Because of this, in this patch we rename it accordingly and limit its
use to the updates caused by a base schema change.
2025-04-24 01:08:40 +02:00
Wojciech Mitros
ea462efa3d base_info: remove base_info snapshot semantics
The base info in view schemas no longer changes on base schema
updates, so saving the base info with a view schema from a specific
point in time doesn't provide any additional benefits.
In this patch we remove the code using the base_and_view snapshots
as it's no longer useful.
2025-04-24 01:08:40 +02:00
Wojciech Mitros
ad55935411 base_info: remove base schema from the base_info
The base info now only contains values which are not reliant on the
base schema version. We remove the the base schema from the base info
to make it immutable regardless of base schema version, at the point
of this patch it's also not needed anywhere - the new base info can
replace the base schema in most places, and in the few (view_updates)
where we need it, we pull the most recent base schema version from
the database.

After this change, the base info no longer changes in a view schema
after creation, so we'll no longer get errors when we try generating
view updates with a base_info that's incompatible with a specific
base schema version.

Fixes #9059
Fixes #21292
Fixes #22410
2025-04-24 01:08:39 +02:00
Wojciech Mitros
05fce91945 schema_registry: store base info instead of base schema for view entries
In the following patch we plan to remove the base schema from the base_info
to make the base_info immutable. To do that, we first prepare the schema
registry for the change; we need to be able to create view schemas from
frozen schemas there and frozen schemas have no information about the base
table. Unless we do this change, after base schemas are removed from the
base info, we'll no longer be able to load a view schema to the schema registry
without looking up the base schema in the database.

This change also required some updates to schema building:
* we add a method for unfreezing a view schema with base info instead of
a base schema
* we make it possible to use schema_builder with a base info instead of
a base schema
* we add a method for creating a view schema from mutations with a base info
instead of a base schema
* we add a view_info constructor withat base info instead of a base schema
* we update the naming in schema_registry to reflect the usage of base info
instead of base schema
2025-04-24 01:08:39 +02:00
Wojciech Mitros
6e539c2b4d base_info: make members non-const
In the following patches we'll add the base info instead of the
base schema to various places (schema building, schema registry).
There, we'll sometimes need to update the base_info fields, which
we can't do with const members. There's also a place (global_schema_ptr)
where we won't be able to use the base_info_ptr (a shared pointer to the
base_info), so we can't just use the base_info_ptr everywhere instead.

In this patch we unmark these members as const.
In the following patches we'll remove the methods for changing the
base_info in the view schema, so it will remain effectively const.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
32258d8f9a view_info: move the base info to a separate header
In the following commits the base_depenedent_view_info will be needed
in many more places. To avoid including the whole db/view/view.hh
or forward declaring (where possible) the base info, we move it to
a separate header which can be included anywhere at almost no cost.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
a3d2cd6b5e view_info: move computation of view pk columns not in base pk to view_updates
In preparation of making the base_info immutable, we want to get rid of
any base_dependent_view_info fields that can change when base schema
is updated.
The _base_regular_columns_in_view_pk and _base_static_columns_in_view_pk
base column_ids of corresponding base columns and they can change
(decrease) when an earlier column is dropped in the base table.
view_updates is the only location where these values are used and calculating
them is not expensive when comparing to the overall work done while performing
a view update - we iterate over all view primary key columns and look them up
in the base table.
With this in mind, we can just calculate them when creating a view_updates
object, instead of keeping them in the base_info. We do that in this patch.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
a33963daef view_info: move base-dependent variables into base_info
The has_computed_column_depending_on_base_non_primary_key
and is_partition_key_permutation_of_base_partition_key variables
in the view_info depend on the base table so they should be in the
base_dependent_view_info instead of view_info.
2025-04-24 01:08:39 +02:00
Wojciech Mitros
900687c818 view_info: set base info on construction
Currently, the base_info may or may not be set in view schemas.
Even when it's set, it may be modified. This necessitates extra
checks when handling view schemas, as well as potentially causing
errors when we forget to set it at some point.

Instead, we want to make the base info an immutable member of view
schemas (inside view_info). The first step towards that is making
sure that all newly created schemas have the base info set.
We achieve that by requiring a base schema when constructing a view
schema. Unfortunately, this adds complexity each time we're making
a view schema - we need to get the base schema as well.
In most cases, the base schema is already available. The most
problematic scenario is when we create a schema from mutations:
- when parsing system tables we can get the schema from the
database, as regular tables are parsed before views
- when loading a view schema using the schema loader tool, we need
to load the base additionally to the view schema, effectively
doubling the work
- when pulling the schema from another node - in this case we can
only get the current version of the base schema from the local
database

Additionally, we need to consider the base schema version - when
we generate view updates the version of the base schema used for
reads should match the version of the base schema in view's base
info.
This is achieved by selecting the correct (old or new) schema in
`db::schema_tables::merge_tables_and_views` and using the stored
base schema in the schema_registry.
2025-04-24 01:08:39 +02:00
Benny Halevy
f279625f59 test_tablets_cql: test_alter_dropped_tablets_keyspace: extend expected error
The query may fail also on a no_such_keyspace
exception, which generates the following cql error:
```
Error from server: code=2200 [Invalid query] message="Can\'t find a keyspace test_1745198244144_qoohq"
```
Extend the pytest.raises match expression to include
this error as well.

Fixes #23812

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#23875
2025-04-23 18:54:22 +03:00
Benny Halevy
2bbdaeba1c Update seastar submodule
* seastar e44af9b0...d7ff58f2 (2):
  > rpc: client: support timeout and cancellation
  > doc/io-properties-file.md: correct a typo

Closes scylladb/scylladb#23865
2025-04-23 16:10:51 +03:00
Aleksandra Martyniuk
c1618c7de5 test: test table drop during flush 2025-04-23 14:29:28 +02:00
Aleksandra Martyniuk
91b57e79f3 replica: skip flush of dropped table 2025-04-23 14:29:28 +02:00
Kefu Chai
0d7752b010 build: cmake: generalize update_cxx_flags()
Refactor our CMake flag handling to make it more flexible and reduce
repetition:

- Rename update_cxx_flags() to update_build_flags() to better reflect
  its expanded purpose
- Generate CMake variable names internally based on configuration type
  instead of requiring callers to specify full variable names
- Follow CMake's standard naming conventions for configuration-specific
  flags, see
  https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html#variable:CMAKE_%3CLANG%3E_FLAGS
- Prepare groundwork for handling linker flags in addition to compiler
  flags in future changes

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23842
2025-04-23 12:06:04 +03:00
Nadav Har'El
64a5eee6b9 test/cqlpy: insert test names into Scylla logs
Both test.py and test/cqlpy/run run many test functions against the same
Scylla process. In the resulting log file, it is hard to understand which
log messages are related to which test. In this patch, we log a message
(using the "/system/log" REST API) every time a test is started or ends.

The messages look like this:

    INFO  2025-04-22 15:10:44,625 [shard 1:strm] api - /system/log:
    test/cqlpy: Starting test_lwt.py::test_lwt_missing_row_with_static
    ...
    INFO  2025-04-22 15:10:44,631 [shard 0:strm] api - /system/log:
    test/cqlpy: Ended test_lwt.py::test_lwt_missing_row_with_static

We already had a similar feature in test/alternator, added three years
ago in commit b0371b6bf8. The implementation
is similar but not identical due to different available utility functions,
and in any case it's very simple.

While at it, this patch also fixes the has_rest_api() to timeout after
one second. Without this, if the REST API is blocked in a way that
a connection attempt just hangs, the tests can hang. With the new
timeout, the test will hang for a second, realize the REST API is
not available, and remember this decision (the next tests will not
wait one second again). We had the same bug in Alternator, and fixed
it in 758f8f01d7. This one second "pause"
will only happen if the REST API port is blocked - in the more typical
case the REST API port is just not listening but not blocked, and the
failure will be noticed immediately and won't wait a whole second.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23857
2025-04-23 12:04:14 +03:00
Piotr Dulikowski
3d73c79a72 test: mv: skip test_view_building_scheduling_group in debug
The test populates a table with 50k rows, creates a view on that table
and then compares the time spent in streaming vs. gossip scheduling
groups. It only takes 10s in dev mode on my machine, but is much slower
in debug mode in CI - building the view doesn't finish within 2 minutes.

The bigger the view to build, the more accurrate the measurement;
moreover, the test scenario isn't interesting enough to be worth running
it in debug mode as this should be covered by other tests. Therefore,
just skip this test in debug mode.

Fixes: scylladb/scylladb#23862

Closes scylladb/scylladb#23866
2025-04-23 11:29:35 +03:00
Pavel Emelyanov
a6ba535c3c Merge 'test.py: refactoring before boost pytest integration' from Andrei Chekun
This PR contains changes that do not add new functionality, and have small refactoring of the existing code.
The most significant change though is switching the SQLite writer from a singleton to a thread locking mechanism that will be needed later on.

This PR is an extraction of several commits from https://github.com/scylladb/scylladb/pull/22894 as reviewer [request](https://github.com/scylladb/scylladb/pull/22894?notification_referrer_id=NT_kwDOACiLR7MxNDg0ODk2MDU1MjoyNjU3MDk1&notifications_query=reason%3Aparticipating#pullrequestreview-2778582278).

Closes scylladb/scylladb#23867

* github.com:scylladb/scylladb:
  test.py: move the readme file for LDAP tests to the correct location
  test.py: eliminate deprecation warning for xml.etree.ElementTree.Element
  test.py: align the behavior of max-failures parameter with pytest maxfail
  test.py: fix typo in toxiproxy name parameter
  test.py: add locking to the sqlite writer for resource gather
  test.py: add sqlite datetime adapter for resource gather
  test.py: change the parameter for get_modes_to_run()
2025-04-23 11:10:56 +03:00
Andrei Chekun
57b66e6b2e test.py: move the readme file for LDAP tests to the correct location
README file was created in incorrect location, now it moved to the
directory with source files where it intended to be.
2025-04-22 19:03:28 +02:00
Andrei Chekun
cf4747c151 test.py: eliminate deprecation warning for xml.etree.ElementTree.Element
Testing the truth value of an Element emits DeprecationWarning. This check is done correctly
2025-04-22 19:03:21 +02:00
Andrei Chekun
bc49cd5214 test.py: align the behavior of max-failures parameter with pytest maxfail
This will allow to just transfer the existing max-failures values to the
pytest without any modification. As a downside test.py logic of handling
these changes slightly.
2025-04-22 19:03:08 +02:00
Andrei Chekun
5c3501e4bf test.py: fix typo in toxiproxy name parameter
Fix typo in toxiproxy name parameter. No any functional changes just
cosmetic fix.
2025-04-22 19:02:12 +02:00
Andrei Chekun
2c37a793d1 test.py: add locking to the sqlite writer for resource gather
SQLite blocking the DB during writes, so it's not possible to make writes from
several thread. To be able to gather metrics in several threads, we need a
locking mechanism for threads during writes. So thread will not try to
write metrics while another thread is performing writes.
2025-04-22 19:01:30 +02:00
Andrei Chekun
800710dc2c test.py: add sqlite datetime adapter for resource gather
Add sqlite datetime adapter for resource gather since default adapters are deprecated from 3.12
2025-04-22 18:59:49 +02:00
Andrei Chekun
bf2a9e267e test.py: change the parameter for get_modes_to_run()
Change the parameter for get_modes_to_run() from session to config to
narrow the scope, and prepare it to later use in method that do not have
access to the session, but have access to the config object
2025-04-22 18:58:33 +02:00
Kefu Chai
7254c0c515 db/config.cc: correct a typo in option's description
s/incomming/incoming/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23826
2025-04-22 16:55:04 +03:00
Pavel Emelyanov
65efd2b2f6 Merge 'Refactor and enhance s3_tests' from Ernest Zaslavsky
This PR introduces a cleanup mechanism in s3_tests to remove uploaded objects after the test completes, ensuring a clean testing environment. Additionally, the recently added test has been refactored and split into smaller, more maintainable parts, improving readability and extending its coverage to include the "proxied" case.

As these changes primarily improve code aesthetics and maintainability, backporting is not necessary.

Refs: https://github.com/scylladb/scylladb/issues/23830

Closes scylladb/scylladb#23828

* github.com:scylladb/scylladb:
  s3_tests: Improve and extend copy object test coverage
  s3_tests: Implement post-test cleanup for uploaded objects
2025-04-22 16:40:37 +03:00
Nadav Har'El
5fd2eabd48 Merge 'Generalize the diversity of parse_table_infos() callers in API' from Pavel Emelyanov
The helper in question is used in several different ways -- by handlers directly (most of the callers), as a part of wrap_ks_cf() helper and by one of its overloads that unpack the "cf" query parameter from request. This PR generalizes most of the described callers thus reducing the number differently-looking of ways API handlers parse "keyspace" and "cf" request parameters.

Continuation of #22742

Closes scylladb/scylladb#23368

* github.com:scylladb/scylladb:
  api: Squash two parse_table_infos into one
  api: Generalize keyspaces:tables parsing a little bit more
  api: Provide general pair<keyspace, vector<table>> parsing
  api: Remove ks_cf_func and related code
2025-04-22 15:40:06 +03:00
Nadav Har'El
8d1a413357 test/scylla_gdb: better error message when running on dev build mode
The test/scylla_gdb suite needs Scylla to have been built with debug
symbols - which is NOT the case for the dev build. So the script
test/scylla_gdb/run attempts to recognize when a developer runs it
on an executable with the debug symbols missing - and prints a clear error.

Unfortunately, as we noticed in #10863, and again in #23832, because
wasmtime is compiled with debug symbols and linked with Scylla,
build/dev/scylla "pretends" to have debug symbols, foiling the check
in test/scylla_gdb/run. Reviewers rejected two solutions to this problem
(pull requests #10865 and #10923), so in pull request #10937 I added
a cosmetic solution just for test/scylla_gdb: in test/scylla_gdb/conftest.py
we check that there are **really** debug symbols that interest us,
and if not, exit immediately instead of failing each test separately.

For some reason, the sys.exit() we used is no longer effective - it
no longer exits pytest, so in this patch we use pytest.exit() instead.

Fixes #23832 (sort of, we leave build/dev/scylla with the fake claim
that it has debug symbols, but test/scylla_gdb will handle this
situation more gracefully).

Closes scylladb/scylladb#23834
2025-04-22 15:02:06 +03:00
Michael Litvak
5c1d24f983 test: test_mv_topology_change: increase timeout for remove_node
The test `test_mv_write_to_dead_node` currently uses a timeout of 60
seconds for remove_node, after it was increased from 30 seconds to fix
scylladb/scylladb#22953. Apparently it is still too low, and it was
observed to fail in debug mode.

Normally remove_node uses a default timeout of TOPOLOGY_TIMEOUT = 1000
seconds, but the test requires a timeout which is shorter than 5
minutes, because it is a regression test for an issue where MV updates
hold topology changes for more than 5 minutes, and we want to verify in
the test that the topology change completes in less than 5 minutes.

To resolve the issue, we set the test to skip in debug mode, because the
remove node operation is unpredictably slow, and we increase the timeout
to 180 seconds which is hopefully enough time for remove_node in
non-debug modes, and still sufficient to satisfy the test requirements.

Fixes scylladb/scylladb#22530

Closes scylladb/scylladb#23833
2025-04-22 10:51:19 +02:00
Ernest Zaslavsky
edaa3f4bdd s3_tests: Improve and extend copy object test coverage
Refactored the copy object test to enhance readability and maintainability.
The test was simplified and split into smaller, more focused parts.
Additionally, a "proxied" variant of the test was introduced to expand
coverage.
2025-04-21 20:54:14 +03:00
Ernest Zaslavsky
252a0a14af s3_tests: Implement post-test cleanup for uploaded objects
Ensure cleanup after tests by deleting objects uploaded to MinIO.
This improves resource management and maintains a clean test environment.
2025-04-21 20:54:14 +03:00
Avi Kivity
2dcd2b21ae Merge 'tablets: Equalize per-table balance when allocating tablets for a new table' from Tomasz Grabiec
Fixes the following scenario:

1. Scale out adds new nodes to each rack
2. Table is created - all tablets are allocated to new nodes because they have low load
3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed

We're wrong to try to equalize global load when allocating tablets,
and we should equalize per-table load instead, and let background load
balancing fix it in a fair way. It will add to the allocated storage
imbalance, but:

1. The table is initially empty, so doesn't impact actual storage imbalance.
2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately.
3. If the table was created before imbalance was formed, we would end up in the same situation as in the problematic scenario after the patch.
4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in.

Before we have CPU-aware tablet allocation, and thus can prove we have
CPU capacity on the small nodes, we should respect per-table balance
as this is the way in which we achieve full CPU utilization.

Fixes #23631

Backport to 2025.1 because load imbalance is a serious problem in production.

Closes scylladb/scylladb#23708

* github.com:scylladb/scylladb:
  tablets: Equalize per-table balance when allocating tablets for a new table
  load_sketch: Tolerate missing tablet_map when selecting for a given table
  tests: tablets: Simplify tests by moving common code to topology_builder
2025-04-21 17:06:30 +03:00
Pavel Emelyanov
eb5b52f598 Merge 'main: make DC and rack immutable after bootstrap' from Piotr Dulikowski
Changing DC or rack on a node which was already bootstrapped is, in
case of vnodes, very unsafe (almost guaranteed to cause data loss or
unavailability), and is outright not supported if the cluster has
a tablet-backed keyspaces. Moreover, the possibility of doing that
makes it impossible to uphold some of the invariants promised by
the RF-rack-valid flag, which is eventually going to become
unconditionally enabled.

Get rid of the above problems by removing the possibility of changing
the DC / rack of a node. A node will now fail to start if its snitch
reports a different DC or rack than the one that was reported during the
first boot.

Fixes: scylladb/scylladb#23278
Fixes: scylladb/scylladb#22869

Marking for backport to 2025.1, as this is a necessary part of the RF-rack-valid saga

Closes scylladb/scylladb#23800

* github.com:scylladb/scylladb:
  doc: changing topology when changing snitches is no longer supported
  test: cluster: introduce test_no_dc_rack_change
  storage_service: don't update DC/rack in update_topology_with_local_metadata
  main: make dc and rack immutable after bootstrap
  test: cluster: remove test_snitch_change
2025-04-21 15:52:55 +03:00
Yaniv Michael Kaul
b374f94b15 pip installation: use --no-cache-dir
There are two reasons we may want NOT to use caching of pip deps:
1. When building a container, unless we specifically clean it up, it'll remain, even when we squash the image layers later.
2. When building a container, that cache is not useful, as we squash our containers later (so that layer is not cached really). And our CI cleans up the layers repo anyway.
3. Caching sometimes isn't great, and doesn't ensure we pick up the exact version (or latest) that we wish to...

This PR changes two locations in Scylla, both of which (also) build containers, so certainly relevant for 1, 2 above and possibly 3.
No real need to backport.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#23822
2025-04-21 13:46:57 +03:00
Avi Kivity
0ba3ce1741 test: gdb: avoid using file(1) to determine if debug information is present
The scylla_gdb tests verify, as a sanity check, that the executable
was built with debug information. They do so via file(1).

In Fedora 42, file(1) crashes on ELF files that have interpreter pathnames
larger than 128 characters[1]. This was later fixed[2], but the fix is not
in any release.

Work around the problem by using objdump instead of file.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2354970
[2] b3384a1fbf

Closes scylladb/scylladb#23823
2025-04-21 13:29:27 +03:00