Commit Graph

45889 Commits

Author SHA1 Message Date
Michał Chojnowski
f6ebd445e4 test_tablets.py: limit concurrency in test_tablet_storage_freeing
Apparently the python driver can't deal with the current concurrency sometimes.
Lower it from 1000 to 100.

Fixes scylladb/scylladb#20489

Closes scylladb/scylladb#20494
2024-12-19 15:14:41 +02:00
Kefu Chai
df36985fc3 raft: do not include unused headers
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21838
2024-12-19 14:57:22 +02:00
Kefu Chai
93be8f3a0c db,sstables: migate boost::range::stable_partition to std library
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::stable_partition`.

in this change, we:

- replace `boost::range::stable_parition()` to
  `std::ranges::stable_parition()`
- since `std::ranges::stable_parition()` returns a subrange instead of
  an iterator, change the names of variables which were previously used
  for holding the return value of `boost::range::stable_partition()`
  accordingly for better readability.
- remove unused `#include` of boost headers

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21911
2024-12-19 14:56:07 +02:00
Avi Kivity
a4440392d7 build: update dependencies for features to be ported from enterprise
ldap/slapd/toxiproxy/cyrus-sasl - for ldap authentication and authorization
git-lfs/bolt - for profile-guided optimization
lz4-static - for dictionary based network compression
jwt - for Oauth/GCP connectivity (for key management)
openkmip - for kmip testing
fipscheck - for FIPS validation

Frozen toolchain regenerated, with optimized clang from

  https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz
  https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz
2024-12-19 14:26:31 +02:00
Wojciech Mitros
37a25d3af4 mv: avoid stalls when calculating affected clustering ranges
Currently, when finishing db::view::calculate_affected_clustering_ranges
we deoverlap, transform and copy all ranges prepared before. This
is all done within a single continuation and can cause stalls.

We fix this by adding yields after each transform and moving elements
to the final vector one by one instead of copying them all at the end.

After this change, the longest continuation in this code will be
deoverlapping the initial ranges (and one transform). While it has
a relatively high computational complexity (we sort all ranges), it
should execute quickly because we're operating on views there and
we don't need to copy the actual bytes. If we encounter a stall there,
we'll need to implement an asynchronous `deoverlap` method.

Fixes scylladb/scylladb#21843

Closes scylladb/scylladb#21846
2024-12-19 12:50:30 +01:00
Kamil Braun
91cddcc17f Merge 'Do not reset quarantine list in non raft mode' from Gleb Natapov
The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue.

Fixes: scylladb/scylladb#21930

Closes scylladb/scylladb#21956

* github.com:scylladb/scylladb:
  gossiper: do not reset _just_removed_endpoints in non raft mode
  gossiper: do not send echo message to yourself
  gossiper: do not call apply for the node's old state
2024-12-19 11:03:35 +01:00
Pavel Emelyanov
bb094cc099 Merge 'Make restore task abortable' from Calle Wilund
Fixes #20717

Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.

Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".

Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.

There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.

Closes scylladb/scylladb#21567

* github.com:scylladb/scylladb:
  test_backup: Add restore abort test case
  sstables_loader: Make restore task abortable
  distributed_loader: Add optional abort_source to get_sstables_from_object_store
  s3_storage: Add optional abort_source to params/object
  s3::client: Make "readable_file" abortable
2024-12-19 12:23:33 +03:00
Kefu Chai
2a31a82ae2 .github: Ensure header generation before include analysis
When running clang-include-cleaner, the tool performs static analysis by
"compiling" specified source files. Previously, non-existent included headers
caused the tool to skip source files, reducing the effectiveness of unused
include detection.

Problem:
- Header files like 'rust/wasmtime_bindings.hh' were not pre-generated
- Compilation errors led to skipping source file analysis

  ```
  /__w/scylladb/scylladb/lang/wasm.hh:15:10: fatal error: 'rust/wasmtime_bindings.hh' file not found
     15 | #include "rust/wasmtime_bindings.hh"
        |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~
  Skipping file /__w/scylladb/scylladb/lang/wasm.hh due to compiler errors. clang-include-cleaner expects to work on compilable source code.
  1 error generated.
```

- This significantly reduced clang-include-cleaner's coverage

Solution:
- Build the `wasmtime_bindings` target to generate required header files
- Ensure all necessary headers are created before running static analysis
- Enable full source file checking for unused includes

By generating headers before analysis, we prevent skipping of source files
and improve the comprehensiveness of our include cleaner workflow.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21739
2024-12-19 09:41:46 +02:00
Ferenc Szili
dc375b8cd3 test: enable test_truncate_with_coordinator_crash
This test was added in PR #19789 but was disabled with xfail because of
the bug with way truncate saved the commit log replay positions. More
specifically, the replay positions for shards that had no mutations were
saved to system.truncated with shard_id == 0, regardless for which shard
it was actually saved for (see #21719).
The bug was fixed in #21722, so this change removes the xfail tag from
the test.

Closes scylladb/scylladb#21902
2024-12-18 18:02:52 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Botond Dénes
1a717f3014 service/storage_proxy: data_resolver::resolve(): apply mutations gently
The data resolved has to apply all mutations from all replica to a
single mutation. In the extreme case, when all rows are dead, the
mutations can have around 10K rows in them. This is not a huge amount,
but it is enough to cause moderate stalls of <20ms.
To avoid this, use the gentle variant of apply(), which can yield in the
middle.

Fixes: scylladb/scylladb#21818

Closes scylladb/scylladb#21884
2024-12-18 15:21:19 +01:00
Kefu Chai
e65fc35b5e replica: do not include unused headers
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21836
2024-12-18 13:52:57 +02:00
Avi Kivity
5a849b0a6a Merge "Move more subsystems to use host ids instead of ips" from Gleb
"
This series converts repair, streaming and node_ops (and some parts of
alternator) to work on host ids instead of ips. This allows to remove
a lot of (but not all) functions that work on ips from effective
replication map.

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13830/

Refs: scylladb/scylladb#21777
"

* 'gleb/move-to-host-id-more' of github.com:scylladb/scylla-dev:
  locator: topology: remove no longer use get_all_ips()
  gossiper: change get_unreachable_nodes to host ids
  locator: drop no longer used ip based functions from effective replication map and friends
  test: move network_topology_strategy_test and token_metadata_test to use host id based APIs
  replica/database: drop usage of ip in favor of host id in get_keyspace_local_ranges
  replica/mutation_dump: use host ids instead of ips
  alternator: move ttl to work with host ids instead of ips
  storage_service: move node_ops code to use host ids instead of host ips
  streaming: move streaming code to use host ids instead of host ips
  repair: move repair code to use host ids instead of host ips
  gossiper: add get_unreachable_host_ids() function
  locator: topology: add more function that return host ids to effective replication map
  locator: add more function that return host ids to effective replication map
2024-12-18 13:48:22 +02:00
Piotr Dulikowski
d067d8caef Merge 'More Python tests for materialized view and Alternator GSI feature' from Nadav Har'El
This patch includes more tests (in Python) that I wrote while implementing the Alternator UpdateTable feature for adding a GSI to an existing table (https://github.com/scylladb/scylladb/issues/11567).

I explain each of these tests in the separate patches below, but basically they fall into two types:
1. Tests which pass with today's materialized views and Alternator GSI/LSI, and serve to ensure that whatever changes I do to the view update implementation, doesn't break corner cases that already worked.
2. Tests for the UpdateTable feature in Alternator which doesn't work today so xfail - and will need to work for #11567. We already had a few tests for this, but here I add more and improve coverage of various corner cases I discovered while implementing the featue.

I already have a working prototype for #11567 which passes all these tests. Many of these tests helped exposed various bugs in earlier versions of my code.

Closes scylladb/scylladb#21927

* github.com:scylladb/scylladb:
  test/cqlpy: a few more functional tests for materialized views
  test/alternator: more tests for UpdateTable create and delete GSI
  test/alternator: make UpdateTable tests wait less
  test/alternator: move UpdateTable tests to a separate file
  test/alternator: add another test for elaborate GSI updates
  test/alternator: test that DescribeTable returns IndexStatus for GSI
  test/alternator: fix wrong test for UpdateTable metrics
  test/alternator: add test for missing attribute in item in LSI
  test/alternator: test that DescribeTable doesn't return IndexStatus for LSI
  test/alternator: add tests for RBAC for create and delete GSI
2024-12-17 20:43:07 +01:00
Yaron Kaikov
3a00ffd2eb build_docker.sh: remove rsyslog installation and conf
It seems that no one is using rsyslog, so there is no point having it
inside our container (see https://github.com/scylladb/scylladb/issues/21923#issuecomment-2545191667)

Refs: https://github.com/scylladb/scylladb/issues/21923

Closes scylladb/scylladb#21953
2024-12-17 17:34:35 +02:00
Gleb Natapov
e318dfb83a gossiper: do not reset _just_removed_endpoints in non raft mode
By the time the function is called during start it may already be
populated.

Fixes: scylladb/scylladb#21930
2024-12-17 16:57:13 +02:00
Gleb Natapov
3368019982 gossiper: do not send echo message to yourself
When sending by ID we should check that we do not translate our old
address to our ID and sending locally. mark_alive should not be called
with node's old ip anyway.
2024-12-17 16:57:13 +02:00
Gleb Natapov
e80355d3a1 gossiper: do not call apply for the node's old state
If a nodes changed its address an old state may be still in a gossiper,
so ignore it.
2024-12-17 16:57:13 +02:00
Avi Kivity
01cdba9a98 Merge 'cache_algorithm_test: fix flaky failures' from Michał Chojnowski
This series attempts to get read of flakiness in `cache_algorithm_test` by solving two problems.

Problem 1:

The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form:
0x0000000000000000000000000000000000000000...
0x0100000000000000000000000000000000000000...
0x0200000000000000000000000000000000000000...
But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar...
0x0100000000000000garbagegarbagegarbagegar...
0x0200000000000000garbagegarbagegarbagegar...

Each of these keys is created several times and -- for the test to pass -- the result must be the same each time.
By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses.

Problem 2:

Cache stats are global, so there's no good way to reliably
verify that e.g. a given read causes 0 cache misses,
because something done by Scylla in a background can trigger a cache miss.

This can cause the test to fail spuriously.

With how the test framework and the cache are designed, there's probably
no good way to test this properly. It would require ensuring that cache
stats are per-read, or at least per-table, and that Scylla's background
activity doesn't cause enough memory pressure to evict the tested rows.

This patch tries to deal with the flakiness without deleting the test
altogether by letting it retry after a failure if it notices that it
can be explained by a read which wasn't done by the test.
(Though, if the test can't be written well, maybe it just shouldn't be written...)

Fixes #21536

Should be backported to prevent flaky failures in older branches.

Closes scylladb/scylladb#21948

* github.com:scylladb/scylladb:
  cache_algorithm_test: harden against stats being confused by background activity
  cache_algorithm_test: fix a use of an uninitialized variable
2024-12-17 14:46:43 +02:00
Botond Dénes
73fc135e02 Merge 'test.py: make sure topology/ and topology_custom/ passes with tablets on.' from Konstantin Osipov
Explicitly disable tablets in a few tests that rely on features not yet supported with tablets.

Closes scylladb/scylladb#21070

* github.com:scylladb/scylladb:
  test: disable tablets in test_raft_fix_broken_snapshot
  test: disable tablets in test_raft_recovery_stuck
  test: disable tablets in tet_raft_recovery_majority_lost
  test: don't run test_raft_recovery_basic with tablets
  test: fix test_writes_to_previous_cdc_generations work with tablets
  test: fix topology_custom/test_mv_topology_change.py to work with tablets
  test: correct replication factor in test_multidc.py
  test: update test_view_build_status to work with tablets
  test: fix test_change_rpc_address with tablets.
  test: explicitly disable tablets in test_gropu0_schema_versioning
  test: disable tablets in topology/test_mutation_schema_change.py
  test: disable tablets in topology/test_mv.py
2024-12-17 08:38:10 +02:00
Aleksandra Martyniuk
d0cda8ebef replica: check enabled features in tablet_map_to_mutation
Before adding a value to a new column in tablet_map_to_mutation
check if the column is supported by the whole cluster.

Closes scylladb/scylladb#21941
2024-12-17 07:02:11 +02:00
Michał Chojnowski
6caaead4ac cache_algorithm_test: harden against stats being confused by background activity
Cache stats are global, so there's no good way to reliably
verify that e.g. a given read causes 0 cache misses,
because something done by Scylla in a background can trigger a cache miss.

This can cause the test to fail spuriously.

With how the test framework and the cache are designed, there's probably
no good way to test this properly. It would require ensuring that cache
stats are per-read, or at least per-table, and that Scylla's background
activity doesn't cause enough memory pressure to evict the tested rows.

This patch tries to deal with the flakiness without deleting the test
altogether by letting it retry after a failure if it notices that it
can be explained by a read which wasn't done by the test.
(Though, if the test can't be written well, maybe it just shouldn't be written...)
2024-12-16 23:14:30 +01:00
Michał Chojnowski
1fffd976a4 cache_algorithm_test: fix a use of an uninitialized variable
The test needs to create some arbitrary partition keys of a given size.
It intends to create keys of the form:
0x0000000000000000000000000000000000000000...
0x0100000000000000000000000000000000000000...
0x0200000000000000000000000000000000000000...
But instead, unintentionally, it creates partially initialized keys of the form:
0x0000000000000000garbagegarbagegarbagegar...
0x0100000000000000garbagegarbagegarbagegar...
0x0200000000000000garbagegarbagegarbagegar...

Each of these keys is created several times and -- for the test to pass --
the result must be the same each time.
By coincidence, this is usually the case, since the same allocator slots are used.
But if some background task happens to overwrite the allocator slot during a
preemption, the keys used during "SELECT" will be different than the keys used
during "INSERT", and the test will fail due to extra cache misses.
2024-12-16 23:14:13 +01:00
Nadav Har'El
99e7fdef6d test/cqlpy: a few more functional tests for materialized views
This patch adds a few more functional tests for the CQL materialized
view feature in the cqlpy. The new tests pass, but helped me catch bugs (and
understand what are *not* bugs) while refactoring some view update code.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
d9af154772 test/alternator: more tests for UpdateTable create and delete GSI
We already have in test_gsi_updatetable.py several functional tests for
the Alternator feature of adding or deleting a GSI on an existing table,
through the UpdateTable operation.
This patch adds many more tests for various corner cases of this feature -
tests developed in parallel with actually implementing that feature.

All test in test_gsi_updatetable.py pass on Amazon DynamoDB but currently
xfail on Alternator, due to the following issues:

 * #11567: Alternator: allow adding a GSI to a pre-existing table
 * #9424: Alternator GSIs should exclude items with empty-string key components

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
5c7b8c8e4d test/alternator: make UpdateTable tests wait less
The UpdateTable tests for creating and deleting a GSI need to wait for
the asynchronous operation of the view's building and deletion, using
two utility functions wait_for_gsi() and wait_for_gsi_gone().

Because I originally wrote these tests for DynamoDB and its extremely
high latency for these operations, these functions waited a whole second
before checking for the end of the wait. This whole-second sleep is
absurd in Alternator where building a small view takes just a fraction of
a second. So let's lower the sleep time from 1 second to 0.1 seconds,
and allow these tests to pass much faster on Alternator (once this
feature is implemented in Alternator, of course - until then all these
tests still fail immediately on an unimplemented operation).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
b1bd5cdf0f test/alternator: move UpdateTable tests to a separate file
The source file test/alternator/test_gsi.py has already grown very
large, so this patch moves all the existing tests related to using
UpdateTable to add or delete a GSIs to a separate file:
test_gsi_updatetable.py.

We just move tests here - no new tests or functional changes to the
tests - but did use the opportunity for some small improvements in
the comments.

In the next patch we'll add more tests to this new file.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 19:36:47 +02:00
Nadav Har'El
cc308bd0cc test/alternator: add another test for elaborate GSI updates
We have a test, test/alternator/test_gsi.py::test_update_gsi_pk which
created a GSI whose *partition key* was a regular column in the base
table, and exercised various elaborate updates requiring adding,
updating and deleting of rows from the materialized view.

In this patch, we add another similar test case, just for a *clustering
key*.

Both these tests are important regression tests - when we later
reimplement GSI we'll want to verify that none of the complex update
scenarios got broken (and indeed, some broken code did break these
tests).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
9094fe1608 test/alternator: test that DescribeTable returns IndexStatus for GSI
This patch adds a test reproducing issue #11471 - where DescribeTable
on a table that as an already built GSI (creating with the table itself)
must return IndexStatus == "ACTIVE".

This test passes on DynamoDB, but xfails on Alternator because of
issue #11471.

We actually had this check earlier, but it was part of a bigger xfailing
tests that checked multiple features. It's better to have it as a
separate test just for this feature, as we'll soon fix this issue and
make this test pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
1b120e3c7e test/alternator: fix wrong test for UpdateTable metrics
The test we had for counting Alternator operations metrics ran the
UpdateTable request without any parameters, which isn't actually a
valid call - Amazon DynamoDB rejects such a call, saying one of the
different parameters must be present, and we'll want to do that
later too.

So let's fix the test to use a valid UpdateTable request, one that
does the silly BillingMode='PAY_PER_REQUEST'. This is already the
current setting, so nothing is really changed, but it's still counted
as an operation in the metric.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
85088516b2 test/alternator: add test for missing attribute in item in LSI
Test that when a table has an LSI, then if the indexed attribute is
missing, the item is added to the base table but not the index.

We already have exactly the same test for GSI in test_gsi.py, but forgot
to do write the same test for LSI. It's important to test this scenario
separately for GSIs and LSIs because in an upcoming GSI reimplementation
we plan to make the GSI and LSI implementation slightly different, and
they can have separate bugs (and in fact, we had such an LSI-specific
bug in one broken implementation).

We also have the same scenario that is tested here in the test
test_streams.py::test_streams_updateitem_old_image_lsi_missing_column
but that was a Alternator Streams test and we should have a more basic
test for this scenario in test_lsi.py.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
b00f5a6070 test/alternator: test that DescribeTable doesn't return IndexStatus for LSI
Whereas GSIs have an IndexStatus when described by DescribeTable,
LSIs do not. The purpose of IndexStatus is to tell when the index is live,
and this is not needed for LSIs because they cannot be added to a base
table that already exists.

We already had a test for this, but it was hidden in an xfailing test
for many different DescribeTable attributes - so let's move it into it's
own, *passing*, test. The new tests passes on both Alternator and
Amazon DynamoDB.

This test is an important regression test for when we later add
IndexStatus support to GSI, and this test will ensure that we don't
accidentally introduce IndexStatus to LSIs as well - DynamoDB doesn't
generate it for LSIs so neither should Alternator.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:56:28 +02:00
Nadav Har'El
373b37b5da test/alternator: add tests for RBAC for create and delete GSI
In later patches we will implement (as requested in issue #11567) the
UpdateTable operation for creating a new GSI or removing a GSI on an
existing table. In this patch we add to test/alternator/test_cql_rbac.py
tests to exhaustively check that the new operations will behave as expected
in respect to role-based access control (RBAC):

1. UpdateTable requires the ALTER permissions on the affected table -
   as was already the case before (and was documented in compatibility.md).
   This should also be true for the newly-implemented UpdateTable
   operations that create a GSI and delete a GSI, and we test that.

   The above statement may sound counter-intuitive - why does creating
   or deleting a GSI require ALTER permissions (on the base table), not
   CREATE or DROP permissions? But this makes sense when you consider
   that CREATE permissions should allow you create new independent tables,
   not to change the behavior or performance of existing tables (which
   adding a GSI does).

2. When a role has permissions to create a GSI, it should be able to
   read the new GSI (SELECT permissions). This is known as "auto-grant".

3. When a GSI is deleted, whatever permissions was set on it is revoked,
   so that if it's later recreated, the old permissions don't resurface.
   This is known as "auto-revoke".

Because the UpdateTable feature for creating and deleting a GSI is not
yet enabled, the new tests are all marked "xfail".

The new tests, like all tests in the file test/alternator/test_cql_rbac.py
are Scylla-only and are skipped on Amazon DynamoDB - because they test
the Scylla-only CQL-based role-based access control API.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-12-16 18:55:28 +02:00
Konstantin Osipov
686c0e517f test: disable tablets in test_raft_fix_broken_snapshot
The test is using force_gossip_topology_changes which
doesn't work with tablets.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
bba034202d test: disable tablets in test_raft_recovery_stuck
The test is using force_gossip_topology_mode which doesn't
work with tablets.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
3767a54696 test: disable tablets in tet_raft_recovery_majority_lost
The test is using force_gossip_topology_mode which doesn't
work with tablets.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
e961d692e6 test: don't run test_raft_recovery_basic with tablets
It uses force_gossip_topology_changes, which doesn't work
with tablets.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
d6fc0d5512 test: fix test_writes_to_previous_cdc_generations work with tablets
The test is testing CDC. CDC doesn't work with tablets.
Explicitly disable tablets in the keyspaces used by the test.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
169c2e62b8 test: fix topology_custom/test_mv_topology_change.py to work with tablets
test_mv_topology_change runs in gossip mode, so disable tablets as well.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
ff43f8d9f6 test: correct replication factor in test_multidc.py
In tablets mode, it is not allowed to CREATE a table
if replication factor can be satisfied. E.g. if the keyspace
is defined to have replication_factor = 3 and there
are only 2 replicas, in vnodes mode one still can
CREATE the table and write to it, whereas in tablets
mode one gets an error.

The confusion is what 'replication_factor' means.
When NetworkTopologyStrategy is used, in multi-dc mode, each DC must
have at least 'replication_factor' replicas and stores
'replication_factor' copies of data.

The test author (as well as the author of this "fix", see
my confused report of gh-21166) assumed that 'replication_factor'
means the total number of replicas, not the number of replicas
per DC.

Correct the test to use only one replica per DC, as this is the
topology the test is working with. The test is not specific
to the number of replicas, so the change does not impact
the logic of the test.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
1e582b4c0f test: update test_view_build_status to work with tablets
The test runs a bunch of tests in gossip only mode, which doesn't
work with tablets, so disable tablets explicitly in these tests.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
3e55f1c033 test: fix test_change_rpc_address with tablets.
With tablets, it's not allowed to create a table in a keyspace
which replication factor exceeds the actual number of nodes in the
cluster.

Pass the replication factor to random_tables fixture so that
a keyspace with a correct replication_factor is created.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
4b10c10c1b test: explicitly disable tablets in test_gropu0_schema_versioning
This is a gossip-based topology changes test, and tablets
don't work with gossip based topology.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
4aa7dca862 test: disable tablets in topology/test_mutation_schema_change.py
This test uses lightweight transactions, which are not enabled
with tablets keyspaces.
2024-12-16 08:38:05 -05:00
Konstantin Osipov
2866b4f550 test: disable tablets in topology/test_mv.py
The test file contains two test cases, which both test
materialized view tombstone gc settings. With tablets the default
is "repair" which is different from vnodes.

The tests are testing that the gc settings are not inherited. With
tablets, the gc settings are forced. This is indistinguishable from
inheriting, so the tests are failing when run with tablets.
2024-12-16 08:38:05 -05:00
Botond Dénes
e6447f60c2 Merge 'db,auth,locator: Remove unused member variables' from Kefu Chai
this issue was identified by clang-20.

---

it's a cleanup, hence no need to backport.

Closes scylladb/scylladb#21835

* github.com:scylladb/scylladb:
  locator: remove unused member variable
  auth: remove unused member variable
  db: remove unused member variable
2024-12-16 15:16:17 +02:00
Kefu Chai
f2638c3d18 test: topology_custom: restrcuture comment as ordered list
When investigating issue #21724, the docstring for
`test_recover_stuck_raft_recovery` was found to be difficult to follow.
Restructured the docstring into an ordered list to:

1. Improve readability
2. Clearly outline the test steps
3. Make the test's logic and flow more immediately comprehensible

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21728
2024-12-16 14:30:13 +02:00
Pavel Emelyanov
7db9132b56 test: Add validation of getting/changing compaction strategy via REST API
The /column_family/compaction_strategy has GET and POST implemented, the
latter changes the strategy on the table.

Unknown strategy name implicitly renders internal server error code by
catching exception from compaction_strategy::type() that tries to
convert strategy name string to strategy enum class type.

This is to finish validation of #21533

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#21569
2024-12-16 14:28:23 +02:00
Botond Dénes
34a8b492be Merge 'materialized view: make flow-control maximum delay configurable' from Piotr Dulikowski
This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test.

Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays.

This hard-coded one maximum second delay was considered *huge* - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client.

So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before.

Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it.

The new parameter's help string mentions both these use cases of the parameter.

Fixes #18187

This is new functionality, no need to backport to any open source release.

Closes scylladb/scylladb#21647

* github.com:scylladb/scylladb:
  materialized views: test for the MV delay configuration parameter
  service: add injection for skipping view update backlog
  materialized view: make flow-control maximum delay configurable
2024-12-16 14:20:33 +02:00
Yaron Kaikov
2e6755ecca .github/scripts/auto-backport.py: Add comment to PR when conflicts apply
When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR)

Let's add a comment to notify the user we have conflicts

Closes scylladb/scylladb#21939
2024-12-16 14:17:40 +02:00