these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21838
now that we are allowed to use C++23. we now have the luxury of using
`std::ranges::stable_partition`.
in this change, we:
- replace `boost::range::stable_parition()` to
`std::ranges::stable_parition()`
- since `std::ranges::stable_parition()` returns a subrange instead of
an iterator, change the names of variables which were previously used
for holding the return value of `boost::range::stable_partition()`
accordingly for better readability.
- remove unused `#include` of boost headers
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21911
Currently, when finishing db::view::calculate_affected_clustering_ranges
we deoverlap, transform and copy all ranges prepared before. This
is all done within a single continuation and can cause stalls.
We fix this by adding yields after each transform and moving elements
to the final vector one by one instead of copying them all at the end.
After this change, the longest continuation in this code will be
deoverlapping the initial ranges (and one transform). While it has
a relatively high computational complexity (we sort all ranges), it
should execute quickly because we're operating on views there and
we don't need to copy the actual bytes. If we encounter a stall there,
we'll need to implement an asynchronous `deoverlap` method.
Fixesscylladb/scylladb#21843Closesscylladb/scylladb#21846
The series contains small fixes to the gossiper one of which fixes#21930. Others I noticed while debugged the issue.
Fixes: scylladb/scylladb#21930Closesscylladb/scylladb#21956
* github.com:scylladb/scylladb:
gossiper: do not reset _just_removed_endpoints in non raft mode
gossiper: do not send echo message to yourself
gossiper: do not call apply for the node's old state
Fixes#20717
Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.
Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".
Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.
There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.
Closesscylladb/scylladb#21567
* github.com:scylladb/scylladb:
test_backup: Add restore abort test case
sstables_loader: Make restore task abortable
distributed_loader: Add optional abort_source to get_sstables_from_object_store
s3_storage: Add optional abort_source to params/object
s3::client: Make "readable_file" abortable
When running clang-include-cleaner, the tool performs static analysis by
"compiling" specified source files. Previously, non-existent included headers
caused the tool to skip source files, reducing the effectiveness of unused
include detection.
Problem:
- Header files like 'rust/wasmtime_bindings.hh' were not pre-generated
- Compilation errors led to skipping source file analysis
```
/__w/scylladb/scylladb/lang/wasm.hh:15:10: fatal error: 'rust/wasmtime_bindings.hh' file not found
15 | #include "rust/wasmtime_bindings.hh"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
Skipping file /__w/scylladb/scylladb/lang/wasm.hh due to compiler errors. clang-include-cleaner expects to work on compilable source code.
1 error generated.
```
- This significantly reduced clang-include-cleaner's coverage
Solution:
- Build the `wasmtime_bindings` target to generate required header files
- Ensure all necessary headers are created before running static analysis
- Enable full source file checking for unused includes
By generating headers before analysis, we prevent skipping of source files
and improve the comprehensiveness of our include cleaner workflow.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21739
This test was added in PR #19789 but was disabled with xfail because of
the bug with way truncate saved the commit log replay positions. More
specifically, the replay positions for shards that had no mutations were
saved to system.truncated with shard_id == 0, regardless for which shard
it was actually saved for (see #21719).
The bug was fixed in #21722, so this change removes the xfail tag from
the test.
Closesscylladb/scylladb#21902
The data resolved has to apply all mutations from all replica to a
single mutation. In the extreme case, when all rows are dead, the
mutations can have around 10K rows in them. This is not a huge amount,
but it is enough to cause moderate stalls of <20ms.
To avoid this, use the gentle variant of apply(), which can yield in the
middle.
Fixes: scylladb/scylladb#21818Closesscylladb/scylladb#21884
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21836
"
This series converts repair, streaming and node_ops (and some parts of
alternator) to work on host ids instead of ips. This allows to remove
a lot of (but not all) functions that work on ips from effective
replication map.
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13830/
Refs: scylladb/scylladb#21777
"
* 'gleb/move-to-host-id-more' of github.com:scylladb/scylla-dev:
locator: topology: remove no longer use get_all_ips()
gossiper: change get_unreachable_nodes to host ids
locator: drop no longer used ip based functions from effective replication map and friends
test: move network_topology_strategy_test and token_metadata_test to use host id based APIs
replica/database: drop usage of ip in favor of host id in get_keyspace_local_ranges
replica/mutation_dump: use host ids instead of ips
alternator: move ttl to work with host ids instead of ips
storage_service: move node_ops code to use host ids instead of host ips
streaming: move streaming code to use host ids instead of host ips
repair: move repair code to use host ids instead of host ips
gossiper: add get_unreachable_host_ids() function
locator: topology: add more function that return host ids to effective replication map
locator: add more function that return host ids to effective replication map
This patch includes more tests (in Python) that I wrote while implementing the Alternator UpdateTable feature for adding a GSI to an existing table (https://github.com/scylladb/scylladb/issues/11567).
I explain each of these tests in the separate patches below, but basically they fall into two types:
1. Tests which pass with today's materialized views and Alternator GSI/LSI, and serve to ensure that whatever changes I do to the view update implementation, doesn't break corner cases that already worked.
2. Tests for the UpdateTable feature in Alternator which doesn't work today so xfail - and will need to work for #11567. We already had a few tests for this, but here I add more and improve coverage of various corner cases I discovered while implementing the featue.
I already have a working prototype for #11567 which passes all these tests. Many of these tests helped exposed various bugs in earlier versions of my code.
Closesscylladb/scylladb#21927
* github.com:scylladb/scylladb:
test/cqlpy: a few more functional tests for materialized views
test/alternator: more tests for UpdateTable create and delete GSI
test/alternator: make UpdateTable tests wait less
test/alternator: move UpdateTable tests to a separate file
test/alternator: add another test for elaborate GSI updates
test/alternator: test that DescribeTable returns IndexStatus for GSI
test/alternator: fix wrong test for UpdateTable metrics
test/alternator: add test for missing attribute in item in LSI
test/alternator: test that DescribeTable doesn't return IndexStatus for LSI
test/alternator: add tests for RBAC for create and delete GSI
When sending by ID we should check that we do not translate our old
address to our ID and sending locally. mark_alive should not be called
with node's old ip anyway.
This series attempts to get read of flakiness in `cache_algorithm_test` by solving two problems.
Problem 1:
The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form:
0x0000000000000000000000000000000000000000...
0x0100000000000000000000000000000000000000...
0x0200000000000000000000000000000000000000...
But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar...
0x0100000000000000garbagegarbagegarbagegar...
0x0200000000000000garbagegarbagegarbagegar...
Each of these keys is created several times and -- for the test to pass -- the result must be the same each time.
By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses.
Problem 2:
Cache stats are global, so there's no good way to reliably
verify that e.g. a given read causes 0 cache misses,
because something done by Scylla in a background can trigger a cache miss.
This can cause the test to fail spuriously.
With how the test framework and the cache are designed, there's probably
no good way to test this properly. It would require ensuring that cache
stats are per-read, or at least per-table, and that Scylla's background
activity doesn't cause enough memory pressure to evict the tested rows.
This patch tries to deal with the flakiness without deleting the test
altogether by letting it retry after a failure if it notices that it
can be explained by a read which wasn't done by the test.
(Though, if the test can't be written well, maybe it just shouldn't be written...)
Fixes#21536
Should be backported to prevent flaky failures in older branches.
Closesscylladb/scylladb#21948
* github.com:scylladb/scylladb:
cache_algorithm_test: harden against stats being confused by background activity
cache_algorithm_test: fix a use of an uninitialized variable
Explicitly disable tablets in a few tests that rely on features not yet supported with tablets.
Closesscylladb/scylladb#21070
* github.com:scylladb/scylladb:
test: disable tablets in test_raft_fix_broken_snapshot
test: disable tablets in test_raft_recovery_stuck
test: disable tablets in tet_raft_recovery_majority_lost
test: don't run test_raft_recovery_basic with tablets
test: fix test_writes_to_previous_cdc_generations work with tablets
test: fix topology_custom/test_mv_topology_change.py to work with tablets
test: correct replication factor in test_multidc.py
test: update test_view_build_status to work with tablets
test: fix test_change_rpc_address with tablets.
test: explicitly disable tablets in test_gropu0_schema_versioning
test: disable tablets in topology/test_mutation_schema_change.py
test: disable tablets in topology/test_mv.py
Cache stats are global, so there's no good way to reliably
verify that e.g. a given read causes 0 cache misses,
because something done by Scylla in a background can trigger a cache miss.
This can cause the test to fail spuriously.
With how the test framework and the cache are designed, there's probably
no good way to test this properly. It would require ensuring that cache
stats are per-read, or at least per-table, and that Scylla's background
activity doesn't cause enough memory pressure to evict the tested rows.
This patch tries to deal with the flakiness without deleting the test
altogether by letting it retry after a failure if it notices that it
can be explained by a read which wasn't done by the test.
(Though, if the test can't be written well, maybe it just shouldn't be written...)
The test needs to create some arbitrary partition keys of a given size.
It intends to create keys of the form:
0x0000000000000000000000000000000000000000...
0x0100000000000000000000000000000000000000...
0x0200000000000000000000000000000000000000...
But instead, unintentionally, it creates partially initialized keys of the form:
0x0000000000000000garbagegarbagegarbagegar...
0x0100000000000000garbagegarbagegarbagegar...
0x0200000000000000garbagegarbagegarbagegar...
Each of these keys is created several times and -- for the test to pass --
the result must be the same each time.
By coincidence, this is usually the case, since the same allocator slots are used.
But if some background task happens to overwrite the allocator slot during a
preemption, the keys used during "SELECT" will be different than the keys used
during "INSERT", and the test will fail due to extra cache misses.
This patch adds a few more functional tests for the CQL materialized
view feature in the cqlpy. The new tests pass, but helped me catch bugs (and
understand what are *not* bugs) while refactoring some view update code.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
We already have in test_gsi_updatetable.py several functional tests for
the Alternator feature of adding or deleting a GSI on an existing table,
through the UpdateTable operation.
This patch adds many more tests for various corner cases of this feature -
tests developed in parallel with actually implementing that feature.
All test in test_gsi_updatetable.py pass on Amazon DynamoDB but currently
xfail on Alternator, due to the following issues:
* #11567: Alternator: allow adding a GSI to a pre-existing table
* #9424: Alternator GSIs should exclude items with empty-string key components
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The UpdateTable tests for creating and deleting a GSI need to wait for
the asynchronous operation of the view's building and deletion, using
two utility functions wait_for_gsi() and wait_for_gsi_gone().
Because I originally wrote these tests for DynamoDB and its extremely
high latency for these operations, these functions waited a whole second
before checking for the end of the wait. This whole-second sleep is
absurd in Alternator where building a small view takes just a fraction of
a second. So let's lower the sleep time from 1 second to 0.1 seconds,
and allow these tests to pass much faster on Alternator (once this
feature is implemented in Alternator, of course - until then all these
tests still fail immediately on an unimplemented operation).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The source file test/alternator/test_gsi.py has already grown very
large, so this patch moves all the existing tests related to using
UpdateTable to add or delete a GSIs to a separate file:
test_gsi_updatetable.py.
We just move tests here - no new tests or functional changes to the
tests - but did use the opportunity for some small improvements in
the comments.
In the next patch we'll add more tests to this new file.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
We have a test, test/alternator/test_gsi.py::test_update_gsi_pk which
created a GSI whose *partition key* was a regular column in the base
table, and exercised various elaborate updates requiring adding,
updating and deleting of rows from the materialized view.
In this patch, we add another similar test case, just for a *clustering
key*.
Both these tests are important regression tests - when we later
reimplement GSI we'll want to verify that none of the complex update
scenarios got broken (and indeed, some broken code did break these
tests).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds a test reproducing issue #11471 - where DescribeTable
on a table that as an already built GSI (creating with the table itself)
must return IndexStatus == "ACTIVE".
This test passes on DynamoDB, but xfails on Alternator because of
issue #11471.
We actually had this check earlier, but it was part of a bigger xfailing
tests that checked multiple features. It's better to have it as a
separate test just for this feature, as we'll soon fix this issue and
make this test pass.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The test we had for counting Alternator operations metrics ran the
UpdateTable request without any parameters, which isn't actually a
valid call - Amazon DynamoDB rejects such a call, saying one of the
different parameters must be present, and we'll want to do that
later too.
So let's fix the test to use a valid UpdateTable request, one that
does the silly BillingMode='PAY_PER_REQUEST'. This is already the
current setting, so nothing is really changed, but it's still counted
as an operation in the metric.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Test that when a table has an LSI, then if the indexed attribute is
missing, the item is added to the base table but not the index.
We already have exactly the same test for GSI in test_gsi.py, but forgot
to do write the same test for LSI. It's important to test this scenario
separately for GSIs and LSIs because in an upcoming GSI reimplementation
we plan to make the GSI and LSI implementation slightly different, and
they can have separate bugs (and in fact, we had such an LSI-specific
bug in one broken implementation).
We also have the same scenario that is tested here in the test
test_streams.py::test_streams_updateitem_old_image_lsi_missing_column
but that was a Alternator Streams test and we should have a more basic
test for this scenario in test_lsi.py.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Whereas GSIs have an IndexStatus when described by DescribeTable,
LSIs do not. The purpose of IndexStatus is to tell when the index is live,
and this is not needed for LSIs because they cannot be added to a base
table that already exists.
We already had a test for this, but it was hidden in an xfailing test
for many different DescribeTable attributes - so let's move it into it's
own, *passing*, test. The new tests passes on both Alternator and
Amazon DynamoDB.
This test is an important regression test for when we later add
IndexStatus support to GSI, and this test will ensure that we don't
accidentally introduce IndexStatus to LSIs as well - DynamoDB doesn't
generate it for LSIs so neither should Alternator.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In later patches we will implement (as requested in issue #11567) the
UpdateTable operation for creating a new GSI or removing a GSI on an
existing table. In this patch we add to test/alternator/test_cql_rbac.py
tests to exhaustively check that the new operations will behave as expected
in respect to role-based access control (RBAC):
1. UpdateTable requires the ALTER permissions on the affected table -
as was already the case before (and was documented in compatibility.md).
This should also be true for the newly-implemented UpdateTable
operations that create a GSI and delete a GSI, and we test that.
The above statement may sound counter-intuitive - why does creating
or deleting a GSI require ALTER permissions (on the base table), not
CREATE or DROP permissions? But this makes sense when you consider
that CREATE permissions should allow you create new independent tables,
not to change the behavior or performance of existing tables (which
adding a GSI does).
2. When a role has permissions to create a GSI, it should be able to
read the new GSI (SELECT permissions). This is known as "auto-grant".
3. When a GSI is deleted, whatever permissions was set on it is revoked,
so that if it's later recreated, the old permissions don't resurface.
This is known as "auto-revoke".
Because the UpdateTable feature for creating and deleting a GSI is not
yet enabled, the new tests are all marked "xfail".
The new tests, like all tests in the file test/alternator/test_cql_rbac.py
are Scylla-only and are skipped on Amazon DynamoDB - because they test
the Scylla-only CQL-based role-based access control API.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In tablets mode, it is not allowed to CREATE a table
if replication factor can be satisfied. E.g. if the keyspace
is defined to have replication_factor = 3 and there
are only 2 replicas, in vnodes mode one still can
CREATE the table and write to it, whereas in tablets
mode one gets an error.
The confusion is what 'replication_factor' means.
When NetworkTopologyStrategy is used, in multi-dc mode, each DC must
have at least 'replication_factor' replicas and stores
'replication_factor' copies of data.
The test author (as well as the author of this "fix", see
my confused report of gh-21166) assumed that 'replication_factor'
means the total number of replicas, not the number of replicas
per DC.
Correct the test to use only one replica per DC, as this is the
topology the test is working with. The test is not specific
to the number of replicas, so the change does not impact
the logic of the test.
With tablets, it's not allowed to create a table in a keyspace
which replication factor exceeds the actual number of nodes in the
cluster.
Pass the replication factor to random_tables fixture so that
a keyspace with a correct replication_factor is created.
The test file contains two test cases, which both test
materialized view tombstone gc settings. With tablets the default
is "repair" which is different from vnodes.
The tests are testing that the gc settings are not inherited. With
tablets, the gc settings are forced. This is indistinguishable from
inheriting, so the tests are failing when run with tablets.
this issue was identified by clang-20.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#21835
* github.com:scylladb/scylladb:
locator: remove unused member variable
auth: remove unused member variable
db: remove unused member variable
When investigating issue #21724, the docstring for
`test_recover_stuck_raft_recovery` was found to be difficult to follow.
Restructured the docstring into an ordered list to:
1. Improve readability
2. Clearly outline the test steps
3. Make the test's logic and flow more immediately comprehensible
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21728
The /column_family/compaction_strategy has GET and POST implemented, the
latter changes the strategy on the table.
Unknown strategy name implicitly renders internal server error code by
catching exception from compaction_strategy::type() that tries to
convert strategy name string to strategy enum class type.
This is to finish validation of #21533
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21569
This pull request is continuation of scylladb/scylladb#20688 - contents of the main commit are the same, the only change is the additional commit with a test.
Until this patch, the materialized view flow-control algorithm (https://www.scylladb.com/2018/12/04/worry-free-ingestion-flow-control/) used a constant delay_limit_us hard-coded to one second, which means that when the size of view-update backlog reached the maximum (10% of memory), we delay every request by an additional second - while smaller amounts of backlog will result in smaller delays.
This hard-coded one maximum second delay was considered *huge* - it will slow down a client with concurrency 1000 to just 1000 requests per second - but we already saw some workloads where it was not enough - such as a test workload running very slow reads at high concurrency on a slow machine, where a latency of over one second was expected for each read, so adding a one second latecy for writes wasn't having any noticable affect on slowing down the client.
So this patch replaces the hard-coded default with a live-updateable configuration parameter, `view_flow_control_delay_limit_in_ms`, which defaults to 1000ms as before.
Another useful way in which the new `view_flow_control_delay_limit_in_ms` can be used is to set it to 0. In that case, the view-update flow control always adds zero delay, and in effect - does absolutely nothing. This setting can be used in emergency situations where it is suspected that the MV flow control is not behaving properly, and the user wants to disable it.
The new parameter's help string mentions both these use cases of the parameter.
Fixes#18187
This is new functionality, no need to backport to any open source release.
Closesscylladb/scylladb#21647
* github.com:scylladb/scylladb:
materialized views: test for the MV delay configuration parameter
service: add injection for skipping view update backlog
materialized view: make flow-control maximum delay configurable
When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR)
Let's add a comment to notify the user we have conflicts
Closesscylladb/scylladb#21939