Compare commits

..

390 Commits

Author SHA1 Message Date
Botond Dénes
5a67119dce Merge '[Backport 2025.1] Merge 'test/pylib: servers_add: support list of property_files' from Benny Halevy' from Dawid Mędrek
So that a multi-dc/multi-rack cluster can be populated in a single call.

Original PR: scylladb/scylladb#23341

Fixes https://github.com/scylladb/scylladb/issues/23551

(cherry picked from commit 0fdf2a2090)

Closes scylladb/scylladb#23549

* github.com:scylladb/scylladb:
  Merge 'test/pylib: servers_add: support list of property_files' from Benny Halevy
  test.py: Remove reuse cluster in cluster tests
  test.py apply prepare_3_nodes_cluster in topology
  test.py: introduce prepare_3_nodes_cluster marker
2025-05-27 14:30:07 +03:00
Anna Stuchlik
ee2e6b09fb doc: fix the product name for version 2025.1 (on branch-2025.1)
Starting with 2025.1, ScyllaDB versions are no longer called "Enterprise", but the OS support page still uses that label.
This commit fixes that by replacing "Enterprise" with "ScyllaDB".

This update is required since we've removed "Enterprise" from everywhere else, including the commands, so having it here is confusing.

The same update was added on master and on branch-2025.2: https://github.com/scylladb/scylladb/pull/24204
However, that PR cannot be backported to branch-2025.1 because versions 2025.2 and later
store OS support in a JSON file, whereas 2025.1 and earlier in a regular RST file.
See https://github.com/scylladb/scylladb/issues/24179#issuecomment-2888632722.

Fixes https://github.com/scylladb/scylladb/issues/24179

Closes scylladb/scylladb#24222
2025-05-26 10:28:57 +03:00
Łukasz Paszkowski
a1082de797 tools/scylla-nodetool: fix crash when rows_merged cells contain null
Any empty object of the json::json_list type has its internal
_set variable assigned to false which results in such objects
being skipped by the json::json_builder.

Hence, the json returned by the api GET//compaction_manager/compaction_history
does not contain the field `rows_merged` if a cell in the
system.compaction_history table is null or an empty list.

In such cases, executing the command `nodetool compactionhistory`
will result in a crash with the following error message:
`error running operation: rjson::error (JSON assert failed on condition 'false'`

The patch fixes it by checking if the json object contains the
`rows_merged` element before processing. If the element does
not exist, the nodetool will now produce an empty list.

Fixes https://github.com/scylladb/scylladb/issues/23540

Closes scylladb/scylladb#23514

(cherry picked from commit 113647550f)

Closes scylladb/scylladb#24113
2025-05-20 08:30:33 +03:00
Aleksandra Martyniuk
1db71eac3c test_tablet_repair_hosts_filter: change injected error
test_tablet_repair_hosts_filter checks whether the host filter
specfied for tablet repair is correctly persisted. To check this,
we need to ensure that the repair is still ongoing and its data
is kept. The test achieves that by failing the repair on replica
side - as the failed repair is going to be retried.

However, if the filter does not contain any host (included_host_count = 0),
the repair is started on no replica, so the request succeeds
and its data is deleted. The test fails if it checks the filter
after repair request data is removed.

Fail repair on topology coordinator side, so the request is ongoing
regardless of the specified hosts.

Fixes: #23986.

Closes scylladb/scylladb#24003

(cherry picked from commit 2549f5e16b)

Closes scylladb/scylladb#24075
2025-05-19 12:36:45 +03:00
Pavel Emelyanov
3296920fa3 Merge '[Backport 2025.1] logalloc_test: don't test performance in test background_reclaim' from Scylladb[bot]
The test is failing in CI sometimes due to performance reasons.

There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
   doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
   reclaim, the `background_reclaim` scheduling group isn't actually
   guaranteed to get any CPU, regardless of shares. If the process is
   switched out inside the `background_reclaim` group, it might
   accumulate so much vruntime that it won't get any more CPU again
   for a long time.

We have seen both.

This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.

So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test). After the patch,
we only check that the background reclaim happens *eventually*.

Fixes https://github.com/scylladb/scylladb/issues/15677

Backporting this is optional. The test is flaky even in stable branches, but the failure is rare.

- (cherry picked from commit c47f438db3)

- (cherry picked from commit 1c1741cfbc)

Parent PR: #24030

Closes scylladb/scylladb#24092

* github.com:scylladb/scylladb:
  logalloc_test: don't test performance in test `background_reclaim`
  logalloc: make background_reclaimer::free_memory_threshold publicly visible
2025-05-19 12:35:39 +03:00
Aleksandra Martyniuk
b4e2773e63 streaming: use host_id in file streaming
Use host ids instead of ips in file-streaming.

Fixes: #22421.

Closes scylladb/scylladb#24055

(cherry picked from commit 2dcea5a27d)

Closes scylladb/scylladb#24118
2025-05-19 12:33:54 +03:00
Aleksandra Martyniuk
b0a7ca7c17 cql_test_env: main: move stream_manager initialization
Currently, stream_manager is initialized after storage_service and
so it is stopped before the storage_service is. In its stop method
storage_service accesses stream_manager which is uninitialized
at a time.

Move stream_manager initialization over the storage_service initialization.

Fixes: #23207.

Closes scylladb/scylladb#24008

(cherry picked from commit 9c03255fd2)

Closes scylladb/scylladb#24189
2025-05-19 12:30:01 +03:00
Michał Chojnowski
c339f464b6 logalloc_test: don't test performance in test background_reclaim
The test is failing in CI sometimes due to performance reasons.

There are at least two problems:
1. The initial 500ms (wall time) sleep might be too short. If the reclaimer
   doesn't manage to evict enough memory during this time, the test will fail.
2. During the 100ms (thread CPU time) window given by the test to background
   reclaim, the `background_reclaim` scheduling group isn't actually
   guaranteed to get any CPU, regardless of shares. If the process is
   switched out inside the `background_reclaim` group, it might
   accumulate so much vruntime that it won't get any more CPU again
   for a long time.

We have seen both.

This kind of timing test can't be run reliably on overcommitted machines
without modifying the Seastar scheduler to support that (by e.g. using
thread clock instead of wall time clock in the scheduler), and that would
require an amount of effort disproportionate to the value of the test.

So for now, to unflake the test, this patch removes the performance test
part. (And the tradeoff is a weakening of the test).

(cherry picked from commit 1c1741cfbc)
2025-05-16 11:49:18 +00:00
Michał Chojnowski
5f4927eaaf logalloc: make background_reclaimer::free_memory_threshold publicly visible
Wanted by the change to the background_reclaim test in the next patch.

(cherry picked from commit c47f438db3)
2025-05-16 11:49:18 +00:00
Wojciech Mitros
cadc3eeae8 mv: remove queue length limit from the view update read concurrency semaphore
Each view update is correlated to a write that generates it (aside from view
building which is throttled separately). These writes are limited by a throttling
mechanism, which effectively works by performing the writes with CL=ALL if
ongoing writes exceed some memory usage limit

When writes generate view updates, they usually also need to perform a read. This read
goes through a read concurrency semaphore where it can get delayed or killed. The
semaphore allows up to 100 concurrent reads and puts all remaining reads in a queue.
If the number of queued reads exceeds a specific limit, the view update will fail on
the replica, causing inconsistencies.

This limit is not necessary. When a read gets queued on the semaphore, the write that's
causing the view update is paused, so the write takes part in the regular write throttling.
If too many writes get stuck on view update reads, they will get throttled, so their
number is limited and the number of queued reads is also limited to the same amount.

In this patch we remove the specified queue length limit for the view update read concurrency
semaphore. Instead of this limit, the queue will be now limited indirectly, by the base write
throttling mechanism. This may allow the queue grow longer than with the previous limit, but
it shouldn't ever cause issues - we only perform up to 100 actual reads at once, and the
remaining ones that get queued use a tiny amount of memory, less than the writes that generated
them and which are getting limited directly.

Fixes https://github.com/scylladb/scylladb/issues/23319

Closes scylladb/scylladb#24112

(cherry picked from commit 5920647617)

Closes scylladb/scylladb#24168
2025-05-16 11:46:15 +03:00
Dawid Mędrek
6987a5e363 locator/production_snitch_base: Reduce log level when property file incomplete
We're reducing the log level in case the provided property file is incomplete.
The rationale behind this change is related to how CCM interacts with Scylla:

* The `GossipingPropertyFileSnitch` reloads the `cassandra-rackdc.properties`
  configuration every 60 seconds.
* When a new node is added to the cluster, CCM recreates the
  `cassandra-rackdc.properties` file for EVERY node.

If those two processes start happening at about the same time, it may lead
to Scylla trying to read a not-completely-recreated file, and an error will
be produced.

Although we would normally fix this issue and try to avoid the race, that
behavior will be no longer relevant as we're making the rack and DC values
immutable (cf. scylladb/scylladb#23278). What's more, trying to fix the problem
in the older versions of Scylla could bring a more serious regression. Having
that in mind, this commit is a compromise between making CI less flaky and
having minimal impact when backported.

We do the same for when the format of the file is invalid: the rationale
is the same.

We also do that for when there is a double declaration. Although it seems
impossible that this can stem from the same scenario the other two errors
can (since if the format of the file is valid, the error is justified;
if the format is invalid, it should be detected sooner than a doubled
declaration), let's stay consistent with the logging level.

Fixes scylladb/scylladb#20092

Closes scylladb/scylladb#23956

(cherry picked from commit 9ebd6df43a)

Closes scylladb/scylladb#24142
2025-05-16 11:45:48 +03:00
Anna Stuchlik
6aaf3354bc doc: remove the redundant pages
This commit removes two redundant pages and adds the related redirections.

- The Tutorials page is a duplicate and is not maintained anymore.
  Having it in the docs hurts the SEO of the up-to-date Tutorias page.
- The Contributing page is not helpful. Contributions-related information
  should be maintained in the project README file.

Fixes https://github.com/scylladb/scylladb/issues/17279
Fixes https://github.com/scylladb/scylladb/issues/24060

Closes scylladb/scylladb#24090

(cherry picked from commit eed8373b77)

Closes scylladb/scylladb#24140
2025-05-16 11:45:28 +03:00
Ernest Zaslavsky
4cd6f58111 database_test: Wait for the index to be created
Just call `wait_until_built` for the index in question

fix: https://github.com/scylladb/scylladb/issues/24059

Closes scylladb/scylladb#24117

(cherry picked from commit 4a7c847cba)

Closes scylladb/scylladb#24131
2025-05-16 11:45:04 +03:00
Piotr Smaron
90af439e47 cql: fix CREATE tablets KS warning msg
Materialized Views and Secondary Indexes are yet another features that
keyspaces with tablets do not support, but these were not listed in a
warning message returned to the user on CREATE KEYSPACE statement. This
commit adds the 2 missing features.

Fixes: #24006

Closes scylladb/scylladb#23902

(cherry picked from commit f740f9f0e1)

Closes scylladb/scylladb#24079
2025-05-16 11:44:24 +03:00
Piotr Dulikowski
89d24d813c topology_coordinator: silence ERROR messages on abort
When the topology coordinator is shut down while doing a long-running
operation, the current operation might throw a raft::request_aborted
exception. This is not a critical issue and should not be logged with
ERROR verbosity level.

Make sure that all the try..catch blocks in the topology coordinator
which:

- May try to acquire a new group0 guard in the `try` part
- Have a `catch (...)` block that print an ERROR-level message

...have a pass-through `catch (raft::request_aborted&)` block which does
not log the exception.

Fixes: scylladb/scylladb#22649

Closes scylladb/scylladb#23962

(cherry picked from commit 156ff8798b)

Closes scylladb/scylladb#24078
2025-05-16 11:43:50 +03:00
Botond Dénes
a0c01964cd tools/scylla-nodetool: status: handle negative load sizes
Negative load sizes don't make sense, but we've seen a case in
production, where a negative number was returned by ScyllaDB REST API,
so be prepared to handle these too.

Fixes: scylladb/scylladb#24134

Closes scylladb/scylladb#24135

(cherry picked from commit 700a5f86ed)

Closes scylladb/scylladb#24167
2025-05-15 17:39:36 +03:00
Jenkins Promoter
2034578d0e Update pgo profiles - aarch64 2025-05-15 04:22:14 +03:00
Jenkins Promoter
e97c883ef8 Update pgo profiles - x86_64 2025-05-15 04:07:11 +03:00
Tomasz Grabiec
e46e02e3f5 Merge '[Backport 2025.1] test/topology: use standard new_test_keyspace functions' from Scylladb[bot]
This PR improves and refactors the test.topology.util new_test_keyspace generator
and adds a corresponding create_new_test_keyspace function to be used by most if not
all topology unit tests in order to standardize the way the tests create keyspaces
and to mitigate the python driver create keyspace retry issue: https://github.com/scylladb/python-driver/issues/317

Fixes #22342
Fixes #21905
Refs https://github.com/scylladb/scylla-enterprise/issues/5060

Fixes #23699

- (cherry picked from commit 50ce0aaf1c)

- (cherry picked from commit 5d448f721e)

- (cherry picked from commit f946302369)

- (cherry picked from commit 0fd1b846fe)

- (cherry picked from commit a66ddb7c04)

- (cherry picked from commit df84097a4b)

- (cherry picked from commit 59687c25e0)

- (cherry picked from commit fdb339bf28)

- (cherry picked from commit 205ed113dd)

- (cherry picked from commit 57faab9ffa)

- (cherry picked from commit 4fefffe335)

- (cherry picked from commit 480a5837ab)

- (cherry picked from commit fed078a38a)

- (cherry picked from commit c6653e65ba)

- (cherry picked from commit 9c095b622b)

- (cherry picked from commit 0668c642a2)

- (cherry picked from commit 0e11aad9c5)

- (cherry picked from commit ef85c4b27e)

- (cherry picked from commit b13e48b648)

- (cherry picked from commit a82e734110)

- (cherry picked from commit 629ee3cb46)

- (cherry picked from commit 42a104038d)

- (cherry picked from commit d5e3c578f5)

- (cherry picked from commit c05794c156)

- (cherry picked from commit 966cf82dae)

- (cherry picked from commit 11005b10db)

- (cherry picked from commit ff9c8428df)

- (cherry picked from commit 55b35eb21c)

- (cherry picked from commit 5759a97eb4)

- (cherry picked from commit c68d2a471c)

- (cherry picked from commit e05372afa4)

- (cherry picked from commit 380c5e5ac8)

- (cherry picked from commit 3f35491264)

- (cherry picked from commit e72a9d3faa)

- (cherry picked from commit 47326d01b7)

- (cherry picked from commit 72bc4016e7)

- (cherry picked from commit 4fd6c2d24e)

- (cherry picked from commit 50a8f5c1c0)

- (cherry picked from commit 005ceb77d3)

- (cherry picked from commit 649e68c6db)

- (cherry picked from commit 0b88ea9798)

- (cherry picked from commit 6b37d04aa9)

- (cherry picked from commit e59aca66bf)

- (cherry picked from commit 5ff3153912)

- (cherry picked from commit 20f7eda16e)

- (cherry picked from commit f30e4c6917)

- (cherry picked from commit 96d327fb83)

- (cherry picked from commit 16ef78075c)

- (cherry picked from commit 2d4af01281)

- (cherry picked from commit b810791fbb)

- (cherry picked from commit 46b1850f0c)

- (cherry picked from commit 0564e95c51)

- (cherry picked from commit 12f85ce57c)

- (cherry picked from commit 9829b1594f)

- (cherry picked from commit cbe79b20f7)

- (cherry picked from commit cc281ff88d)

Parent PR: #22399

Closes scylladb/scylladb#23408

* github.com:scylladb/scylladb:
  test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
  test/repair: create_table_insert_data_for_repair: create keyspace with unique name
  topology_tasks/test_tablet_tasks: use new_test_keyspace
  topology_tasks/test_node_ops_tasks: use new_test_keyspace
  topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
  topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
  topology_custom/test_view_build_status: use new_test_keyspace
  topology_custom/test_truncate_with_tablets: use new_test_keyspace
  topology_custom/test_topology_failure_recovery: use new_test_keyspace
  topology_custom/test_tablets_removenode: use create_new_test_keyspace
  topology_custom/test_tablets_migration: use new_test_keyspace
  topology_custom/test_tablets_merge: use new_test_keyspace
  topology_custom/test_tablets_intranode: use new_test_keyspace
  topology_custom/test_tablets_cql: use new_test_keyspace
  topology_custom/test_tablets2: use *new_test_keyspace
  topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
  test/cluster/test_tablets.py: Fix test errorneous indentation
  topology_custom/test_tablets: use new_test_keyspace
  topology_custom/test_table_desc_read_barrier: use new_test_keyspace
  topology_custom/test_shutdown_hang: use new_test_keyspace
  topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
  topology_custom/test_rpc_compression: use new_test_keyspace
  topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
  topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
  topology_custom/test_raft_no_quorum: use new_test_keyspace
  topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
  topology_custom/test_query_rebounce: use new_test_keyspace
  topology_custom/test_not_enough_token_owners: use new_test_keyspace
  topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
  topology_custom/test_node_isolation: use create_new_test_keyspace
  topology_custom/test_mv_topology_change: use new_test_keyspace
  topology_custom/test_mv_tablets_replace: use new_test_keyspace
  topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
  topology_custom/test_mv_tablets: use new_test_keyspace
  topology_custom/test_mv_read_concurrency: use new_test_keyspace
  topology_custom/test_mv_fail_building: use new_test_keyspace
  topology_custom/test_mv_delete_partitions: use new_test_keyspace
  topology_custom/test_mv_building: use new_test_keyspace
  topology_custom/test_mv_backlog: use new_test_keyspace
  topology_custom/test_mv_admission_control: use new_test_keyspace
  topology_custom/test_major_compaction: use new_test_keyspace
  topology_custom/test_maintenance_mode: use new_test_keyspace
  topology_custom/test_lwt_semaphore: use new_test_keyspace
  topology_custom/test_ip_mappings: use new_test_keyspace
  topology_custom/test_hints: use new_test_keyspace
  topology_custom/test_group0_schema_versioning: use new_test_keyspace
  topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
  topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
  topology_custom/test_read_repair: use new_test_keyspace
  topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
  topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
  topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
  topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
  test/topology/util: new_test_keyspace: drop keyspace only on success
  test/topology/util: refactor new_test_keyspace
  test/topology/util: CREATE KEYSPACE IF NOT EXISTS
  test/topology/util: new_test_keyspace: accept ManagerClient
2025-05-13 11:29:44 +02:00
Benny Halevy
ee14a0dac1 test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
and return the keyspace unique name to the caller.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit cc281ff88d)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-05-12 13:58:19 +03:00
Benny Halevy
5eca0b6d16 test/repair: create_table_insert_data_for_repair: create keyspace with unique name
and return it to the caller

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit cbe79b20f7)
2025-05-12 13:58:19 +03:00
Benny Halevy
dc1ec6e6d5 topology_tasks/test_tablet_tasks: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 9829b1594f)
2025-05-12 13:58:19 +03:00
Benny Halevy
fda3a37026 topology_tasks/test_node_ops_tasks: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 12f85ce57c)
2025-05-12 13:58:18 +03:00
Benny Halevy
5b848ecf0a topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0564e95c51)
2025-05-12 13:58:18 +03:00
Benny Halevy
0c73e4522f topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 46b1850f0c)
2025-05-12 13:58:18 +03:00
Benny Halevy
2fec75d557 topology_custom/test_view_build_status: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b810791fbb)
2025-05-12 13:58:18 +03:00
Benny Halevy
80a5c121a3 topology_custom/test_truncate_with_tablets: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 2d4af01281)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-05-12 13:58:18 +03:00
Benny Halevy
77aaa3bfbb topology_custom/test_topology_failure_recovery: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 16ef78075c)
2025-05-12 13:58:18 +03:00
Benny Halevy
12191fca90 topology_custom/test_tablets_removenode: use create_new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 96d327fb83)
2025-05-12 13:58:18 +03:00
Benny Halevy
d687488471 topology_custom/test_tablets_migration: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit f30e4c6917)
2025-05-12 13:58:18 +03:00
Benny Halevy
2475e333ef topology_custom/test_tablets_merge: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 20f7eda16e)
2025-05-12 13:58:18 +03:00
Benny Halevy
98704d9033 topology_custom/test_tablets_intranode: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 5ff3153912)
2025-05-12 13:58:18 +03:00
Benny Halevy
aa6e421b84 topology_custom/test_tablets_cql: use new_test_keyspace
And create_new_test_keyspace when we need drop
to be explicit.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e59aca66bf)
2025-05-12 13:58:18 +03:00
Benny Halevy
d01d121b64 topology_custom/test_tablets2: use *new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 6b37d04aa9)
2025-05-12 13:58:18 +03:00
Benny Halevy
2949cb2c50 topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0b88ea9798)
2025-05-12 13:58:18 +03:00
Dawid Mędrek
6f65e0469a test/cluster/test_tablets.py: Fix test errorneous indentation
Some of the statements in the test are not indented properly
and, as a result, are never run. It's most likely a small mistake,
so let's fix it.

Closes scylladb/scylladb#23659

(cherry picked from commit 0ed21d9cc1)
2025-05-12 13:58:18 +03:00
Benny Halevy
2f2d392d54 topology_custom/test_tablets: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 649e68c6db)
2025-05-12 13:58:18 +03:00
Benny Halevy
871ef9c9b2 topology_custom/test_table_desc_read_barrier: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 005ceb77d3)
2025-05-12 13:58:18 +03:00
Benny Halevy
84ff0eefdc topology_custom/test_shutdown_hang: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 50a8f5c1c0)
2025-05-12 13:58:18 +03:00
Benny Halevy
a866a1ccf3 topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4fd6c2d24e)
2025-05-12 13:58:18 +03:00
Benny Halevy
b20816ffd2 topology_custom/test_rpc_compression: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 72bc4016e7)
2025-05-12 13:58:18 +03:00
Benny Halevy
c87d12c416 topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 47326d01b7)
2025-05-12 13:58:18 +03:00
Benny Halevy
23052b4df1 topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
Using the new_test_keyspace fixture is awkward for this test
as it is written to explicitly drop the created keyspaces
at certain points.

Therefore, just use create_new_test_keyspace to standardize the
creation procedure.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e72a9d3faa)
2025-05-12 13:58:18 +03:00
Benny Halevy
51cd55fb2e topology_custom/test_raft_no_quorum: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 3f35491264)
2025-05-12 13:58:18 +03:00
Benny Halevy
296fbbf1d6 topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 380c5e5ac8)
2025-05-12 13:58:18 +03:00
Benny Halevy
9b602cf6c1 topology_custom/test_query_rebounce: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit e05372afa4)
2025-05-12 13:58:18 +03:00
Benny Halevy
4c96ff3e36 topology_custom/test_not_enough_token_owners: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit c68d2a471c)
2025-05-12 13:58:18 +03:00
Benny Halevy
0d6383f70d topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 5759a97eb4)
2025-05-12 13:58:18 +03:00
Benny Halevy
2c8d6075b8 topology_custom/test_node_isolation: use create_new_test_keyspace
new_test_keyspace is problematic here since
the presence of the banned node can fail the automatic drop of
the test keyspace due to NoHostAvailable (in debug mode for
some reason)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 55b35eb21c)
2025-05-12 13:58:18 +03:00
Benny Halevy
7a8e3f7bb3 topology_custom/test_mv_topology_change: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ff9c8428df)
2025-05-12 13:58:18 +03:00
Benny Halevy
8ece858560 topology_custom/test_mv_tablets_replace: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 11005b10db)
2025-05-12 13:58:18 +03:00
Benny Halevy
b29a01a453 topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 966cf82dae)
2025-05-12 13:58:18 +03:00
Benny Halevy
4b6d5c82ed topology_custom/test_mv_tablets: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit c05794c156)
2025-05-12 13:58:18 +03:00
Benny Halevy
33cf81e195 topology_custom/test_mv_read_concurrency: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit d5e3c578f5)
2025-05-12 13:58:18 +03:00
Benny Halevy
c2c0c0b4e1 topology_custom/test_mv_fail_building: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 42a104038d)
2025-05-12 13:58:18 +03:00
Benny Halevy
149eab295a topology_custom/test_mv_delete_partitions: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 629ee3cb46)
2025-05-12 13:58:18 +03:00
Benny Halevy
1c0b6aaa47 topology_custom/test_mv_building: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a82e734110)
2025-05-12 13:58:18 +03:00
Benny Halevy
ec888b253a topology_custom/test_mv_backlog: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b13e48b648)
2025-05-12 13:58:18 +03:00
Benny Halevy
5b418eebe7 topology_custom/test_mv_admission_control: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit ef85c4b27e)
2025-05-12 13:58:18 +03:00
Benny Halevy
3b4ac3cb2d topology_custom/test_major_compaction: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0e11aad9c5)
2025-05-12 13:58:18 +03:00
Benny Halevy
67c2ca55a6 topology_custom/test_maintenance_mode: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0668c642a2)
2025-05-12 13:58:18 +03:00
Benny Halevy
4e778b4875 topology_custom/test_lwt_semaphore: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 9c095b622b)
2025-05-12 13:58:18 +03:00
Benny Halevy
dbf42b8e75 topology_custom/test_ip_mappings: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit c6653e65ba)
2025-05-12 13:58:18 +03:00
Benny Halevy
9adb5a4e84 topology_custom/test_hints: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit fed078a38a)
2025-05-12 13:58:18 +03:00
Benny Halevy
ecd337763c topology_custom/test_group0_schema_versioning: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 480a5837ab)
2025-05-12 13:58:18 +03:00
Benny Halevy
9751bd3dbf topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4fefffe335)
2025-05-12 13:58:18 +03:00
Benny Halevy
9179938036 topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 57faab9ffa)
2025-05-12 13:58:17 +03:00
Benny Halevy
a9c8651722 topology_custom/test_read_repair: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 205ed113dd)
2025-05-12 13:58:17 +03:00
Benny Halevy
5a563503b4 topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit fdb339bf28)
2025-05-12 13:58:17 +03:00
Benny Halevy
1d3d48c79e topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 59687c25e0)
2025-05-12 13:58:17 +03:00
Benny Halevy
84adef0489 topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit df84097a4b)
2025-05-12 13:58:17 +03:00
Benny Halevy
bffd3fa3b7 topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit a66ddb7c04)
2025-05-12 13:58:17 +03:00
Benny Halevy
ab86683e7a test/topology/util: new_test_keyspace: drop keyspace only on success
When the test fails with exception, keep the keyspace
intact for post-mortem analysis.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0fd1b846fe)
2025-05-12 13:58:17 +03:00
Benny Halevy
09ffcb2d98 test/topology/util: refactor new_test_keyspace
Define create_new_test_keyspace that can be used in
cases we cannot automatically drop the newly created keyspace
due to e.g. loss of raft majority at the end of the test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit f946302369)
2025-05-12 13:58:17 +03:00
Benny Halevy
ce0c6e7595 test/topology/util: CREATE KEYSPACE IF NOT EXISTS
Workaround spurious keyspace creation errors
due to retries caused by
https://github.com/scylladb/python-driver/issues/317.
This is safe since the function uses a unique_name for the
keyspace so it should never exist by mistake.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 5d448f721e)
2025-05-12 13:58:17 +03:00
Benny Halevy
6d88937dcd test/topology/util: new_test_keyspace: accept ManagerClient
Following patch will convert topology tests to use
new_test_keyspace and friends.

Some tests restart server and reset the driver connection
so we cannot use the original cql Session for
dropping the created keyspace in the `finally` block.

Pass the ManagerClient instead to get a new cql
session for dropping the keyspace.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 50ce0aaf1c)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-05-12 13:58:16 +03:00
Patryk Jędrzejczak
654c097121 Merge '[Backport 2025.1] Correctly skip updating node's own ip address due to oudated gossiper data ' from Scylladb[bot]
Used host id to check if the update is for the node itself. Using IP is unreliable since if a node is restarted with different IP a gossiper message with previous IP can be misinterpreted as belonging to a different node.

Fixes: #22777

Backport to 2025.1 since this fixes a crash. Older version do not have the code.

- (cherry picked from commit a2178b7c31)

- (cherry picked from commit ecd14753c0)

- (cherry picked from commit 7403de241c)

Parent PR: #24000

Closes scylladb/scylladb#24088

* https://github.com/scylladb/scylladb:
  test: add reproducer for #22777
  storage_service: use id to check for local node
2025-05-12 09:37:49 +02:00
Gleb Natapov
6a2259a9d1 test: add reproducer for #22777
Add sleep before starting gossiper to increase a chance of getting old
gossiper entry about yourself before updating local gossiper info with
new IP address.

(cherry picked from commit 7403de241c)
2025-05-11 11:38:31 +03:00
Gleb Natapov
64a7a5ed0f storage_service: use id to check for local node
IP may change and an old gossiper message with previous IP may be
processed when it shouldn't.

Fixes: #22777
(cherry picked from commit a2178b7c31)
2025-05-09 12:55:44 +00:00
Michał Chojnowski
94606edff6 test/boost/mvcc_test: fix an overly-strong assertion in test_snapshot_cursor_is_consistent_with_merging
The test checks that merging the partition versions on-the-fly using the
cursor gives the same results as merging them destructively with apply_monotonically.

In particular, it tests that the continuity of both results is equal.
However, there's a subtlety which makes this not true.
The cursor puts empty dummy rows (i.e. dummies shadowed by the partition
tombstone) in the output.
But the destructive merge is allowed (as an expection to the general
rule, for optimization reasons), to remove those dummies and thus reduce
the continuity.

So after this patch we instead check that the output of the cursor
has continuity equal to the merged continuities of version.
(Rather than to the continuity of merged versions, which can be
smaller as described above).

Refs https://github.com/scylladb/scylladb/pull/21459, a patch which did
the same in a different test.
Fixes https://github.com/scylladb/scylladb/issues/13642

Closes scylladb/scylladb#24044

(cherry picked from commit 746ec1d4e4)

Closes scylladb/scylladb#24077
2025-05-09 13:08:59 +02:00
David Garcia
65d57819c9 docs: fix md redirections for multiversion support
This change resolves an issue where selecting a version from the multiversion dropdown on Markdown pages (e.g. https://docs.scylladb.com/manual/stable/alternator/getting-started.html) incorrectly redirected users to the main page instead of the corresponding versioned page.

The underlying cause was that the `multiversion` extension relies on `source_suffix` to identify available pages for URL mapping. Without this configuration, proper redirection fails for `.md` files.

This fix should be backported to `2025.1` to ensure correct behavior. Otherwise, the fix will only take effect in future releases.

Testing locally is non-trivial: clone the repository, apply the changes to each relevant branch, set `smv_remote_whitelist` to "", then run `make multiversionpreview`. Afterward, switch between versions in the dropdown to verify behavior. I've tested it locally, so the best next step is to merge and confirm that it works as expected in the live environment.

Closes scylladb/scylladb#23957

(cherry picked from commit 4ba7182515)

Closes scylladb/scylladb#24010
2025-05-08 11:18:56 +03:00
Botond Dénes
c341668371 Merge '[Backport 2025.1] replica: skip flush of dropped table' from Scylladb[bot]
Currently, flush throws no_such_column_family if a table is dropped. Skip the flush of dropped table instead.

Fixes: #16095.

Needs backport to 2025.1 and 6.2 as they contain the bug

- (cherry picked from commit 91b57e79f3)

- (cherry picked from commit c1618c7de5)

Parent PR: #23876

Closes scylladb/scylladb#23905

* github.com:scylladb/scylladb:
  test: test table drop during flush
  replica: skip flush of dropped table
2025-05-08 11:18:17 +03:00
Aleksandra Martyniuk
880f11fdba streaming: skip dropped tables
Currently, stream_session::prepare throws when a table in requests
or summaries is dropped. However, we do not want to fail streaming
if the table is dropped.

Delete table checks from stream_session::prepare. Further streaming
steps can handle the dropped table and finish the streaming successfully.

Fixes: #15257.

Closes scylladb/scylladb#23915

(cherry picked from commit 20c2d6210e)

Closes scylladb/scylladb#24052
2025-05-08 11:17:18 +03:00
Pavel Emelyanov
d19d02b543 sstable_directory: Print ks.cf when moving unshared remove sstables
When an sstable is identified by sstable_directory as remote-unshared,
it will at some point be moved to the target shard. When it happens a
log-message appears:

    sstable_directory - Moving 1 unshared SSTables to shard 1

Processing of tables by sstable_directory often happens in parallel, and
messages from sstable_directory are intermixed. Having a message like
above is not very informative, as it tells nothing about sstables that
are being moved.

Equip the message with ks:cf pair to make it more informative.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23912

(cherry picked from commit d40d6801b0)

Closes scylladb/scylladb#24015
2025-05-08 11:14:40 +03:00
Anna Stuchlik
db7de7f434 doc: add a link to the previous Enterprise documentation
This commit adds a link to the docs for previous Enterprise versions
at https://enterprise.docs.scylladb.com/ to the left menu.

As we still support versions 2024.1 and 2024.2, we need to ensure
easier access to those docs sets.

Fixes https://github.com/scylladb/scylladb/issues/23870

Closes scylladb/scylladb#23945

(cherry picked from commit 851a433663)

Closes scylladb/scylladb#24017
2025-05-08 10:58:05 +03:00
Botond Dénes
6201531b39 replica/database: memtable_list: save ref to memtable_table_shared_data
This is passed by reference to the constructor, but a copy is saved into
the _table_shared_data member. A reference to this member is passed down
to all memtable readers. Because of the copy, the memtable readers save
a reference to the memtable_list's member, which goes away together with
the memtable_list when the storage_group is destroyed.
This causes use-after-free when a storage group is destroyed while a
memtable read is still ongoing. The memtable reader keeps the memtable
alive, but its reference to the memtable_table_shared_data becomes
stale.
Fix by saving a reference in the memtable_list too, so memtable readers
receive a reference pointing to the original replica::table member,
which is stable accross tablet migrations and merges.
The copy was introduced by 2a76065e3d.
There was a copy even before this commit, but in the previous vnode-only
world this was fine -- there was one memtable_list per table and it was
around until the table itself was. In the tablet world, this is no
longer given, but the above commit didn't account for this.

A test is included, which reproduces the use-after-free on memtable
migration. The test is somewhat artificial in that the use-after-free
would be prevented by holding on to an ERM, but this is done
intentionaly to keep the test simple. Migration -- unlike merge where
this use-after-free was originally observed -- is easy to trigger from
unit tests.

Fixes: #23762

Closes scylladb/scylladb#23984

(cherry picked from commit 0a9ca52cfd)

Closes scylladb/scylladb#24033
2025-05-07 19:21:03 +03:00
Piotr Dulikowski
7591e131a2 test: mv: skip test_view_building_scheduling_group in debug
The test populates a table with 50k rows, creates a view on that table
and then compares the time spent in streaming vs. gossip scheduling
groups. It only takes 10s in dev mode on my machine, but is much slower
in debug mode in CI - building the view doesn't finish within 2 minutes.

The bigger the view to build, the more accurrate the measurement;
moreover, the test scenario isn't interesting enough to be worth running
it in debug mode as this should be covered by other tests. Therefore,
just skip this test in debug mode.

Fixes: scylladb/scylladb#23862

Closes scylladb/scylladb#23866

(cherry picked from commit 3d73c79a72)

Closes scylladb/scylladb#23881
2025-05-06 10:16:57 +02:00
Aleksandra Martyniuk
bfdf7c944b test: test table drop during flush
(cherry picked from commit c1618c7de5)
2025-05-06 09:52:42 +02:00
Aleksandra Martyniuk
238cf27471 replica: skip flush of dropped table
(cherry picked from commit 91b57e79f3)
2025-05-06 09:52:30 +02:00
Benny Halevy
fc21a0f8a1 loading_cache_test: test_loading_cache_reload_during_eviction: use manual_clock
Rather than lowres_clock, as since
32b7cab917,
loading_cache_for_test uses manual_clock for timing
and relying on lowres_clock to time the test might
run out of memory on fast test machines.

Fixes #23497

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#23498

(cherry picked from commit 5f2ce0b022)

Closes scylladb/scylladb#23983
2025-05-01 14:07:25 +03:00
Piotr Dulikowski
b77db75280 utils::loading_cache: gracefully skip timer if gate closed
The loading_cache has a periodic timer which acquires the
_timer_reads_gate. The stop() method first closes the gate and then
cancels the timer - this order is necessary because the timer is
re-armed under the gate. However, the timer callback does not check
whether the gate was closed but tries to acquire it, which might result
in unhandled exception which is logged with ERROR severity.

Fix the timer callback by acquiring access to the gate at the beginning
and gracefully returning if the gate is closed. Even though the gate
used to be entered in the middle of the callback, it does not make sense
to execute the timer's logic at all if the cache is being stopped.

Fixes: scylladb/scylladb#23951

Closes scylladb/scylladb#23952

(cherry picked from commit 8ffe4b0308)

Closes scylladb/scylladb#23981
2025-05-01 08:28:46 +03:00
Aleksandra Martyniuk
aaaeb6dcee test_tablet_tasks: use injection to revoke resize
Currently, test_tablet_resize_revoked tries to trigger split revoke
by deleting some rows. This method isn't deterministic and so a test
is flaky.

Use error injection to trigger resize revoke.

Fixes: #22570.

Closes scylladb/scylladb#23966

(cherry picked from commit 1f4edd8683)

Closes scylladb/scylladb#23974
2025-05-01 08:28:09 +03:00
Botond Dénes
f76cf6118a Merge '[Backport 2025.1] topology coordinator: do not proceed further on invalid boostrap tokens' from Scylladb[bot]
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.

Fix this by adding the missing break.

Fixes: scylladb/scylladb#23897

From the code inspection alone it looks like 2025.1 and 6.2 have this problem, so marking for backport to both of them.

- (cherry picked from commit 66acaa1bf8)

- (cherry picked from commit 845cedea7f)

- (cherry picked from commit 670a69007e)

Parent PR: #23914

Closes scylladb/scylladb#23949

* github.com:scylladb/scylladb:
  test: cluster: add test_bad_initial_token
  topology coordinator: do not proceed further on invalid boostrap tokens
  cdc: add sanity check for generating an empty generation
2025-05-01 08:23:06 +03:00
Botond Dénes
b8a8f288fb Merge '[Backport 2025.1] tasks: check whether a node is alive before rpc' from Scylladb[bot]
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.

Fixes: https://github.com/scylladb/scylladb/issues/22514.

Needs backport to 2025.1 and 6.2 as they contain the bug.

- (cherry picked from commit 53e0f79947)

- (cherry picked from commit e178bd7847)

Parent PR: #23787

Closes scylladb/scylladb#23943

* github.com:scylladb/scylladb:
  test: add test for getting tasks children
  tasks: check whether a node is alive before rpc
2025-05-01 08:21:49 +03:00
Botond Dénes
503748b1ae Merge '[Backport 2025.1] topology_coordinator: stop: await all background_action_holder:s' from Scylladb[bot]
Add missing awaits for the rebuild_repair and repair background actions.
Although the background actions hold the _async_gate
which is closed in topology_coordinator::run(),
stop() still needs to await all background action futures
and handle any errors they may have left behind.

Fixes #23755

* The issue exists since 6.2

- (cherry picked from commit d624795fda)

- (cherry picked from commit 6de79d0dd3)

- (cherry picked from commit 7a0f5e0a54)

Parent PR: #17712

Closes scylladb/scylladb#23799

* github.com:scylladb/scylladb:
  topology_coordinator: stop: await all background_action_holder:s
  topology_coordinator: stop: improve error messages
  topology_coordinator: stop: define stop_background_action helper
2025-05-01 08:20:03 +03:00
Jenkins Promoter
15c88b6453 Update pgo profiles - aarch64 2025-05-01 04:20:37 +03:00
Jenkins Promoter
0bdd7bd755 Update pgo profiles - x86_64 2025-05-01 04:06:29 +03:00
Aleksandra Martyniuk
6d9bf63e06 test: add test for getting tasks children
Add test that checks whether the children of a virtual task will be
properly gathered if a node is down.

(cherry picked from commit e178bd7847)
2025-04-30 10:25:09 +02:00
Aleksandra Martyniuk
9385dc1d4e tasks: check whether a node is alive before rpc
Check whether a node is alive before making an rpc that gathers children
infos from the whole cluster in virtual_task::impl::get_children.

(cherry picked from commit 53e0f79947)
2025-04-30 10:14:58 +02:00
Patryk Jędrzejczak
bf688890ad Merge '[Backport 2025.1] Ensure raft group0 RPCs use the gossip scheduling group.' from Scylladb[bot]
Scylla operations use concurrency semaphores to limit the number of concurrent operations and prevent resource exhaustion. The semaphore is selected based on the current scheduling group.

For RAFT group operations, it is essential to use a system semaphore to avoid queuing behind user operations. This patch ensures that RAFT operations use the `gossip` scheduling group to leverage the system semaphore.

Fixes scylladb/scylladb#21637

Backport: 6.2 and 6.1

- (cherry picked from commit 60f1053087)

- (cherry picked from commit e05c082002)

Parent PR: #22779

Closes scylladb/scylladb#23844

* https://github.com/scylladb/scylladb:
  Ensure raft group0 RPCs use the gossip scheduling group
  Move RAFT operations verbs to GOSSIP group.
2025-04-29 16:02:47 +02:00
Piotr Dulikowski
791b4a9fe0 test: cluster: add test_bad_initial_token
Adds a test which checks that rollback works properly in case when a bad
value of the initial_token function is provided.

(cherry picked from commit 670a69007e)
2025-04-28 17:07:43 +00:00
Piotr Dulikowski
d25dd24a7e topology coordinator: do not proceed further on invalid boostrap tokens
In case when dht::boot_strapper::get_boostrap_tokens fail to parse the
tokens, the topology coordinator handles the exception and schedules a
rollback. However, the current code tries to continue with the topology
coordinator logic even if an exception occurs, leaving boostrap_tokens
empty. This does not make sense and can actually cause issues,
specifically in prepare_and_broadcast_cdc_generation_data which
implicitly expect that the bootstrap_tokens of the first node in the
cluster will not be empty.

Fix this by adding the missing break.

Fixes: scylladb/scylladb#23897
(cherry picked from commit 845cedea7f)
2025-04-28 17:07:43 +00:00
Piotr Dulikowski
d8380e5d5e cdc: add sanity check for generating an empty generation
It doesn't make sense to create an empty CDC generation because it does
not make sense to have a cluster with no tokens. Add a sanity check to
cdc::make_new_generation_description which fails if somebody attempts to
do that (i.e. when the set of current tokens + optionally bootstrapping
node's tokens is empty).

The function does not work correctly if it is misused, as we saw in
scylladb/scylladb#23897. While the function should not be misused in the
first place, it's better to throw an exception rather than crash -
especially that this crash could happen on the topology coordinator.

(cherry picked from commit 66acaa1bf8)
2025-04-28 17:07:43 +00:00
Jenkins Promoter
964468d901 Update ScyllaDB version to: 2025.1.3 2025-04-28 16:10:06 +03:00
Tomasz Grabiec
962fa5a369 Merge '[Backport 2025.1] tablets: Equalize per-table balance when allocating tablets for a new table' from Scylladb[bot]
Fixes the following scenario:

1. Scale out adds new nodes to each rack
2. Table is created - all tablets are allocated to new nodes because they have low load
3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed

We're wrong to try to equalize global load when allocating tablets,
and we should equalize per-table load instead, and let background load
balancing fix it in a fair way. It will add to the allocated storage
imbalance, but:

1. The table is initially empty, so doesn't impact actual storage imbalance.
2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately.
3. If the table was created before imbalance was formed, we would end up in the same situation as in the problematic scenario after the patch.
4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in.

Before we have CPU-aware tablet allocation, and thus can prove we have
CPU capacity on the small nodes, we should respect per-table balance
as this is the way in which we achieve full CPU utilization.

Fixes #23631

Backport to 2025.1 because load imbalance is a serious problem in production.

- (cherry picked from commit d493a8d736)

- (cherry picked from commit 2597a7e980)

- (cherry picked from commit 1e407ab4d2)

Parent PR: #23708

Closes scylladb/scylladb#23873

* github.com:scylladb/scylladb:
  tablets: Equalize per-table balance when allocating tablets for a new table
  load_sketch: Tolerate missing tablet_map when selecting for a given table
  tests: tablets: Simplify tests by moving common code to topology_builder
2025-04-28 13:23:25 +02:00
Tomasz Grabiec
885924067f Merge '[Backport 2025.1] Simplify loading_cache_test and use manual_clock' from Scylladb[bot]
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.

In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.

Fixes #20322

* The test was flaky forever, so backport is required for all live versions.

- (cherry picked from commit b509644972)

- (cherry picked from commit 934a9d3fd6)

- (cherry picked from commit d68829243f)

- (cherry picked from commit b258f8cc69)

- (cherry picked from commit 0841483d68)

- (cherry picked from commit 32b7cab917)

Parent PR: #22064

Closes scylladb/scylladb#23926

* github.com:scylladb/scylladb:
  tests: loading_cache_test: use manual_clock
  utils: loading_cache: make clock_type a template parameter
  test: loading_cache_test: use function-scope loader
  test: loading_cache_test: simlute loader using sleep
  test: lib: eventually: add sleep function param
  test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
2025-04-28 10:55:14 +02:00
Benny Halevy
bf3b09313c tests: loading_cache_test: use manual_clock
Relying on a real-time clock like lowres_clock
can be flaky (in particular in debug mode).
Use manual_clock instead to harden the test against
timing issues.

Fixes #20322

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 32b7cab917)
2025-04-27 08:49:05 +00:00
Benny Halevy
49bc81415b utils: loading_cache: make clock_type a template parameter
So the unit test can use manual_clock rather than lowres_clock
which can be flaky (in particular in debug mode).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0841483d68)
2025-04-27 08:49:05 +00:00
Benny Halevy
325cdcbd58 test: loading_cache_test: use function-scope loader
Rather than a global function, accessing a thread-local `load_count`.
The thread-local load_count cannot be used when multiple test
cases run in parallel.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b258f8cc69)
2025-04-27 08:49:04 +00:00
Benny Halevy
a4584e205c test: loading_cache_test: simlute loader using sleep
This test isn't about reading values from file,
but rather it's about the loading_cache.
Reading from the file can sometimes take longer than
the expected refresh times, causing flakiness (see #20322).

Rather than reading a string from a real file, just
sleep a random, short time, and co_return the string.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit d68829243f)
2025-04-27 08:49:04 +00:00
Benny Halevy
52c82527f9 test: lib: eventually: add sleep function param
To allow support for manual_clock instead of seastar::sleep.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 934a9d3fd6)
2025-04-27 08:49:04 +00:00
Benny Halevy
3792810030 test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
rather then macros.

This is a first cleanup step before adding a sleep function
parameter to support also manual_clock.

Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL,
respectively, to make an error more visible in the test log
since those entry points print the offending values
when not equal.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b509644972)
2025-04-27 08:49:04 +00:00
Tomasz Grabiec
ba3d53be55 tablets: Equalize per-table balance when allocating tablets for a new table
Fixes the following scenario:

1. Scale out adds new nodes to each rack
2. Table is created - all tablets are allocated to new nodes because they have low load
3. Rebalancing moves tablets from old nodes to new nodes - table balance for the new table is not fixed

We're wrong to try to equalize global load when allocating tablets,
and we should equalize per-table load instead, and let background load
balancing fix it in a fair way. It will add to the allocated storage
imbalance, but:

1. The table is initially empty, so doesn't impact actual storage imbalance.
2. It's more important to avoid overloading CPU on the nodes - imbalance hurts this aspect immediately.
3. If the table was created before imbalance was formed, we would end up in the same situation in the problematic scenario after the patch.
4. It's the job of the load balancing to keep up with storage growing, and if it's not, scale out should kick in.

Before we have CPU-aware tablet allocation, and thus can prove we have
CPU capacity on the small nodes, we should respect per-table balance
as this is the way in which we achieve full CPU utilization.

Fixes #23631

(cherry picked from commit 1e407ab4d2)
2025-04-25 18:29:36 +02:00
Tomasz Grabiec
e73954da80 load_sketch: Tolerate missing tablet_map when selecting for a given table
To simplify future usage in
network_topology_strategy::add_tablets_in_dc() which invokes
populate() for a given table, which may be both new and preexisitng.

(cherry picked from commit 2597a7e980)
2025-04-25 18:29:36 +02:00
Tomasz Grabiec
93dab31007 tests: tablets: Simplify tests by moving common code to topology_builder
Reduces code duplication.

(cherry picked from commit d493a8d736)
2025-04-25 18:29:36 +02:00
Benny Halevy
923944bf21 test_tablets_cql: test_alter_dropped_tablets_keyspace: extend expected error
The query may fail also on a no_such_keyspace
exception, which generates the following cql error:
```
Error from server: code=2200 [Invalid query] message="Can\'t find a keyspace test_1745198244144_qoohq"
```
Extend the pytest.raises match expression to include
this error as well.

Fixes #23812

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#23875

(cherry picked from commit f279625f59)

Closes scylladb/scylladb#23887
2025-04-25 11:34:27 +02:00
Michael Litvak
f71eff37bb test: test_mv_topology_change: increase timeout for remove_node
The test `test_mv_write_to_dead_node` currently uses a timeout of 60
seconds for remove_node, after it was increased from 30 seconds to fix
scylladb/scylladb#22953. Apparently it is still too low, and it was
observed to fail in debug mode.

Normally remove_node uses a default timeout of TOPOLOGY_TIMEOUT = 1000
seconds, but the test requires a timeout which is shorter than 5
minutes, because it is a regression test for an issue where MV updates
hold topology changes for more than 5 minutes, and we want to verify in
the test that the topology change completes in less than 5 minutes.

To resolve the issue, we set the test to skip in debug mode, because the
remove node operation is unpredictably slow, and we increase the timeout
to 180 seconds which is hopefully enough time for remove_node in
non-debug modes, and still sufficient to satisfy the test requirements.

Fixes scylladb/scylladb#22530

Closes scylladb/scylladb#23833

(cherry picked from commit 5c1d24f983)

Closes scylladb/scylladb#23874
2025-04-24 17:43:16 +02:00
Botond Dénes
e5ddc0c5ce Merge 'test/pylib: servers_add: support list of property_files' from Benny Halevy
So that a multi-dc/multi-rack cluster can be populated
in a single call.

* Enhancement, no backport required

Closes scylladb/scylladb#23341

* github.com:scylladb/scylladb:
  test/pylib: servers_add: add auto_rack_dc parameter
  test/pylib: servers_add: support list of property_files

(cherry picked from commit 0fdf2a2090)
2025-04-22 19:50:03 +02:00
Andrei Chekun
4ba475eb22 test.py: Remove reuse cluster in cluster tests
Pool is not aware of the cluster configuration, so it can return cluster
to the test that is not suitable for it. Removing reuse will remove such
possibility, so there will be less flaky tests.

Closes scylladb/scylladb#23277

(cherry picked from commit d68e54c26d)
2025-04-22 19:10:27 +02:00
Artsiom Mishuta
b4568850f9 test.py apply prepare_3_nodes_cluster in topology
apply prepare_3_nodes_cluster for all tests in the topology folder
via applying mark at the test module level using pytestmark
https://docs.pytest.org/en/stable/example/markers.html#marking-whole-classes-or-modules

set initial initial_size for topology folder to 0

(cherry picked from commit cf48444e3b)
2025-04-22 19:09:18 +02:00
Artsiom Mishuta
72adae2c16 test.py: introduce prepare_3_nodes_cluster marker
prepare_3_nodes_cluster marker will allow preparing non-dirty 3 nodes cluster
that can be reused between tests

(cherry picked from commit 20777d7fc6)
2025-04-22 19:08:31 +02:00
Sergey Zolotukhin
d34061818c Ensure raft group0 RPCs use the gossip scheduling group
Scylla operations use concurrency semaphores to limit the number
of concurrent operations and prevent resource exhaustion. The
semaphore is selected based on the current scheduling group.
For Raft group operations, it is essential to use a system semaphore to
avoid queuing behind user operations.
This commit adds a check to ensure that the raft group0 RPCs are
executed with the `gossiper` scheduling group.

(cherry picked from commit e05c082002)
2025-04-22 07:59:01 +00:00
Sergey Zolotukhin
b0fe705c80 Move RAFT operations verbs to GOSSIP group.
In order for RAFT operations to use the gossip system semaphore, moving RAFT
verbs to the gossip group in `do_get_rpc_client_idx`,  messaging_service.

Fixes scylladb/scylladb21637

(cherry picked from commit 60f1053087)
2025-04-22 07:59:01 +00:00
Yaron Kaikov
502c62d91d install-dependencies.sh: update node_exporter to 1.9.0
Update node_exporter to 1.9.0 to resolve the following CVE's
https://github.com/advisories/GHSA-49gw-vxvf-fc2g
https://github.com/advisories/GHSA-8xfx-rj4p-23jm
https://github.com/advisories/GHSA-crqm-pwhx-j97f
https://github.com/advisories/GHSA-j7vj-rw65-4v26

Fixes: https://github.com/scylladb/scylladb/issues/22884

regenerate frozen toolchain with optimized clang from
* https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz
* https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz

Closes scylladb/scylladb#22987

(cherry picked from commit e6227f9a25)

Closes scylladb/scylladb#23021
2025-04-21 23:12:28 +03:00
Avi Kivity
8ea6f5824a Update seastar submodule
* seastar 6d8fccf14c...ed31c1ce82 (1):
  > Merge 'Share IO queues between mountpoints' from Pavel Emelyanov

Changes in io-queue call for scylla-gdb update as well -- now the
reactor map of device to io-queue uses seastar::shared_ptr, not
std::unique_ptr.

Closes scylladb/scylladb#23733

Ref 70ac5828a8

Fixes #23820
2025-04-20 13:31:17 +03:00
Benny Halevy
e9b57a149d topology_coordinator: stop: await all background_action_holder:s
Add missing awaits for the rebuild_repair and repair background actions.
Although the background actions hold the _async_gate
which is closed in topology_coordinator::run(),
stop() still needs to await all background action futures
and handle any errors they may have left behind.

Fixes #23755

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 7a0f5e0a54)
2025-04-20 07:46:23 +00:00
Benny Halevy
7bdb93aa1c topology_coordinator: stop: improve error messages
"when cleanup" is ill-formed. Use "when XYZ"
to "during XYZ" instead.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 6de79d0dd3)
2025-04-20 07:46:22 +00:00
Benny Halevy
985492018b topology_coordinator: stop: define stop_background_action helper
Refactor the code to use a helper to await background_action_holder
and handle any errors by printing a warning.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit d624795fda)
2025-04-20 07:46:22 +00:00
Avi Kivity
6a8b033510 Merge '[Backport 2025.1] managed_bytes: in the copy constructor, respect the target preferred allocation size' from Scylladb[bot]
Commit 14bf09f447 added a single-chunk layout to `managed_bytes`, which makes the overhead of `managed_bytes` smaller in the common case of a small buffer.

But there was a bug in it. In the copy constructor of `managed_bytes`, a copy of a single-chunk `managed_bytes` is made single-chunk too.

But this is wrong, because the source of the copy and the target of the copy might have different preferred max contiguous allocation sizes.

In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB is copied from the standard allocator into LSA, the resulting `managed_bytes` is a single chunk which violates LSA's preferred allocation size. (And therefore is placed by LSA in the standard allocator).

In other words, since Scylla 6.0, cache and memtable cells between 13 kiB and 128 kiB are getting allocated in the standard allocator rather than inside LSA segments.

Consequences of the bug:

1. Effective memory consumption of an affected cell is rounded up to the nearest power of 2.

2. With a pathological-enough allocation pattern (for example, one which somehow ends up placing a single 16 kiB memtable-owned allocation in every aligned 128 kiB span), memtable flushing could theoretically deadlock, because the allocator might be too fragmented to let the memtable grow by another 128 kiB segment, while keeping the sum of all allocations small enough to avoid triggering a flush. (Such an allocation pattern probably wouldn't happen in practice though).

3. It triggers a bug in reclaim which results in spurious allocation failures despite ample evictable memory.

   There is a path in the reclaimer procedure where we check whether reclamation succeeded by checking that the number of free LSA segments grew.

   But in the presence of evictable non-LSA allocations, this is wrong because the reclaim might have met its target by evicting the non-LSA allocations, in which case memory is returned directly to the standard allocator, rather than to the pool of free segments.

   If that happens, the reclaimer wrongly returns `reclaimed_nothing` to Seastar, which fails the allocation.

Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072
Fixes https://github.com/scylladb/scylladb/issues/22941
Fixes https://github.com/scylladb/scylladb/issues/22389
Fixes https://github.com/scylladb/scylladb/issues/23781

This is a regression fix, should be backported to all affected releases.

- (cherry picked from commit 4e2f62143b)

- (cherry picked from commit 6c1889f65c)

Parent PR: #23782

Closes scylladb/scylladb#23810

* github.com:scylladb/scylladb:
  managed_bytes_test: add a reproducer for #23781
  managed_bytes: in the copy constructor, respect the target preferred allocation size
2025-04-19 18:42:45 +03:00
Botond Dénes
e19ab12f5e Merge '[Backport 2025.1] service/storage_proxy: schedule_repair(): materialize the range into a vector' from Scylladb[bot]
Said method passes down its diff input to mutate_internal(), after some std::ranges massaging. Said massaging is destructive -- it moves items from the diff. If the output range is iterated-over multiple times, only the first time will see the actual output, further iterations will get an empty range.

When trace-level logging is enabled, this is exactly what happens: mutate_internal() iterates over the range multiple times, first to log its content, then to pass it down the stack. This ends up resulting in an empty range being pased down and write handlers being created with nullopt optionals.

Fixes: scylladb/scylladb#21907
Fixes: scylladb/scylladb#21714

A follow-up stability fix for the test is also included.

Fixes: https://github.com/scylladb/scylladb/issues/23513
Fixes: https://github.com/scylladb/scylladb/issues/23512

Based on code-inspection, all versions are vulnerable, although <=6.2 use boost::ranges, not std::ranges.

- (cherry picked from commit 7150442f6a)

Parent PR: #21910

Closes scylladb/scylladb#23791

* github.com:scylladb/scylladb:
  test/cluster/test_read_repair.py: increase read request timeout
  service/storage_proxy: schedule_repair(): materialize the range into a vector
2025-04-18 14:04:23 +03:00
Anna Stuchlik
f8615b8c53 doc: add info about Scylla Doctor Automation to the docs
Fixes https://github.com/scylladb/scylladb/issues/23642

Closes scylladb/scylladb#23745

(cherry picked from commit 0b4740f3d7)

Closes scylladb/scylladb#23776
2025-04-18 14:03:54 +03:00
Botond Dénes
34ea9af232 Merge '[Backport 2025.1] tablets: rebuild: use repair for tablet rebuild' from Scylladb[bot]
Currently, when we rebuild a tablet, we stream data from all
replicas. This creates a lot of redundancy, wastes bandwidth
and CPU resources.

In this series, we split the streaming stage of tablet rebuild into
two phases: first we stream tablet's data from only one replica
and then repair the tablet.

Fixes: https://github.com/scylladb/scylladb/issues/17174.

Needs backport to 2025.1 to prevent out of space during streaming

- (cherry picked from commit b80e957a40)

- (cherry picked from commit ed7b8bb787)

- (cherry picked from commit 5d6041617b)

- (cherry picked from commit 4a847df55c)

- (cherry picked from commit eb17af6143)

- (cherry picked from commit acd32b24d3)

- (cherry picked from commit 372b562f5e)

Parent PR: #23187

Closes scylladb/scylladb#23682

* github.com:scylladb/scylladb:
  test: add test for rebuild with repair
  locator: service: move to rebuild_v2 transition if cluster is upgraded
  locator: service: add transition to rebuild_repair stage for rebuild_v2
  locator: service: add rebuild_repair tablet transition stage
  locator: add maybe_get_primary_replica
  locator: service: add rebuild_v2 tablet transition kind
  gms: add REPAIR_BASED_TABLET_REBUILD cluster feature
2025-04-18 14:03:25 +03:00
Michał Chojnowski
e6a2d67be4 managed_bytes_test: add a reproducer for #23781
(cherry picked from commit 6c1889f65c)
2025-04-18 07:55:46 +00:00
Michał Chojnowski
d1d21d97e1 managed_bytes: in the copy constructor, respect the target preferred allocation size
Commit 14bf09f447 added a single-chunk
layout to `managed_bytes`, which makes the overhead of `managed_bytes`
smaller in the common case of a small buffer.

But there was a bug in it. In the copy constructor of `managed_bytes`,
a copy of a single-chunk `managed_bytes` is made single-chunk too.

But this is wrong, because the source of the copy and the target
of the copy might have different preferred max contiguous allocation
sizes.

In particular, if a `managed_bytes` of size between 13 kiB and 128 kiB
is copied from the standard allocator into LSA, the resulting
`managed_bytes` is a single chunk which violates LSA's preferred
allocation size. (And therefore is placed by LSA in the standard
allocator).

In other words, since Scylla 6.0, cache and memtable cells
between 13 kiB and 128 kiB are getting allocated in the standard allocator
rather than inside LSA segments.

Consequences of the bug:

1. Effective memory consumption of an affected cell is rounded up to the nearest
   power of 2.

2. With a pathological-enough allocation pattern
   (for example, one which somehow ends up placing a single 16 kiB
   memtable-owned allocation in every aligned 128 kiB span),
   memtable flushing could theoretically deadlock,
   because the allocator might be too fragmented to let the memtable
   grow by another 128 kiB segment, while keeping the sum of all
   allocations small enough to avoid triggering a flush.
   (Such an allocation pattern probably wouldn't happen in practice though).

3. It triggers a bug in reclaim which results in spurious
   allocation failures despite ample evictable memory.

   There is a path in the reclaimer procedure where we check whether
   reclamation succeeded by checking that the number of free LSA
   segments grew.

   But in the presence of evictable non-LSA allocations, this is wrong
   because the reclaim might have met its target by evicting the non-LSA
   allocations, in which case memory is returned directly to the
   standard allocator, rather than to the pool of free segments.

   If that happens, the reclaimer wrongly returns `reclaimed_nothing`
   to Seastar, which fails the allocation.

Refs (possibly fixes) https://github.com/scylladb/scylladb/issues/21072
Fixes https://github.com/scylladb/scylladb/issues/22941
Fixes https://github.com/scylladb/scylladb/issues/22389
Fixes https://github.com/scylladb/scylladb/issues/23781

(cherry picked from commit 4e2f62143b)
2025-04-18 07:55:46 +00:00
Botond Dénes
440141a4ff test/cluster/test_read_repair.py: increase read request timeout
This test enables trace-level logging for the mutation_data logger,
which seems to be too much in debug mode and the test read times out.
Increase timeout to 1minute to avoid this.

Fixes: #23513

Closes scylladb/scylladb#23558

(cherry picked from commit 7bbfa5293f)
2025-04-18 06:31:42 +03:00
Nadav Har'El
6b35eea1a9 Merge '[Backport 2025.1] Alternator batch rcu' from Scylladb[bot]
This series adds support for reporting consumed capacity in BatchGetItem operations in Alternator.
It includes changes to the RCU accounting logic, exposing internal functionality to support batch-specific behavior, and adds corresponding tests for both simple and complex use cases involving multiple tables and consistency modes.

Need backporting to 2025.1, as RCU and WCU are not fully supported

Fixes #23690

- (cherry picked from commit 0eabf8b388)

- (cherry picked from commit 88095919d0)

- (cherry picked from commit 3acde5f904)

Parent PR: #23691

Closes scylladb/scylladb#23790

* github.com:scylladb/scylladb:
  test_returnconsumedcapacity.py: test RCU for batch get item
  alternator/executor: Add RCU support for batch get items
  alternator/consumed_capacity: make functionality public
2025-04-17 21:39:58 +03:00
Botond Dénes
b14ae92f4f service/storage_proxy: schedule_repair(): materialize the range into a vector
Said method passes down its `diff` input to `mutate_internal()`, after
some std::ranges massaging. Said massaging is destructive -- it moves
items from the diff. If the output range is iterated-over multiple
times, only the first time will see the actual output, further
iterations will get an empty range.
When trace-level logging is enabled, this is exactly what happens:
`mutate_internal()` iterates over the range multiple times, first to log
its content, then to pass it down the stack. This ends up resulting in
a range with moved-from elements being pased down and consequently write
handlers being created with nullopt mutations.

Make the range re-entrant by materializing it into a vector before
passing it to `mutate_internal()`.

Fixes: scylladb/scylladb#21907
Fixes: scylladb/scylladb#21714

Closes scylladb/scylladb#21910

(cherry picked from commit 7150442f6a)
2025-04-17 11:15:19 +00:00
Benny Halevy
fd6c7c53b8 token_group_based_splitting_mutation_writer: maybe_switch_to_new_writer: prevent double close
Currently, maybe_switch_to_new_writer resets _current_writer
only in a continuation after closing the current writer.
This leaves a window of vulnerability if close() yields,
and token_group_based_splitting_mutation_writer::close()
is called. Seeing the engaged _current_writer, close()
will call _current_writer->close() - which must be called
exactly once.

Solve this when switching to a new writer by resetting
_current_writer before closing it and potentially yielding.

Fixes #22715

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22922

(cherry picked from commit 29b795709b)

Closes scylladb/scylladb#22965
2025-04-17 12:59:55 +02:00
Amnon Heiman
9434bd81b3 test_returnconsumedcapacity.py: test RCU for batch get item
This patch adds tests for consumed capacity in batch get item.  It tests
both the simple case and the multi-item, multi-table case that combines
consistent and non-consistent reads.

(cherry picked from commit 3acde5f904)
2025-04-17 10:30:18 +00:00
Amnon Heiman
0761eacf68 alternator/executor: Add RCU support for batch get items
This patch adds RCU support for batch get items.  With batch requests,
multiple objects are read from multiple tables. While the criterion for
adding the units is per the batch request, the units are calculated per
table—and so is the read consistency.

(cherry picked from commit 88095919d0)
2025-04-17 10:30:18 +00:00
Amnon Heiman
8bb4ee49da alternator/consumed_capacity: make functionality public
The consumed_capacity_counter is not completely applicable for batch
operations.  This patch makes some of its functionality public so that
batch get item can use the components to decide if it needs to send
consumed capacity in the reply, to get the half units used by the
metrics and returned result, and to allow an empty constructor for the
RCU counter.

(cherry picked from commit 0eabf8b388)
2025-04-17 10:30:18 +00:00
Avi Kivity
a89cdfc253 scylla-gdb: small-objects: fix for very small objects
Because of rounding and alignment, there are multiple pools for small
sizes (e.g. 4 for size 32). Because the pool selection algorithm
ignores alignment, different pools can be chosen for different object
sizes. For example, an object size of 29 will choose the first pool
of size 32, while an object size of 32 will choose the fourth pool of
size 32.

The small-objects command doesn't know about this and always considers
just the first pool for a given size. This causes it to miss out on
sister pools.

While it's possible to adjust pool selection to always choose one of the
pools, it may eat a precious cycle. So instead let's compensate in the
small-objects command. Instead of finding one pool for a given size,
find all of them, and iterate over all those pools.

Fixes #23603

Closes scylladb/scylladb#23604

(cherry picked from commit b4d4e48381)

Closes scylladb/scylladb#23749
2025-04-16 14:37:43 +03:00
Botond Dénes
998bfe908f Merge '[Backport 2025.1] Fix EAR not applied on write to S3 (but on read).' from Scylladb[bot]
Fixes #23225
Fixes #23185

Adds a "wrap_sink" (with default implementation) to sstables::file_io_extension, and moves
extension wrapping of file and sink objects to storage level.
(Wrapping/handling on sstable level would be problematic, because for file storage we typically re-use the sstable file objects for sinks, whereas for S3 we do not).

This ensures we apply encryption on both read and write, whereas we previously only did so on read -> fail.
Adds io wrapper objects for adapting file/sink for default implementation, as well as a proper encrypted sink implementation for EAR.

Unit tests for io objects and a macro test for S3 encrypted storage included.

- (cherry picked from commit 98a6d0f79c)

- (cherry picked from commit e100af5280)

- (cherry picked from commit d46dcbb769)

- (cherry picked from commit e02be77af7)

- (cherry picked from commit 9ac9813c62)

- (cherry picked from commit 5c6337b887)

Parent PR: #23261

Closes scylladb/scylladb#23424

* github.com:scylladb/scylladb:
  encryption: Add "wrap_sink" to encryption sstable extension
  encrypted_file_impl: Add encrypted_data_sink
  sstables::storage: Move wrapping sstable components to storage provider
  sstables::file_io_extension: Add a "wrap_sink" method.
  sstables::file_io_extension: Make sstable argument to "wrap" const
  utils: Add "io-wrappers", useful IO helper types
2025-04-16 09:32:23 +03:00
Calle Wilund
0eed7f8f29 encryption: Add "wrap_sink" to encryption sstable extension
Creates a more efficient data_sink wrapper for encrypted output
stream (S3).

(cherry picked from commit 5c6337b887)
2025-04-15 11:00:22 +00:00
Calle Wilund
f174b419a4 encrypted_file_impl: Add encrypted_data_sink
Adds a sibling type to encrypted file, a data_sink, that
will write a data stream in the same block format as a file
object would. Including end padding.

For making encrypted data sink writing less cumbersome.

(cherry picked from commit 9ac9813c62)
2025-04-15 11:00:22 +00:00
Calle Wilund
ac4c7a7ad2 sstables::storage: Move wrapping sstable components to storage provider
Fixes #23225
Fixes #23185

Moved wrapping component files/sinks to storage provider. Also ensures
to wrap data_sinks as well as actual files. This ensures that we actually
write encryption if active.

(cherry picked from commit e02be77af7)
2025-04-15 11:00:22 +00:00
Calle Wilund
6feb95ffad sstables::file_io_extension: Add a "wrap_sink" method.
Similar to wrap file, should wrap a data_sink (used for
sstable writers), in obvious write-only, simple stream
mode.

Default impl will detect if we wrap files for this component,
and if so, generate a file wrapper for the input sink, wrap
this, and the wrap it in a file_data_sink_impl.

This is obviously not efficient, so extensions used in actual
non-test code should implement the method.

(cherry picked from commit d46dcbb769)
2025-04-15 11:00:22 +00:00
Calle Wilund
b6ec0961ca sstables::file_io_extension: Make sstable argument to "wrap" const
This matches the signature of call sites. Since the only "real"
extension to actually make a marker in the sstable will do so in
the scylla component, which is writable even in a const sstable,
this is ok.

(cherry picked from commit e100af5280)
2025-04-15 10:36:47 +00:00
Calle Wilund
9a10458500 utils: Add "io-wrappers", useful IO helper types
Mainly to add a somewhat functional file-impl wrapping
a data_sink. This can implement a rudimentary, write-only,
file based on any output sink.

For testing, and because they fit there, place memory
sink and source types there as well.

(cherry picked from commit 98a6d0f79c)
2025-04-15 10:36:47 +00:00
Pavel Emelyanov
263416201c Merge '[Backport 2025.1] audit: add semaphore to audit_syslog_storage_helper' from Scylladb[bot]
audit_syslog_storage_helper::syslog_send_helper uses Seastar's
net::datagram_channel to write to syslog device (usually /dev/log).
However, datagram_channel.send() is not fiber-safe (ref seastar#2690),
so unserialized use of send() results in packets overwriting its state.
This, in turn, causes a corruption of audit logs, as well as assertion
failures.

To workaround the problem, a new semaphore is introduced in
audit_syslog_storage_helper. As storage_helper is a member of sharded
audit service, the semaphore allows for one datagram_channel.send() on
each shard. Each audit_syslog_storage_helper stores its own
datagram_channel, therefore concurrent sends to datagram_channel are
eliminated.

This change:
 - Moved syslog_send_helper to audit_syslog_storage_helper
 - Corutinize audit_syslog_storage_helper
 - Introduce semaphore with count=1 in audit_syslog_storage_helper.

See https://github.com/scylladb/scylla-dtest/pull/5749 for releated dtest
Fixes: scylladb/scylladb#22973

Backport to 2025.1 should be considered, as https://github.com/scylladb/scylladb/issues/22973 is known to cause crashes of 2025.1.

- (cherry picked from commit dbd2acd2be)

- (cherry picked from commit 889fd5bc9f)

- (cherry picked from commit c12f976389)

Parent PR: #23464

Closes scylladb/scylladb#23674

* github.com:scylladb/scylladb:
  audit: add semaphore to audit_syslog_storage_helper
  audit: corutinize audit_syslog_storage_helper
  audit: moved syslog_send_helper to audit_syslog_storage_helper
2025-04-15 12:37:48 +03:00
Jenkins Promoter
42db149393 Update ScyllaDB version to: 2025.1.2 2025-04-15 12:13:36 +03:00
Pavel Emelyanov
4c382fbe7e cql: Remove unused "initial_tablets" mention from guardrails
All tablets configuration was moved into its own "with tablets" section,
this option name cannot be met among replication factors.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23555

(cherry picked from commit d4f3a3ee4f)

Closes scylladb/scylladb#23676
2025-04-15 11:01:50 +03:00
David Garcia
7588789b02 fix: openapi not rendering in docs.scylladb.com/manual
Closes scylladb/scylladb#23686

(cherry picked from commit cf11d5eb69)

Closes scylladb/scylladb#23710
2025-04-15 10:58:59 +03:00
Jenkins Promoter
a0faf0bde0 Update pgo profiles - aarch64 2025-04-15 04:33:44 +03:00
Jenkins Promoter
a503e74bf5 Update pgo profiles - x86_64 2025-04-15 04:10:13 +03:00
Aleksandra Martyniuk
6702849f32 test: add test for rebuild with repair
(cherry picked from commit 372b562f5e)
2025-04-14 12:01:58 +02:00
Aleksandra Martyniuk
5c683449b3 locator: service: move to rebuild_v2 transition if cluster is upgraded
If cluster is upgraded to version containing rebuild_v2 transition
kind, move to this transition kind instead of rebuild.

(cherry picked from commit acd32b24d3)
2025-04-14 11:55:28 +02:00
Aleksandra Martyniuk
2436c24db7 locator: service: add transition to rebuild_repair stage for rebuild_v2
Modify write_both_read_old and streaming stages in rebuild_v2 transition
kind: write_both_read_old moves to rebuild_repair stage and streaming stage
streams data only from one replica.

(cherry picked from commit eb17af6143)
2025-04-14 11:55:28 +02:00
Aleksandra Martyniuk
6a251df136 locator: service: add rebuild_repair tablet transition stage
Currently, in the streaming stage of rebuild tablet transition,
we stream tablet data from all replicas.
This patch series splits the streaming stage into two phases:
- repair phase, where we repair the tablet;
- streaming phase, where we stream tablet data from one replica.

rebuild_repair is a stage that will be used to perform the repair
phase. It executes the tablet repair on tablet_info::replicas.
A primary replica out of migration_streraming_info::read_from is
the repair master. If the repair succeeds, we move to streaming
tablet transition stage, and to cleanup_target - if it fails.

The repair bypasses the tablet repair scheduler and it does not update
the repair_time.

A transition to the rebuild_repair stage will be added in the following
patches.

(cherry picked from commit 4a847df55c)
2025-04-14 11:55:28 +02:00
Aleksandra Martyniuk
0e152e9f2e locator: add maybe_get_primary_replica
Add maybe_get_primary_replica to choose a primary replica out of
custom replica set.

(cherry picked from commit 5d6041617b)
2025-04-14 11:55:28 +02:00
Aleksandra Martyniuk
5a42418a19 locator: service: add rebuild_v2 tablet transition kind
Currently, in the streaming stage of rebuild tablet transition,
we stream tablet data from all replicas.
This patch series splits the streaming stage into two phases:
- repair phase, where we repair the tablet;
- streaming phase, where we stream tablet data from one replica.

To differentiate the two streaming methods, a new tablet transition
kind - rebuild_v2 - is added.

The transtions and stages for rebuild_v2 transition kind will be
added in the following patches.

(cherry picked from commit ed7b8bb787)
2025-04-14 11:55:28 +02:00
Aleksandra Martyniuk
4df0ddb11f gms: add REPAIR_BASED_TABLET_REBUILD cluster feature
(cherry picked from commit b80e957a40)
2025-04-14 11:55:28 +02:00
Botond Dénes
c1dce79847 Merge '[Backport 2025.1] Finalize tablet splits earlier' from Scylladb[bot]
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

This PR fixes the issue by updating the load balancer to no schedule any
migrations and to not make any repair plans when there a resize
finalization is pending in any table.

Also added a testcase to verify the fix.

Fixes #21762

- (cherry picked from commit 8cabc66f07)

- (cherry picked from commit 5b47d84399)

- (cherry picked from commit dccce670c1)

Parent PR: #22148

Closes scylladb/scylladb#23633

* github.com:scylladb/scylladb:
  topology_coordinator: fix indentation in generate_migration_updates
  topology_coordinator: do not schedule migrations when there are pending resize finalizations
  load_balancer: make repair plans only when there is no pending resize finalization
2025-04-14 06:44:57 +03:00
Botond Dénes
251db77fcb mutation/frozen_mutation: frozen_mutation_consumer_adaptor: fix end-of-partition handling
This adaptor adapts a mutation reader pausable consumer to the frozen
mutation visitor interface. The pausable consumer protocol allows the
consumer to skip the remaining parts of the partition and resume the
consumption with the next one. To do this, the consumer just has to
return stop_iteration::yes from one of the consume() overloads for
clustering elements, then return stop_iteration::no from
consume_end_of_partition(). Due to a bug in the adaptor, this sequence
leads to terminating the consumption completely -- so any remaining
partitions are also skipped.

This protocol implementation bug has user-visible effects, when the
only user of the adaptor -- read repair -- happens during a query which
has limitations on the amount of content in each partition.
There are two such queries: select distinct ... and select ... with
partition limit. When converting the repaired mutation to to query
result, these queries will trigger the skip sequence in the consumer and
due to the above described bug, will skip the remaining partitions in
the results, omitting these from the final query result.

This patch fixes the protocol bug, the return value of the underlying
consumer's consume_end_of_partition() is now respected.

A unit test is also added which reproduces the problem both with select
distinct ... and select ... per partition limit.

Follow-up work:
* frozen_mutation_consumer_adaptor::on_end_of_partition() calls the
  underlying consumer's on_end_of_stream(), so when consuming multiple
  frozen mutations, the underlying's on_end_of_stream() is called for
  each partition. This is incorrect but benign.
* Improve documentation of mutation_reader::consume_pausable().

Fixes: #20084

Closes scylladb/scylladb#23657

(cherry picked from commit d67202972a)

Closes scylladb/scylladb#23694
2025-04-11 10:53:31 +03:00
Botond Dénes
f7761729cc Merge '[Backport 2025.1] nodetool: cluster repair: add a command to repair tablet keyspaces' from Scylladb[bot]
Add a new nodetool cluster super-command. Add nodetool
cluster repair command to repair tablet keyspaces.
It uses the new /storage_service/tablets/repair API.

The nodetool cluster repair command allows you to specify
the keyspace and tables to be repaired. A cluster repair of many
tables will request /storage_service/tablets/repair and wait for
the result synchronously for each table.

The nodetool repair command, which was previously used to repair
keyspaces of any type, now repairs only vnode keyspaces.

Fixes: https://github.com/scylladb/scylladb/issues/22409.

Needs backport to 2025.1 that introduces the new tablet repair API

- (cherry picked from commit cbde835792)

- (cherry picked from commit b81c81c7f4)

- (cherry picked from commit aa3973c850)

- (cherry picked from commit 8bbc5e8923)

- (cherry picked from commit 02fb71da42)

- (cherry picked from commit 9769d7a564)

Parent PR: #22905

Closes scylladb/scylladb#23672

* github.com:scylladb/scylladb:
  docs: nodetool: update repair and add tablet-repair docs
  test: nodetool: add tests for cluster repair command
  nodetool: add cluster repair command
  nodetool: repair: extract getting hosts and dcs to functions
  nodetool: repair: warn about repairing tablet keyspaces
  nodetool: repair: move keyspace_uses_tablets function
2025-04-11 10:53:03 +03:00
Raphael S. Carvalho
75cd8e9492 replica: Fix truncate and drop table after tablet migration happens
When running those operations after a tablet replica is migrated away from
a shard, an assert can fail resulting in a crash.

Status quo (around the assert in truncate procedure):

1) Highest RP seen by table is saved in low_mark, and the current time in
low_mark_at.
2) Then compaction is disabled in order to not mix data written before truncate,
and data written later.
3) Then memtable is flushed in order for the data written before truncate to be
available in sstables and then removed.
4) Now, current time is saved in truncated_at, which is supposedly the time of
truncate to decide which sstables to remove.

Note: truncated_at is likely above low_mark_at due to steps 2 and 3.

The interesting part of the assert is:
    (truncated_at <= low_mark_at ? rp <= low_mark : low_mark <= rp)

Note: RP in the assert above is the highest RP among all sstables generated
before truncated_at. RP is retrieved by table::discard_sstables().

If truncated_at > low_mark_at, maybe newer data was written during steps 2 and
3, and memtable's RP becomes greater than low_mark, resulting in a SSTable with
RP > low_mark.
So assert's 2nd condition is there to defend against the scenario above.

truncated_at and low_mark_at uses millisecond granularity, so even if
truncated_at == low_mark_at, data could have been written in steps 2 and 3
(during same MS window), failing the assert. This is fragile.

Reproducer:

To reproduce the problem, truncated_at must be > low_mark_at, which can easily
happen with both drop table and truncate due to steps 2 and 3.

If a shard has 2 or more tablets, the table's highest RP refer to just one
tablet in that shard.
If the tablet with the highest RP is migrated away, then the sstables in that
shard will have lower RP than the recorded highest RP (it's a table wide state,
which makes sense since CL is shared among tablets).

So when either drop table or truncate runs, low_mark will be potentially bigger
than highest RP retrieved from sstables.

Proposed solution:

The current assert is hacked to not fail if writes sneak in, during steps 2 and
3, but it's still fragile and seems not to serve its real purpose, since it's
allowing for RP > low_mark.

We should be able to say that low_mark >= RP, as a way of asserting we're not
leaving data targeted by truncate behind (or that we're not removing the wrong
data).

But the problem is that we're saving low_mark in step 1, before preparation
steps (2 and 3). When truncated_at is recorded in step 4, it's a way of saying
all data written so far is targeted for removal. But as of today, low_mark
refers to all data written up to step 1. So low_mark is now only one set
before issuing flush, and also accounts for all potentially flushed data.

Fixes #18059.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23560

(cherry picked from commit 0f59deffaa)
(cherry picked from commit 7554d4bbe09967f9b7a55575b5dfdde4f6616862)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#23649
2025-04-11 10:52:37 +03:00
Raphael S. Carvalho
7007dabdf9 storage_service: Don't retry split when table is dropped
The split monitor wasn't handling the scenario where the table being
split is dropped. The monitor would be unable to find the tablet map
of such a table, and the error would be treated as a retryable one
causing the monitor to fall into an endless retry loop, with sleeps
in between. And that would block further splits, since the monitor
would be busy with the retries. The fix is about detecting table
was dropped and skipping to the next candidate, if any.

Fixes #21859.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22933

(cherry picked from commit 4d8a333a7f)

Closes scylladb/scylladb#23480
2025-04-11 10:52:05 +03:00
Aleksandra Martyniuk
636ec802c3 service: tasks: hold token_metadata_ptr in tablet_virtual_task
Hold token_metadata_ptr in tablet_virtual_task methods that iterate
over tablets, to keep the tablet_map alive.

Fixes: https://github.com/scylladb/scylladb/issues/22316.

Closes scylladb/scylladb#22740

(cherry picked from commit f8e4198e72)

Closes scylladb/scylladb#22937
2025-04-11 10:51:07 +03:00
Avi Kivity
3335557075 Merge '[Backport 2025.1] row_cache: don't garbage-collect tombstones which cover data in memtables' from Scylladb[bot]
The row cache can garbage-collect tombstones in two places:
1) When populating the cache - the underlying reader pipeline has a `compacting_reader` in it;
2) During reads - reads now compact data including garbage collection;

In both cases, garbage collection has to do overlap checks against memtables, to avoid collecting tombstones which cover data in the memtables.
This PR includes fixes for (2), which were not handled at all currently.
(1) was already supposed to be fixed, see https://github.com/scylladb/scylladb/issues/20916. But the test added in this PR showed that the test is incomplete: https://github.com/scylladb/scylladb/issues/23291. A fix for this issue is also included.

Fixes: https://github.com/scylladb/scylladb/issues/23291
Fixes: https://github.com/scylladb/scylladb/issues/23252

The fix will need backport to all live release.

- (cherry picked from commit c2518cdf1a)

- (cherry picked from commit 6b5b563ef7)

- (cherry picked from commit 7e600a0747)

- (cherry picked from commit d126ea09ba)

- (cherry picked from commit cb76cafb60)

- (cherry picked from commit df09b3f970)

- (cherry picked from commit e5afd9b5fb)

- (cherry picked from commit 34b18d7ef4)

- (cherry picked from commit f7938e3f8b)

- (cherry picked from commit 6c1f6427b3)

- (cherry picked from commit 0d39091df2)

Parent PR: #23255

Closes scylladb/scylladb#23673

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: add memtable overlap check tests
  replica/table: add error injection to memtable post-flush phase
  utils/error_injection: add a way to set parameters from error injection points
  test/cluster: add test_data_resurrection_in_memtable.py
  test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
  replica/mutation_dump: don't assume cells are live
  replica/database: do_apply() add error injection point
  replica: improve memtable overlap checks for the cache
  replica/memtable: add is_merging_to_cache()
  db/row_cache: add overlap-check for cache tombstone garbage collection
  mutation/mutation_compactor: copy key passed-in to consume_new_partition()
2025-04-10 21:42:28 +03:00
Avi Kivity
6ff7927d67 sstables: store features early in write path
sstable features indicate that an sstable has some extension, or that
some bug was fixed. They allow us to know if we can rely on certain
properties in a read sstables.

Currently, sstable features are set early in the read path (when we
read the scylla metadata file) and very late in the write path
(when we write the scylla metadata file just before sealing the sstable).

However, we happen to read features before we set them in the write path -
when we resize the bloom filter for a newly written sstable we instantiate
an index reader, and that depends on some features. As a result,
we read a disengaged optional (for the scylla metadata component) as if
it was engaged. This somehow worked so far, but fails with libstdc++
hash table implementation.

Fix it by moving storage of the features to the sstable itself, and
setting it early in the write path.

Fixes #23484

Closes scylladb/scylladb#23485

(cherry picked from commit 73e4a3c581)

Closes scylladb/scylladb#23504
2025-04-10 21:41:09 +03:00
Pavel Emelyanov
1021a3d126 Merge '[Backport 2025.1] Allow abort during join_cluster' from Scylladb[bot]
Bootstrap or replace can take a long time, but
since feef7d3fa1,
the stop_signal is checked only in checkpoints,
and in particular, abort isn't requested during
join_cluster.

Fixes #23222

* requires backport on top of https://github.com/scylladb/scylladb/pull/23184

- (cherry picked from commit 0fc196991a)

- (cherry picked from commit f269480f53)

- (cherry picked from commit 41f02c521d)

Parent PR: #23306

Closes scylladb/scylladb#23461

* github.com:scylladb/scylladb:
  main: allow abort during join_cluster
  main: add checkpoint before joining cluster
  storage_service: add start_sys_dist_ks
2025-04-10 19:03:46 +03:00
Avi Kivity
5d8bb068fa Merge '[Backport 2025.1] streaming: fix the way a reason of streaming failure is determined' from Scylladb[bot]
During streaming receiving node gets and processes mutation fragments.
If this operation fails, receiver responds with -1 status code, unless
it failed due to no_such_column_family in which case streaming of this
table should be skipped.

However, when the table was dropped, an exception handler on receiver
side may get not only data_dictionary::no_such_column_family, but also
seastar::nested_exception of two no_such_column_family.

Encountered example:
```
ERROR 2025-02-12 15:20:51,508 [shard 0:strm] stream_session - [Stream #f1cd6830-e954-11ef-afd9-b022e40bf72d] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=756dd3fe-2bf0-4dcd-afbc-cfd5202669a0: seastar::nested_exception: data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14) (while cleaning up after data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14))
```

In this case, the exception does not match the try_catch<data_dictionary::no_such_column_family>
clause and gets handled the same as any other exception type.

Replace try_catch clause with table_sync_and_check that synchronizes
the schema and check if the table exists.

Fixes: https://github.com/scylladb/scylladb/issues/22834.

Needs backport to all live version, as they all contain the bug

- (cherry picked from commit 876cf32e9d)

- (cherry picked from commit faf3aa13db)

- (cherry picked from commit 44748d624d)

- (cherry picked from commit 35bc1fe276)

Parent PR: #22868

Closes scylladb/scylladb#23290

* github.com:scylladb/scylladb:
  streaming: fix the way a reason of streaming failure is determined
  streaming: save a continuation lambda
  streaming: use streaming namespace in table_check.{cc,hh}
  repair: streaming: move table_check.{cc,hh} to streaming
2025-04-10 18:22:16 +03:00
Lakshmi Narayanan Sreethar
fb069f0fbf topology_coordinator: fix indentation in generate_migration_updates
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit dccce670c1)
2025-04-10 18:39:10 +05:30
Lakshmi Narayanan Sreethar
48077b160d topology_coordinator: do not schedule migrations when there are pending resize finalizations
Resize finalization is executed in a separate topology transition state,
`tablet_resize_finalization`, to ensure it does not overlap with tablet
transitions. The topology transitions into the
`tablet_resize_finalization` state only when no tablet migrations are
scheduled or being executed. If there is a large load-balancing backlog,
split finalization might be delayed indefinitely, leaving the tables
with large tablets.

To fix this, do not schedule tablet migrations on any tables when there
are pending resize finalizations. This ensures that migrations from the
same table and other unrelated tables do not block resize finalization.

Also added a testcase to verify the fix.

Fixes #21762

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 5b47d84399)
2025-04-10 18:39:10 +05:30
Lakshmi Narayanan Sreethar
c286fc231a load_balancer: make repair plans only when there is no pending resize finalization
Do not make repair plans if any table has pending resize finalization.
This is to ensure that the finalization doesn't get delayed by reapir
tasks.

Refs #21762

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
(cherry picked from commit 8cabc66f07)
2025-04-10 18:20:00 +05:30
Botond Dénes
df4872b82a test/boost/row_cache_test: add memtable overlap check tests
Similar to test/cluster/test_data_resurrection_in_memtable.py but works
on a single node and uses more low-level mechanism. These tests can also
reproduce more advanced scenarios, like concurrent reads, with some
reading from flushed memtables.

(cherry picked from commit 0d39091df2)
2025-04-10 06:52:18 -04:00
Botond Dénes
7943db9844 replica/table: add error injection to memtable post-flush phase
After the memtable was flushed to disk, but before it is merged to
cache. The injection point will only active for the table specified in
the "table_name" injection parameter.

(cherry picked from commit 6c1f6427b3)
2025-04-10 06:52:18 -04:00
Botond Dénes
bd8c584a01 utils/error_injection: add a way to set parameters from error injection points
With this, now it is possible to have two-way communication between
the error injection point and its enabler. The test can enable the error
injection point, then wait until it is hit, before proceedin.

(cherry picked from commit f7938e3f8b)
2025-04-10 06:52:18 -04:00
Botond Dénes
50c05abd14 test/cluster: add test_data_resurrection_in_memtable.py
Reproducers for #23252 and #23291 -- cache garbage
collecting tombstones resurrecting data in the memtable.

(cherry picked from commit 34b18d7ef4)
2025-04-10 06:52:18 -04:00
Aleksandra Martyniuk
3a49808707 streaming: fix the way a reason of streaming failure is determined
During streaming receiving node gets and processes mutation fragments.
If this operation fails, receiver responds with -1 status code, unless
it failed due to no_such_column_family in which case streaming of this
table should be skipped.

However, when the table was dropped, an exception handler on receiver
side may get not only data_dictionary::no_such_column_family, but also
seastar::nested_exception of two no_such_column_family.

Encountered example:
```
ERROR 2025-02-12 15:20:51,508 [shard 0:strm] stream_session - [Stream #f1cd6830-e954-11ef-afd9-b022e40bf72d] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=756dd3fe-2bf0-4dcd-afbc-cfd5202669a0: seastar::nested_exception: data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14) (while cleaning up after data_dictionary::no_such_column_family (Can't find a column family with UUID ef9b1ee0-e954-11ef-ba4a-faf17acf4e14))
```

In this case, the exception does not match the try_catch<data_dictionary::no_such_column_family>
clause and gets handled the same as any other exception type.

Replace try_catch clause with table_sync_and_check that synchronizes
the schema and check if the table exists.

Fixes: https://github.com/scylladb/scylladb/issues/22834.
(cherry picked from commit 35bc1fe276)
2025-04-10 09:35:56 +02:00
Aleksandra Martyniuk
b57774dea6 streaming: save a continuation lambda
In the following patches, an additional preemption point will be
added to the coroutine lambda in register_stream_mutation_fragments.

Assign a lambda to a variable to prolong the captures lifetime.

(cherry picked from commit 44748d624d)
2025-04-10 09:35:55 +02:00
Aleksandra Martyniuk
67b0ea99a0 streaming: use streaming namespace in table_check.{cc,hh}
(cherry picked from commit faf3aa13db)
2025-04-10 09:35:54 +02:00
Aleksandra Martyniuk
7fa0e041eb repair: streaming: move table_check.{cc,hh} to streaming
(cherry picked from commit 876cf32e9d)
2025-04-10 09:34:23 +02:00
Botond Dénes
de1d8372fa test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
Such that a given index in the return hosts refers to the same
underlying Scylla instance, as the same index in the passed-in nodes
list. This is what users of this method intuitively expect, but
currently the returned hosts list is unordered (has random order).

(cherry picked from commit e5afd9b5fb)
2025-04-10 03:17:27 -04:00
Botond Dénes
dcc3604e02 replica/mutation_dump: don't assume cells are live
Currently the dumper unconditionally extracts the value of atomic cells,
assuming they are live. This doesn't always hold of course and
attempting to get the value of a dead cell will lead to marshalling
errors. Fix by checking is_live() before attempting to get the cell
value. Fix for both regular and collection cells.

(cherry picked from commit df09b3f970)
2025-04-10 03:17:27 -04:00
Botond Dénes
39ca3463b3 replica/database: do_apply() add error injection point
So writes (to user tables) can be failed on a replica, via error
injection. Should simplify tests which want to create differences in
what writes different replicas receive.

(cherry picked from commit cb76cafb60)
2025-04-10 03:17:27 -04:00
Botond Dénes
1c7a6ba140 replica: improve memtable overlap checks for the cache
The current memtable overlap check that is used by the cache
-- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only
checks the active memtable, so memtables which are either being flushed
or are already flushed and also have active reads against them do not
participate in the overlap check.
This can result in temporary data resurrection, where a cache read can
garbage-collect a tombstone which still covers data in a flushing or
flushed memtable, which still have active read against it.

To prevent this, extend the overlap check to also consider all of the
memtable list. Furthermore, memtable_list::erase() now places the removed
(flushed) memtable in an intrusive list. These entries are alive only as
long as there are readers still keeping an `lw_shared_ptr<memtable>`
alive. This list is now also consulted on overlap checks.

(cherry picked from commit d126ea09ba)
2025-04-10 03:17:27 -04:00
Botond Dénes
4febf2a938 replica/memtable: add is_merging_to_cache()
And set it when the memtable is merged to cache.

(cherry picked from commit 7e600a0747)
2025-04-10 03:17:27 -04:00
Botond Dénes
b43d024ffb db/row_cache: add overlap-check for cache tombstone garbage collection
The cache should not garbage-collect tombstone which cover data in the
memtable. Add overlap checks (get_max_purgeable) to garbage collection
to detect tombstones which cover data in the memtable and to prevent
their garbage collection.

(cherry picked from commit 6b5b563ef7)
2025-04-10 03:17:27 -04:00
Botond Dénes
4bb1969a7f mutation/mutation_compactor: copy key passed-in to consume_new_partition()
This doesn't introduce additional work for single-partition queries: the
key is copied anyway on consume_end_of_stream().
Multi-partition reads and compaction are not that sensitive to
additional copy added.

This change fixes a bug in the compacting_reader: currently the reader
passes _last_uncompacted_partition_start.key() to the compactor's
consume_new_partition(). When the compactor emits enough content for this
partition, _last_uncompacted_partition_start is moved from to emit the
partition start, this makes the key reference passed to the compaction
corrupt (refer to moved-from value). This in turn means that subsequent
GC checks done by the compactor will be done with a corrupt key and
therefore can result in tombstone being garbage-collected while they
still cover data elsewhere (data resurrection).

The compacting reader is violating the API contract and normally the bug
should be fixed there. We make an exception here because doing the fix
in the mutation compactor better aligns with our future plans:
* The fix simplifies the compactor (gets rid of _last_dk).
* Prepares the way to get rid of the consume API used by the compactor.

(cherry picked from commit c2518cdf1a)
2025-04-10 03:17:27 -04:00
Anna Stuchlik
6bcf513f11 doc: add enabling consistent topology updates to the 2025.1 upgrade guide-from-2024
This commit adds the procedure to enable consistent topology updates for upgrades
from 2024.1 to 2025.1 (or from 2024.2 to 2025.1 if the feature wasn't enabled
after upgrading from 2024.1 to 2024.2).

Fixes https://github.com/scylladb/scylladb/issues/23650

Closes scylladb/scylladb#23651

(cherry picked from commit 93a7b3ac1d)

Closes scylladb/scylladb#23670
2025-04-10 10:09:23 +03:00
Botond Dénes
b1a995b571 Merge '[Backport 2025.1] tablets: Make tablet allocation equalize per-shard load ' from Scylladb[bot]
Before, it was equalizing per-node load (tablet count), which is wrong
in heterogeneous clusters. Nodes with fewer shards will end up with
overloaded shards.

Fixes #23378

- (cherry picked from commit d6232a4f5f)

- (cherry picked from commit 6bff596fce)

Parent PR: #23478

Closes scylladb/scylladb#23635

* github.com:scylladb/scylladb:
  tablets: Make tablet allocation equalize per-shard load
  tablets: load_balancer: Fix reporting of total load per node
2025-04-10 10:08:38 +03:00
Botond Dénes
ec7da3d785 tools/scylla-nodetool: s/GetInt()/GetInt64()/
GetInt() was observed to fail when the integer JSON value overflows the
int32_t type, which `GetInt()` uses for storage. When this happens,
rapidjson will assign a distinct 64 bit integer type to the value, and
attempting to access it as 32 bit integer triggers the wrong-type error,
resulting in assert failure. This was hit on the field where invoking
nodetool netstats resulted in nodetool crashing when the streamed bytes
amounts were higher than maxint.

To avoid such bugs in the future, replace all usage of GetInt() in
nodetool of GetInt64(), just to be sure.

A reproducer is added to the nodetool netstats crash.

Fixes: scylladb/scylladb#23394

Closes scylladb/scylladb#23395

(cherry picked from commit bd8973a025)

Closes scylladb/scylladb#23476
2025-04-10 10:05:18 +03:00
Botond Dénes
02d89435a9 Merge '[Backport 2025.1] Ignore wrapped exceptions gate_closed_exception and rpc::closed_error when node shuts down.' from Scylladb[bot]
Normally, when a node is shutting down, `gate_closed_exception` and `rpc::closed_error`
in `send_to_live_endpoints` should be ignored. However, if these exceptions are wrapped
in a `nested_exception`, an error message is printed, causing tests to fail.

This commit adds handling for nested exceptions in this case to prevent unnecessary
error messages.

Fixes scylladb/scylladb#23325
Fixes scylladb/scylladb#23305
Fixes scylladb/scylladb#21815

Backport: looks like this is quite a frequent issue, therefore backport to 2025.1.

- (cherry picked from commit 6abfed9817)

- (cherry picked from commit b1e89246d4)

- (cherry picked from commit 0d9d0fe60e)

- (cherry picked from commit d448f3de77)

Parent PR: #23336

Closes scylladb/scylladb#23470

* github.com:scylladb/scylladb:
  database: Pass schema_ptr as const ref in `wrap_commitlog_add_error`
  database: Unify exception handling in `do_apply` and `apply_with_commitlog`
  storage_proxy: Ignore wrapped `gate_closed_exception` and `rpc::closed_error` when node shuts down.
  exceptions: Add `try_catch_nested` to universally handle nested exceptions of the same type.
2025-04-10 10:04:50 +03:00
Kefu Chai
4e500bc806 gms: Fix fmt formatter for gossip_digest_sync
In commit 4812a57f, the fmt-based formatter for gossip_digest_syn had
formatting code for cluster_id, partitioner, and group0_id
accidentally commented out, preventing these fields from being included
in the output. This commit restores the formatting by uncommenting the
code, ensuring full visibility of all fields in the gossip_digest_syn
message when logging permits.

This fixes a regression introduced in 4812a57f, which obscured these
fields and reduced debugging insight. Backporting is recommended for
improved observability.

Fixes #23142
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23155

(cherry picked from commit 2a9966a20e)

Closes scylladb/scylladb#23199
2025-04-10 10:00:37 +03:00
Botond Dénes
0a86511359 Merge '[Backport 2025.1] reader_concurrency_semaphore: register_inactive_read(): handle aborted permit' from Scylladb[bot]
It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up.

Fixes: scylladb/scylladb#22919

Bug is present in all live versions, backports are required.

- (cherry picked from commit 4d8eb02b8d)

- (cherry picked from commit 7ba29ec46c)

Parent PR: #23044

Closes scylladb/scylladb#23145

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
  test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
2025-04-10 09:58:19 +03:00
Asias He
a7ab9149e8 repair: Fix return type for storage_service/tablets/repair API
The API returns the repair task UUID. For example:

{"tablet_task_id":"3597e990-dc4f-11ef-b961-95d5ead302a7"}

Fixes #23032

Closes scylladb/scylladb#23050

(cherry picked from commit 3f59a89e85)

Closes scylladb/scylladb#23090
2025-04-10 09:57:45 +03:00
Piotr Dulikowski
a866dada1d test: test_mv_topology_change: increase timeout for removenode
The test `test_mv_topology_change` is a regression test for
scylladb/scylladb#19529. The problem was that CL=ANY writes issued when
all replicas were down would be kept in memory until the timeout. In
particular, MV updates are CL=ANY writes and have a 5 minute timeout.
When doing topology operations for vnodes or when migrating tablet
replicas, the cluster goes through stages where the replica sets for
writes undergo changes, and the writes started with the old replica set
need to be drained first.

Because of the aforementioned MV updates, the removenode operation could
be delayed by 5 minutes or more. Therefore, the
`test_mv_topology_change` test uses a short timeout for the removenode
operation, i.e. 30s. Apparently, this is too low for the debug mode and
the test has been observed to time out even though the removenode
operation is progressing fine.

Increase the timeout to 60s. This is the lowest timeout for the
removenode operation that we currently use among the in-repo tests, and
is lower than 5 minutes so the test will still serve its purpose.

Fixes: scylladb/scylladb#22953

Closes scylladb/scylladb#22958

(cherry picked from commit 43ae3ab703)

Closes scylladb/scylladb#23053
2025-04-10 09:56:38 +03:00
Lakshmi Narayanan Sreethar
4e51f37c76 topology_coordinator: handle_table_migration: do not continue after executing metadata barrier
Return after executing the global metadata barrier to allow the topology
handler to handle any transitions that might have started by a
concurrect transaction.

Fixes #22792

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#22793

(cherry picked from commit 0f7d08d41d)

Closes scylladb/scylladb#23019
2025-04-10 09:55:53 +03:00
Botond Dénes
c44362451c replica/database: setup_scylla_memory_diagnostics_producer() un-static semaphore dump lambda
The lambda which dumps the diagnostics for each semaphore, is static.
Considering that said lambda captures a local (writeln) by reference, this
is wrong on two levels:
* The writeln captured on the shard which happens to initialize this
  static, will be used on all shards.
* The writeln captured on the first dump, will be used on later dumps,
  possibly triggering a segfault.

Drop the `static` to make the lambda local and resolve this problem.

Fixes: scylladb/scylladb#22756

Closes scylladb/scylladb#22776

(cherry picked from commit 820f196a49)

Closes scylladb/scylladb#22938
2025-04-10 09:54:37 +03:00
Calle Wilund
7b351682ac network_topology_strategy/alter ks: Remove dc:s from options once rf=0
Fixes #22688

If we set a dc rf to zero, the options map will still retain a dc=0 entry.
If this dc is decommissioned, any further alters of keyspace will fail,
because the union of new/old options will now contained an unknown keyword.

Change alter ks options processing to simply remove any dc with rf=0 on
alter, and treat this as an implicit dc=0 in nw-topo strategy.
This means we change the reallocate_tablets routine to not rely on
the strategy objects dc mapping, but the full replica topology info
for dc:s to consider for reallocation. Since we verify the input
on attribute processing, the amount of rf/tablets moved should still
be legal.

v2:
* Update docs as well.
v3:
* Simplify dc processing
* Reintroduce options empty check, but do early in ks_prop_defs
* Clean up unit test some

Closes scylladb/scylladb#22693

(cherry picked from commit 342df0b1a8)

Closes scylladb/scylladb#22877
2025-04-10 09:53:48 +03:00
Benny Halevy
eaf67dd227 main: allow abort during join_cluster
Bootstrap or replace can take a long time, but
since feef7d3fa1,
the stop_signal is checked only in checkpoints,
and in particular, abort isn't requested during
join_cluster.

Fixes #23222

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 41f02c521d)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-10 09:02:09 +03:00
Benny Halevy
fa92b6787c main: add checkpoint before joining cluster
(cherry picked from commit f269480f53)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-10 09:01:21 +03:00
Benny Halevy
a86e7ff286 storage_service: add start_sys_dist_ks
Currently, there's a call to
`supervisor::notify("starting system distributed keyspace")`
which is misleading as it is identical to a similar
message in main() when starting the sharded service.

Change that to a storage_service log messages
and be more specific that the sys_dist_ks shards are started.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0fc196991a)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-10 09:01:20 +03:00
Andrzej Jackowski
6da78533ed audit: add semaphore to audit_syslog_storage_helper
audit_syslog_storage_helper::syslog_send_helper uses Seastar's
net::datagram_channel to write to syslog device (usually /dev/log).
However, datagram_channel.send() is not fiber-safe (ref seastar#2690),
so unserialized use of send() results in packets overwriting its state.
This, in turn, causes a corruption of audit logs, as well as assertion
failures.

To workaround the problem, a new semaphore is introduced in
audit_syslog_storage_helper. As storage_helper is a member of sharded
audit service, the semaphore allows for one datagram_channel.send() on
each shard. Each audit_syslog_storage_helper stores its own
datagram_channel, therefore concurrent sends to datagram_channel are
eliminated.

This change:
 - Introduce semaphore with count=1 in audit_syslog_storage_helper.
 - Added 1 hour timeout to the semaphore, so semaphore stalls are
   failed just as all other syslog auditing failures.

Fixes: scylladb#22973
(cherry picked from commit c12f976389)
2025-04-09 14:13:24 +00:00
Andrzej Jackowski
1bb35952d7 audit: corutinize audit_syslog_storage_helper
This change:
 - Corutinize audit_syslog_storage_helper::syslog_send_helper
 - Corutinize audit_syslog_storage_helper::start
 - Corutinize audit_syslog_storage_helper::write

(cherry picked from commit 889fd5bc9f)
2025-04-09 14:13:24 +00:00
Andrzej Jackowski
efb99f29bc audit: moved syslog_send_helper to audit_syslog_storage_helper
This change:
 - Make syslog_send_helper() a method of audit_syslog_storage_helper, so
   syslog_send_helper() can access private members of
   audit_syslog_storage_helper in the next commits.
 - Remove unneeded syslog_send_helper() arguments that now are class
   members.

(cherry picked from commit dbd2acd2be)
2025-04-09 14:13:24 +00:00
Aleksandra Martyniuk
c7f1e1814c docs: nodetool: update repair and add tablet-repair docs
(cherry picked from commit 9769d7a564)
2025-04-09 14:03:29 +00:00
Aleksandra Martyniuk
7bbffb53dd test: nodetool: add tests for cluster repair command
(cherry picked from commit 02fb71da42)
2025-04-09 14:03:28 +00:00
Aleksandra Martyniuk
c5c631f175 nodetool: add cluster repair command
Add a new nodetool cluster repair command that repairs tablet keyspaces.

Users may specify keyspace and tables that they want to repair.
If the keyspace and tables are not specified, all tablet keyspaces
are repaired.

The command calls the new tablet repair API /storage_service/tablets/repair.

(cherry picked from commit 8bbc5e8923)
2025-04-09 14:03:28 +00:00
Aleksandra Martyniuk
8453d4f987 nodetool: repair: extract getting hosts and dcs to functions
(cherry picked from commit aa3973c850)
2025-04-09 14:03:28 +00:00
Aleksandra Martyniuk
a1b8ae57d9 nodetool: repair: warn about repairing tablet keyspaces
Warn about an attempt to repair tablet keysapce with nodetool repair.

A nodetool cluster repair command to repair tablet keyspaces will
be added in the following patches.

(cherry picked from commit b81c81c7f4)
2025-04-09 14:03:28 +00:00
Aleksandra Martyniuk
b500fa498d nodetool: repair: move keyspace_uses_tablets function
(cherry picked from commit cbde835792)
2025-04-09 14:03:28 +00:00
Nadav Har'El
c94d8e2471 Merge '[Backport 2025.1] transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing' from Scylladb[bot]
A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause) can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver.

However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation. For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to set it back then).

This patch fixes this.

Fixes #23173

The issue fixed by this PR is not critical but the fix is simple and safe enough so we should backport it to all live releases.

- (cherry picked from commit ca6bddef35)

- (cherry picked from commit f7e1695068)

Parent PR: #23174

Closes scylladb/scylladb#23524

* github.com:scylladb/scylladb:
  CQL Tracing: set common query parameters in a single function
  transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing
2025-04-09 14:59:13 +03:00
Kefu Chai
d7265a1bc2 storage_proxy: Prevent integer overflow in abstract_read_executor::execute
Fix UBSan abort caused by integer overflow when calculating time difference
between read and write operations. The issue occurs when:
1. The queried partition on replicas is not purgeable (has no recorded
   modified time)
2. Digests don't match across replicas
3. The system attempts to calculate timespan using missing/negative
   last_modified timestamps

This change skips cross-DC repair optimization when write timestamp is
negative or missing, as this optimization is only relevant for reads
occurring within write_timeout of a write.

Error details:
```
service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80
Aborting on shard 1, in scheduling group sl:default
```

Related to previous fix 39325cf which handled negative read_timestamp cases.

Fixes #23314
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23359

(cherry picked from commit ebf9125728)

Closes scylladb/scylladb#23387
2025-04-09 14:56:10 +03:00
Nadav Har'El
7f19a27f4f Merge '[Backport 2025.1] main: safely check stop_signal in-between starting services' from Scylladb[bot]
To simplify aborting scylla while starting the services,
add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

This patch prevents gate_closed_exception to escape handling
when start-up is aborted early with the stop signal,
causing https://github.com/scylladb/scylladb/issues/23153
The regression is apparently due to a25c3eaa1c

Fixes https://github.com/scylladb/scylladb/issues/23153

* Requires backport to 2025.1 due to a25c3eaa1c

- (cherry picked from commit 23433f593c)

- (cherry picked from commit 282ff344db)

- (cherry picked from commit feef7d3fa1)

- (cherry picked from commit b6705ad48b)

Parent PR: #23103

Closes scylladb/scylladb#23184

* github.com:scylladb/scylladb:
  main: add checkpoints
  main: safely check stop_signal in-between starting services
  main: move prometheus start message
  main: move per-shard database start message
2025-04-09 14:54:19 +03:00
Nadav Har'El
c6825920a6 alternator: in GetRecords, enforce Limit to be <= 1000
Alternator Streams' "GetRecords" operation has a "Limit" parameter on
how many records to return. The DynamoDB documentations says that the
upper limit on this Limit parameter is 1000 - but Alternator didn't
enforce this. In this patch we begin enforcing this highest Limit, and
also add a test for verifying this enforcement. As usual, the new test
passes on DynamoDB, and after this patch - also on Alternator.

The reason why it's useful to have *some* upper limit on Limit is that
the existing executor::get_records() implementation does not really have
preemption points in all the necessary places. In particular, we have a
loop on all returned records without preemption points. We also store
the returned records in a RapidJson vector, which requires a contiguous
allocation.

Even before this patch, GetRecords had a hard limit of 1 MB of results.
But still, in some cases 1 MB of results may be a lot of results, and we
can see stalls in the aforementioned places being O(number of results).

Fixes #23534

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23547

(cherry picked from commit 84fd52315f)

Closes scylladb/scylladb#23643
2025-04-09 12:46:30 +03:00
Botond Dénes
bff75aa812 Merge '[Backport 2025.1] Add tablet enforcing option' from Scylladb[bot]
This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing
`enable_tablets` option. It can be set to the following values:
    disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option
    enabled:  New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option
    enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option

`tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether
tablets are disabled or enabled by default for new keyspaces, respectively.
In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}`
keyspace option, when the keyspace is created.

`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces,
like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`

Fixes scylladb/scylla-enterprise#4355

[Edit: changed `Refs` above to `Fixes` to apeace the backport bot gods]

* Requires backport to 2025.1

- (cherry picked from commit c62865df90)

- (cherry picked from commit 62aeba759b)

- (cherry picked from commit 9fac0045d1)

Parent PR: #22273

Closes scylladb/scylladb#23602

* github.com:scylladb/scylladb:
  boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy
  tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option
  db/config: add tablets_mode_for_new_keyspaces option
2025-04-09 08:47:10 +03:00
Michał Chojnowski
2a74426084 table: fix a race in table::take_storage_snapshot()
`safe_foreach_sstable` doesn't do its job correctly.

It iterates over an sstable set under the sstable deletion
lock in an attempt to ensure that SSTables aren't deleted during the iteration.

The thing is, it takes the deletion lock after the SSTable set is
already obtained, so SSTables might get unlinked *before* we take the lock.

Remove this function and fix its usages to obtain the set and iterate
over it under the lock.

Closes scylladb/scylladb#23397

(cherry picked from commit e23fdc0799)

Closes scylladb/scylladb#23628
2025-04-08 19:07:22 +03:00
Lakshmi Narayanan Sreethar
b7e72b3167 replica/table::do_apply : do not check for async gate's closure
The `table::do_apply()` method verifies if the compaction group's async
gate is open to determine if the compaction group is active. Closing
this async gate prevents any new operations but waits for existing
holders to exit, allowing their operations to complete. When holding a
gate, holders will observe the gate as closed when it is being closed,
but this is irrelevant as they are already inside the gate and are
allowed to complete. All the callers of `table::do_apply()` already
enter the gate before calling the method. So, the async gate check
inside `table::do_apply()` will erroneously throw an exception when the
compaction group is closing despite holding the gate. This commit
removes the check to prevent this from happening.

Fixes #23348

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#23579

(cherry picked from commit 750f4baf44)

Closes scylladb/scylladb#23645
2025-04-08 18:59:22 +03:00
Yaron Kaikov
98359dbfb1 .github: Make "make-pr-ready-for-review" workflow run in base repo
in 57683c1a50 we fixed the `token` error,
but removed the checkout part which causing now the following error
```
failed to run git: fatal: not a git repository (or any of the parent directories): .git
```
Adding the repo checkout stage to avoid such error

Fixes: https://github.com/scylladb/scylladb/issues/22765

Closes scylladb/scylladb#23641

(cherry picked from commit 2dc7ea366b)

Closes scylladb/scylladb#23654
2025-04-08 13:49:27 +03:00
Benny Halevy
27ca0d1812 boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 9fac0045d1)
2025-04-08 08:35:26 +03:00
Benny Halevy
736f89b31a tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option
`tablets_mode_for_new_keyspaces=enforced` enables tablets by default for
new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`.
However, it does not allow to opt-out when creating
new keyspaces by setting `tablets = {'enabled': false}`.

Refs scylladb/scylla-enterprise#4355

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 62aeba759b)
2025-04-08 08:35:14 +03:00
Benny Halevy
a49e27ac8f db/config: add tablets_mode_for_new_keyspaces option
The new option deprecates the existing `enable_tablets` option.
It will be extended in the next patch with a 3rd value: "enforced"
while will enable tablets by default for new keyspace but
without the posibility to opt out using the `tablets = {'enabled':
false}` keyspace schema option.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit c62865df90)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-08 08:08:47 +03:00
Tomasz Grabiec
4f4c884d5d tablets: Make tablet allocation equalize per-shard load
Before, it was equalizing per-node load (tablet count), which is wrong
in heterogenous clusters. Nodes with fewer shards will end up with
overloaded shards.

Refs #23378

(cherry picked from commit 6bff596fce)
2025-04-07 18:14:11 +02:00
Tomasz Grabiec
55bfbe8ea3 tablets: load_balancer: Fix reporting of total load per node
Load is now utilization, not count, so we should report average
per-shard load, which is equivalent to node's utilization.

(cherry picked from commit d6232a4f5f)
2025-04-07 15:51:08 +00:00
Botond Dénes
1a896169dc Merge '[Backport 2025.1] repair: release erm in repair_writer_impl::create_writer when possible' from Scylladb[bot]
Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed.

Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked.

Fixes: #23453.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

- (cherry picked from commit 1dc29ddc86)

- (cherry picked from commit bae6711809)

Parent PR: #23455

Closes scylladb/scylladb#23580

* github.com:scylladb/scylladb:
  \test: add test to check concurrent migration and repair of two different tablets
  repair: release erm in repair_writer_impl::create_writer when possible
2025-04-07 10:10:20 +03:00
Kefu Chai
9ccad33e59 .github: Make "make-pr-ready-for-review" workflow run in base repo
The "make-pr-ready-for-review" workflow was failing with an "Input
required and not supplied: token" error.  This was due to GitHub Actions
security restrictions preventing access to the token when the workflow
is triggered in a fork:
```
    Error: Input required and not supplied: token
```

This commit addresses the issue by:

- Running the workflow in the base repository instead of the fork. This
  grants the workflow access to the required token with write permissions.
- Simplifying the workflow by using a job-level `if` condition to
  controlexecution, as recommended in the GitHub Actions documentation
  (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution).
  This is cleaner than conditional steps.
- Removing the repository checkout step, as the source code is not required for this workflow.

This change resolves the token error and ensures the
"make-pr-ready-for-review" workflow functions correctly.

Fixes scylladb/scylladb#22765

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22766

(cherry picked from commit ca832dc4fb)

Closes scylladb/scylladb#23561
2025-04-07 08:10:10 +03:00
Piotr Smaron
a17dd4d4c9 [Backport 2025.1] auth: forbid modifying system ks by non-superusers
Before this patch, granting a user MODIFY permissions on ALL KEYSPACES allowed the user to write to system tables, where the user could also set himself to "superuser" granting him all other permissions. After this patch, MODIFY permissions on ALL KEYSPACES is limited only to non-system keyspaces.

Fixes: scylladb/scylladb#23218
(cherry picked from commit fee50f287c)

Parent PR: #23219

Closes scylladb/scylladb#23594
2025-04-06 15:10:06 +03:00
Nadav Har'El
a2a4c6e4b2 test/alternator: increase timeout in Alternator RBAC test
On our testing infrastructure, tests often run a hundred times (!)
slower than usual, for various reasons that we can't always avoid.
This is why all our test frameworks drastically increase the default
timeouts.

We forgot to increase the timeout in one place - where Alternator tests
use CQL. This is needed for the Alternator role-based access control
(RBAC) tests, which is configured via CQL and therefore the Alternator
test unusually uses CQL.

So in this patch we increase the timeout of CQL driver used by
Alternator tests to the same high timeouts (60-120 seconds) used by
the regular CQL tests. As the famous saying goes, these timeouts should
be enough for anyone.

Fixes #23569.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23578

(cherry picked from commit a9a6f9eecc)

Closes scylladb/scylladb#23601
2025-04-06 11:49:46 +03:00
Avi Kivity
64182d9df6 Update seastar submodule (prefaulter leaving zombie threads)
* seastar a350b5d70e...6d8fccf14c (1):
  > smp: prefaulter: don't leave zombie worker threads

Fixes #23316
2025-04-05 22:28:53 +03:00
Pavel Emelyanov
8e85ef90d2 sstables_loader: Do not stop sharded<progress_monitor> unconditionally
The member in question is unconditionally .stop()-ed in task's
release_resources() method, however, it may happen that the thing wasn't
.start()-ed in the first place. Start happens in the middle of the
task's .run() method and there can be several reasons why it can be
skipped -- e.g. the task is aborted early, or collecting sstables from
S3 throws.

fixes: #23231

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23483

(cherry picked from commit 832d83ae4b)

Closes scylladb/scylladb#23557
2025-04-04 17:46:20 +03:00
Aleksandra Martyniuk
b5b2ffa5df \test: add test to check concurrent migration and repair of two different tablets
(cherry picked from commit bae6711809)
2025-04-04 10:14:51 +02:00
Andrzej Jackowski
b7f067ce33 audit: fix empty query string in BATCH query
Function modification_statement::add_raw() is never called, which
makes query string in audit_info of batch queries empty. In enterprise
branch, add_raw is called in Cql.g and those changes were never merged
to master.

This changes:
 - Add missing call of add_raw() to Cql.g
 - Include other related changes (from PR#3228 in scylla-enterprise)

Fixes scylladb#23311

Closes scylladb/scylladb#23315

(cherry picked from commit b8adbcbc84)

Closes scylladb/scylladb#23495
2025-04-03 16:46:33 +03:00
Aleksandra Martyniuk
307f00a398 repair: release erm in repair_writer_impl::create_writer when possible
Currently, repair_writer_impl::create_writer keeps erm to ensure
that a sharder is valid. If we repair a tablet, erm blocks the state
machine and no operation on any tablet of this table might be performed.

Use auto_refreshing_sharder and topology_guard to ensure that the
operation is safe and that tablet operations on the whole table
aren't blocked.

Fixes: #23453.
(cherry picked from commit 1dc29ddc86)
2025-04-03 13:19:40 +00:00
Dawid Mędrek
c56e47f72f db/hints: Cancel draining when stopping node
Draining hints may occur in one of the two scenarios:

* a node leaves the cluster and the local node drains all of the hints
  saved for that node,
* the local node is being decommissioned.

Draining may take some time and the hint manager won't stop until it
finishes. It's not a problem when decommissioning a node, especially
because we want the cluster to retain the data stored in the hints.
However, it may become a problem when the local node started draining
hints saved for another node and now it's being shut down.

There are two reasons for that:

* Generally, in situations like that, we'd like to be able to shut down
  nodes as fast as possible. The data stored in the hints won't
  disappear from the cluster yet since we can restart the local node.
* Draining hints may introduce flakiness in tests. Replaying hints doesn't
  have the highest priority and it's reflected in the scheduling groups we
  use as well as the explicitly enforced throughput. If there are a large
  number of hints to be replayed, it might affect our tests.
  It's already happened, see: scylladb/scylladb#21949.

To solve those problems, we change the semantics of draining. It will behave
as before when the local node is being decommissioned. However, when the
local node is only being stopped, we will immediately cancel all ongoing
draining processes and stop the hint manager. To amend for that, when we
start a node and it initializes a hint endpoint manager corresponding to
a node that's already left the cluster, we will begin the draining process
of that endpoint manager right away.

That should ensure all data is retained, while possibly speeding up
the shutdown process.

There's a small trade-off to it, though. If we stop a node, we can then
remove it. It won't have a chance to replay hints it might've before
these changes, but that's an edge case. We expect this commit to bring
more benefit than harm.

We also provide tests verifying that the implementation works as intended.

Fixes scylladb/scylladb#21949

Closes scylladb/scylladb#22811

(cherry picked from commit 0a6137218a)

Closes scylladb/scylladb#23370
2025-04-03 09:09:05 +02:00
Tomasz Grabiec
51ee15f02d Merge '[Backport 2025.1] tablets: Make load balancing capacity-aware' from Tomasz Grabiec
Before this patch, the load balancer was equalizing tablet count per
shard, so it achieved balance assuming that:
 1) tablets have the same size
 2) shards have the same capacity

That can cause imbalance of utilization if shards have different
capacity, which can happen in heterogeneous clusters with different
instance types. One of the causes for capacity difference is that
larger instances run with fewer shards due to vCPUs being dedicated to
IRQ handling. This makes those shards have more disk capacity, and
more CPU power.

After this patch, the load balancer equalizes shard's storage
utilization, so it no longer assumes that shards have the same
capacity. It still assumes that each tablet has equal size. So it's a
middle step towards full size-aware balancing.

One consequence is that to be able to balance, the load balancer need
to know about every node's capacity, which is collected with the same
RPC which collects load_stats for average tablet size. This is not a
significant set back because migrations cannot proceed anyway if nodes
are down due to barriers. We could make intra-node migration
scheduling work without capacity information, but it's pointless due
to above, so not implemented.

Also, per-shard goal for tablet count is still the same for all nodes in the cluster,
so nodes with less capacity will be below limit and nodes with more capacity will
be slightly above limit. This shouldn't be a significant problem in practice, we could
compensate for this by increasing the limit.

Fixes #23042

* github.com:scylladb/scylladb:
  tablets: Make load balancing capacity-aware
  topology_coordinator: Fix confusing log message
  topology_coordinator: Refresh load stats after adding a new node
  topology_coordinator: Allow capacity stats to be refreshed with some nodes down
  topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places
  test: boost: tablets_test: Always provide capacity in load_stats
  test: perf_load_balancing: Set node capacity
  test: perf_load_balancing: Convert to topology_builder
  config, disk_space_monitor: Allow overriding capacity via config
  storage_service, tablets: Collect per-node capacity in load_stats
  test: tablets_test: Add support for auto-split mode
  test: cql_test_env: Expose db config

Closes scylladb/scylladb#23443

* github.com:scylladb/scylladb:
  Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec
  test: tablets_test: Add support for auto-split mode
  test: cql_test_env: Expose db config
2025-04-01 20:31:05 +02:00
Vlad Zolotarov
feadb781f2 CQL Tracing: set common query parameters in a single function
Each query-type (QUERY, EXECUTE, BATCH) CQL opcode has a number of parameters
in their payload which we always want to record in the Tracing object.
Today it's a Consistency Level, Serial Consistency Level and a Default Timestamp.

Setting each of them individually can lead to a human error when one (or more) of
them would not be set. Let's eliminate such a possibility by defining
a single function that sets them all.

This also allows an easy addition of such parameters to this function in
the future.

(cherry picked from commit f7e1695068)
2025-04-01 11:45:54 +00:00
Vlad Zolotarov
6b71d6b9ba transport/server.cc: set default timestamp info in EXECUTE and BATCH tracing
A default timestamp (not to confuse with the timestamp passed via 'USING TIMESTAMP' query clause)
can be set using 0x20 flag and the <timestamp> field in the binary CQL frame payload of
QUERY, EXECUTE and BATCH ops. It also happens to be a default of a Java CQL Driver.

However, we were only setting the corresponding info in the CQL Tracing context of a QUERY operation.
For an unknown reason we were not setting this for an EXECUTE and for a BATCH traces (I guess I simply forgot to
set it back then).

This patch fixes this.

Fixes #23173

(cherry picked from commit ca6bddef35)
2025-04-01 11:45:54 +00:00
Jenkins Promoter
36bb089663 Update ScyllaDB version to: 2025.1.1 2025-04-01 14:18:18 +03:00
yangpeiyu2_yewu
1661b35050 mutation_writer/multishard_writer.cc: wrap writer into futurize_invoke
wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction.

Fixes scylladb/scylladb#22790

Closes scylladb/scylladb#22812

(cherry picked from commit 0de232934a)

Closes scylladb/scylladb#22943
2025-04-01 13:46:27 +03:00
Asias He
8c93a331f7 repair: Enable small table optimization for system_replicated_keys
This enterprise-only system table is replicated and small. It should be
included for small table optimization.

Fixes scylladb/scylla-enterprise#5256

Closes scylladb/scylladb#23135

Closes scylladb/scylladb#23147
2025-04-01 13:36:51 +03:00
Calle Wilund
85c161b9f1 generic_server: Update conditions for is_broken_pipe_or_connection_reset
Refs scylla-enterprise#5185
Fixes #22901

If a tls socket gets EPIPE the error is not translated to a specific
gnutls error code, but only a generic ERROR_PULL/PUSH. Since we treat
EPIPE as ignorable for plain sockets, we need to unwind nested exception
here to detect that the error was in fact due to this, so we can suppress
log output for this.

Closes scylladb/scylladb#22888

(cherry picked from commit e49f2046e5)

Closes scylladb/scylladb#23045
2025-04-01 13:06:29 +03:00
Patryk Jędrzejczak
d088cc8a2d Merge '[Backport 2025.1] Fix a regression that sometimes causes an internal error and demote barrier_and_drain rpc error log to a warning ' from Scylladb[bot]
The series fixes a regression and demotes a barrier_and_drain logging error to a warning since this particular condition may happen during normal operation.

We want to backport both since one is a bug fix and another is trivial and reduces CI flakiness.

- (cherry picked from commit 1da7d6bf02)

- (cherry picked from commit fe45ea505b)

Parent PR: #22650

Closes scylladb/scylladb#22923

* https://github.com/scylladb/scylladb:
  topology_coordinator: demote barrier_and_drain rpc failure to warning
  topology_coordinator: read peers table only once during topology state application
2025-04-01 11:54:56 +02:00
Patryk Jędrzejczak
39c20144e5 Merge '[Backport 2025.1] raft topology: Add support for raft topology init to happen before group0 initialization' from Scylladb[bot]
In the current scenario, the problem discovered is that there is a time
gap between group0 creation and raft_initialize_discovery_leader call.
Because of that, the group0 snapshot/apply entry enters wrong values
from the disk(null) and updates the in-memory variables to wrong values.
During the above time gap, the in-memory variables have wrong values and
perform absurd actions.

This PR removes the variable `_manage_topology_change_kind_from_group0`
which was used earlier as a work around for correctly handling
`topology_change_kind` variable, it was brittle and had some bugs
(causing issues like scylladb/scylladb#21114). The reason for this bug
that _manage_topology_change_kind used to block reading from disk and
was enabled after group0 initialization and starting raft server for the
restart case. Similarly, it was hard to manage `topology_change_kind`
using `_manage_topology_change_kind_from_group0` correctly in bug free
anner.

Post `_manage_topology_change_kind_from_group0` removal, careful
management of `topology_change_kind` variable was needed for maintaining
correct `topology_change_kind` in all scenarios. So this PR also performs
a refactoring to populate all init data to system tables even before
group0 creation(via `raft_initialize_discovery_leader` function). Now
because `raft_initialize_discovery_leader` happens before the group 0
creation, we write mutations directly to system tables instead of a
group 0 command. Hence, post group0 creation, the node can read the
correct values from system tables and correct values are maintained
throughout.

Added a new function `initialize_done_topology_upgrade_state` which
takes care of updating the correct upgrade state to system tables before
starting group0 server. This ensures that the node can read the correct
values from system tables and correct values are maintained throughout.

By moving `raft_initialize_discovery_leader` logic to happen before
starting group0 server, and not as group0 command post server start, we
also get rid of the potential problem of init group0 command not being
the 1st command on the server. Hence ensuring full integrity as expected
by programmer.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#21114

- (cherry picked from commit 4748125a48)

- (cherry picked from commit e491950c47)

- (cherry picked from commit 623e01344b)

- (cherry picked from commit d7884cf651)

Parent PR: #22484

Closes scylladb/scylladb#22966

* https://github.com/scylladb/scylladb:
  storage_service: Remove the variable _manage_topology_change_kind_from_group0
  storage_service: fix indentation after the previous commit
  raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
  service/raft: Refactor mutation writing helper functions.
2025-04-01 11:46:15 +02:00
Jenkins Promoter
f1e7cee7a5 Update pgo profiles - aarch64 2025-04-01 04:20:56 +03:00
Jenkins Promoter
023b27312d Update pgo profiles - x86_64 2025-04-01 04:08:00 +03:00
Anna Stuchlik
2ffbc81e19 doc: remove the outdated info on seeds-info
This commit removes the outdated information about seed nodes.
We no longer need it in the docs, as a) the documentation is versioned,
and b) the ScyllaDB Open Source 4.3 and ScyllaDB Enterprise 2021.1 versions
mentioned in the docs are no longer supported.

In addition, some clarification has been added to the existing sections.

Fixes https://github.com/scylladb/scylladb/issues/22400

Closes scylladb/scylladb#23282

(cherry picked from commit dbbf9e19e4)

Closes scylladb/scylladb#23327
2025-03-31 12:33:59 +02:00
Yaron Kaikov
88e548ed72 .github: add action to make PR ready for review when conflicts label was removed
Moving a PR out of draft is only allowed to users with write access,
adding a github action to switch PR to `ready for review` once the
`conflicts` label was removed

Closes scylladb/scylladb#22446

(cherry picked from commit ed4bfad5c3)

Closes scylladb/scylladb#23023
2025-03-31 13:22:04 +03:00
Tomasz Grabiec
975882a489 test: tablets: Fix flakiness due to ungraceful shutdown
The test fails sporadically with:

cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}

That's becase a server is stopped in the middle of the workload.

The server is stopped ungracefully which will cause some requests to
time out. We should stop it gracefully to allow in-flight requests to
finish.

Fixes #20492

Closes scylladb/scylladb#23451

(cherry picked from commit 8e506c5a8f)

Closes scylladb/scylladb#23469
2025-03-28 14:56:02 +01:00
Sergey Zolotukhin
bfb242b735 database: Pass schema_ptr as const ref in wrap_commitlog_add_error
(cherry picked from commit d448f3de77)
2025-03-27 21:28:13 +00:00
Sergey Zolotukhin
fe94b5a475 database: Unify exception handling in do_apply and apply_with_commitlog
Move exception wrapping logic from `do_apply` and `apply_with_commitlog`
to `wrap_commitlog_add_error` to ensure consistent error handling.

(cherry picked from commit 0d9d0fe60e)
2025-03-27 21:28:13 +00:00
Sergey Zolotukhin
f7b7a47404 storage_proxy: Ignore wrapped gate_closed_exception and rpc::closed_error when node shuts down.
Normally, when a node is shutting down, `gate_closed_exception` and `rpc::closed_error`
in `send_to_live_endpoints` should be ignored. However, if these exceptions are wrapped
in a `nested_exception`, an error message is printed, causing tests to fail.

This commit adds handling for nested exceptions in this case to prevent unnecessary
error messages.

Fixes scylladb/scylladb#23325

(cherry picked from commit b1e89246d4)
2025-03-27 21:28:13 +00:00
Sergey Zolotukhin
0e0d5241db exceptions: Add try_catch_nested to universally handle nested exceptions of the same type.
(cherry picked from commit 6abfed9817)
2025-03-27 21:28:12 +00:00
Evgeniy Naydanov
3653662099 test.py: random_failures: deselect topology ops for some injections
After recent changes #18640 and #19151 started to reproduce for
stop_after_sending_join_node_request and
stop_after_bootstrapping_initial_raft_configuration error injections too.

The solution is the same: deselect the tests.

Fixes #23302

Closes scylladb/scylladb#23405

(cherry picked from commit 574c81eac6)

Closes scylladb/scylladb#23460
2025-03-27 13:19:59 +02:00
Anna Stuchlik
7336bb38fa doc: fix product names in the 2025.1 upgrage guides
This commit fixes the product names in the upgrade 2025.1 guides so that:

- 6.2 is preceded with "ScyllaDB Open Source"
- 2024.x is preceded with "ScyllaDB Enterprise"
- 2025.1 is preceded with "ScyllaDB"

Fixes https://github.com/scylladb/scylladb/issues/23154

Closes scylladb/scylladb#23223

(cherry picked from commit cd61f60549)

Closes scylladb/scylladb#23328
2025-03-27 11:58:01 +02:00
Avi Kivity
cff90755d8 Merge 'tablets: Make load balancing capacity-aware' from Tomasz Grabiec
Before this patch, the load balancer was equalizing tablet count per
shard, so it achieved balance assuming that:
 1) tablets have the same size
 2) shards have the same capacity

That can cause imbalance of utilization if shards have different
capacity, which can happen in heterogeneous clusters with different
instance types. One of the causes for capacity difference is that
larger instances run with fewer shards due to vCPUs being dedicated to
IRQ handling. This makes those shards have more disk capacity, and
more CPU power.

After this patch, the load balancer equalizes shard's storage
utilization, so it no longer assumes that shards have the same
capacity. It still assumes that each tablet has equal size. So it's a
middle step towards full size-aware balancing.

One consequence is that to be able to balance, the load balancer need
to know about every node's capacity, which is collected with the same
RPC which collects load_stats for average tablet size. This is not a
significant set back because migrations cannot proceed anyway if nodes
are down due to barriers. We could make intra-node migration
scheduling work without capacity information, but it's pointless due
to above, so not implemented.

Also, per-shard goal for tablet count is still the same for all nodes in the cluster,
so nodes with less capacity will be below limit and nodes with more capacity will
be slightly above limit. This shouldn't be a significant problem in practice, we could
compensate for this by increasing the limit.

Refs #23042

Closes scylladb/scylladb#23079

* github.com:scylladb/scylladb:
  tablets: Make load balancing capacity-aware
  topology_coordinator: Fix confusing log message
  topology_coordinator: Refresh load stats after adding a new node
  topology_coordinator: Allow capacity stats to be refreshed with some nodes down
  topology_coordinator: Refactor load status refreshing so that it can be triggered from multiple places
  test: boost: tablets_test: Always provide capacity in load_stats
  test: perf_load_balancing: Set node capacity
  test: perf_load_balancing: Convert to topology_builder
  config, disk_space_monitor: Allow overriding capacity via config
  storage_service, tablets: Collect per-node capacity in load_stats

(cherry picked from commit b1d9f80d85)
2025-03-25 23:16:35 +01:00
Tomasz Grabiec
3be469da29 test: tablets_test: Add support for auto-split mode
rebalance_tablets() was performing migrations and merges automatically
but not splits, because splits need to be acked by replicas via
load_stats. It's inconvenient in tests which want to rebalance to the
equilibrium point. This patch changes rebalance_tablets() to split
automatically by default, can be disabled for tests which expect
differently.

shared_load_stats was introduced to provide a stable holder of
load_stats which can be reused across rebalance_tablets() calls.

(cherry picked from commit 5e471c6f1b)
2025-03-25 18:23:22 +01:00
Tomasz Grabiec
1895724465 test: cql_test_env: Expose db config
(cherry picked from commit f3b63bfeff)
2025-03-25 18:22:32 +01:00
Jenkins Promoter
9dca28d2b8 Update ScyllaDB version to: 2025.1.0 2025-03-25 09:19:12 +02:00
Avi Kivity
bc98301783 Merge '[Backport 2025.1] repair: allow concurrent repair and migration of two different tablets' from Aleksandra Martyniuk
Do not hold erm during repair of a tablet that is started with tablet
repair scheduler. This way two different tablets can be repaired
and migrated concurrently. The same tablet won't be migrated while
being repaired as it is provided by topology coordinator.

Use topology_guard to maintain safety.

Fixes: https://github.com/scylladb/scylladb/issues/22408.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

Closes scylladb/scylladb#23362

* github.com:scylladb/scylladb:
  test: add test to check concurrent tablets migration and repair
  repair: do not hold erm for repair scheduled by scheduler
  repair: get total rf based on current erm
  repair: make shard_repair_task_impl::erm private
  repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
  repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
  repair: pass session_id to repair_writer_impl::create_writer
  repair: keep materialized topology guard in shard_repair_task_impl
  repair: pass session_id to repair_meta
2025-03-23 20:14:53 +02:00
Avi Kivity
220bbcf329 Merge '[Backport 2025.1] cql3: Introduce RF-rack-valid keyspaces' from Scylladb[bot]
This PR is an introductory step towards enforcing
RF-rack-valid keyspaces in Scylla.

The scope of changes:
* defining RF-rack-valid keyspaces,
* introducing a configuration option enforcing RF-rack-valid
  keyspaces,
* restricting the CREATE and ALTER KEYSPACE statements
  so that they never lead to RF-rack invalid keyspaces,
* during the initialization of a node, it verifies that all existing
  keyspaces are RF-rack-valid. If not, the initialization fails.

We provide tests verifying that the changes behave as intended.

---

Note that there are a number of things that still need to be implemented.
That includes, for instance, restricting topology operations too.

---

Implementation strategy (going beyond the scope of this PR):

1. Introduce the new configuration option `rf_rack_valid_keyspaces`.
2. Start enforcing RF-rack-validity in keyspaces if the option is enabled.
3. Adjust the tests: in the tree and out of it. Explicitly enable the option in all tests.
4. Once the tests have been adjusted, change the default value of the option to enabled.
5. Stop explicitly enabling the option in tests.
6. Get rid of the option.

---

Fixes scylladb/scylladb#20356
Fixes scylladb/scylladb#23276
Fixes scylladb/scylladb#23300

---

Backport: this is part of the requirements for releasing 2025.1.

- (cherry picked from commit 32879ec0d5)

- (cherry picked from commit 41f862d7ba)

- (cherry picked from commit 0e04a6f3eb)

Parent PR: #23138

Closes scylladb/scylladb#23398

* github.com:scylladb/scylladb:
  main: Refuse to start node when RF-rack-invalid keyspace exists
  cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
  db/config: Introduce RF-rack-valid keyspaces
2025-03-23 16:16:29 +02:00
Dawid Mędrek
ecdefe801c main: Refuse to start node when RF-rack-invalid keyspace exists
When a node is started with the option `rf_rack_valid_keyspaces`
enabled, the initialization will fail if there is an RF-rack-invalid
keyspace. We want to force the user to adjust their existing
keyspaces when upgrading to 2025.* so that the invariant that
every keyspace is RF-rack-valid is always satisfied.

Fixes scylladb/scylladb#23300

(cherry picked from commit 0e04a6f3eb)
2025-03-21 12:27:04 +00:00
Dawid Mędrek
af2215c2d2 cql3: Ensure that CREATE and ALTER never lead to RF-rack-invalid keyspaces
In this commit, we refuse to create or alter a keyspace when that operation
would make it RF-rack-invalid if the option `rf_rack_valid_keyspaces` is
enabled.

We provide two tests verifying that the changes work as intended.

Fixes scylladb/scylladb#23276

(cherry picked from commit 41f862d7ba)
2025-03-21 12:27:04 +00:00
Dawid Mędrek
864528eb9b db/config: Introduce RF-rack-valid keyspaces
We introduce a new term in the glossary: RF-rack-valid keyspace.

We also highlight in our user documentation that all keyspaces
must remain RF-rack-valid throughout their lifetime, and failing
to guarantee that may result in data inconsistencies or other
issues. We base that information on our experience with materialized
views in keyspaces using tablets, even though they remain
an experimental feature.

Along with the new term, we introduce a new configuration option
called `rf_rack_valid_keyspaces`, which, when enabled, will enforce
preserving all keyspaces RF-rack-valid. That functionality will be
implemented in upcoming commits. For now, we materialize the
restriction in form of a named requirement: a function verifying
that the passed keyspace is RF-rack-valid.

The option is disabled by default. That will change once we adjust
the existing tests to the new semantics. Once that is done, the option
will first be enabled by default, and then it will be removed.

Fixes scylladb/scylladb#20356

(cherry picked from commit 32879ec0d5)
2025-03-21 12:27:04 +00:00
Aleksandra Martyniuk
5153b91514 test: add test to check concurrent tablets migration and repair
Add a test to check whether a tablet can be migrated while another
tablet is repaired.

(cherry picked from commit 20f9d7b6eb)
2025-03-19 10:15:19 +01:00
Aleksandra Martyniuk
0a0347cb4e repair: do not hold erm for repair scheduled by scheduler
Do not hold erm	for tablet repair scheduled by scheduler. Thanks to
that one tablet repair won't exclude migration of other tablets.

Concurrent repair and migration of the same tablet isn't possible,
since a tablet can be in one type of transition only at the time.
Hence the change is safe.

Refs: https://github.com/scylladb/scylladb/issues/22408.
(cherry picked from commit 5b792bdc98)
2025-03-19 10:09:51 +01:00
Aleksandra Martyniuk
da64c02b92 repair: get total rf based on current erm
Get total rf based on erm. Currently, it does not change anything
because erm stays the same during the whole repair.

(cherry picked from commit a1375896df)
2025-03-19 10:09:30 +01:00
Aleksandra Martyniuk
39aabe5191 repair: make shard_repair_task_impl::erm private
Make shard_repair_task_impl::erm private. Access it with getter.

(cherry picked from commit 34cd485553)
2025-03-19 10:09:11 +01:00
Aleksandra Martyniuk
9eeff8573b repair: do not pass erm to put_row_diff_with_rpc_stream when unnecessary
When small_table_optimization isn't enabled, put_row_diff_with_rpc_stream
does not access erm. Pass small_table_optimization_params containing erm
only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit 444c7eab90)
2025-03-19 10:08:22 +01:00
Aleksandra Martyniuk
4115f6f367 repair: do not pass erm to flush_rows_in_working_row_buf when unnecessary
When small_table_optimization isn't enabled, flush_rows_in_working_row_buf
does not access erm. Add small_table_optimization_params containing erm and
pass it only when small_table_optimization is enabled.

This is safe as erm is kept by shard_repair_task_impl.

(cherry picked from commit e56bb5b6e2)
2025-03-19 10:07:45 +01:00
Aleksandra Martyniuk
fb2c46dfbe repair: pass session_id to repair_writer_impl::create_writer
(cherry picked from commit 09c74aa294)
2025-03-19 10:07:00 +01:00
Aleksandra Martyniuk
b4e37600d6 repair: keep materialized topology guard in shard_repair_task_impl
Keep materialized topology guard in shard_repair_task_impl and check
it in check_in_abort_or_shutdown and before each range repair.

(cherry picked from commit 47bb9dcf78)
2025-03-19 10:04:17 +01:00
Aleksandra Martyniuk
6bbf20a440 repair: pass session_id to repair_meta
Pass session_id of tablet repair down the stack from the repair request
to repair_meta.

The session_id will be utiziled in the following patches.

(cherry picked from commit 928f92c780)
2025-03-19 10:02:24 +01:00
Botond Dénes
b8797551eb Merge '[Backport 2025.1] Rack aware tablet merge colocation migration ' from Tomasz Grabiec
service: Introduce rack-aware co-location migrations for tablet merge

Merge co-location can emit migrations across racks even when RF=#racks,
reducing availability and affecting consistency of base-view pairing.

Given replica set of sibling tablets T0 and T1 below:
[T0: (rack1,rack3,rack2)]
[T1: (rack2,rack1,rack3)]

Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at
only a subset of racks, reducing availability.

This is the main problem fixed by this patch.

It also lays the ground for consistent base-view replica pairing,
which is rack-based. For tables on which views can be created we plan
to enforce the constraint that replicas don't move across racks and
that all tablets use the same set of racks (RF=#racks). This patch
avoids moving replicas across racks unless it's necessary, so if the
constraint is satisfied before merge, there will be no co-locating
migrations across racks. This constraint of RF=#racks is not enforced
yet, it requires more extensive changes.

Fixes #22994.
Refs #17265.

This patch is based on Raphael's work done in PR #23081. The main differences are:

1) Instead of sorting replicas by rack, we try to find
    replicas in sibling tablets which belong to the same rack.
    This is similar to how we match replicas within the same host.
    It reduces number of across-rack migrations even if RF!=#racks,
    which the original patch didn't handle.
    Unlike the original patch, it also avoids rack-overloaded in case
    RF!=#racks

2) We emit across-rack co-locating migrations if we have no other choice
   in order to finalize the merge

   This is ok, since views are not supported with tablets yet. Later,
   we will disallow this for tables which have views, and we will
   allow creating views in the first place only when no such migrations
   can happen (RF=#racks).

3) Added boost unit test which checks that rack overload is avoided during merge
   in case RF<#racks

4) Moved logging of across-rack migration to debug level

5) Exposed metric for across-rack co-locating migrations

(cherry picked from commit af949f3b6a)

Also backports dependent patches:
  - locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
  - locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
  - Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec

Closes scylladb/scylladb#22657
Closes scylladb/scylladb#22652

Closes scylladb/scylladb#23297

* github.com:scylladb/scylladb:
  service: Introduce rack-aware co-location migrations for tablet merge
  Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec
  locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
  locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
2025-03-18 16:22:29 +02:00
Nadav Har'El
b1cf1890a9 alternator: document the state of tablet support in Alternator
In commit c24bc3b we decided that creating a new table in Alternator
will by default use vnodes - not tablets - because of all the missing
features in our tablets implementation that are important for
Alternator, namely - LWT, CDC and Alternator TTL.

We never documented this, or the fact that we support a tag
`experimental:initial_tablets` which allows to override this decision
and create an Alternator table using tablets. We also never documented
what exactly doesn't work when Alternator uses tablet.

This patch adds the missing documentation in docs/alternator/new-apis.md
(which is a good place for describing the `experimental:initial_tablets`
tag). The patch also adds a new test file, test_tablets.py, which
includes tests for all the statements made in the document regarding
how `experimental:initial_tablets` works and what works or doesn't
work when tablets are enabled.

Two existing tests - for TTL and Streams non-support with tablets -
are moved to the new test file.

When the tablets feature will finally be completed, both the document
and the tests will need to be modified (some of the tests should be
outright deleted). But it seems this will not happen for at least
several months, and that is too long to wait without accurate
documentation.

Fixes #21629

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22462

(cherry picked from commit c0821842de)

Closes scylladb/scylladb#23298
2025-03-16 18:25:21 +02:00
Jenkins Promoter
2f0ebe9f49 Update pgo profiles - aarch64 2025-03-15 04:21:14 +02:00
Jenkins Promoter
3633fb9ff8 Update pgo profiles - x86_64 2025-03-15 04:13:25 +02:00
Raphael S. Carvalho
33b5f27057 service: Introduce rack-aware co-location migrations for tablet merge
Merge co-location can emit migrations across racks even when RF=#racks,
reducing availability and affecting consistency of base-view pairing.

Given replica set of sibling tablets T0 and T1 below:
[T0: (rack1,rack3,rack2)]
[T1: (rack2,rack1,rack3)]

Merge will co-locate T1:rack2 into T0:rack1, T1 will be temporarily only at
only a subset of racks, reducing availability.

This is the main problem fixed by this patch.

It also lays the ground for consistent base-view replica pairing,
which is rack-based. For tables on which views can be created we plan
to enforce the constraint that replicas don't move across racks and
that all tablets use the same set of racks (RF=#racks). This patch
avoids moving replicas across racks unless it's necessary, so if the
constraint is satisfied before merge, there will be no co-locating
migrations across racks. This constraint of RF=#racks is not enforced
yet, it requires more extensive changes.

Fixes #22994.
Refs #17265.

This patch is based on Raphael's work done in PR #23081. The main differences are:

1) Instead of sorting replicas by rack, we try to find
    replicas in sibling tablets which belong to the same rack.
    This is similar to how we match replicas within the same host.
    It reduces number of across-rack migrations even if RF!=#racks,
    which the original patch didn't handle.
    Unlike the original patch, it also avoids rack-overloaded in case
    RF!=#racks

2) We emit across-rack co-locating migrations if we have no other choice
   in order to finalize the merge

   This is ok, since views are not supported with tablets yet. Later,
   we will disallow this for tables which have views, and we will
   allow creating views in the first place only when no such migrations
   can happen (RF=#racks).

3) Added boost unit test which checks that rack overload is avoided during merge
   in case RF<#racks

4) Moved logging of across-rack migration to debug level

5) Exposed metric for across-rack co-locating migrations

(cherry picked from commit af949f3b6a)

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>
2025-03-14 20:02:33 +01:00
Anna Stuchlik
11ecc886c3 doc: Remove "experimental" from ALTER KEYSPACE with Tablets
Altering a keyspace with tablets is no longer experimental.
This commit removes the "Experimental" label from the feature.

Fixes https://github.com/scylladb/scylladb/issues/23166

Closes scylladb/scylladb#23183

(cherry picked from commit 562b5db5b8)

Closes scylladb/scylladb#23274
2025-03-14 13:57:55 +01:00
Botond Dénes
eb147ec564 Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec
This PR converts boost load balancer tests in preparation for load balancer changes
which add per-table tablet hints. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.

Topology is created in tests via group0 commands, which is abstracted by
the new `topology_builder` class.

Tests cannot modify token_metadata only in memory now as it needs to be
consistent with the schema and on-disk metadata. That's why modifications to
tablet metadata are now made under group0 guard and save back metadata to disk.

Closes scylladb/scylladb#22648

* github.com:scylladb/scylladb:
  test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario
  tests: tablets: Set initial tablets to 1 to exit growing mode
  test: tablets_test: Create proper schema in load balancer tests
  test: lib: Introduce topology_builder
  test: cql_test_env: Expose topology_state_machine
  topology_state_machine: Introduce lock transition

(cherry picked from commit 51a273401c)
2025-03-13 14:08:30 +01:00
Tomasz Grabiec
637e5fc9b5 locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables
For example, nodes which are being decommissioned should not be
consider as available capacity for new tables. We don't allocate
tablets on such nodes.

Would result in higher per-shard load then planned.

Closes scylladb/scylladb#22657

(cherry picked from commit 3bb19e9ac9)
2025-03-13 14:08:27 +01:00
Tomasz Grabiec
0d77754c63 locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes
In that case, new_racks will be used, but when we discover no
candidates, we try to pop from existing_racks.

Fixes #22625

Closes scylladb/scylladb#22652

(cherry picked from commit e22e3b21b1)
2025-03-13 14:00:48 +01:00
Benny Halevy
5481c9aedd docs: document the views-with-tablets experimental feature
Refs scylladb/scylladb#22217

Fixes scylladb/scylladb#22893

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#22896

(cherry picked from commit 55dbf5493c)

Closes scylladb/scylladb#23024
2025-03-10 13:26:36 +01:00
Botond Dénes
59db708cba Merge '[Backport 2025.1] tablets: repair: fix hosts and dcs filters behavior for tablet repair' from Scylladb[bot]
If hosts and/or dcs filters are specified for tablet repair and
some replicas match these filters, choose the replica that will
be the repair master according to round-robin principle
(currently it's always the first replica).

If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, the repair succeeds and
the repair request is removed (currently an exception is thrown
and tablet repair scheduler reschedules the repair forever).

Fixes: https://github.com/scylladb/scylladb/issues/23100.

Needs backport to 2025.1 that introduces hosts and dcs filters for tablet repair

- (cherry picked from commit 9bce40d917)

- (cherry picked from commit fe4e99d7b3)

- (cherry picked from commit 2b538d228c)

- (cherry picked from commit c40eaa0577)

- (cherry picked from commit c7c6d820d7)

Parent PR: #23101

Closes scylladb/scylladb#23109

* github.com:scylladb/scylladb:
  test: add new cases to tablet_repair tests
  test: extract repiar check to function
  locator: add round-robin selection of filtered replicas
  locator: add tablet_task_info::selected_by_filters
  service: finish repair successfully if no matching replica found
2025-03-10 12:49:01 +02:00
Botond Dénes
28690f8203 Merge '[Backport 2025.1] repair: Introduce Host and DC filter support' from Scylladb[bot]
Currently, the tablet repair scheduler repairs all replicas of a tablet. It does not support hosts or DCs selection. It should be enough for most cases. However, users might still want to limit the repair to certain hosts or DCs in production. https://github.com/scylladb/scylladb/pull/21985 added the preparation work to add the config options for the selection. This patch adds the hosts or DCs selection support.

Fixes https://github.com/scylladb/scylladb/issues/22417

New feature. No backport is needed.

- (cherry picked from commit 4c75701756)

- (cherry picked from commit 5545289bfa)

- (cherry picked from commit 1c8a41e2dd)

- (cherry picked from commit e499f7c971)

Parent PR: #22621

Closes scylladb/scylladb#23080

* github.com:scylladb/scylladb:
  test: add test to check dcs and hosts repair filter
  test: add repair dc selection to test_tablet_metadata_persistence
  repair: Introduce Host and DC filter support
  docs: locator: update the docs and formatter of tablet_task_info
2025-03-10 12:48:49 +02:00
Anna Stuchlik
235c859b98 doc: zero-token nodes and Arbiter DC
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.

The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.

We need separate documents because zero-token nodes may be used
for other purposes in the future.

In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.

Refs https://github.com/scylladb/scylladb/pull/19684

Fixes https://github.com/scylladb/scylladb/issues/20294

Closes scylladb/scylladb#21348

(cherry picked from commit 9ac0aa7bba)

Closes scylladb/scylladb#23201
2025-03-10 10:59:07 +01:00
Benny Halevy
64248f2635 main: add checkpoints
Before starting significant services that didn't
have a corresponding call to supervisor::notify
before them.

Fixes #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b6705ad48b)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-03-10 08:52:16 +02:00
Benny Halevy
b139b09129 main: safely check stop_signal in-between starting services
To simplify aborting scylla while starting the services,
Add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.

Refs #23153

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit feef7d3fa1)
2025-03-10 08:42:30 +02:00
Benny Halevy
ca161900cd main: move prometheus start message
The `prometheus_server` is started only conditionally
but the notification message is sent and logged
unconditionally.
Move it inside the condtional code block.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 282ff344db)
2025-03-10 08:16:37 +02:00
Benny Halevy
5417d4d517 main: move per-shard database start message
It is now logged out of place, so move it to right before
calling `start` on every database shard.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 23433f593c)
2025-03-10 08:16:37 +02:00
Anna Stuchlik
5453e85f39 doc: remove the reference to the 6.2 version
This commit removes the OSS version name, which is irrelevant
and confusing for 2025.1 and later users.
Also, it updates the warning to avoid specifying the release
when the deprecated feature will be removed.

Fixes https://github.com/scylladb/scylladb/issues/22839

Closes scylladb/scylladb#22936

(cherry picked from commit d0a48c5661)

Closes scylladb/scylladb#23022
2025-03-07 12:53:42 +02:00
Anna Stuchlik
7a6bcb3a3f doc: remove references to Enterprise
This commit removes the redundant references to Enterprise,
which are no longer valid.

Fixes https://github.com/scylladb/scylladb/issues/22927

Closes scylladb/scylladb#22930

(cherry picked from commit a28bbc22bd)

Closes scylladb/scylladb#22963
2025-03-07 12:53:22 +02:00
Anna Stuchlik
8b2a382eb6 doc: add support for Ubuntu 24.04 in 2024.1
Fixes https://github.com/scylladb/scylladb/issues/22841

Refs https://github.com/scylladb/scylla-enterprise/issues/4550

Closes scylladb/scylladb#22843

(cherry picked from commit 439463dbbf)

Closes scylladb/scylladb#23092
2025-03-07 12:51:13 +02:00
Dusan Malusev
cdd51d8b7a docs: add instruction for installing cassandra-stress
Signed-off-by: Dusan Malusev <dusan.malusev@scylladb.com>

Closes scylladb/scylladb#21723

(cherry picked from commit 4e6ea232d2)

Closes scylladb/scylladb#22947
2025-03-07 11:48:46 +02:00
Anna Stuchlik
88a8d140b3 doc: add information about tablets limitation to the CQL page
This commit adds a link to the Limitations section on the Tablets page
to the CQL pag, the tablets option.
This is actually the place where the user will need the information:
when creating a keyspace.

In addition, I've reorganized the section for better readability
(otherwise, the section about limitations was easy to miss)
and moved the section up on the page.

Note that I've removed the updated content from the  `_common` folder
(which I deleted) to the .rst page - we no longer split OSS and Enterprise,
so there's no need to keep using the `scylladb_include_flag` directive
to include OSS- and Ent-specific content.

Fixes https://github.com/scylladb/scylladb/issues/22892

Fixes https://github.com/scylladb/scylladb/issues/22940

Closes scylladb/scylladb#22939

(cherry picked from commit 0999fad279)

Closes scylladb/scylladb#23091
2025-03-07 11:48:07 +02:00
Aleksandra Martyniuk
1957dac2b4 test: add new cases to tablet_repair tests
Add tests for tablet repair with host and dc filters that select
one or no replica.

(cherry picked from commit c7c6d820d7)
2025-03-05 10:59:00 +01:00
Aleksandra Martyniuk
1091ef89e1 test: extract repiar check to function
(cherry picked from commit c40eaa0577)
2025-03-05 10:59:00 +01:00
Aleksandra Martyniuk
b081e07ffa locator: add round-robin selection of filtered replicas
(cherry picked from commit 2b538d228c)
2025-03-05 10:58:59 +01:00
Aleksandra Martyniuk
1f102ca2f7 locator: add tablet_task_info::selected_by_filters
Extract dcs and hosts filters check to a method.

(cherry picked from commit fe4e99d7b3)
2025-03-05 10:36:51 +01:00
Aleksandra Martyniuk
8a98f0d5b6 service: finish repair successfully if no matching replica found
If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, an exception is thrown. The repair
fails and tablet repair scheduler reschedules it forever.

Such a repair should actually succeed (as all specified relpicas were
repaired) and the repair request should be removed.

Treat the repair as successful if the filters were specified and
selected no replica.

(cherry picked from commit 9bce40d917)
2025-03-05 10:36:50 +01:00
Botond Dénes
01485b2158 reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
It is possible that the permit handed in to register_inactive_read() is
already aborted (currently only possible if permit timed out).
If the permit also happens to have wait for memory, the current code
will attempt to call promise<>::set_exception() on the permit's promise
to abort its waiters. But if the permit was already aborted via timeout,
this promise will already have an exception and this will trigger an
assert. Add a separate case for checking if the permit is aborted
already. If so, treat it as immediate eviction: close the reader and
clean up.

Fixes: scylladb/scylladb#22919
(cherry picked from commit 7ba29ec46c)
2025-03-04 18:46:55 +00:00
Botond Dénes
953c7cd71a test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
Unless the test in question actually wants to test timeouts. Timeouts
will have more pronounced consequences soon and thus using
db::timeout_clock::now() becomes a sure way to make tests flaky.
To avoid this, use db::no_timeout in the tests that don't care about
timeouts.

(cherry picked from commit 4d8eb02b8d)
2025-03-04 18:46:55 +00:00
Anna Stuchlik
cdae92065b doc: add the 2025.1 upgrade guides and reorganize the upgrade section
This commit adds the upgrade guides relevant in version 2025.1:
- From 6.2 to 2025.1
- From 2024.x to 2025.1

It also removes the upgrade guides that are not relevant in 2025.1 source available:
- Open Source upgrade guides
- From Open Source to Enterprise upgrade guides
- Links to the Enterprise upgrade guides

Also, as part of this PR, the remaining relevant content has been moved to
the new About Upgrade page.

WHAT NEEDS TO BE REVIEWED
- Review the instructions in the 6.2-to-2025.1 guide
- Review the instructions in the 2024.x-to-2025.1 guide
- Verify that there are no references to Open Source and Enterprise.

The scope of this PR does not have to include metrics - the info can be added
in a follow-up PR.

Fixes https://github.com/scylladb/scylladb/issues/22208
Fixes https://github.com/scylladb/scylladb/issues/22209
Fixes https://github.com/scylladb/scylladb/issues/23072
Fixes https://github.com/scylladb/scylladb/issues/22346

Closes scylladb/scylladb#22352

(cherry picked from commit 850aec58e0)

Closes scylladb/scylladb#23106
2025-03-04 08:15:08 +02:00
Jenkins Promoter
4813c48d64 Update pgo profiles - aarch64 2025-03-01 04:23:19 +02:00
Jenkins Promoter
b623b108c3 Update pgo profiles - x86_64 2025-03-01 04:05:24 +02:00
Aleksandra Martyniuk
7fdc7bdc4b test: add test to check dcs and hosts repair filter
(cherry picked from commit e499f7c971)
2025-02-27 12:14:47 +01:00
Aleksandra Martyniuk
c2e926850d test: add repair dc selection to test_tablet_metadata_persistence
(cherry picked from commit 1c8a41e2dd)
2025-02-27 12:14:47 +01:00
Asias He
6d5b029812 repair: Introduce Host and DC filter support
Currently, the tablet repair scheduler repairs all replicas of a tablet.
It does not support hosts or DCs selection. It should be enough for most
cases. However, users might still want to limit the repair to certain
hosts or DCs in production. #21985 added the preparation work to add the
config options for the selection. This patch adds the hosts or DCs
selection support.

Fixes #22417

(cherry picked from commit 5545289bfa)
2025-02-27 12:14:44 +01:00
Aleksandra Martyniuk
ffeb55cf77 docs: locator: update the docs and formatter of tablet_task_info
(cherry picked from commit 4c75701756)
2025-02-26 23:49:50 +00:00
Jenkins Promoter
37aa7c216c Update ScyllaDB version to: 2025.1.0-rc4 2025-02-25 21:33:18 +02:00
Gleb Natapov
0b0e9f0c32 treewide: include build_mode.hh for SCYLLA_BUILD_MODE_RELEASE where it is missing
Fixes: #22914

Closes scylladb/scylladb#22915

(cherry picked from commit 914c9f1711)

Closes scylladb/scylladb#22962
2025-02-25 18:12:54 +03:00
Evgeniy Naydanov
871fabd60a test.py: test_random_failures: improve handling of hung node
In some cases the paused/unpaused node can hang not after 30s timeout.
This make the test flaky.  Change the condition to always check the
coordinator's log if there is a hung node.

Add `stop_after_streaming` to the list of error injections which can
cause a node's hang.

Also add a wait for a new coordinator election in cluster events
which cause such elections.

Closes scylladb/scylladb#22825

(cherry picked from commit 99be9ac8d8)

Closes scylladb/scylladb#23007
2025-02-25 14:31:51 +03:00
Abhi
67b7ea12a2 storage_service: Remove the variable _manage_topology_change_kind_from_group0
This commit removes the variable _manage_topology_change_kind_from_group0
which was used earlier as a work around for correctly handling
topology_change_kind variable, it was brittle and had some bugs. Earlier commits
made some modifications to deal with handling topology_change_kind variable
post _manage_topology_change_kind_from_group0 removal

(cherry picked from commit d7884cf651)
2025-02-20 21:21:31 +00:00
Abhi
d74bb95f54 storage_service: fix indentation after the previous commit
(cherry picked from commit 623e01344b)
2025-02-20 21:21:31 +00:00
Abhinav Jha
98977e9465 raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
In the current scenario, topology_change_kind variable, was been handled using
 _manage_topology_change_kind_from_group0 variable. This method was brittle
and had some bugs(e.g. for restart case, it led to a time gap between group0
server start and topology_change_kind being managed via group0)

Post _manage_topology_change_kind_from_group0 removal, careful management of
topology_change_kind variable was needed for maintaining correct
topology_change_kind in all scenarios. So this PR also performs a refactoring
to populate all init data to system tables even before group0 creation(via
raft_initialize_discovery_leader function). Now because raft_initialize_discovery_leader
happens before the group 0 creation, we write mutations directly to system
tables instead of a group 0 command. Hence, post group0 creation, the node
can read the correct values from system tables and correct values are
maintained throughout.

Added a new function initialize_done_topology_upgrade_state which takes
care of updating the correct upgrade state to system tables before starting
group0 server. This ensures that the node can read the correct values from
system tables and correct values are maintained throughout.

By moving raft_initialize_discovery_leader logic to happen before starting
group0 server, and not as group0 command post server start, we also get rid
of the potential problem of init group0 command not being the 1st command on
the server. Hence ensuring full integrity as expected by programmer.

Fixes: scylladb/scylladb#21114
(cherry picked from commit e491950c47)
2025-02-20 21:21:31 +00:00
Abhi
e84376c9dc service/raft: Refactor mutation writing helper functions.
We use these changes in following commit.

(cherry picked from commit 4748125a48)
2025-02-20 21:21:31 +00:00
Gleb Natapov
79556be7a7 topology_coordinator: demote barrier_and_drain rpc failure to warning
The failure may happen during normal operation as well (for instance if
leader changes).

Fixes: scylladb/scylladb#22364
(cherry picked from commit fe45ea505b)
2025-02-19 08:59:53 +00:00
Gleb Natapov
fe0740ff56 topology_coordinator: read peers table only once during topology state application
During topology state application peers table may be updated with the
new ip->id mapping. The update is not atomic: it adds new mapping and
then removes the old one. If we call get_host_id_to_ip_map while this is
happening it may trigger an internal error there. This is a regression
since ef929c5def. Before that commit the
code read the peers table only once before starting the update loop.
This patch restores the behaviour.

Fixes: scylladb/scylladb#22578
(cherry picked from commit 1da7d6bf02)
2025-02-19 08:59:53 +00:00
Pavel Emelyanov
aa5cb15166 Merge 'Alternator: implement UpdateTable operation to add or delete GSI' from Nadav Har'El
In this series we implement the UpdateTable operation to add a GSI to an existing table, or remove a GSI from a table. As the individual commit messages will explained, this required changing how Alternator stores materialized view keys - instead of insisting that these key must be real columns (that is **not** the case when adding a GSI to an existing table), the materialized view can now take as its key any Alternator attribute serialized inside the ":attrs" map holding all non-key attributes. Fixes #11567.

We also fix the IndexStatus and Backfilling attributes returned by DescribeTable - as DynamoDB API users use this API to discover when a newly added GSI completed its "backfilling" (what we call "view building") stage. Fixes #11471.

This series should not be backported lightly - it's a new feature and required fairly large and intrusive changes that can introduce bugs to use cases that don't even use Alternator or its UpdateTable operations - every user of CQL materialized views or secondary indexes, as well as Alternator GSI or LSI, will use modified code. **It should be backported to 2025.1**, though - this version was actually branched long after this PR was sent, and it provides a feature that was promised for 2025.1.

Closes scylladb/scylladb#21989

* github.com:scylladb/scylladb:
  alternator: fix view build on oversized GSI key attribute
  mv: clean up do_delete_old_entry
  test/alternator: unflake test for IndexStatus
  test/alternator: work around unrelated bug causing test flakiness
  docs/alternator: adding a GSI is no longer an unimplemented feature
  test/alternator: remove xfail from all tests for issue 11567
  alternator: overhaul implementation of GSIs and support UpdateTable
  mv: support regular_column_transformation key columns in view
  alternator: add new materialized-view computed column for item in map
  build: in cmake build, schema needs alternator
  build: build tests with Alternator
  alternator: add function serialized_value_if_type()
  mv: introduce regular_column_transformation, a new type of computed column
  alternator: add IndexStatus/Backfilling in DescribeTable
  alternator: add "LimitExceededException" error type
  docs/alternator: document two more unimplemented Alternator features

(cherry picked from commit 529ff3efa5)

Closes scylladb/scylladb#22826
2025-02-18 19:05:21 +02:00
Jenkins Promoter
13d79ba990 Update ScyllaDB version to: 2025.1.0-rc3 2025-02-18 15:06:57 +02:00
Nadav Har'El
35b410326b test/topology_custom: fix very slow test test_localnodes_broadcast_rpc_address
The test
topology_custom/test_alternator::test_localnodes_broadcast_rpc_address
sets up nodes with a silly "broadcast rpc address" and checks that
Alternator's "/localnodes" requests returns it correctly.

The problem is that although we don't use CQL in this test, the test
framework does open a CQL connection when the test starts, and closes
it when it ends. It turns out that when we set a silly "broadcast RPC
address", the driver tends to try to connect to it when shutting down,
I'm not even sure why. But the choice of the silly address was 1.2.3.4
is unfortunate, because this IP address is actually routable - and
the driver hangs until it times out (in practice, in a bit over two
minutes). This trivial patch changes 1.2.3.4 to 127.0.0.0 - and equally
silly address but one to which connections fail immediately.

Before this patch, the test often takes more than 2 minutes to finish
on my laptop, after this patch, it always finishes in 4-5 seconds.

Fixes #22744

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#22746

(cherry picked from commit f89235517d)

Closes scylladb/scylladb#22875
2025-02-18 10:33:21 +02:00
Botond Dénes
12a3fcceae Merge '[Backport 2025.1] sstable_loader: fix cross-shard resource cleanup in download_task_impl ' from Scylladb[bot]
This PR addresses two related issues in our task system:

1. Prepares for asynchronous resource cleanup by converting release_resources() to a coroutine. This refactoring enables future improvements in how we handle resource cleanup.

2. Fixes a cross-shard resource cleanup issue in the SSTable loader where destruction of per-shard progress elements could trigger "shared_ptr accessed on non-owner cpu" errors in multi-shard environments. The fix uses coroutines to ensure resources are released on their owner shards.

Fixes #22759

---

this change addresses a regression introduced by d815d7013c, which is contained by 2025.1 and master branches. so it should be backported to 2025.1 branch.

- (cherry picked from commit 4c1f1baab4)

- (cherry picked from commit b448fea260)

Parent PR: #22791

Closes scylladb/scylladb#22871

* github.com:scylladb/scylladb:
  sstable_loader: fix cross-shard resource cleanup in download_task_impl
  tasks: make release_resources() a coroutine
2025-02-18 10:32:48 +02:00
Gleb Natapov
040c59674a api: initialize token metadata API after starting the gossiper
Token metadata API now depend on gossiper to do ip to host id mappings,
so initialized it after the gossiper is initialized and de-initialized
it before gossiper is stopped.

Fixes: scylladb/scylladb#22743

Closes scylladb/scylladb#22760

(cherry picked from commit d288d79d78)

Closes scylladb/scylladb#22854
2025-02-18 10:32:24 +02:00
Asias He
b50a6657e8 repair: Add await_completion option for tablet_repair api
Set true to wait for the repair to complete. Set false to skip waiting
for the repair to complete. When the option is not provided, it defaults
to false.

It is useful for management tool that wants the api to be async.

Fixes #22418

Closes scylladb/scylladb#22436

(cherry picked from commit fb318d0c81)

Closes scylladb/scylladb#22851
2025-02-18 10:31:53 +02:00
Botond Dénes
93479ffcf9 Merge '[Backport 2025.1] raft/group0_state_machine: load current RPC compression dict on startup' from Michał Chojnowski
We are supposed to be loading the most recent RPC compression dictionary on startup, but we forgot to port the relevant piece of logic during the source-available port. This causes a restarted node not to use the dictionary for RPC compression until the next dictionary update.

Fix that.

Fixes #22738

This is more of a bugfix than an improvement, so it should be backported to 2025.1.

* (cherry picked from commit [dd82b40](dd82b40186))

* (cherry picked from commit [8fb2ea6](8fb2ea61ba))

Additionally cherry picked https://github.com/scylladb/scylladb/pull/22836 to fix the timeout.

Parent PR: #22739

Closes scylladb/scylladb#22837

* github.com:scylladb/scylladb:
  test_rpc_compression.py: fix an overly-short timeout
  test_rpc_compression.py: test the dictionaries are loaded on startup
  raft/group0_state_machine: load current RPC compression dict on startup
2025-02-18 10:31:23 +02:00
Botond Dénes
38bd74b2d4 tools/scylla-nodetool: netstats: don't assume both senders and receivers
The code currently assumes that a session has both sender and receiver
streams, but it is possible to have just one or the other.
Change the test to include this scenario and remove this assumption from
the code.

Fixes: #22770

Closes scylladb/scylladb#22771

(cherry picked from commit 87e8e00de6)

Closes scylladb/scylladb#22874
2025-02-17 14:34:36 +02:00
Takuya ASADA
6ee1779578 dist: fix upgrade error from 2024.1
We need to allow replacing nodetool from scylla-enterprise-tools < 2024.2,
just like we did for scylla-tools < 5.5.
This is required to make packages able to upgrade from 2024.1.

Fixes #22820

Closes scylladb/scylladb#22821

(cherry picked from commit b5e306047f)

Closes scylladb/scylladb#22867
2025-02-16 14:47:48 +02:00
Kefu Chai
9fe2301647 sstable_loader: fix cross-shard resource cleanup in download_task_impl
Previously, download_task_impl's destructor would destroy per-shard progress
elements on whatever shard the task was destroyed on. In multi-shard
environments, this caused "shared_ptr accessed on non-owner cpu" errors when
attempting to free memory allocated on a different shard.

Fix by:
- Convert progress_per_shard into a sharded service
- Stop the service on owner shards during cleanup using coroutines
- Add operator+= to stream_progress to leverage seastar's built-in adder
  instead of a custom adder struct

Alternative approaches considered:

1. Using foreign_ptr: Rejected as it would require interface changes
   that complicate stream delegation. foreign_ptr manages the underlying
   pointee with another smart pointer but does not expose the smart
   pointer instance in its APIs, making it impossible to use
   shared_ptr<stream_progress> in the interface.
2. Using vector<stream_progress>: Rejected for similar interface
   compatibility reasons.

This solution maintains the existing interfaces while ensuring proper
cross-shard cleanup.

Fixes scylladb/scylladb#22759
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit b448fea260)
2025-02-15 22:46:43 +00:00
Kefu Chai
6b27459de3 tasks: make release_resources() a coroutine
Convert tasks::task_manager::task::impl::release_resources() to a coroutine
to prepare for upcoming changes that will implement asynchronous resource
release.

This is a preparatory refactoring that enables future coroutine-based
implementation of resource cleanup logic.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 4c1f1baab4)
2025-02-15 22:46:43 +00:00
Jenkins Promoter
48130ca2e9 Update pgo profiles - aarch64 2025-02-15 04:20:15 +02:00
Jenkins Promoter
5054087f0b Update pgo profiles - x86_64 2025-02-15 04:05:06 +02:00
Botond Dénes
889fb9c18b Update tools/java submodule
* tools/java 807e991d...6dfe728a (1):
  > dist: support smooth upgrade from enterprise to source availalbe

Fixes: scylladb/scylladb#22820
2025-02-14 11:14:07 +02:00
Botond Dénes
c627aff5f7 Merge '[Backport 2025.1] reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Scylladb[bot]
`set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature eviction.

Fixes: https://github.com/scylladb/scylladb/issues/22629

Backport required to all active releases, they are all affected.

- (cherry picked from commit a3ae0c7cee)

- (cherry picked from commit 9174f27cc8)

Parent PR: #22701

Closes scylladb/scylladb#22752

* github.com:scylladb/scylladb:
  reader_concurrency_semaphore: set_notify_handler(): disable timeout
  reader_permit: mark check_abort() as const
2025-02-13 15:24:54 +02:00
Michał Chojnowski
ffca4a9f85 test_rpc_compression.py: fix an overly-short timeout
The timeout of 10 seconds is too small for CI.
I didn't mean to make it so short, it was an accident.

Fix that by changing the timeout to 10 minutes.
2025-02-13 10:03:13 +01:00
Michał Chojnowski
2c0ffdce31 pgo: disable tablets for training with secondary index, lwt and counters
As of right now, materialized views (and consequently secondary
indexes), lwt and counters are unsupported or experimental with tablets.
Since by defaults tablets are enabled, training cases using those
features are currently broken.

The right thing to do here is to disable tablets in those cases.

Fixes https://github.com/scylladb/scylladb/issues/22638

Closes scylladb/scylladb#22661

(cherry picked from commit bea434f417)

Closes scylladb/scylladb#22808
2025-02-13 09:42:09 +02:00
Botond Dénes
ff7e93ddd5 db/config: reader_concurrency_semaphore_cpu_concurrency: bump default to 2
This config item controls how many CPU-bound reads are allowed to run in
parallel. The effective concurrency of a single CPU core is 1, so
allowing more than one CPU-bound reads to run concurrently will just
result in time-sharing and both reads having higher latency.
However, restricting concurrency to 1 means that a CPU bound read that
takes a lot of time to complete can block other quick reads while it is
running. Increase this default setting to 2 as a compromise between not
over-using time-sharing, while not allowing such slow reads to block the
queue behind them.

Fixes: #22450

Closes scylladb/scylladb#22679

(cherry picked from commit 3d12451d1f)

Closes scylladb/scylladb#22722
2025-02-13 09:40:25 +02:00
Botond Dénes
1998733228 service: query_pager: fix last-position for filtering queries
On short-pages, cut short because of a tombstone prefix.
When page-results are filtered and the filter drops some rows, the
last-position is taken from the page visitor, which does the filtering.
This means that last partition and row position will be that of the last
row the filter saw. This will not match the last position of the
replica, when the replica cut the page due to tombstones.
When fetching the next page, this means that all the tombstone suffix of
the last page, will be re-fetched. Worse still: the last position of the
next page will not match that of the saved reader left on the replica, so
the saved reader will be dropped and a new one created from scratch.
This wasted work will show up as elevated tail latencies.
Fix by always taking the last position from raw query results.

Fixes: #22620

Closes scylladb/scylladb#22622

(cherry picked from commit 7ce932ce01)

Closes scylladb/scylladb#22719
2025-02-13 09:40:05 +02:00
Botond Dénes
e79ee2ddb0 reader_concurrency_semaphore: foreach_permit(): include _inactive_reads
So inactive reads show up in semaphore diagnostics dumps (currently the
only non-test user of this method).

Fixes: #22574

Closes scylladb/scylladb#22575

(cherry picked from commit e1b1a2068a)

Closes scylladb/scylladb#22611
2025-02-13 09:39:39 +02:00
Aleksandra Martyniuk
4c39943b3f replica: mark registry entry as synch after the table is added
When a replica get a write request it performs get_schema_for_write,
which waits until the schema is synced. However, database::add_column_family
marks a schema as synced before the table is added. Hence, the write may
see the schema as synced, but hit no_such_column_family as the table
hasn't been added yet.

Mark schema as synced after the table is added to database::_tables_metadata.

Fixes: #22347.

Closes scylladb/scylladb#22348

(cherry picked from commit 328818a50f)

Closes scylladb/scylladb#22604
2025-02-13 09:39:13 +02:00
Calle Wilund
17c86f8b57 encryption: Fix encrypted components mask check in describe
Fixes #22401

In the fix for scylladb/scylla-enterprise#892, the extraction and check for sstable component encryption mask was copied
to a subroutine for description purposes, but a very important 1 << <value> shift was somehow
left on the floor.

Without this, the check for whether we actually contain a component encrypted can be wholly
broken for some components.

Closes scylladb/scylladb#22398

(cherry picked from commit 7db14420b7)

Closes scylladb/scylladb#22599
2025-02-13 09:38:41 +02:00
Botond Dénes
d05b3897a2 Merge '[Backport 2025.1] api: task_manager: do not unregister finish task when its status is queried' from Scylladb[bot]
Currently, when the status of a task is queried and the task is already finished,
it gets unregistered. Getting the status shouldn't be a one-time operation.

Stop removing the task after its status is queried. Adjust tests not to rely
on this behavior. Add task_manager/drain API and nodetool tasks drain
command to remove finished tasks in the module.

Fixes: https://github.com/scylladb/scylladb/issues/21388.

It's a fix to task_manager API, should be backported to all branches

- (cherry picked from commit e37d1bcb98)

- (cherry picked from commit 18cc79176a)

Parent PR: #22310

Closes scylladb/scylladb#22598

* github.com:scylladb/scylladb:
  api: task_manager: do not unregister tasks on get_status
  api: task_manager: add /task_manager/drain
2025-02-13 09:38:12 +02:00
Botond Dénes
9116fc635e Merge '[Backport 2025.1] split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Scylladb[bot]
`tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups.

The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that
`tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but
`tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

Fixes #22431

This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1

- (cherry picked from commit 24e8d2a55c)

- (cherry picked from commit 8bff7786a8)

Parent PR: #22330

Closes scylladb/scylladb#22560

* github.com:scylladb/scylladb:
  test: add reproducer and test for fix to split ready CG creation
  table: run set_split_mode() on all storage groups during all_storage_groups_split()
2025-02-13 09:36:23 +02:00
Raphael S. Carvalho
5f74b5fdff test: Use linux-aio backend again on seastar-based tests
Since mid December, tests started failing with ENOMEM while
submitting I/O requests.

Logs of failed tests show IO uring was used as backend, but we
never deliberately switched to IO uring. Investigation pointed
to it happening accidentaly in commit 1bac6b75dc,
which turned on IO uring for allowing native tool in production,
and picked linux-aio backend explicitly when initializing Scylla.
But it missed that seastar-based tests would pick the default
backend, which is io_uring once enabled.

There's a reason we never made io_uring the default, which is
that it's not stable enough, and turns out we made the right
choice back then and it apparently continue to be unstable
causing flakiness in the tests.

Let's undo that accidental change in tests by explicitly
picking the linux-aio backend for seastar-based tests.
This should hopefully bring back stability.

Refs #21968.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22695

(cherry picked from commit ce65164315)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#22800
2025-02-12 20:50:51 +02:00
Michał Chojnowski
a746fd2bb8 test_rpc_compression.py: test the dictionaries are loaded on startup
Reproduces scylladb/scylladb#22738

(cherry picked from commit 8fb2ea61ba)
2025-02-11 15:52:34 +00:00
Michał Chojnowski
89a5889bed raft/group0_state_machine: load current RPC compression dict on startup
We are supposed to be loading the most recent RPC compression dictionary
on startup, but we forgot to port the relevant piece of logic during
the source-available port.

(cherry picked from commit dd82b40186)
2025-02-11 15:52:33 +00:00
Michael Litvak
8d1f6df818 test/test_view_build_status: fix flaky asserts
In few test cases of test_view_build_status we create a view, wait for
it and then query the view_build_status table and expect it to have all
rows for each node and view.

But it may fail because it could happen that the wait_for_view query and
the following queries are done on different nodes, and some of the nodes
didn't apply all the table updates yet, so they have missing rows.

To fix it, we change the assert to work in the eventual consistency
sense, retrying until the number of rows is as expectd.

Fixes scylladb/scylladb#22644

Closes scylladb/scylladb#22654

(cherry picked from commit c098e9a327)

Closes scylladb/scylladb#22780
2025-02-11 10:21:54 +01:00
Avi Kivity
75320c9a13 Update tools/cqlsh submodule (driver update, upgradability)
* tools/cqlsh 52c6130...02ec7c5 (18):
  > chore(deps): update dependency scylla-driver to v3.28.2
  > dist: support smooth upgrade from enterprise to source availalbe
  > github action: fix downloading of artifacts
  > chore(deps): update docker/setup-buildx-action action to v3
  > chore(deps): update docker/login-action action to v3
  > chore(deps): update docker/build-push-action action to v6
  > chore(deps): update docker/setup-qemu-action action to v3
  > chore(deps): update peter-evans/dockerhub-description action to v4
  > upload actions: update the usage for multiple artifacts
  > chore(deps): update actions/download-artifact action to v4.1.8
  > chore(deps): update dependency scylla-driver to v3.28.0
  > chore(deps): update pypa/cibuildwheel action to v2.22.0
  > chore(deps): update actions/checkout action to v4
  > chore(deps): update python docker tag to v3.13
  > chore(deps): update actions/upload-artifact action to v4
  > github actions: update it to work
  > add option to output driver debug
  > Add renovate.json (#107)

Fixes: https://github.com/scylladb/scylladb/issues/22420
2025-02-09 18:07:55 +02:00
Yaron Kaikov
359af0ae9c dist: support smooth upgrade from enterprise to source availalbe
When upgrading for example from `2024.1` to `2025.1` the package name is
not identical casuing the upgrade command to fail:
```
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade scylla -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"'
Exit code: 100
Stdout:
Selecting previously unselected package scylla.
Preparing to unpack .../6-scylla_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb ...
Unpacking scylla (2025.1.0~dev-0.20250118.1ef2d9d07692-1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-JbOMav/0-scylla-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/1-scylla-python3_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/2-scylla-server_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/3-scylla-kernel-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/4-scylla-node-exporter_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/5-scylla-cqlsh_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
Stderr:
E: Sub-process /usr/bin/dpkg returned an error code (1)
```

Adding `Obsoletes` (for rpm) and `Replaces` (for deb)

Fixes: https://github.com/scylladb/scylladb/issues/22420

Closes scylladb/scylladb#22457

(cherry picked from commit 93f53f4eb8)

Closes scylladb/scylladb#22753
2025-02-09 18:06:52 +02:00
Avi Kivity
7f350558c2 Update tools/python3 (smooth upgrade from enterprise)
* tools/python3 8415caf...91c9531 (1):
  > dist: support smooth upgrade from enterprise to source availalbe

Ref #22420
2025-02-09 14:22:33 +02:00
Botond Dénes
fa9b1800b6 reader_concurrency_semaphore: set_notify_handler(): disable timeout
set_notify_handler() is called after a querier was inserted into the
querier cache. It has two purposes: set a callback for eviction and set
a TTL for the cache entry. This latter was not disabling the
pre-existing timeout of the permit (if any) and this would lead to
premature eviction of the cache entry if the timeout was shorter than
TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature
eviction.

Fixes: #scylladb/scylladb#22629
(cherry picked from commit 9174f27cc8)
2025-02-09 00:32:38 +00:00
Botond Dénes
c25d447b9c reader_permit: mark check_abort() as const
All it does is read one field, making it const makes using it easier.

(cherry picked from commit a3ae0c7cee)
2025-02-09 00:32:38 +00:00
Ferenc Szili
cf147d8f85 truncate: create session during request handling
Currently, the session ID under which the truncate for tablets request is
running is created during the request creation and queuing. This is a problem
because this could overwrite the session ID of any ongoing operation on
system.topology#session

This change moves the creation of the session ID for truncate from the request
creation to the request handling.

Fixes #22613

Closes scylladb/scylladb#22615

(cherry picked from commit a59618e83d)

Closes scylladb/scylladb#22705
2025-02-06 10:09:00 +02:00
Botond Dénes
319626e941 reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload
with_permit() creates a permit, with a self-reference, to avoid
attaching a continuation to the permit's run function. This
self-reference is used to keep the permit alive, until the execution
loop processes it. This self reference has to be carefully cleared on
error-paths, otherwise the permit will become a zombie, effectively
leaking memory.
Instead of trying to handle all loose ends, get rid of this
self-reference altogether: ask caller to provide a place to save the
permit, where it will survive until the end of the call. This makes the
call-site a little bit less nice, but it gets rid of a whole class of
possible bugs.

Fixes: #22588

Closes scylladb/scylladb#22624

(cherry picked from commit f2d5819645)

Closes scylladb/scylladb#22704
2025-02-06 10:08:19 +02:00
Aleksandra Martyniuk
cca2d974b6 service: use read barrier in tablet_virtual_task::contains
Currently, when the tablet repair is started, info regarding
the operation is kept in the system.tablets. The new tablet states
are reflected in memory after load_topology_state is called.
Before that, the data in the table and the memory aren't consistent.

To check the supported operations, tablet_virtual_task uses in-memory
tablet_metadata. Hence, it may not see the operation, even though
its info is already kept in system.tablets table.

Run read barrier in tablet_virtual_task::contains to ensure it will
see the latest data. Add a test to check it.

Fixes: #21975.

Closes scylladb/scylladb#21995

(cherry picked from commit 610a761ca2)

Closes scylladb/scylladb#22694
2025-02-06 10:07:51 +02:00
Aleksandra Martyniuk
43f2e5f86b nodetool: tasks: print empty string for start_time/end_time if unspecified
If start_time/end_time is unspecified for a task, task_manager API
returns epoch. Nodetool prints the value in task status.

Fix nodetool tasks commands to print empty string for start_time/end_time
if it isn't specified.

Modify nodetool tasks status docs to show empty end_time.

Fixes: #22373.

Closes scylladb/scylladb#22370

(cherry picked from commit 477ad98b72)

Closes scylladb/scylladb#22601
2025-02-06 10:05:07 +02:00
Takuya ASADA
ad81d49923 dist: Support FIPS mode
- To make Scylla able to run in FIPS-compliant system, add .hmac files for
  crypto libraries on relocatable/rpm/deb packages.
- Currently we just write hmac value on *.hmac files, but there is new
  .hmac file format something like this:

  ```
  [global]
  format-version = 1
  [lib.xxx.so.yy]
  path = /lib64/libxxx.so.yy
  hmac = <hmac>
  ```
  Seems like GnuTLS rejects fips selftest on .libgnutls.so.30.hmac when
  file format is older one.
  Since we need to absolute path on "path" directive, we need to generate
  .libgnutls.so.30.hmac in older format on create-relocatable-script.py,

Fixes scylladb/scylladb#22573

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes scylladb/scylladb#22384

(cherry picked from commit fb4c7dc3d8)

Closes scylladb/scylladb#22587
2025-02-06 10:01:12 +02:00
Wojciech Mitros
138c68d80e mv: forbid views with tablets by default
Materialized views with tablets are not stable yet, but we want
them available as an experimental feature, mainly for teseting.

The feature was added in scylladb/scylladb#21833,
but currently it has no effect. All tests have been updated to use the
feature, so we should finally make it work.
This patch prevents users from creating materialized views in keyspaces
using tablets when the VIEWS_WITH_TABLETS feature is not enabled - such
requests will now get rejected.

Fixes scylladb/scylladb#21832

Closes scylladb/scylladb#22217

(cherry picked from commit 677f9962cf)

Closes scylladb/scylladb#22659
2025-02-04 08:06:23 +01:00
Avi Kivity
e0fb727f18 Update seastar submodule (hwloc failure on some AWS instances)
* seastar 1822136684...a350b5d70e (1):
  > resource: fallback to sysconf when failed to detect memory size from hwloc

Fixes #22382.
2025-02-03 22:47:39 +02:00
Jenkins Promoter
440833ae59 Update ScyllaDB version to: 2025.1.0-rc2 2025-02-03 13:23:18 +02:00
Michael Litvak
246635c426 test/test_view_build_status: fix wrong assert in test
The test expects and asserts that after wait_for_view is completed we
read the view_build_status table and get a row for each node and view.
But this is wrong because wait_for_view may have read the table on one
node, and then we query the table on a different node that didn't insert
all the rows yet, so the assert could fail.

To fix it we change the test to retry and check that eventually all
expected rows are found and then eventually removed on the same host.

Fixes scylladb/scylladb#22547

Closes scylladb/scylladb#22585

(cherry picked from commit 44c06ddfbb)

Closes scylladb/scylladb#22608
2025-02-03 09:24:17 +01:00
Michael Litvak
58eda6670f view_builder: fix loop in view builder when tokens are moved
The view builder builds a view by going over the entire token ring,
consuming the base table partitions, and generating view updates for
each partition.

A view is considered as built when we complete a full cycle of the
token ring. Suppose we start to build a view at a token F. We will
consume all partitions with tokens starting at F until the maximum
token, then go back to the minimum token and consume all partitions
until F, and then we detect that we pass F and complete building the
view. This happens in the view builder consumer in
`check_for_built_views`.

The problem is that we check if we pass the first token F with the
condition `_step.current_token() >= it->first_token` whenever we consume
a new partition or the current_token goes back to the minimum token.
But suppose that we don't have any partitions with a token greater than
or equal to the first token (this could happen if the partition with
token F was moved to another node for example), then this condition will never be
satisfied, and we don't detect correctly when we pass F. Instead, we
go back to the minimum token, building the same token ranges again,
in a possibly infinite loop.

To fix this we add another step when reaching the end of the reader's
stream. When this happens it means we don't have any more fragments to
consume until the end of the range, so we advance the current_token to
the end of the range, simulating a partition, and check for built views
in that range.

Fixes scylladb/scylladb#21829

Closes scylladb/scylladb#22493

(cherry picked from commit 6d34125eb7)

Closes scylladb/scylladb#22607
2025-02-02 22:29:52 +02:00
Jenkins Promoter
28b8896680 Update pgo profiles - aarch64 2025-02-01 04:30:11 +02:00
Jenkins Promoter
e9cae4be17 Update pgo profiles - x86_64 2025-02-01 04:05:22 +02:00
Avi Kivity
daf1c96ad3 seatar: point submodule at scylla-seastar.git
This allows backporting commits to seastar.
2025-01-31 19:47:30 +02:00
Botond Dénes
1a1893078a Merge '[Backport 2025.1] encrypted_file_impl: Check for reads on or past actual file length in transform' from Scylladb[bot]
Fixes #22236

If reading a file and not stopping on block bounds returned by `size()`, we could allow reading from (_file_size+&lt;1-15&gt;) (if crossing block boundary) and try to decrypt this buffer (last one).

Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return `_file_size` - `read offset` -> wraparound -> rather larger number than we expected (not to mention the data in question is junk/zero).

Check on last block in `transform` would wrap around size due to us being >= file size (l).
Just do an early bounds check and return zero if we're past the actual data limit.

- (cherry picked from commit e96cc52668)

- (cherry picked from commit 2fb95e4e2f)

Parent PR: #22395

Closes scylladb/scylladb#22583

* github.com:scylladb/scylladb:
  encrypted_file_test: Test reads beyond decrypted file length
  encrypted_file_impl: Check for reads on or past actual file length in transform
2025-01-31 11:38:50 +02:00
Aleksandra Martyniuk
8cc5566a3c api: task_manager: do not unregister tasks on get_status
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.

The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.

(cherry picked from commit 18cc79176a)
2025-01-31 08:21:03 +00:00
Aleksandra Martyniuk
1f52ced2ff api: task_manager: add /task_manager/drain
In the following patches, get_status won't be unregistering finished
tasks. However, tests need a functionality to drop a task, so that
they could manipulate only with the tasks for operations that were
invoked by these tests.

Add /task_manager/drain/{module} to unregister all finished tasks
from the module. Add respective nodetool command.

(cherry picked from commit e37d1bcb98)
2025-01-31 08:21:03 +00:00
Avi Kivity
d7e3ab2226 Merge '[Backport 2025.1] truncate: trigger truncate logic from a transition state instead of global topology request' from Ferenc Szili
This is a manual backport of #22452

Truncate table for tablets is implemented as a global topology operation. However, it does not have a transition state associated with it, and performs the truncate logic in topology_coordinator::handle_global_request() while topology::tstate remains empty. This creates problems because topology::is_busy() uses transition_state to determine if the topology state machine is busy, and will return false even though a truncate operation is ongoing.

This change introduces a new topology transition topology::transition_state::truncate_table and moves the truncate logic to a new method topology_coordinator::handle_truncate_table(). This method is now called as a handler of the truncate_table transition state instead of a handler of the truncate_table global topology request.

Fixes #22552

Closes scylladb/scylladb#22557

* github.com:scylladb/scylladb:
  truncate: trigger truncate logic from transition state instead of global request handler
  truncate: add truncate_table transition state
2025-01-30 22:49:17 +02:00
Anna Stuchlik
cf589222a0 doc: update the Web Installer docs to remove OSS
Fixes https://github.com/scylladb/scylladb/issues/22292

Closes scylladb/scylladb#22433

(cherry picked from commit 2a6445343c)

Closes scylladb/scylladb#22581
2025-01-30 13:04:16 +02:00
Anna Stuchlik
156800a3dd doc: add SStable support in 2025.1
This commit adds the information about SStable version support in 2025.1
by replacing "2022.2" with "2022.2 and above".

In addition, this commit removes information about versions that are
no longer supported.

Fixes https://github.com/scylladb/scylladb/issues/22485

Closes scylladb/scylladb#22486

(cherry picked from commit caf598b118)

Closes scylladb/scylladb#22580
2025-01-30 13:03:47 +02:00
Nikos Dragazis
d1e8b02260 encrypted_file_test: Test reads beyond decrypted file length
Add a test to reproduce a bug in the read DMA API of
`encrypted_file_impl` (the file implementation for Encryption-at-Rest).

The test creates an encrypted file that contains padding, and then
attempts to read from an offset within the padding area. Although this
offset is invalid on the decrypted file, the `encrypted_file_impl` makes
no checks and proceeds with the decryption of padding data, which
eventually leads to bogus results.

Refs #22236.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 8f936b2cbc)
(cherry picked from commit 2fb95e4e2f)
2025-01-30 09:17:31 +00:00
Calle Wilund
a51888694e encrypted_file_impl: Check for reads on or past actual file length in transform
Fixes #22236

If reading a file and not stopping on block bounds returned by `size()`, we could
allow reading from (_file_size+1-15) (block boundary) and try to decrypt this
buffer (last one).
Check on last block in `transform` would wrap around size due to us being >=
file size (l).

Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return _file_size - read offset
-> wraparound -> rather larger number than we expected
(not to mention the data in question is junk/zero).

Just do an early bounds check and return zero if we're past the actual data limit.

v2:
* Moved check to a min expression instead
* Added lengthy comment
* Added unit test

v3:
* Fixed read_dma_bulk handling of short, unaligned read
* Added test for unaligned read

v4:
* Added another unaligned test case

(cherry picked from commit e96cc52668)
2025-01-30 09:17:31 +00:00
Botond Dénes
68f134ee23 Merge '[Backport 2025.1] Do not update topology on address change' from Scylladb[bot]
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated. The series factors out peers table update code from
sync_raft_topology_nodes() and calls it on topology and ip address
updates. As a side effect it fixes #22293 since now topology loading
does not require IP do be present, so the assert that is triggered in
this bug is removed.

Fixes: scylladb/scylladb#22293

- (cherry picked from commit ef929c5def)

- (cherry picked from commit fbfef6b28a)

Parent PR: #22519

Closes scylladb/scylladb#22543

* github.com:scylladb/scylladb:
  topology coordinator: do not update topology on address change
  topology coordinator: split out the peer table update functionality from raft state application
2025-01-30 11:14:19 +02:00
Jenkins Promoter
b623c237bc Update ScyllaDB version to: 2025.1.0-rc1 2025-01-30 01:25:18 +02:00
Calle Wilund
8379d545c5 docs: Remove configuration_encryptor
Fixes #21993

Removes configuration_encryptor mention from docs.
The tool itself (java) is not included in the main branch
java tools, thus need not remove from there. Only the words.

Closes scylladb/scylladb#22427

(cherry picked from commit bae5b44b97)

Closes scylladb/scylladb#22556
2025-01-29 20:17:36 +02:00
Michael Litvak
58d13d0daf cdc: fix handling of new generation during raft upgrade
During raft upgrade, a node may gossip about a new CDC generation that
was propagated through raft. The node that receives the generation by
gossip may have not applied the raft update yet, and it will not find
the generation in the system tables. We should consider this error
non-fatal and retry to read until it succeeds or becomes obsolete.

Another issue is when we fail with a "fatal" exception and not retrying
to read, the cdc metadata is left in an inconsistent state that causes
further attempts to insert this CDC generation to fail.

What happens is we complete preparing the new generation by calling `prepare`,
we insert an empty entry for the generation's timestamp, and then we fail. The
next time we try to insert the generation, we skip inserting it because we see
that it already has an entry in the metadata and we determine that
there's nothing to do. But this is wrong, because the entry is empty,
and we should continue to insert the generation.

To fix it, we change `prepare` to return `true` when the entry already
exists but it's empty, indicating we should continue to insert the
generation.

Fixes scylladb/scylladb#21227

Closes scylladb/scylladb#22093

(cherry picked from commit 4f5550d7f2)

Closes scylladb/scylladb#22546
2025-01-29 20:06:18 +02:00
Anna Stuchlik
4def507b1b doc: add OS support for 2025.1 and reorganize the page
This commit adds the OS support information for version 2025.1.
In addition, the OS support page is reorganized so that:
- The content is moved from the include page _common/os-support-info.rst
  to the regular os-support.rst page. The include page was necessary
  to document different support for OSS and Enterprise versions, so
  we don't need it anymore.
- I skipped the entries for versions that won't be supported when 2025.1
  is released: 6.1 and 2023.1.
- I moved the definition of "supported" to the end of the page for better
  readability.
- I've renamed the index entry to "OS Support" to be shorter on the left menu.

Fixes https://github.com/scylladb/scylladb/issues/22474

Closes scylladb/scylladb#22476

(cherry picked from commit 61c822715c)

Closes scylladb/scylladb#22538
2025-01-29 19:48:32 +02:00
Anna Stuchlik
69ad9350cc doc: remove Enterprise labels and directives
This PR removes the now redundant Enterprise labels and directives
from the ScyllDB documentation.

Fixes https://github.com/scylladb/scylladb/issues/22432

Closes scylladb/scylladb#22434

(cherry picked from commit b2a718547f)

Closes scylladb/scylladb#22539
2025-01-29 19:48:11 +02:00
Anna Stuchlik
29e5f5f54d doc: enable the FIPS note in the ScyllaDB docs
This commit removes the information about FIPS out of the '.. only:: enterprise' directive.
As a result, the information will now show in the doc in the ScyllaDB repo
(previously, the directive included the note in the Entrprise docs only).

Refs https://github.com/scylladb/scylla-enterprise/issues/5020

Closes scylladb/scylladb#22374

(cherry picked from commit 1d5ef3dddb)

Closes scylladb/scylladb#22550
2025-01-29 19:47:37 +02:00
Avi Kivity
379b3fa46c Merge '[Backport 2025.1] repair: handle no_such_keyspace in repair preparation phase' from null
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

Fixes: #22073.

Requires backport to 6.1 and 6.2 as they contain the bug

- (cherry picked from commit bfb1704afa)

- (cherry picked from commit 54e7f2819c)

Parent PR: #22473

Closes scylladb/scylladb#22542

* github.com:scylladb/scylladb:
  test: add test to check if repair handles no_such_keyspace
  repair: handle keyspace dropped
2025-01-29 14:09:23 +02:00
Ferenc Szili
fe869fd902 test: add reproducer and test for fix to split ready CG creation
This adds a reproducer for #22431

In cases where a tablet storage group manager had more than one storage
group, it was possible to create compaction groups outside the group0
guard, which could create problems with operations which should exclude
with compaction group creation.

(cherry picked from commit 8bff7786a8)
2025-01-29 10:10:28 +00:00
Ferenc Szili
dc55a566fa table: run set_split_mode() on all storage groups during all_storage_groups_split()
tablet_storage_group_manager::all_storage_groups_split() calls set_split_mode()
for each of its storage groups to create split ready compaction groups. It does
this by iterating through storage groups using std::ranges::all_of() which is
not guaranteed to iterate through the entire range, and will stop iterating on
the first occurance of the predicate (set_split_mode()) returning false.
set_split_mode() creates the split compaction groups and returns false if the
storage group's main compaction group or merging groups are not empty. This
means that in cases where the tablet storage group manager has non-empty
storage groups, we could have a situation where split compaction groups are not
created for all storage groups.

The missing split compaction groups are later created in
tablet_storage_group_manager::split_all_storage_groups() which also calls
set_split_mode(), and that is the reason why split completes successfully. The
problem is that tablet_storage_group_manager::all_storage_groups_split() runs
under a group0 guard, and tablet_storage_group_manager::split_all_storage_groups()
does not. This can cause problems with operations which should exclude with
compaction group creation. i.e. DROP TABLE/DROP KEYSPACE

(cherry picked from commit 24e8d2a55c)
2025-01-29 10:10:28 +00:00
Ferenc Szili
3bb8039359 truncate: trigger truncate logic from transition state instead of global
request handler

Before this change, the logic of truncate for tablets was triggered from
topology_coordinator::handle_global_request(). This was done without
using a topology transition state which remained empty throughout the
truncate handler's execution.

This change moves the truncate logic to a new method
topology_coordinator::handle_truncate_table(). This method is now called
as a handler of the truncate_table topology transition state instead of
a handler of the trunacate_table global topology request.
2025-01-29 10:48:34 +01:00
Ferenc Szili
9f3838e614 truncate: add truncate_table transition state
Truncate table for tablets is implemented as a global topology operation.
However, it does not have a transition state associated with it, and
performs the truncate logic in handle_global_request() while
topology::tstate remains empty. This creates problems because
topology::is_busy() uses transition_state to determine if the topology
state machine is busy, and will return false even though a truncate
operation is ongoing.

This change adds a new transition state: truncate_table
2025-01-29 10:47:15 +01:00
Gleb Natapov
366212f997 topology coordinator: do not update topology on address change
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated, so call a function that does peers table update only.

(cherry picked from commit fbfef6b28a)
2025-01-28 21:51:11 +00:00
Gleb Natapov
c0637aff81 topology coordinator: split out the peer table update functionality from raft state application
Raft topology state application does two things: re-creates token metadata
and updates peers table if needed. The code for both task is intermixed
now. The patch separates it into separate functions. Will be needed in
the next patch.

(cherry picked from commit ef929c5def)
2025-01-28 21:51:11 +00:00
Aleksandra Martyniuk
dcf436eb84 test: add test to check if repair handles no_such_keyspace
(cherry picked from commit 54e7f2819c)
2025-01-28 21:50:35 +00:00
Aleksandra Martyniuk
8e754e9d41 repair: handle keyspace dropped
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.

Skip the keyspace repair if no_such_keyspace is thrown during preparations.

(cherry picked from commit bfb1704afa)
2025-01-28 21:50:35 +00:00
Yaron Kaikov
f407799f25 Update ScyllaDB version to: 2025.1.0-rc0 2025-01-27 11:29:45 +02:00
2272 changed files with 44339 additions and 155950 deletions

15
.github/CODEOWNERS vendored
View File

@@ -1,5 +1,5 @@
# AUTH
auth/* @nuivall @ptrsmrn
auth/* @nuivall @ptrsmrn @KrzaQ
# CACHE
row_cache* @tgrabiec
@@ -25,15 +25,15 @@ compaction/* @raphaelsc
transport/*
# CQL QUERY LANGUAGE
cql3/* @tgrabiec @nuivall @ptrsmrn
cql3/* @tgrabiec @nuivall @ptrsmrn @KrzaQ
# COUNTERS
counters* @nuivall @ptrsmrn
tests/counter_test* @nuivall @ptrsmrn
counters* @nuivall @ptrsmrn @KrzaQ
tests/counter_test* @nuivall @ptrsmrn @KrzaQ
# DOCS
docs/* @annastuchlik @tzach
docs/alternator @annastuchlik @tzach @nyh
docs/alternator @annastuchlik @tzach @nyh @nuivall @ptrsmrn @KrzaQ
# GOSSIP
gms/* @tgrabiec @asias @kbr-scylla
@@ -57,6 +57,7 @@ repair/* @tgrabiec @asias
# SCHEMA MANAGEMENT
db/schema_tables* @tgrabiec
db/legacy_schema_migrator* @tgrabiec
service/migration* @tgrabiec
schema* @tgrabiec
@@ -73,8 +74,8 @@ streaming/* @tgrabiec @asias
service/storage_service.* @tgrabiec @asias
# ALTERNATOR
alternator/* @nyh
test/alternator/* @nyh
alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
test/alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
# HINTED HANDOFF
db/hints/* @piodul @vladzcloudius @eliransin

View File

@@ -1,86 +1,15 @@
name: "Report a bug"
description: "File a bug report."
title: "[Bug]: "
type: "bug"
labels: bug
body:
- type: checkboxes
id: terms
attributes:
label: Code of Conduct
description: "This is Scylla's bug tracker, to be used for reporting bugs only.
This is Scylla's bug tracker, to be used for reporting bugs only.
If you have a question about Scylla, and not a bug, please ask it in
our forum at https://forum.scylladb.com/ or in our slack channel https://slack.scylladb.com/ "
options:
- label: I have read the disclaimer above and am reporting a suspected malfunction in Scylla.
required: true
our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.
- type: input
id: product-version
attributes:
label: product version
description: Scylla version (or git commit hash)
placeholder: ex. scylla-6.1.1
validations:
required: true
- type: input
id: cluster-size
attributes:
label: Cluster Size
validations:
required: true
- type: input
id: os
attributes:
label: OS
placeholder: RHEL/CentOS/Ubuntu/AWS AMI
validations:
required: true
- type: textarea
id: additional-data
attributes:
label: Additional Environmental Data
#description:
placeholder: Add additional data
value: "Platform (physical/VM/cloud instance type/docker):\n
Hardware: sockets= cores= hyperthreading= memory=\n
Disks: (SSD/HDD, count)"
validations:
required: false
- type: textarea
id: reproducer-steps
attributes:
label: Reproduction Steps
placeholder: Describe how to reproduce the problem
value: "The steps to reproduce the problem are:"
validations:
required: true
- type: textarea
id: the-problem
attributes:
label: What is the problem?
placeholder: Describe the problem you found
value: "The problem is that"
validations:
required: true
- type: textarea
id: what-happened
attributes:
label: Expected behavior?
placeholder: Describe what should have happened
value: "I expected that "
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant log output
description: Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks.
render: shell
- [] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.
*Installation details*
Scylla version (or git commit hash):
Cluster size:
OS (RHEL/CentOS/Ubuntu/AWS AMI):
*Hardware details (for performance issues)* Delete if unneeded
Platform (physical/VM/cloud instance type/docker):
Hardware: sockets= cores= hyperthreading= memory=
Disks: (SSD/HDD, count)

View File

@@ -1,86 +0,0 @@
# ScyllaDB Development Instructions
## Project Context
High-performance distributed NoSQL database. Core values: performance, correctness, readability.
## Build System
### Modern Build (configure.py + ninja)
```bash
# Configure (run once per mode, or when switching modes)
./configure.py --mode=<mode> # mode: dev, debug, release, sanitize
# Build everything
ninja <mode>-build # e.g., ninja dev-build
# Build Scylla binary only (sufficient for Python integration tests)
ninja build/<mode>/scylla
# Build specific test
ninja build/<mode>/test/boost/<test_name>
```
## Running Tests
### C++ Unit Tests
```bash
# Run all tests in a file
./test.py --mode=<mode> test/<suite>/<test_name>.cc
# Run a single test case from a file
./test.py --mode=<mode> test/<suite>/<test_name>.cc::<test_case_name>
# Examples
./test.py --mode=dev test/boost/memtable_test.cc
./test.py --mode=dev test/raft/raft_server_test.cc::test_check_abort_on_client_api
```
**Important:**
- Use full path with `.cc` extension (e.g., `test/boost/test_name.cc`, not `boost/test_name`)
- To run a single test case, append `::<test_case_name>` to the file path
- If you encounter permission issues with cgroup metric gathering, add `--no-gather-metrics` flag
**Rebuilding Tests:**
- test.py does NOT automatically rebuild when test source files are modified
- Many tests are part of composite binaries (e.g., `combined_tests` in test/boost contains multiple test files)
- To find which binary contains a test, check `configure.py` in the repository root (primary source) or `test/<suite>/CMakeLists.txt`
- To rebuild a specific test binary: `ninja build/<mode>/test/<suite>/<binary_name>`
- Examples:
- `ninja build/dev/test/boost/combined_tests` (contains group0_voter_calculator_test.cc and others)
- `ninja build/dev/test/raft/replication_test` (standalone Raft test)
### Python Integration Tests
```bash
# Only requires Scylla binary (full build usually not needed)
ninja build/<mode>/scylla
# Run all tests in a file
./test.py --mode=<mode> <test_path>
# Run a single test case from a file
./test.py --mode=<mode> <test_path>::<test_function_name>
# Examples
./test.py --mode=dev alternator/
./test.py --mode=dev cluster/test_raft_voters::test_raft_limited_voters_retain_coordinator
# Optional flags
./test.py --mode=dev cluster/test_raft_no_quorum -v # Verbose output
./test.py --mode=dev cluster/test_raft_no_quorum --repeat 5 # Repeat test 5 times
```
**Important:**
- Use path without `.py` extension (e.g., `cluster/test_raft_no_quorum`, not `cluster/test_raft_no_quorum.py`)
- To run a single test case, append `::<test_function_name>` to the file path
- Add `-v` for verbose output
- Add `--repeat <num>` to repeat a test multiple times
- After modifying C++ source files, only rebuild the Scylla binary for Python tests - building the entire repository is unnecessary
## Code Philosophy
- Performance matters in hot paths (data read/write, inner loops)
- Self-documenting code through clear naming
- Comments explain "why", not "what"
- Prefer standard library over custom implementations
- Strive for simplicity and clarity, add complexity only when clearly justified
- Question requests: don't blindly implement requests - evaluate trade-offs, identify issues, and suggest better alternatives when appropriate
- Consider different approaches, weigh pros and cons, and recommend the best fit for the specific context

View File

@@ -1,115 +0,0 @@
---
applyTo: "**/*.{cc,hh}"
---
# C++ Guidelines
**Important:** Always match the style and conventions of existing code in the file and directory.
## Memory Management
- Prefer stack allocation whenever possible
- Use `std::unique_ptr` by default for dynamic allocations
- `new`/`delete` are forbidden (use RAII)
- Use `seastar::lw_shared_ptr` or `seastar::shared_ptr` for shared ownership within same shard
- Use `seastar::foreign_ptr` for cross-shard sharing
- Avoid `std::shared_ptr` except when interfacing with external C++ APIs
- Avoid raw pointers except for non-owning references or C API interop
## Seastar Asynchronous Programming
- Use `seastar::future<T>` for all async operations
- Prefer coroutines (`co_await`, `co_return`) over `.then()` chains for readability
- Coroutines are preferred over `seastar::do_with()` for managing temporary state
- In hot paths where futures are ready, continuations may be more efficient than coroutines
- Chain futures with `.then()`, don't block with `.get()` (unless in `seastar::thread` context)
- All I/O must be asynchronous (no blocking calls)
- Use `seastar::gate` for shutdown coordination
- Use `seastar::semaphore` for resource limiting (not `std::mutex`)
- Break long loops with `maybe_yield()` to avoid reactor stalls
## Coroutines
```cpp
seastar::future<T> func() {
auto result = co_await async_operation();
co_return result;
}
```
## Error Handling
- Throw exceptions for errors (futures propagate them automatically)
- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead
- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)
- Database-specific: throw appropriate schema/query exceptions
## Performance
- Pass large objects by `const&` or `&&` (move semantics)
- Use `std::string_view` for non-owning string references
- Avoid copies: prefer move semantics
- Use `utils::chunked_vector` instead of `std::vector` for large allocations (>128KB)
- Minimize dynamic allocations in hot paths
## Database-Specific Types
- Use `schema_ptr` for schema references
- Use `mutation` and `mutation_partition` for data modifications
- Use `partition_key` and `clustering_key` for keys
- Use `api::timestamp_type` for database timestamps
- Use `gc_clock` for garbage collection timing
## Style
- C++23 standard (prefer modern features, especially coroutines)
- Use `auto` when type is obvious from RHS
- Avoid `auto` when it obscures the type
- Use range-based for loops: `for (const auto& item : container)`
- Use standard algorithms when they clearly simplify code (e.g., replacing 10-line loops)
- Avoid chaining multiple algorithms if a straightforward loop is clearer
- Mark functions and variables `const` whenever possible
- Use scoped enums: `enum class` (not unscoped `enum`)
## Headers
- Use `#pragma once`
- Include order: own header, C++ std, Seastar, Boost, project headers
- Forward declare when possible
- Never `using namespace` in headers (exception: `using namespace seastar` is globally available via `seastarx.hh`)
## Documentation
- Public APIs require clear documentation
- Implementation details should be self-evident from code
- Use `///` or Doxygen `/** */` for public documentation, `//` for implementation notes - follow the existing style
## Naming
- `snake_case` for most identifiers (classes, functions, variables, namespaces)
- Template parameters: `CamelCase` (e.g., `template<typename ValueType>`)
- Member variables: prefix with `_` (e.g., `int _count;`)
- Structs (value-only): no `_` prefix on members
- Constants and `constexpr`: `snake_case` (e.g., `static constexpr int max_size = 100;`)
- Files: `.hh` for headers, `.cc` for source
## Formatting
- 4 spaces indentation, never tabs
- Opening braces on same line as control structure (except namespaces)
- Space after keywords: `if (`, `while (`, `return `
- Whitespace around operators matches precedence: `*a + *b` not `* a+* b`
- Line length: keep reasonable (<160 chars), use continuation lines with double indent if needed
- Brace all nested scopes, even single statements
- Minimal patches: only format code you modify, never reformat entire files
## Logging
- Use structured logging with appropriate levels: DEBUG, INFO, WARN, ERROR
- Include context in log messages (e.g., request IDs)
- Never log sensitive data (credentials, PII)
## Forbidden
- `malloc`/`free`
- `printf` family (use logging or fmt)
- Raw pointers for ownership
- `using namespace` in headers
- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)
- `std::atomic` (reserved for very special circumstances only)
- Macros (use `inline`, `constexpr`, or templates instead)
## Testing
When modifying existing code, follow TDD: create/update test first, then implement.
- Examine existing tests for style and structure
- Use Boost.Test framework
- Use `SEASTAR_THREAD_TEST_CASE` for Seastar asynchronous tests
- Aim for high code coverage, especially for new features and bug fixes
- Maintain bisectability: all tests must pass in every commit. Mark failing tests with `BOOST_FAIL()` or similar, then fix in subsequent commit

View File

@@ -1,51 +0,0 @@
---
applyTo: "**/*.py"
---
# Python Guidelines
**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.
## Style
- Follow PEP 8
- Use type hints for function signatures (unless directory style omits them)
- Use f-strings for formatting
- Line length: 160 characters max
- 4 spaces for indentation
## Imports
Order: standard library, third-party, local imports
```python
import os
import sys
import pytest
from cassandra.cluster import Cluster
from test.utils import setup_keyspace
```
Never use `from module import *`
## Documentation
All public functions/classes need docstrings (unless the current directory conventions omit them):
```python
def my_function(arg1: str, arg2: int) -> bool:
"""
Brief summary of function purpose.
Args:
arg1: Description of first argument.
arg2: Description of second argument.
Returns:
Description of return value.
"""
pass
```
## Testing Best Practices
- Maintain bisectability: all tests must pass in every commit
- Mark currently-failing tests with `@pytest.mark.xfail`, unmark when fixed
- Use descriptive names that convey intent
- Docstrings/comments should explain what the test verifies and why, and if it reproduces a specific issue or how it fits into the larger test suite

View File

@@ -29,11 +29,10 @@ def parse_args():
parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')
parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')
parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')
parser.add_argument('--github-event', type=str, help='Get GitHub event type')
return parser.parse_args()
def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft, is_collaborator):
def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):
pr_body = f'{pr.body}\n\n'
for commit in commits:
pr_body += f'- (cherry picked from commit {commit})\n\n'
@@ -47,29 +46,12 @@ def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr
draft=is_draft
)
logging.info(f"Pull request created: {backport_pr.html_url}")
labels_to_add = []
priority_labels = {"P0", "P1"}
parent_pr_labels = [label.name for label in pr.labels]
for label in priority_labels:
if label in parent_pr_labels:
labels_to_add.append(label)
labels_to_add.append("force_on_cloud")
logging.info(f"Adding {label} and force_on_cloud labels from parent PR to backport PR")
break # Only apply the highest priority label
if is_collaborator:
backport_pr.add_to_assignees(pr.user)
backport_pr.add_to_assignees(pr.user)
if is_draft:
labels_to_add.append("conflicts")
backport_pr.add_to_labels("conflicts")
pr_comment = f"@{pr.user.login} - This PR was marked as draft because it has conflicts\n"
pr_comment += "Please resolve them and mark this PR as ready for review"
backport_pr.create_issue_comment(pr_comment)
# Apply all labels at once if we have any
if labels_to_add:
backport_pr.add_to_labels(*labels_to_add)
logging.info(f"Added labels to backport PR: {labels_to_add}")
logging.info(f"Assigned PR to original author: {pr.user}")
return backport_pr
except GithubException as e:
@@ -110,7 +92,18 @@ def get_pr_commits(repo, pr, stable_branch, start_commit=None):
return commits
def backport(repo, pr, version, commits, backport_base_branch, is_collaborator):
def create_pr_comment_and_remove_label(pr, comment_body):
labels = pr.get_labels()
pattern = re.compile(r"backport/\d+\.\d+$")
for label in labels:
if pattern.match(label.name):
print(f"Removing label: {label.name}")
comment_body += f'- {label.name}\n'
pr.remove_from_labels(label)
pr.create_issue_comment(comment_body)
def backport(repo, pr, version, commits, backport_base_branch):
new_branch_name = f'backport/{pr.number}/to-{version}'
backport_pr_title = f'[Backport {version}] {pr.title}'
repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'
@@ -128,45 +121,27 @@ def backport(repo, pr, version, commits, backport_base_branch, is_collaborator):
is_draft = True
repo_local.git.add(A=True)
repo_local.git.cherry_pick('--continue')
# Check if the branch already exists in the remote fork
remote_refs = repo_local.git.ls_remote('--heads', fork_repo, new_branch_name)
if not remote_refs:
# Branch does not exist, create it with a regular push
repo_local.git.push(fork_repo, new_branch_name)
create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
is_draft, is_collaborator)
else:
logging.info(f"Remote branch {new_branch_name} already exists in fork. Skipping push.")
repo_local.git.push(fork_repo, new_branch_name, force=True)
create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,
is_draft=is_draft)
except GitCommandError as e:
logging.warning(f"GitCommandError: {e}")
def with_github_keyword_prefix(repo, pr):
# GitHub issue pattern: #123, scylladb/scylladb#123, or full GitHub URLs
github_pattern = rf"(?:fix(?:|es|ed))\s*:?\s*(?:(?:(?:{repo.full_name})?#)|https://github\.com/{repo.full_name}/issues/)(\d+)"
# JIRA issue pattern: PKG-92 or https://scylladb.atlassian.net/browse/PKG-92
jira_pattern = r"(?:fix(?:|es|ed))\s*:?\s*(?:(?:https://scylladb\.atlassian\.net/browse/)?([A-Z]+-\d+))"
# Check PR body for GitHub issues
github_match = re.findall(github_pattern, pr.body, re.IGNORECASE)
# Check PR body for JIRA issues
jira_match = re.findall(jira_pattern, pr.body, re.IGNORECASE)
match = github_match or jira_match
if match:
pattern = rf"(?:fix(?:|es|ed))\s*:?\s*(?:(?:(?:{repo.full_name})?#)|https://github\.com/{repo.full_name}/issues/)(\d+)"
match = re.findall(pattern, pr.body, re.IGNORECASE)
if not match:
print(f'No valid close reference for {pr.number}')
comment = f':warning: @{pr.user.login} PR body does not contain a Fixes reference to an issue '
comment += ' and can not be backported\n\n'
comment += 'The following labels were removed:\n'
create_pr_comment_and_remove_label(pr, comment)
return False
else:
return True
for commit in pr.get_commits():
github_match = re.findall(github_pattern, commit.commit.message, re.IGNORECASE)
jira_match = re.findall(jira_pattern, commit.commit.message, re.IGNORECASE)
if github_match or jira_match:
print(f'{pr.number} has a valid close reference in commit message {commit.sha}')
return True
print(f'No valid close reference for {pr.number}')
return False
def main():
args = parse_args()
@@ -187,7 +162,6 @@ def main():
scylladbbot_repo = g.get_repo(fork_repo_name)
closed_prs = []
start_commit = None
is_collaborator = True
if args.commits:
start_commit, end_commit = args.commits.split('..')
@@ -212,33 +186,21 @@ def main():
if not backport_labels:
print(f'no backport label: {pr.number}')
continue
if not with_github_keyword_prefix(repo, pr) and args.github_event != 'unlabeled':
comment = f''':warning: @{pr.user.login} PR body or PR commits do not contain a Fixes reference to an issue and can not be backported
please update PR body with a valid ref to an issue. Then remove `scylladbbot/backport_error` label to re-trigger the backport process
'''
pr.create_issue_comment(comment)
pr.add_to_labels("scylladbbot/backport_error")
if args.commits and not with_github_keyword_prefix(repo, pr):
continue
if not repo.private and not scylladbbot_repo.has_in_collaborators(pr.user.login):
logging.info(f"Sending an invite to {pr.user.login} to become a collaborator to {scylladbbot_repo.full_name} ")
scylladbbot_repo.add_to_collaborators(pr.user.login)
comment = f''':warning: @{pr.user.login} you have been added as collaborator to scylladbbot fork
Please check your inbox and approve the invitation, otherwise you will not be able to edit PR branch when needed
'''
# When a pull request is pending for backport but its author is not yet a collaborator of "scylladbbot",
# we attach a "scylladbbot/backport_error" label to the PR.
# This prevents the workflow from proceeding with the backport process
# until the author has been granted proper permissions
# the author should remove the label manually to re-trigger the backport workflow.
pr.add_to_labels("scylladbbot/backport_error")
pr.create_issue_comment(comment)
is_collaborator = False
comment = f':warning: @{pr.user.login} you have been added as collaborator to scylladbbot fork '
comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again\n'
create_pr_comment_and_remove_label(pr, comment)
continue
commits = get_pr_commits(repo, pr, stable_branch, start_commit)
logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")
for backport_label in backport_labels:
version = backport_label.replace('backport/', '')
backport_base_branch = backport_label.replace('backport/', backport_branch)
backport(repo, pr, version, commits, backport_base_branch, is_collaborator)
backport(repo, pr, version, commits, backport_base_branch)
if __name__ == "__main__":

View File

@@ -1,81 +0,0 @@
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#
# Copyright (C) 2024-present ScyllaDB
#
#
# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
#
import argparse
import sys
from pathlib import Path
from typing import Set
def parse_args() -> argparse.Namespace:
"""Parses command-line arguments."""
parser = argparse.ArgumentParser(description='Check license headers in files')
parser.add_argument('--files', required=True, nargs="+", type=Path,
help='List of files to check')
parser.add_argument('--license', required=True,
help='License to check for')
parser.add_argument('--check-lines', type=int, default=10,
help='Number of lines to check (default: %(default)s)')
parser.add_argument('--extensions', required=True, nargs="+",
help='List of file extensions to check')
parser.add_argument('--verbose', action='store_true',
help='Print verbose output (default: %(default)s)')
return parser.parse_args()
def should_check_file(file_path: Path, allowed_extensions: Set[str]) -> bool:
return file_path.suffix in allowed_extensions
def check_license_header(file_path: Path, license_header: str, check_lines: int) -> bool:
try:
with open(file_path, 'r', encoding='utf-8') as f:
for _ in range(check_lines):
line = f.readline()
if license_header in line:
return True
return False
except (UnicodeDecodeError, StopIteration):
# Handle files that can't be read as text or have fewer lines
return False
def main() -> int:
args = parse_args()
if not args.files:
print("No files to check")
return 0
num_errors = 0
for file_path in args.files:
# Skip non-existent files
if not file_path.exists():
continue
# Skip files with non-matching extensions
if not should_check_file(file_path, args.extensions):
print(f" Skipping file with unchecked extension: {file_path}")
continue
# Check license header
if check_license_header(file_path, args.license, args.check_lines):
if args.verbose:
print(f"✅ License header found in: {file_path}")
else:
print(f"❌ Missing license header in: {file_path}")
num_errors += 1
if num_errors > 0:
sys.exit(1)
if __name__ == '__main__':
main()

View File

@@ -30,13 +30,8 @@ def copy_labels_from_linked_issues(repo, pr_number):
try:
issue = repo.get_issue(int(issue_number))
for label in issue.labels:
# Copy ALL labels from issues to PR when PR is opened
pr.add_to_labels(label.name)
print(f"Copied label '{label.name}' from issue #{issue_number} to PR #{pr_number}")
if label.name in ['P0', 'P1']:
pr.add_to_labels('force_on_cloud')
print(f"Added force_on_cloud label to PR #{pr_number} due to {label.name} label")
print(f"All labels from issue #{issue_number} copied to PR #{pr_number}")
print(f"Labels from issue #{issue_number} copied to PR #{pr_number}")
except Exception as e:
print(f"Error processing issue #{issue_number}: {e}")
@@ -79,22 +74,9 @@ def sync_labels(repo, number, label, action, is_issue=False):
target = repo.get_issue(int(pr_or_issue_number))
if action == 'labeled':
target.add_to_labels(label)
if label in ['P0', 'P1'] and is_issue:
# Only add force_on_cloud to PRs when P0/P1 is added to an issue
target.add_to_labels('force_on_cloud')
print(f"Added 'force_on_cloud' label to PR #{pr_or_issue_number} due to {label} label")
print(f"Label '{label}' successfully added.")
elif action == 'unlabeled':
target.remove_from_labels(label)
if label in ['P0', 'P1'] and is_issue:
# Check if any other P0/P1 labels remain before removing force_on_cloud
remaining_priority_labels = [l.name for l in target.labels if l.name in ['P0', 'P1']]
if not remaining_priority_labels:
try:
target.remove_from_labels('force_on_cloud')
print(f"Removed 'force_on_cloud' label from PR #{pr_or_issue_number} as no P0/P1 labels remain")
except Exception as e:
print(f"Warning: Could not remove force_on_cloud label: {e}")
print(f"Label '{label}' successfully removed.")
elif action == 'opened':
copy_labels_from_linked_issues(repo, number)

View File

@@ -1,16 +0,0 @@
{
"problemMatcher": [
{
"owner": "seastar-bad-include",
"severity": "error",
"pattern": [
{
"regexp": "^(.+):(\\d+):(.+)$",
"file": 1,
"line": 2,
"message": 3
}
]
}
]
}

View File

@@ -7,7 +7,7 @@ on:
- branch-*.*
- enterprise
pull_request_target:
types: [labeled, unlabeled]
types: [labeled]
branches: [master, next, enterprise]
jobs:
@@ -53,31 +53,19 @@ jobs:
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}
- name: Check if a valid backport label exists and no backport_error
env:
LABELS_JSON: ${{ toJson(github.event.pull_request.labels) }}
- name: Check if label starts with 'backport/' and contains digits
id: check_label
run: |
labels_json="$LABELS_JSON"
echo "Checking labels:"
echo "$labels_json" | jq -r '.[].name'
# Check if a valid backport label exists
if echo "$labels_json" | jq -e 'any(.[] | .name; test("backport/[0-9]+\\.[0-9]+$"))' > /dev/null; then
# Ensure scylladbbot/backport_error is NOT present
if ! echo "$labels_json" | jq -e '.[] | select(.name == "scylladbbot/backport_error")' > /dev/null; then
echo "A matching backport label was found and no backport_error label exists."
echo "ready_for_backport=true" >> "$GITHUB_OUTPUT"
exit 0
else
echo "The label 'scylladbbot/backport_error' is present, invalidating backport."
fi
label_name="${{ github.event.label.name }}"
if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then
echo "Label matches backport/X.X pattern."
echo "backport_label=true" >> $GITHUB_OUTPUT
else
echo "No matching backport label found."
echo "Label does not match the required pattern."
echo "backport_label=false" >> $GITHUB_OUTPUT
fi
echo "ready_for_backport=false" >> "$GITHUB_OUTPUT"
- name: Run auto-backport.py when PR is closed
if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.ready_for_backport == 'true' && github.event.pull_request.state == 'closed' }}
- name: Run auto-backport.py when label was added
if: ${{ github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && github.event.pull_request.state == 'closed' }}
env:
GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }} --github-event ${{ github.event.action }}
run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}

View File

@@ -1,12 +0,0 @@
name: Call Jira Status In Progress
on:
pull_request_target:
types: [opened]
jobs:
call-jira-status-in-progress:
uses: scylladb/github-automation/.github/workflows/main_update_jira_status_to_in_progress.yml@main
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,12 +0,0 @@
name: Call Jira Status In Review
on:
pull_request_target:
types: [ready_for_review, review_requested]
jobs:
call-jira-status-in-review:
uses: scylladb/github-automation/.github/workflows/main_update_jira_status_to_in_review.yml@main
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,12 +0,0 @@
name: Call Jira Status Ready For Merge
on:
pull_request_target:
types: [labeled]
jobs:
call-jira-status-update:
uses: scylladb/github-automation/.github/workflows/main_update_jira_status_to_ready_for_merge.yml@main
secrets:
caller_jira_auth: ${{ secrets.USER_AND_KEY_FOR_JIRA_AUTOMATION }}

View File

@@ -1,52 +0,0 @@
name: License Header Check
on:
pull_request:
types: [opened, synchronize, reopened]
branches: [master]
env:
HEADER_CHECK_LINES: 10
LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.0"
CHECKED_EXTENSIONS: ".cc .hh .py"
jobs:
check-license-headers:
name: Check License Headers
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Get changed files
id: changed-files
run: |
# Get list of added files comparing with base branch
echo "files=$(git diff --name-only --diff-filter=A ${{ github.event.pull_request.base.sha }} ${{ github.sha }} | tr '\n' ' ')" >> $GITHUB_OUTPUT
- name: Check license headers
if: steps.changed-files.outputs.files != ''
run: |
.github/scripts/check-license.py \
--files ${{ steps.changed-files.outputs.files }} \
--license "${{ env.LICENSE }}" \
--check-lines "${{ env.HEADER_CHECK_LINES }}" \
--extensions ${{ env.CHECKED_EXTENSIONS }}
- name: Comment on PR if check fails
if: failure()
uses: actions/github-script@v7
with:
script: |
const license = '${{ env.LICENSE }}';
await github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `❌ License header check failed. Please ensure all new files include the header within the first ${{ env.HEADER_CHECK_LINES }} lines:\n\`\`\`\n${license}\n\`\`\`\nSee action logs for details.`
});

View File

@@ -7,7 +7,7 @@ on:
env:
# use the development branch explicitly
CLANG_VERSION: 21
CLANG_VERSION: 20
BUILD_DIR: build
permissions: {}

View File

@@ -34,7 +34,6 @@ jobs:
name: Run clang-tidy
needs:
- read-toolchain
if: "${{ needs.read-toolchain.result == 'success' }}"
runs-on: ubuntu-latest
container: ${{ needs.read-toolchain.outputs.image }}
steps:

View File

@@ -1,16 +1,9 @@
name: Notify PR Authors of Conflicts
permissions:
issues: write
pull-requests: write
on:
push:
branches:
- 'master'
- 'branch-*'
schedule:
- cron: '0 10 * * 1' # Runs every Monday at 10:00am
- cron: '0 10 * * 1,4' # Runs every Monday and Thursday at 10:00am
workflow_dispatch: # Manual trigger for testing
jobs:
notify_conflict_prs:
@@ -21,134 +14,32 @@ jobs:
uses: actions/github-script@v7
with:
script: |
console.log("Starting conflict reminder script...");
// Print trigger event
if (process.env.GITHUB_EVENT_NAME) {
console.log(`Workflow triggered by: ${process.env.GITHUB_EVENT_NAME}`);
} else {
console.log("Could not determine workflow trigger event.");
}
const isPushEvent = process.env.GITHUB_EVENT_NAME === 'push';
console.log(`isPushEvent: ${isPushEvent}`);
const twoMonthsAgo = new Date();
twoMonthsAgo.setMonth(twoMonthsAgo.getMonth() - 2);
const prs = await github.paginate(github.rest.pulls.list, {
owner: context.repo.owner,
repo: context.repo.repo,
state: 'open',
per_page: 100
});
console.log(`Fetched ${prs.length} open PRs`);
const recentPrs = prs.filter(pr => new Date(pr.created_at) >= twoMonthsAgo);
const validBaseBranches = ['master'];
const branchPrefix = 'branch-';
const oneWeekAgo = new Date();
const conflictLabel = 'conflicts';
oneWeekAgo.setDate(oneWeekAgo.getDate() - 7);
console.log(`One week ago: ${oneWeekAgo.toISOString()}`);
for (const pr of recentPrs) {
console.log(`Checking PR #${pr.number} on base branch '${pr.base.ref}'`);
const isBranchX = pr.base.ref.startsWith(branchPrefix);
const isMaster = validBaseBranches.includes(pr.base.ref);
if (!(isBranchX || isMaster)) {
console.log(`PR #${pr.number} skipped: base branch is not 'master' or does not start with '${branchPrefix}'`);
continue;
}
const threeDaysAgo = new Date();
const conflictLabel = 'conflicts';
threeDaysAgo.setDate(threeDaysAgo.getDate() - 3);
for (const pr of prs) {
if (!pr.base.ref.startsWith(branchPrefix)) continue;
const hasConflictLabel = pr.labels.some(label => label.name === conflictLabel);
if (!hasConflictLabel) continue;
const updatedDate = new Date(pr.updated_at);
console.log(`PR #${pr.number} last updated at: ${updatedDate.toISOString()}`);
if (!isPushEvent && updatedDate >= oneWeekAgo) {
console.log(`PR #${pr.number} skipped: updated within last week`);
continue;
}
if (pr.assignee === null) {
console.log(`PR #${pr.number} skipped: no assignee`);
continue;
}
// Fetch PR details to check mergeability
let { data: prDetails } = await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: pr.number,
});
console.log(`PR #${pr.number} mergeable: ${prDetails.mergeable}`);
// Wait and re-fetch if mergeable is null
if (prDetails.mergeable === null) {
console.log(`PR #${pr.number} mergeable is null, waiting 2 seconds and retrying...`);
await new Promise(resolve => setTimeout(resolve, 2000)); // wait 2 seconds
prDetails = (await github.rest.pulls.get({
owner: context.repo.owner,
repo: context.repo.repo,
pull_number: pr.number,
})).data;
console.log(`PR #${pr.number} mergeable after retry: ${prDetails.mergeable}`);
}
if (prDetails.mergeable === false) {
const hasConflictLabel = pr.labels.some(label => label.name === conflictLabel);
console.log(`PR #${pr.number} has conflict label: ${hasConflictLabel}`);
// Fetch comments to check for existing notifications
const comments = await github.paginate(github.rest.issues.listComments, {
if (updatedDate >= threeDaysAgo) continue;
if (pr.assignee === null) continue;
const assignee = pr.assignee.login;
if (assignee) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
per_page: 100,
body: `@${assignee}, this PR has been open with conflicts. Please resolve the conflicts so we can merge it.`,
});
// Find last notification comment from the bot
const notificationPrefix = `@${pr.assignee.login}, this PR has merge conflicts with the base branch.`;
const lastNotification = comments
.filter(c =>
c.user.type === "Bot" &&
c.body.startsWith(notificationPrefix)
)
.sort((a, b) => new Date(b.created_at) - new Date(a.created_at))[0];
// Check if we should skip notification based on recent notification
let shouldSkipNotification = false;
if (lastNotification) {
const lastNotified = new Date(lastNotification.created_at);
if (lastNotified >= oneWeekAgo) {
console.log(`PR #${pr.number} skipped: last notification was less than 1 week ago`);
shouldSkipNotification = true;
}
}
// Additional check for push events on draft PRs with conflict labels
if (
isPushEvent &&
pr.draft === true &&
hasConflictLabel &&
shouldSkipNotification
) {
continue;
}
if (!hasConflictLabel) {
await github.rest.issues.addLabels({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
labels: [conflictLabel],
});
console.log(`Added 'conflicts' label to PR #${pr.number}`);
}
const assignee = pr.assignee.login;
if (assignee && !shouldSkipNotification) {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: pr.number,
body: `@${assignee}, this PR has merge conflicts with the base branch. Please resolve the conflicts so we can merge it.`,
});
console.log(`Notified @${assignee} for PR #${pr.number}`);
}
} else {
console.log(`PR #${pr.number} is mergeable, no action needed.`);
}
console.log(`Notified @${assignee} for PR #${pr.number}`);
}
}
console.log(`Total PRs checked: ${prs.length}`);

View File

@@ -1,34 +0,0 @@
name: Docs / Validate metrics
on:
pull_request:
branches:
- master
- enterprise
paths:
- '**/*.cc'
- 'scripts/metrics-config.yml'
- 'scripts/get_description.py'
- 'docs/_ext/scylladb_metrics.py'
jobs:
validate-metrics:
runs-on: ubuntu-latest
name: Check metrics documentation coverage
steps:
- name: Checkout code
uses: actions/checkout@v4
with:
submodules: true
- name: Set up Python
uses: actions/setup-python@v6
with:
python-version: '3.10'
- name: Install dependencies
run: pip install PyYAML
- name: Validate metrics
run: python3 scripts/get_description.py --validate -c scripts/metrics-config.yml

View File

@@ -11,8 +11,7 @@ env:
CLEANER_OUTPUT_PATH: build/clang-include-cleaner.log
# the "idl" subdirectory does not contain C++ source code. the .hh files in it are
# supposed to be processed by idl-compiler.py, so we don't check them using the cleaner
CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction db dht gms index lang message mutation mutation_writer node_ops raft redis replica service
SEASTAR_BAD_INCLUDE_OUTPUT_PATH: build/seastar-bad-include.log
CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction db dht gms index lang message mutation mutation_writer node_ops redis replica
permissions: {}
@@ -81,24 +80,7 @@ jobs:
done
- run: |
echo "::remove-matcher owner=clang-include-cleaner::"
- run: |
echo "::add-matcher::.github/seastar-bad-include.json"
- name: check for seastar includes
run: |
git -c safe.directory="$PWD" \
grep -nE '#include +"seastar/' \
| tee "$SEASTAR_BAD_INCLUDE_OUTPUT_PATH"
- run: |
echo "::remove-matcher owner=seastar-bad-include::"
- uses: actions/upload-artifact@v4
with:
name: Logs
path: |
${{ env.CLEANER_OUTPUT_PATH }}
${{ env.SEASTAR_BAD_INCLUDE_OUTPUT_PATH }}
- name: fail if seastar headers are included as an internal library
run: |
if [ -s "$SEASTAR_BAD_INCLUDE_OUTPUT_PATH" ]; then
echo "::error::Found #include \"seastar/ in the source code. Use angle brackets instead."
exit 1
fi
name: Logs (clang-include-cleaner)
path: "./${{ env.CLEANER_OUTPUT_PATH }}"

View File

@@ -12,8 +12,6 @@ jobs:
mark-ready:
if: github.event.label.name == 'conflicts'
runs-on: ubuntu-latest
permissions:
pull-requests: write
steps:
- name: Checkout repository

View File

@@ -13,12 +13,10 @@ jobs:
issues: write
pull-requests: write
steps:
- name: Wait for label to be added
run: sleep 1m
- uses: mheap/github-action-required-labels@v5
with:
mode: minimum
count: 1
labels: "backport/none\nbackport/\\d{4}\\.\\d+\nbackport/\\d+\\.\\d+"
labels: "backport/none\nbackport/\\d.\\d"
use_regex: true
add_comment: false

View File

@@ -37,13 +37,13 @@ jobs:
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}
- name: Pull request labeled or unlabeled event
if: github.event_name == 'pull_request_target' && (startsWith(github.event.label.name, 'backport/') || github.event.label.name == 'P0' || github.event.label.name == 'P1')
if: github.event_name == 'pull_request_target' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}
- name: Issue labeled or unlabeled event
if: github.event_name == 'issues' && (startsWith(github.event.label.name, 'backport/') || github.event.label.name == 'P0' || github.event.label.name == 'P1')
if: github.event_name == 'issues' && startsWith(github.event.label.name, 'backport/')
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.issue.number }} --action ${{ github.event.action }} --is_issue --label ${{ github.event.label.name }}

View File

@@ -1,21 +0,0 @@
name: Trigger Scylla CI Route
on:
issue_comment:
types: [created]
jobs:
trigger-jenkins:
if: github.event.comment.user.login != 'scylladbbot' && contains(github.event.comment.body, '@scylladbbot') && contains(github.event.comment.body, 'trigger-ci')
runs-on: ubuntu-latest
steps:
- name: Trigger Scylla-CI-Route Jenkins Job
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
run: |
PR_NUMBER=${{ github.event.issue.number }}
PR_REPO_NAME=${{ github.event.repository.full_name }}
curl -X POST "$JENKINS_URL/job/releng/job/Scylla-CI-Route/buildWithParameters?PR_NUMBER=$PR_NUMBER&PR_REPO_NAME=$PR_REPO_NAME" \
--user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail -i -v

View File

@@ -1,242 +0,0 @@
name: Trigger next gating
on:
pull_request_target:
types: [opened, reopened, synchronize]
issue_comment:
types: [created]
jobs:
trigger-ci:
runs-on: ubuntu-latest
steps:
- name: Dump GitHub context
env:
GITHUB_CONTEXT: ${{ toJson(github) }}
run: echo "$GITHUB_CONTEXT"
- name: Checkout PR code
uses: actions/checkout@v3
with:
fetch-depth: 0 # Needed to access full history
ref: ${{ github.event.pull_request.head.ref }}
- name: Fetch before commit if needed
run: |
if ! git cat-file -e ${{ github.event.before }} 2>/dev/null; then
echo "Fetching before commit ${{ github.event.before }}"
git fetch --depth=1 origin ${{ github.event.before }}
fi
- name: Compare commits for file changes
if: github.action == 'synchronize'
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
echo "Base: ${{ github.event.before }}"
echo "Head: ${{ github.event.after }}"
TREE_BEFORE=$(git show -s --format=%T ${{ github.event.before }})
TREE_AFTER=$(git show -s --format=%T ${{ github.event.after }})
echo "TREE_BEFORE=$TREE_BEFORE" >> $GITHUB_ENV
echo "TREE_AFTER=$TREE_AFTER" >> $GITHUB_ENV
- name: Check if last push has file changes
run: |
if [[ "${{ env.TREE_BEFORE }}" == "${{ env.TREE_AFTER }}" ]]; then
echo "No file changes detected in the last push, only commit message edit."
echo "has_file_changes=false" >> $GITHUB_ENV
else
echo "File changes detected in the last push."
echo "has_file_changes=true" >> $GITHUB_ENV
fi
- name: Rule 1 - Check PR draft or conflict status
run: |
# Check if PR is in draft mode
IS_DRAFT="${{ github.event.pull_request.draft }}"
# Check if PR has 'conflict' label
HAS_CONFLICT_LABEL="false"
LABELS='${{ toJson(github.event.pull_request.labels) }}'
if echo "$LABELS" | jq -r '.[].name' | grep -q "^conflict$"; then
HAS_CONFLICT_LABEL="true"
fi
# Set draft_or_conflict variable
if [[ "$IS_DRAFT" == "true" || "$HAS_CONFLICT_LABEL" == "true" ]]; then
echo "draft_or_conflict=true" >> $GITHUB_ENV
echo "✅ Rule 1: PR is in draft mode or has conflict label - setting draft_or_conflict=true"
else
echo "draft_or_conflict=false" >> $GITHUB_ENV
echo "✅ Rule 1: PR is ready and has no conflict label - setting draft_or_conflict=false"
fi
echo "Draft status: $IS_DRAFT"
echo "Has conflict label: $HAS_CONFLICT_LABEL"
echo "Result: draft_or_conflict = $draft_or_conflict"
- name: Rule 2 - Check labels
run: |
# Check if PR has P0 or P1 labels
HAS_P0_P1_LABEL="false"
LABELS='${{ toJson(github.event.pull_request.labels) }}'
if echo "$LABELS" | jq -r '.[].name' | grep -E "^(P0|P1)$" > /dev/null; then
HAS_P0_P1_LABEL="true"
fi
# Check if PR already has force_on_cloud label
echo "HAS_FORCE_ON_CLOUD_LABEL=false" >> $GITHUB_ENV
if echo "$LABELS" | jq -r '.[].name' | grep -q "^force_on_cloud$"; then
HAS_FORCE_ON_CLOUD_LABEL="true"
echo "HAS_FORCE_ON_CLOUD_LABEL=true" >> $GITHUB_ENV
fi
echo "Has P0/P1 label: $HAS_P0_P1_LABEL"
echo "Has force_on_cloud label: $HAS_FORCE_ON_CLOUD_LABEL"
# Add force_on_cloud label if PR has P0/P1 and doesn't already have force_on_cloud
if [[ "$HAS_P0_P1_LABEL" == "true" && "$HAS_FORCE_ON_CLOUD_LABEL" == "false" ]]; then
echo "✅ Rule 2: PR has P0 or P1 label - adding force_on_cloud label"
curl -X POST \
-H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" \
-H "Accept: application/vnd.github.v3+json" \
"https://api.github.com/repos/${{ github.repository }}/issues/${{ github.event.pull_request.number }}/labels" \
-d '{"labels":["force_on_cloud"]}'
elif [[ "$HAS_P0_P1_LABEL" == "true" && "$HAS_FORCE_ON_CLOUD_LABEL" == "true" ]]; then
echo "✅ Rule 2: PR has P0 or P1 label and already has force_on_cloud label - no action needed"
else
echo "✅ Rule 2: PR does not have P0 or P1 label - no force_on_cloud label needed"
fi
SKIP_UNIT_TEST_CUSTOM="false"
if echo "$LABELS" | jq -r '.[].name' | grep -q "^ci/skip_unit-tests_custom$"; then
SKIP_UNIT_TEST_CUSTOM="true"
fi
echo "SKIP_UNIT_TEST_CUSTOM=$SKIP_UNIT_TEST_CUSTOM" >> $GITHUB_ENV
- name: Rule 3 - Analyze changed files and set build requirements
run: |
# Get list of changed files
CHANGED_FILES=$(git diff --name-only ${{ github.event.pull_request.base.sha }} ${{ github.event.pull_request.head.sha }})
echo "Changed files:"
echo "$CHANGED_FILES"
echo ""
# Initialize all requirements to false
REQUIRE_BUILD="false"
REQUIRE_DTEST="false"
REQUIRE_UNITTEST="false"
REQUIRE_ARTIFACTS="false"
REQUIRE_SCYLLA_GDB="false"
# Check each file against patterns
while IFS= read -r file; do
if [[ -n "$file" ]]; then
echo "Checking file: $file"
# Build pattern: ^(?!scripts\/pull_github_pr.sh).*$
# Everything except scripts/pull_github_pr.sh
if [[ "$file" != "scripts/pull_github_pr.sh" ]]; then
REQUIRE_BUILD="true"
echo " ✓ Matches build pattern"
fi
# Dtest pattern: ^(?!test(.py|\/)|dist\/docker\/|dist\/common\/scripts\/).*$
# Everything except test files, dist/docker/, dist/common/scripts/
if [[ ! "$file" =~ ^test\.(py|/).*$ ]] && [[ ! "$file" =~ ^dist/docker/.*$ ]] && [[ ! "$file" =~ ^dist/common/scripts/.*$ ]]; then
REQUIRE_DTEST="true"
echo " ✓ Matches dtest pattern"
fi
# Unittest pattern: ^(?!dist\/docker\/|dist\/common\/scripts).*$
# Everything except dist/docker/, dist/common/scripts/
if [[ ! "$file" =~ ^dist/docker/.*$ ]] && [[ ! "$file" =~ ^dist/common/scripts.*$ ]]; then
REQUIRE_UNITTEST="true"
echo " ✓ Matches unittest pattern"
fi
# Artifacts pattern: ^(?:dist|tools\/toolchain).*$
# Files starting with dist or tools/toolchain
if [[ "$file" =~ ^dist.*$ ]] || [[ "$file" =~ ^tools/toolchain.*$ ]]; then
REQUIRE_ARTIFACTS="true"
echo " ✓ Matches artifacts pattern"
fi
# Scylla GDB pattern: ^(scylla-gdb.py).*$
# Files starting with scylla-gdb.py
if [[ "$file" =~ ^scylla-gdb\.py.*$ ]]; then
REQUIRE_SCYLLA_GDB="true"
echo " ✓ Matches scylla_gdb pattern"
fi
fi
done <<< "$CHANGED_FILES"
# Set environment variables
echo "requireBuild=$REQUIRE_BUILD" >> $GITHUB_ENV
echo "requireDtest=$REQUIRE_DTEST" >> $GITHUB_ENV
echo "requireUnittest=$REQUIRE_UNITTEST" >> $GITHUB_ENV
echo "requireArtifacts=$REQUIRE_ARTIFACTS" >> $GITHUB_ENV
echo "requireScyllaGdb=$REQUIRE_SCYLLA_GDB" >> $GITHUB_ENV
echo ""
echo "✅ Rule 3: File analysis complete"
echo "Build required: $REQUIRE_BUILD"
echo "Dtest required: $REQUIRE_DTEST"
echo "Unittest required: $REQUIRE_UNITTEST"
echo "Artifacts required: $REQUIRE_ARTIFACTS"
echo "Scylla GDB required: $REQUIRE_SCYLLA_GDB"
- name: Determine Jenkins Job Name
run: |
if [[ "${{ github.ref_name }}" == "next" ]]; then
FOLDER_NAME="scylla-master"
elif [[ "${{ github.ref_name }}" == "next-enterprise" ]]; then
FOLDER_NAME="scylla-enterprise"
else
VERSION=$(echo "${{ github.ref_name }}" | awk -F'-' '{print $2}')
if [[ "$VERSION" =~ ^202[0-4]\.[0-9]+$ ]]; then
FOLDER_NAME="enterprise-$VERSION"
elif [[ "$VERSION" =~ ^[0-9]+\.[0-9]+$ ]]; then
FOLDER_NAME="scylla-$VERSION"
fi
fi
echo "JOB_NAME=${FOLDER_NAME}/job/scylla-ci" >> $GITHUB_ENV
- name: Trigger Jenkins Job
if: env.draft_or_conflict == 'false' && env.has_file_changes == 'true' && github.action == 'opened' || github.action == 'reopened'
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
run: |
PR_NUMBER=${{ github.event.issue.number }}
PR_REPO_NAME=${{ github.event.repository.full_name }}
echo "Triggering Jenkins Job: $JOB_NAME"
curl -X POST \
"$JENKINS_URL/job/$JOB_NAME/buildWithParameters? \
PR_NUMBER=$PR_NUMBER& \
RUN_DTEST=$REQUIRE_DTEST& \
RUN_ONLY_SCYLLA_GDB=$REQUIRE_SCYLLA_GDB& \
RUN_UNIT_TEST=$REQUIRE_UNITTEST& \
FORCE_ON_CLOUD=$HAS_FORCE_ON_CLOUD_LABEL& \
SKIP_UNIT_TEST_CUSTOM=$SKIP_UNIT_TEST_CUSTOM& \
RUN_ARTIFACT_TESTS=$REQUIRE_ARTIFACTS" \
--fail \
--user "$JENKINS_USER:$JENKINS_API_TOKEN" \
-i -v
trigger-ci-via-comment:
if: github.event.comment.user.login != 'scylladbbot' && contains(github.event.comment.body, '@scylladbbot') && contains(github.event.comment.body, 'trigger-ci')
runs-on: ubuntu-latest
steps:
- name: Trigger Scylla-CI Jenkins Job
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
run: |
PR_NUMBER=${{ github.event.issue.number }}
PR_REPO_NAME=${{ github.event.repository.full_name }}
curl -X POST "$JENKINS_URL/job/$JOB_NAME/buildWithParameters?PR_NUMBER=$PR_NUMBER" \
--user "$JENKINS_USER:$JENKINS_API_TOKEN" --fail -i -v

View File

@@ -1,50 +0,0 @@
name: Trigger next gating
on:
push:
branches:
- next**
jobs:
trigger-jenkins:
runs-on: ubuntu-latest
steps:
- name: Determine Jenkins Job Name
run: |
if [[ "${{ github.ref_name }}" == "next" ]]; then
FOLDER_NAME="scylla-master"
elif [[ "${{ github.ref_name }}" == "next-enterprise" ]]; then
FOLDER_NAME="scylla-enterprise"
else
VERSION=$(echo "${{ github.ref_name }}" | awk -F'-' '{print $2}')
if [[ "$VERSION" =~ ^202[0-4]\.[0-9]+$ ]]; then
FOLDER_NAME="enterprise-$VERSION"
elif [[ "$VERSION" =~ ^[0-9]+\.[0-9]+$ ]]; then
FOLDER_NAME="scylla-$VERSION"
fi
fi
echo "JOB_NAME=${FOLDER_NAME}/job/next" >> $GITHUB_ENV
- name: Trigger Jenkins Job
env:
JENKINS_USER: ${{ secrets.JENKINS_USERNAME }}
JENKINS_API_TOKEN: ${{ secrets.JENKINS_TOKEN }}
JENKINS_URL: "https://jenkins.scylladb.com"
SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
run: |
echo "Triggering Jenkins Job: $JOB_NAME"
if ! curl -X POST "$JENKINS_URL/job/$JOB_NAME/buildWithParameters" --fail --user "$JENKINS_USER:$JENKINS_API_TOKEN" -i -v; then
echo "Error: Jenkins job trigger failed"
# Send Slack message
curl -X POST -H 'Content-type: application/json' \
-H "Authorization: Bearer $SLACK_BOT_TOKEN" \
--data '{
"channel": "#releng-team",
"text": "🚨 @here '$JOB_NAME' failed to be triggered, please check https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }} for more details",
"icon_emoji": ":warning:"
}' \
https://slack.com/api/chat.postMessage
exit 1
fi

View File

@@ -2,7 +2,7 @@ name: Urgent Issue Reminder
on:
schedule:
- cron: '10 8 * * *' # Runs daily at 8 AM
- cron: '10 8 * * 1' # Runs every Monday at 8 AM
jobs:
reminder:

2
.gitignore vendored
View File

@@ -35,5 +35,3 @@ compile_commands.json
.envrc
clang_build
.idea/
nuke
rust/target

5
.gitmodules vendored
View File

@@ -1,6 +1,6 @@
[submodule "seastar"]
path = seastar
url = ../seastar
url = ../scylla-seastar
ignore = dirty
[submodule "swagger-ui"]
path = swagger-ui
@@ -9,6 +9,9 @@
[submodule "abseil"]
path = abseil
url = ../abseil-cpp
[submodule "scylla-tools"]
path = tools/java
url = ../scylla-tools-java
[submodule "scylla-python3"]
path = tools/python3
url = ../scylla-python3

View File

@@ -49,7 +49,7 @@ include(limit_jobs)
set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")
set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")
set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")
set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)
set(CMAKE_CXX_VISIBILITY_PRESET hidden)
if(is_multi_config)
find_package(Seastar)
@@ -66,37 +66,36 @@ if(is_multi_config)
# establishing proper dependencies between them
include(ExternalProject)
# should be consistent with configure_seastar() in configure.py
set(seastar_build_dir "${CMAKE_BINARY_DIR}/$<CONFIG>/seastar")
ExternalProject_Add(Seastar
SOURCE_DIR "${PROJECT_SOURCE_DIR}/seastar"
BINARY_DIR "${CMAKE_BINARY_DIR}/$<CONFIG>/seastar"
CONFIGURE_COMMAND ""
BUILD_COMMAND ${CMAKE_COMMAND} --build "${seastar_build_dir}"
BUILD_COMMAND ${CMAKE_COMMAND} --build <BINARY_DIR>
--target seastar
--target seastar_testing
--target seastar_perf_testing
--target app_iotune
BUILD_ALWAYS ON
BUILD_BYPRODUCTS
${seastar_build_dir}/libseastar.$<IF:$<CONFIG:Debug,Dev>,so,a>
${seastar_build_dir}/libseastar_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
${seastar_build_dir}/libseastar_perf_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
${seastar_build_dir}/apps/iotune/iotune
${seastar_build_dir}/gen/include/seastar/http/chunk_parsers.hh
${seastar_build_dir}/gen/include/seastar/http/request_parser.hh
${seastar_build_dir}/gen/include/seastar/http/response_parser.hh
<BINARY_DIR>/libseastar.$<IF:$<CONFIG:Debug,Dev>,so,a>
<BINARY_DIR>/libseastar_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
<BINARY_DIR>/libseastar_perf_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>
<BINARY_DIR>/apps/iotune/iotune
<BINARY_DIR>/gen/include/seastar/http/chunk_parsers.hh
<BINARY_DIR>/gen/include/seastar/http/request_parser.hh
<BINARY_DIR>/gen/include/seastar/http/response_parser.hh
INSTALL_COMMAND "")
add_dependencies(Seastar::seastar Seastar)
add_dependencies(Seastar::seastar_testing Seastar)
else()
set(Seastar_TESTING ON CACHE BOOL "" FORCE)
set(Seastar_API_LEVEL 9 CACHE STRING "" FORCE)
set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)
set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)
set(Seastar_APPS ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)
set(Seastar_IO_URING ON CACHE BOOL "" FORCE)
set(Seastar_SCHEDULING_GROUPS_COUNT 21 CACHE STRING "" FORCE)
set(Seastar_SCHEDULING_GROUPS_COUNT 19 CACHE STRING "" FORCE)
set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)
add_subdirectory(seastar)
target_compile_definitions (seastar
@@ -116,7 +115,6 @@ list(APPEND absl_cxx_flags
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")
list(APPEND absl_cxx_flags "-Wno-deprecated-builtins")
list(APPEND ABSL_LLVM_FLAGS ${absl_cxx_flags})
endif()
set(ABSL_DEFAULT_LINKOPTS
@@ -164,68 +162,48 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")
include(add_version_library)
generate_scylla_version()
option(Scylla_USE_PRECOMPILED_HEADER "Use precompiled header for Scylla" ON)
add_library(scylla-precompiled-header STATIC exported_templates.cc)
target_link_libraries(scylla-precompiled-header PRIVATE
absl::headers
absl::btree
absl::hash
absl::raw_hash_set
add_library(scylla-zstd STATIC
zstd.cc)
target_link_libraries(scylla-zstd
PRIVATE
db
Seastar::seastar
Snappy::snappy
systemd
ZLIB::ZLIB
lz4::lz4_static
zstd::zstd_static)
if (Scylla_USE_PRECOMPILED_HEADER)
set(Scylla_USE_PRECOMPILED_HEADER_USE ON)
find_program(DISTCC_EXEC NAMES distcc OPTIONAL)
if (DISTCC_EXEC)
if(DEFINED ENV{DISTCC_HOSTS})
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
message(STATUS "Disabling precompiled header usage because distcc exists and DISTCC_HOSTS is set, assuming you're using distributed compilation.")
else()
file(REAL_PATH "~/.distcc/hosts" DIST_CC_HOSTS_PATH EXPAND_TILDE)
if (EXISTS ${DIST_CC_HOSTS_PATH})
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
message(STATUS "Disabling precompiled header usage because distcc and ~/.distcc/hosts exists, assuming you're using distributed compilation.")
endif()
endif()
endif()
if (Scylla_USE_PRECOMPILED_HEADER_USE)
message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")
target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")
target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)
endif()
else()
set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)
endif()
zstd::libzstd)
add_library(scylla-main STATIC)
target_sources(scylla-main
PRIVATE
absl-flat_hash_map.cc
bytes.cc
client_data.cc
clocks-impl.cc
sstable_dict_autotrainer.cc
collection_mutation.cc
compress.cc
converting_mutation_partition_applier.cc
counters.cc
direct_failure_detector/failure_detector.cc
duration.cc
exceptions/exceptions.cc
frozen_schema.cc
generic_server.cc
debug.cc
init.cc
keys/keys.cc
keys.cc
multishard_mutation_query.cc
mutation_query.cc
node_ops/task_manager_module.cc
partition_slice_builder.cc
query/query.cc
querier.cc
query.cc
query_ranges_to_vnodes.cc
query/query-result-set.cc
query-result-set.cc
tombstone_gc_options.cc
tombstone_gc.cc
reader_concurrency_semaphore.cc
reader_concurrency_semaphore_group.cc
row_cache.cc
schema_mutations.cc
serializer.cc
service/direct_failure_detector/failure_detector.cc
sstables_loader.cc
table_helper.cc
tasks/task_handler.cc
@@ -236,6 +214,7 @@ target_sources(scylla-main
vint-serialization.cc)
target_link_libraries(scylla-main
PRIVATE
"$<LINK_LIBRARY:WHOLE_ARCHIVE,scylla-zstd>"
db
absl::headers
absl::btree
@@ -247,7 +226,6 @@ target_link_libraries(scylla-main
ZLIB::ZLIB
lz4::lz4_static
zstd::zstd_static
scylla-precompiled-header
)
option(Scylla_CHECK_HEADERS
@@ -302,6 +280,7 @@ add_subdirectory(mutation)
add_subdirectory(mutation_writer)
add_subdirectory(node_ops)
add_subdirectory(readers)
add_subdirectory(redis)
add_subdirectory(replica)
add_subdirectory(raft)
add_subdirectory(repair)
@@ -316,7 +295,6 @@ add_subdirectory(tracing)
add_subdirectory(transport)
add_subdirectory(types)
add_subdirectory(utils)
add_subdirectory(vector_search)
add_version_library(scylla_version
release.cc)
@@ -346,6 +324,7 @@ set(scylla_libs
mutation_writer
raft
readers
redis
repair
replica
schema
@@ -358,8 +337,7 @@ set(scylla_libs
tracing
transport
types
utils
vector_search)
utils)
target_link_libraries(scylla PRIVATE
${scylla_libs})
@@ -393,6 +371,3 @@ endif()
if(Scylla_BUILD_INSTRUMENTED)
add_subdirectory(pgo)
endif()
add_executable(patchelf
tools/patchelf.cc)

View File

@@ -12,7 +12,7 @@ Please use the [issue tracker](https://github.com/scylladb/scylla/issues/) to re
## Contributing code to Scylla
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form to cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
If you need help formatting or sending patches, [check out these instructions](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches).
The Scylla C++ source code uses the [Seastar coding style](https://github.com/scylladb/seastar/blob/master/coding-style.md) so please adhere to that in your patches. Note that Scylla code is written with `using namespace seastar`, so should not explicitly add the `seastar::` prefix to Seastar symbols. You will usually not need to add `using namespace seastar` to new source files, because most Scylla header files have `#include "seastarx.hh"`, which does this.

View File

@@ -43,7 +43,7 @@ $ ./tools/toolchain/dbuild ninja build/release/scylla
$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1
```
Note: do not mix environments - either perform all your work with dbuild, or natively on the host.
Note: do not mix environemtns - either perform all your work with dbuild, or natively on the host.
Note2: you can get to an interactive shell within dbuild by running it without any parameters:
```bash
$ ./tools/toolchain/dbuild
@@ -91,7 +91,7 @@ You can also specify a single mode. For example
$ ninja-build release
```
Will build everything in release mode. The valid modes are
Will build everytihng in release mode. The valid modes are
* Debug: Enables [AddressSanitizer](https://github.com/google/sanitizers/wiki/AddressSanitizer)
and other sanity checks. It has no optimizations, which allows for debugging with tools like
@@ -220,9 +220,28 @@ On a development machine, one might run Scylla as
$ SCYLLA_HOME=$HOME/scylla build/release/scylla --overprovisioned --developer-mode=yes
```
To interact with scylla it is recommended to build our version of
cqlsh. It is available at
https://github.com/scylladb/scylla-cqlsh and is available as a submodule.
To interact with scylla it is recommended to build our versions of
cqlsh and nodetool. They are available at
https://github.com/scylladb/scylla-tools-java and can be built with
```bash
$ sudo ./install-dependencies.sh
$ ant jar
```
cqlsh should work out of the box, but nodetool depends on a running
scylla-jmx (https://github.com/scylladb/scylla-jmx). It can be build
with
```bash
$ mvn package
```
and must be started with
```bash
$ ./scripts/scylla-jmx
```
### Branches and tags
@@ -261,45 +280,21 @@ Once the patch set is ready to be reviewed, push the branch to the public remote
### Development environment and source code navigation
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt` that can be used with development environments so
that they can properly analyze the source code. However, building with CMake is not yet officially supported.
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
Good IDEs that have support for CMake build toolchain are [CLion](https://www.jetbrains.com/clion/),
[KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects and its
C++ parser has many issues.
Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
#### CLion
To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice
for code hygiene, though its C++ parser sometimes makes errors and flags false issues. In order to enable proper code
analysis in CLion, the following steps are needed:
1. Get the ScyllaDB source code by following the [Getting the source code](#getting-the-source-code).
2. Follow the steps in [Dependencies](#dependencies) in order to install the required tools natively into your system.
**Don't** follow the *frozen toolchain* part described there, since CMake checks for the build dependencies installed
in the system, not in the container image provided by the toolchain.
3. In CLion, select `File``Open` and select the main ScyllaDB directory in order to open the CMake project there. The
project should open and fail to process the `CMakeLists.txt`. That's expected.
4. In CLion, open `File``Settings`.
5. Find and click on `Toolchains` (type *toolchains* into search box).
6. Select the toolchain you will use, for instance the `Default` one.
7. Type in the following system-installed tools to be used:
- `CMake`: *cmake*
- `Build Tool`: *ninja*
- `C Compiler`: *clang*
- `C++ Compiler`: *clang*
8. On the `CMake` panel/tab, click on `Reload CMake Project`
After that, CLion should successfully initialize the CMake project (marked by `[Finished]` in the console) and the
source code editor should provide code analysis support normally from now on.
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
### Distributed compilation: `distcc` and `ccache`
Scylla's compilations times can be long. Two tools help somewhat:
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and reuses them when possible
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.
@@ -361,7 +356,7 @@ avoid that the gold linker can be told to create an index with
More info at https://gcc.gnu.org/wiki/DebugFission.
Both options can be enabled by passing `--split-dwarf` to configure.py.
Both options can be enable by passing `--split-dwarf` to configure.py.
Note that distcc is *not* compatible with it, but icecream
(https://github.com/icecc/icecream) is.
@@ -370,7 +365,7 @@ Note that distcc is *not* compatible with it, but icecream
Sometimes Scylla development is closely tied with a feature being developed in Seastar. It can be useful to compile Scylla with a particular check-out of Seastar.
One way to do this is to create a local remote for the Seastar submodule in the Scylla repository:
One way to do this it to create a local remote for the Seastar submodule in the Scylla repository:
```bash
$ cd $HOME/src/scylla

View File

@@ -49,7 +49,7 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using
* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensors trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.
* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback"). If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it. All right in any trademark or logo of Licensor or its affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between the You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.
* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.

View File

@@ -1,6 +1,9 @@
This project includes code developed by the Apache Software Foundation (http://www.apache.org/),
especially Apache Cassandra.
It includes files from https://github.com/antonblanchard/crc32-vpmsum (author Anton Blanchard <anton@au.ibm.com>, IBM).
These files are located in utils/arch/powerpc/crc32-vpmsum. Their license may be found in licenses/LICENSE-crc32-vpmsum.TXT.
It includes modified code from https://gitbox.apache.org/repos/asf?p=cassandra-dtest.git (owned by The Apache Software Foundation)
It includes modified tests from https://github.com/etcd-io/etcd.git (owned by The etcd Authors)

View File

@@ -18,7 +18,7 @@ Scylla is fairly fussy about its build environment, requiring very recent
versions of the C++23 compiler and of many libraries to build. The document
[HACKING.md](HACKING.md) includes detailed information on building and
developing Scylla, but to get Scylla building quickly on (almost) any build
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md).
machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),
This is a pre-configured Docker image which includes recent versions of all
the required compilers, libraries and build tools. Using the frozen toolchain
allows you to avoid changing anything in your build machine to meet Scylla's
@@ -102,7 +102,7 @@ If you are a developer working on Scylla, please read the [developer guidelines]
## Contact
* The [community forum] and [Slack channel] are for users to discuss configuration, management, and operations of ScyllaDB.
* The [community forum] and [Slack channel] are for users to discuss configuration, management, and operations of the ScyllaDB open source.
* The [developers mailing list] is for developers and people interested in following the development of ScyllaDB to discuss technical topics.
[Community forum]: https://forum.scylladb.com/

View File

@@ -78,7 +78,7 @@ fi
# Default scylla product/version tags
PRODUCT=scylla
VERSION=2026.1.0-dev
VERSION=2025.1.3
if test -f version
then

View File

@@ -17,7 +17,6 @@ target_sources(alternator
streams.cc
consumed_capacity.cc
ttl.cc
parsed_expression_cache.cc
${cql_grammar_srcs})
target_include_directories(alternator
PUBLIC
@@ -34,8 +33,5 @@ target_link_libraries(alternator
idl
absl::headers)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(alternator REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers alternator
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -11,6 +11,7 @@
#include "utils/log.hh"
#include <string>
#include <string_view>
#include "bytes.hh"
#include "alternator/auth.hh"
#include <fmt/format.h>
#include "auth/password_authenticator.hh"

View File

@@ -74,10 +74,6 @@ uint64_t wcu_consumed_capacity_counter::get_half_units() const noexcept {
return calculate_half_units(WCU_BLOCK_SIZE_LENGTH, _total_bytes, true);
}
uint64_t wcu_consumed_capacity_counter::get_units(uint64_t total_bytes) noexcept {
return calculate_half_units(WCU_BLOCK_SIZE_LENGTH, total_bytes, true) * HALF_UNIT_MULTIPLIER;
}
wcu_consumed_capacity_counter::wcu_consumed_capacity_counter(const rjson::value& request) :
consumed_capacity_counter(should_add_capacity(request)) {
}

View File

@@ -60,7 +60,6 @@ class wcu_consumed_capacity_counter : public consumed_capacity_counter {
virtual uint64_t get_half_units() const noexcept;
public:
wcu_consumed_capacity_counter(const rjson::value& request);
static uint64_t get_units(uint64_t total_bytes) noexcept;
};
}

View File

@@ -136,8 +136,6 @@ future<> controller::start_server() {
[this, addr, alternator_port, alternator_https_port, creds = std::move(creds)] (server& server) mutable {
return server.init(addr, alternator_port, alternator_https_port, creds,
_config.alternator_enforce_authorization,
_config.alternator_warn_authorization,
_config.alternator_max_users_query_size_in_trace_output,
&_memory_limiter.local().get_semaphore(),
_config.max_concurrent_requests_per_shard);
}).handle_exception([this, addr, alternator_port, alternator_https_port] (std::exception_ptr ep) {
@@ -169,8 +167,4 @@ future<> controller::request_stop_server() {
});
}
future<utils::chunked_vector<client_data>> controller::get_client_data() {
return _server.local().get_client_data();
}
}

View File

@@ -11,7 +11,7 @@
#include <seastar/core/sharded.hh>
#include <seastar/core/smp.hh>
#include "transport/protocol_server.hh"
#include "protocol_server.hh"
namespace service {
class storage_proxy;
@@ -90,10 +90,6 @@ public:
virtual future<> start_server() override;
virtual future<> stop_server() override;
virtual future<> request_stop_server() override;
// This virtual function is called (on each shard separately) when the
// virtual table "system.clients" is read. It is expected to generate a
// list of clients connected to this server (on this shard).
virtual future<utils::chunked_vector<client_data>> get_client_data() override;
};
}

View File

@@ -94,9 +94,6 @@ public:
static api_error internal(std::string msg) {
return api_error("InternalServerError", std::move(msg), http::reply::status_type::internal_server_error);
}
static api_error payload_too_large(std::string msg) {
return api_error("PayloadTooLarge", std::move(msg), status_type::payload_too_large);
}
// Provide the "std::exception" interface, to make it easier to print this
// exception in log messages. Note that this function is *not* used to

File diff suppressed because it is too large Load Diff

View File

@@ -10,8 +10,8 @@
#include <seastar/core/future.hh>
#include "seastarx.hh"
#include <seastar/json/json_elements.hh>
#include <seastar/core/sharded.hh>
#include <seastar/util/noncopyable_function.hh>
#include "service/migration_manager.hh"
#include "service/client_state.hh"
@@ -58,6 +58,33 @@ namespace alternator {
class rmw_operation;
struct make_jsonable : public json::jsonable {
rjson::value _value;
public:
explicit make_jsonable(rjson::value&& value);
std::string to_json() const override;
};
/**
* Make return type for serializing the object "streamed",
* i.e. direct to HTTP output stream. Note: only useful for
* (very) large objects as there are overhead issues with this
* as well, but for massive lists of return objects this can
* help avoid large allocations/many re-allocs
*/
json::json_return_type make_streamed(rjson::value&&);
struct json_string : public json::jsonable {
std::string _value;
public:
explicit json_string(std::string&& value);
std::string to_json() const override;
};
namespace parsed {
class path;
};
schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request);
bool is_alternator_keyspace(const sstring& ks_name);
// Wraps the db::get_tags_of_table and throws if the table is missing the tags extension.
@@ -128,9 +155,6 @@ using attrs_to_get_node = attribute_path_map_node<std::monostate>;
// optional means we should get all attributes, not specific ones.
using attrs_to_get = attribute_path_map<std::monostate>;
namespace parsed {
class expression_cache;
}
class executor : public peering_sharded_service<executor> {
gms::gossiper& _gossiper;
@@ -139,32 +163,14 @@ class executor : public peering_sharded_service<executor> {
db::system_distributed_keyspace& _sdks;
cdc::metadata& _cdc_metadata;
utils::updateable_value<bool> _enforce_authorization;
utils::updateable_value<bool> _warn_authorization;
// An smp_service_group to be used for limiting the concurrency when
// forwarding Alternator request between shards - if necessary for LWT.
smp_service_group _ssg;
std::unique_ptr<parsed::expression_cache> _parsed_expression_cache;
public:
using client_state = service::client_state;
// request_return_type is the return type of the executor methods, which
// can be one of:
// 1. A string, which is the response body for the request.
// 2. A body_writer, an asynchronous function (returning future<>) that
// takes an output_stream and writes the response body into it.
// 3. An api_error, which is an error response that should be returned to
// the client.
// The body_writer is used for streaming responses, where the response body
// is written in chunks to the output_stream. This allows for efficient
// handling of large responses without needing to allocate a large buffer
// in memory.
using body_writer = noncopyable_function<future<>(output_stream<char>&&)>;
using request_return_type = std::variant<std::string, body_writer, api_error>;
using request_return_type = std::variant<json::json_return_type, api_error>;
stats _stats;
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
static constexpr auto ATTRS_COLUMN_NAME = ":attrs";
static constexpr auto KEYSPACE_NAME_PREFIX = "alternator_";
static constexpr std::string_view INTERNAL_TABLE_PREFIX = ".scylla.alternator.";
@@ -176,7 +182,6 @@ public:
cdc::metadata& cdc_metadata,
smp_service_group ssg,
utils::updateable_value<uint32_t> default_timeout_in_ms);
~executor();
future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);
@@ -204,23 +209,26 @@ public:
future<request_return_type> describe_continuous_backups(client_state& client_state, service_permit permit, rjson::value request);
future<> start();
future<> stop();
future<> stop() {
// disconnect from the value source, but keep the value unchanged.
s_default_timeout_in_ms = utils::updateable_value<uint32_t>{s_default_timeout_in_ms()};
return make_ready_future<>();
}
static sstring table_name(const schema&);
static db::timeout_clock::time_point default_timeout();
private:
static thread_local utils::updateable_value<uint32_t> s_default_timeout_in_ms;
public:
static schema_ptr find_table(service::storage_proxy&, std::string_view table_name);
static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);
private:
friend class rmw_operation;
static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr, const std::map<sstring, sstring> *tags = nullptr);
static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr);
public:
static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&, const std::map<sstring, sstring> *tags = nullptr);
static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&);
static std::optional<rjson::value> describe_single_item(schema_ptr,
const query::partition_slice&,
@@ -229,15 +237,12 @@ public:
const std::optional<attrs_to_get>&,
uint64_t* = nullptr);
// Converts a multi-row selection result to JSON compatible with DynamoDB.
// For each row, this method calls item_callback, which takes the size of
// the item as the parameter.
static future<std::vector<rjson::value>> describe_multi_item(schema_ptr schema,
const query::partition_slice&& slice,
shared_ptr<cql3::selection::selection> selection,
foreign_ptr<lw_shared_ptr<query::result>> query_result,
shared_ptr<const std::optional<attrs_to_get>> attrs_to_get,
noncopyable_function<void(uint64_t)> item_callback = {});
uint64_t& rcu_half_units);
static void describe_single_item(const cql3::selection::selection&,
const std::vector<managed_bytes_opt>&,
@@ -246,7 +251,7 @@ public:
uint64_t* item_length_in_bytes = nullptr,
bool = false);
static bool add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp);
static void add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp);
static void supplement_table_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp);
static void supplement_table_stream_info(rjson::value& descr, const schema& schema, const service::storage_proxy& sp);
};
@@ -265,15 +270,6 @@ bool is_big(const rjson::value& val, int big_size = 100'000);
// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,
// SELECT, DROP, etc.) on the given table. When permission is denied an
// appropriate user-readable api_error::access_denied is thrown.
future<> verify_permission(bool enforce_authorization, bool warn_authorization, const service::client_state&, const schema_ptr&, auth::permission, alternator::stats& stats);
/**
* Make return type for serializing the object "streamed",
* i.e. direct to HTTP output stream. Note: only useful for
* (very) large objects as there are overhead issues with this
* as well, but for massive lists of return objects this can
* help avoid large allocations/many re-allocs
*/
executor::body_writer make_streamed(rjson::value&&);
future<> verify_permission(bool enforce_authorization, const service::client_state&, const schema_ptr&, auth::permission);
}

View File

@@ -165,9 +165,7 @@ static std::optional<std::string> resolve_path_component(const std::string& colu
fmt::format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));
}
used_attribute_names.emplace(column_name);
auto result = std::string(rjson::to_string_view(*value));
validate_attr_name_length("", result.size(), false, "ExpressionAttributeNames contains invalid value: ");
return result;
return std::string(rjson::to_string_view(*value));
}
return std::nullopt;
}
@@ -739,26 +737,6 @@ rjson::value calculate_value(const parsed::set_rhs& rhs,
return rjson::null_value();
}
void validate_attr_name_length(std::string_view supplementary_context, size_t attr_name_length, bool is_key, std::string_view error_msg_prefix) {
constexpr const size_t DYNAMODB_KEY_ATTR_NAME_SIZE_MAX = 255;
constexpr const size_t DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX = 65535;
const size_t max_length = is_key ? DYNAMODB_KEY_ATTR_NAME_SIZE_MAX : DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX;
if (attr_name_length > max_length) {
std::string error_msg;
if (!error_msg_prefix.empty()) {
error_msg += error_msg_prefix;
}
if (!supplementary_context.empty()) {
error_msg += "in ";
error_msg += supplementary_context;
error_msg += " - ";
}
error_msg += fmt::format("Attribute name is too large, must be less than {} bytes", std::to_string(max_length + 1));
throw api_error::validation(error_msg);
}
}
} // namespace alternator
auto fmt::formatter<alternator::parsed::path>::format(const alternator::parsed::path& p, fmt::format_context& ctx) const

View File

@@ -91,18 +91,6 @@ options {
throw expressions_syntax_error(format("{} at char {}", err,
ex->get_charPositionInLine()));
}
// ANTLR3 tries to recover missing tokens - it tries to finish parsing
// and create valid objects, as if the missing token was there.
// But it has a bug and leaks these tokens.
// We override offending method and handle abandoned pointers.
std::vector<std::unique_ptr<TokenType>> _missing_tokens;
TokenType* getMissingSymbol(IntStreamType* istream, ExceptionBaseType* e,
ANTLR_UINT32 expectedTokenType, BitsetListType* follow) {
auto token = BaseType::getMissingSymbol(istream, e, expectedTokenType, follow);
_missing_tokens.emplace_back(token);
return token;
}
}
@lexer::context {
void displayRecognitionError(ANTLR_UINT8** token_names, ExceptionBaseType* ex) {
@@ -196,13 +184,7 @@ path_component: NAME | NAMEREF;
path returns [parsed::path p]:
root=path_component { $p.set_root($root.text); }
( '.' name=path_component { $p.add_dot($name.text); }
| '[' INTEGER ']' {
try {
$p.add_index(std::stoi($INTEGER.text));
} catch(std::out_of_range&) {
throw expressions_syntax_error("list index out of integer range");
}
}
| '[' INTEGER ']' { $p.add_index(std::stoi($INTEGER.text)); }
)*;
/* See comment above why the "depth" counter was needed here */
@@ -248,7 +230,7 @@ update_expression_clause returns [parsed::update_expression e]:
// Note the "EOF" token at the end of the update expression. We want to the
// parser to match the entire string given to it - not just its beginning!
update_expression returns [parsed::update_expression e]:
(update_expression_clause { e.append($update_expression_clause.e); })+ EOF;
(update_expression_clause { e.append($update_expression_clause.e); })* EOF;
projection_expression returns [std::vector<parsed::path> v]:
p=path { $v.push_back(std::move($p.p)); }
@@ -275,13 +257,6 @@ primitive_condition returns [parsed::primitive_condition c]:
(',' v=value[0] { $c.add_value(std::move($v.v)); })*
')'
)?
{
// Post-parse check to reject non-function single values
if ($c._op == parsed::primitive_condition::type::VALUE &&
!$c._values.front().is_func()) {
throw expressions_syntax_error("Single value must be a function");
}
}
;
// The following rules for parsing boolean expressions are verbose and

View File

@@ -18,8 +18,6 @@
#include "expressions_types.hh"
#include "utils/rjson.hh"
#include "utils/updateable_value.hh"
#include "stats.hh"
namespace alternator {
@@ -28,26 +26,6 @@ public:
using runtime_error::runtime_error;
};
namespace parsed {
class expression_cache_impl;
class expression_cache {
std::unique_ptr<expression_cache_impl> _impl;
public:
struct config {
utils::updateable_value<uint32_t> max_cache_entries;
};
expression_cache(config cfg, stats& stats);
~expression_cache();
// stop background tasks, if any
future<> stop();
update_expression parse_update_expression(std::string_view query);
std::vector<path> parse_projection_expression(std::string_view query);
condition_expression parse_condition_expression(std::string_view query, const char* caller);
};
} // namespace parsed
// Preferably use parsed::expression_cache instance instead of this free functions.
parsed::update_expression parse_update_expression(std::string_view query);
std::vector<parsed::path> parse_projection_expression(std::string_view query);
parsed::condition_expression parse_condition_expression(std::string_view query, const char* caller);
@@ -113,7 +91,5 @@ rjson::value calculate_value(const parsed::value& v,
rjson::value calculate_value(const parsed::set_rhs& rhs,
const rjson::value* previous_item);
void validate_attr_name_length(std::string_view supplementary_context, size_t attr_name_length, bool is_key, std::string_view error_msg_prefix = {});
} /* namespace alternator */

View File

@@ -209,7 +209,9 @@ public:
// function is supported).
// 2. Ternary operator - v1 BETWEEN v2 and v3 (means v1 >= v2 AND v1 <= v3).
// 3. N-ary operator - v1 IN ( v2, v3, ... )
// 4. A single function call (attribute_exists etc.).
// 4. A single function call (attribute_exists etc.). The parser actually
// accepts a more general "value" here but later stages reject a value
// which is not a function call (because DynamoDB does it too).
class primitive_condition {
public:
enum class type {

View File

@@ -13,7 +13,7 @@
#include "utils/rjson.hh"
#include "serialization.hh"
#include "schema/column_computation.hh"
#include "column_computation.hh"
#include "db/view/regular_column_transformation.hh"
namespace alternator {

View File

@@ -1,109 +0,0 @@
/*
* Copyright 2025-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include "expressions.hh"
#include "utils/log.hh"
#include "utils/lru_string_map.hh"
#include <variant>
static logging::logger logger_("parsed-expression-cache");
namespace alternator::parsed {
struct expression_cache_impl {
stats& _stats;
using cached_expressions_types = std::variant<
update_expression,
condition_expression,
std::vector<path>
>;
sized_lru_string_map<cached_expressions_types> _cached_entries;
utils::observable<uint32_t>::observer _max_cache_entries_observer;
expression_cache_impl(expression_cache::config cfg, stats& stats);
// to define the specialized return type of `get_or_create()`
template <typename Func, typename... Args>
using ParseResult = std::invoke_result_t<Func, std::string_view, Args...>;
// Caching layer for parsed expressions
// The expression type is determined by the type of the parsing function passed as a parameter,
// and the return type is exactly the same as the return type of this parsing function.
// StatsType is used only to update appropriate statistics - currently it is aligned with the expression type,
// but it could be extended in the future if needed, e.g. split per operation.
template <stats::expression_types StatsType, typename Func, typename... Args>
ParseResult<Func, Args...> get_or_create(std::string_view query, Func&& parse_func, Args&&... other_args) {
if (_cached_entries.disabled()) {
return parse_func(query, std::forward<Args>(other_args)...);
}
if (!_cached_entries.sanity_check()) {
_stats.expression_cache.requests[StatsType].misses++;
return parse_func(query, std::forward<Args>(other_args)...);
}
auto value = _cached_entries.find(query);
if (value) {
logger_.trace("Cache hit for query: {}", query);
_stats.expression_cache.requests[StatsType].hits++;
try {
return std::get<ParseResult<Func, Args...>>(value->get());
} catch (const std::bad_variant_access&) {
// User can reach this code, by sending the same query string as a different expression type.
// In practice valid queries are different enough to not collide.
// Entries in cache are only valid queries.
// This request will fail at parsing below.
// If, by any chance this is a valid query, it will be updated below with the new value.
logger_.trace("Cache hit for '{}', but type mismatch.", query);
_stats.expression_cache.requests[StatsType].hits--;
}
} else {
logger_.trace("Cache miss for query: {}", query);
}
ParseResult<Func, Args...> expr = parse_func(query, std::forward<Args>(other_args)...);
// Invalid query will throw here ^
_stats.expression_cache.requests[StatsType].misses++;
if (value) [[unlikely]] {
value->get() = cached_expressions_types{expr};
} else {
_cached_entries.insert(query, cached_expressions_types{expr});
}
return expr;
}
};
expression_cache_impl::expression_cache_impl(expression_cache::config cfg, stats& stats) :
_stats(stats), _cached_entries(logger_, _stats.expression_cache.evictions),
_max_cache_entries_observer(cfg.max_cache_entries.observe([this] (uint32_t max_value) {
_cached_entries.set_max_size(max_value);
})) {
_cached_entries.set_max_size(cfg.max_cache_entries());
}
expression_cache::expression_cache(expression_cache::config cfg, stats& stats) :
_impl(std::make_unique<expression_cache_impl>(std::move(cfg), stats)) {
}
expression_cache::~expression_cache() = default;
future<> expression_cache::stop() {
return _impl->_cached_entries.stop();
}
update_expression expression_cache::parse_update_expression(std::string_view query) {
return _impl->get_or_create<stats::expression_types::UPDATE_EXPRESSION>(query, alternator::parse_update_expression);
}
std::vector<path> expression_cache::parse_projection_expression(std::string_view query) {
return _impl->get_or_create<stats::expression_types::PROJECTION_EXPRESSION>(query, alternator::parse_projection_expression);
}
condition_expression expression_cache::parse_condition_expression(std::string_view query, const char* caller) {
return _impl->get_or_create<stats::expression_types::CONDITION_EXPRESSION>(query, alternator::parse_condition_expression, caller);
}
} // namespace alternator::parsed

View File

@@ -8,16 +8,13 @@
#pragma once
#include "cdc/cdc_options.hh"
#include "cdc/log.hh"
#include "seastarx.hh"
#include "service/paxos/cas_request.hh"
#include "service/cas_shard.hh"
#include "utils/rjson.hh"
#include "consumed_capacity.hh"
#include "executor.hh"
#include "tracing/trace_state.hh"
#include "keys/keys.hh"
#include "keys.hh"
namespace alternator {
@@ -58,7 +55,7 @@ public:
static write_isolation get_write_isolation_for_schema(schema_ptr schema);
static write_isolation default_write_isolation;
public:
static void set_default_write_isolation(std::string_view mode);
protected:
@@ -109,27 +106,21 @@ public:
// violating this). We mark apply() "const" to let the compiler validate
// this for us. The output-only field _return_attributes is marked
// "mutable" above so that apply() can still write to it.
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts, cdc::per_request_options& cdc_opts) const = 0;
virtual std::optional<mutation> apply(std::unique_ptr<rjson::value> previous_item, api::timestamp_type ts) const = 0;
// Convert the above apply() into the signature needed by cas_request:
virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts, cdc::per_request_options& cdc_opts) override;
virtual std::optional<mutation> apply(foreign_ptr<lw_shared_ptr<query::result>> qr, const query::partition_slice& slice, api::timestamp_type ts) override;
virtual ~rmw_operation() = default;
const wcu_consumed_capacity_counter& consumed_capacity() const noexcept { return _consumed_capacity; }
schema_ptr schema() const { return _schema; }
const rjson::value& request() const { return _request; }
rjson::value&& move_request() && { return std::move(_request); }
future<executor::request_return_type> execute(service::storage_proxy& proxy,
std::optional<service::cas_shard> cas_shard,
service::client_state& client_state,
tracing::trace_state_ptr trace_state,
service_permit permit,
bool needs_read_before_write,
stats& global_stats,
stats& per_table_stats,
stats& stats,
uint64_t& wcu_total);
std::optional<service::cas_shard> shard_for_execute(bool needs_read_before_write);
private:
inline bool should_fill_preimage() const { return _schema->cdc_options().enabled(); }
std::optional<shard_id> shard_for_execute(bool needs_read_before_write);
};
} // namespace alternator

View File

@@ -11,8 +11,8 @@
#include "utils/log.hh"
#include "serialization.hh"
#include "error.hh"
#include "types/concrete_types.hh"
#include "types/json_utils.hh"
#include "concrete_types.hh"
#include "cql3/type_json.hh"
#include "mutation/position_in_partition.hh"
static logging::logger slogger("alternator-serialization");
@@ -282,23 +282,15 @@ std::string type_to_string(data_type type) {
return it->second;
}
std::optional<bytes> try_get_key_column_value(const rjson::value& item, const column_definition& column) {
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
std::string column_name = column.name_as_text();
const rjson::value* key_typed_value = rjson::find(item, column_name);
if (!key_typed_value) {
return std::nullopt;
throw api_error::validation(fmt::format("Key column {} not found", column_name));
}
return get_key_from_typed_value(*key_typed_value, column);
}
bytes get_key_column_value(const rjson::value& item, const column_definition& column) {
auto value = try_get_key_column_value(item, column);
if (!value) {
throw api_error::validation(fmt::format("Key column {} not found", column.name_as_text()));
}
return std::move(*value);
}
// Parses the JSON encoding for a key value, which is a map with a single
// entry whose key is the type and the value is the encoded value.
// If this type does not match the desired "type_str", an api_error::validation
@@ -388,38 +380,20 @@ clustering_key ck_from_json(const rjson::value& item, schema_ptr schema) {
return clustering_key::make_empty();
}
std::vector<bytes> raw_ck;
// Note: it's possible to get more than one clustering column here, as
// Alternator can be used to read scylla internal tables.
// FIXME: this is a loop, but we really allow only one clustering key column.
for (const column_definition& cdef : schema->clustering_key_columns()) {
auto raw_value = get_key_column_value(item, cdef);
bytes raw_value = get_key_column_value(item, cdef);
raw_ck.push_back(std::move(raw_value));
}
return clustering_key::from_exploded(raw_ck);
}
clustering_key_prefix ck_prefix_from_json(const rjson::value& item, schema_ptr schema) {
if (schema->clustering_key_size() == 0) {
return clustering_key_prefix::make_empty();
}
std::vector<bytes> raw_ck;
for (const column_definition& cdef : schema->clustering_key_columns()) {
auto raw_value = try_get_key_column_value(item, cdef);
if (!raw_value) {
break;
}
raw_ck.push_back(std::move(*raw_value));
}
return clustering_key_prefix::from_exploded(raw_ck);
}
position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema) {
const bool is_alternator_ks = is_alternator_keyspace(schema->ks_name());
if (is_alternator_ks) {
return position_in_partition::for_key(ck_from_json(item, schema));
auto ck = ck_from_json(item, schema);
if (is_alternator_keyspace(schema->ks_name())) {
return position_in_partition::for_key(std::move(ck));
}
const auto region_item = rjson::find(item, scylla_paging_region);
const auto weight_item = rjson::find(item, scylla_paging_weight);
if (bool(region_item) != bool(weight_item)) {
@@ -439,9 +413,8 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)
} else {
throw std::runtime_error(fmt::format("Invalid value for weight: {}", weight_view));
}
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(ck_prefix_from_json(item, schema)) : std::nullopt);
return position_in_partition(region, weight, region == partition_region::clustered ? std::optional(std::move(ck)) : std::nullopt);
}
auto ck = ck_from_json(item, schema);
if (ck.is_empty()) {
return position_in_partition::for_partition_start();
}

View File

@@ -13,7 +13,7 @@
#include <optional>
#include "types/types.hh"
#include "schema/schema_fwd.hh"
#include "keys/keys.hh"
#include "keys.hh"
#include "utils/rjson.hh"
#include "utils/big_decimal.hh"

View File

@@ -13,7 +13,7 @@
#include <seastar/http/function_handlers.hh>
#include <seastar/http/short_streams.hh>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <seastar/json/json_elements.hh>
#include <seastar/util/defer.hh>
#include <seastar/util/short_streams.hh>
#include "seastarx.hh"
@@ -31,9 +31,6 @@
#include "gms/gossiper.hh"
#include "utils/overloaded_functor.hh"
#include "utils/aws_sigv4.hh"
#include "client_data.hh"
#include "utils/updateable_value.hh"
#include <zlib.h>
static logging::logger slogger("alternator-server");
@@ -103,13 +100,6 @@ static void handle_CORS(const request& req, reply& rep, bool preflight) {
// the user directly. Other exceptions are unexpected, and reported as
// Internal Server Error.
class api_handler : public handler_base {
// Although the the DynamoDB API responses are JSON, additional
// conventions apply to these responses. For this reason, DynamoDB uses
// the content type "application/x-amz-json-1.0" instead of the standard
// "application/json". Some other AWS services use later versions instead
// of "1.0", but DynamoDB currently uses "1.0". Note that this content
// type applies to all replies, both success and error.
static constexpr const char* REPLY_CONTENT_TYPE = "application/x-amz-json-1.0";
public:
api_handler(const std::function<future<executor::request_return_type>(std::unique_ptr<request> req)>& _handle) : _f_handle(
[this, _handle](std::unique_ptr<request> req, std::unique_ptr<reply> rep) {
@@ -134,19 +124,22 @@ public:
}
auto res = resf.get();
std::visit(overloaded_functor {
[&] (std::string&& str) {
// Note that despite the move, there is a copy here -
// as str is std::string and rep->_content is sstring.
rep->_content = std::move(str);
rep->set_content_type(REPLY_CONTENT_TYPE);
},
[&] (executor::body_writer&& body_writer) {
rep->write_body(REPLY_CONTENT_TYPE, std::move(body_writer));
},
[&] (const api_error& err) {
generate_error_reply(*rep, err);
}
}, std::move(res));
[&] (const json::json_return_type& json_return_value) {
slogger.trace("api_handler success case");
if (json_return_value._body_writer) {
// Unfortunately, write_body() forces us to choose
// from a fixed and irrelevant list of "mime-types"
// at this point. But we'll override it with the
// one (application/x-amz-json-1.0) below.
rep->write_body("json", std::move(json_return_value._body_writer));
} else {
rep->_content += json_return_value._res;
}
},
[&] (const api_error& err) {
generate_error_reply(*rep, err);
}
}, res);
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
@@ -158,6 +151,7 @@ public:
handle_CORS(*req, *rep, false);
return _f_handle(std::move(req), std::move(rep)).then(
[](std::unique_ptr<reply> rep) {
rep->set_mime_type("application/x-amz-json-1.0");
rep->done();
return make_ready_future<std::unique_ptr<reply>>(std::move(rep));
});
@@ -173,7 +167,6 @@ protected:
rjson::add(results, "message", err._msg);
rep._content = rjson::print(std::move(results));
rep._status = err._http_code;
rep.set_content_type(REPLY_CONTENT_TYPE);
slogger.trace("api_handler error case: {}", rep._content);
}
@@ -235,8 +228,9 @@ protected:
// If the rack does not exist, we return an empty list - not an error.
sstring query_rack = req->get_query_param("rack");
for (auto& id : local_dc_nodes) {
auto ip = _gossiper.get_address_map().get(id);
if (!query_rack.empty()) {
auto rack = _gossiper.get_application_state_value(id, gms::application_state::RACK);
auto rack = _gossiper.get_application_state_value(ip, gms::application_state::RACK);
if (rack != query_rack) {
continue;
}
@@ -244,10 +238,10 @@ protected:
// Note that it's not enough for the node to be is_alive() - a
// node joining the cluster is also "alive" but not responsive to
// requests. We alive *and* normal. See #19694, #21538.
if (_gossiper.is_alive(id) && _gossiper.is_normal(id)) {
if (_gossiper.is_alive(ip) && _gossiper.is_normal(ip)) {
// Use the gossiped broadcast_rpc_address if available instead
// of the internal IP address "ip". See discussion in #18711.
rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(id)));
rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));
}
}
rep->set_status(reply::status_type::ok);
@@ -273,57 +267,24 @@ protected:
}
};
// This function increments the authentication_failures counter, and may also
// log a warn-level message and/or throw an exception, depending on what
// enforce_authorization and warn_authorization are set to.
// The username and client address are only used for logging purposes -
// they are not included in the error message returned to the client, since
// the client knows who it is.
// Note that if enforce_authorization is false, this function will return
// without throwing. So a caller that doesn't want to continue after an
// authentication_error must explicitly return after calling this function.
template<typename Exception>
static void authentication_error(alternator::stats& stats, bool enforce_authorization, bool warn_authorization, Exception&& e, std::string_view user, gms::inet_address client_address) {
stats.authentication_failures++;
if (enforce_authorization) {
if (warn_authorization) {
slogger.warn("alternator_warn_authorization=true: {} for user {}, client address {}", e.what(), user, client_address);
}
throw std::move(e);
} else {
if (warn_authorization) {
slogger.warn("If you set alternator_enforce_authorization=true the following will be enforced: {} for user {}, client address {}", e.what(), user, client_address);
}
}
}
future<std::string> server::verify_signature(const request& req, const chunked_content& content) {
if (!_enforce_authorization.get() && !_warn_authorization.get()) {
if (!_enforce_authorization) {
slogger.debug("Skipping authorization");
return make_ready_future<std::string>();
}
auto host_it = req._headers.find("Host");
if (host_it == req._headers.end()) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::invalid_signature("Host header is mandatory for signature verification"),
"", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::invalid_signature("Host header is mandatory for signature verification");
}
auto authorization_it = req._headers.find("Authorization");
if (authorization_it == req._headers.end()) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::missing_authentication_token("Authorization header is mandatory for signature verification"),
"", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::missing_authentication_token("Authorization header is mandatory for signature verification");
}
std::string host = host_it->second;
std::string_view authorization_header = authorization_it->second;
auto pos = authorization_header.find_first_of(' ');
if (pos == std::string_view::npos || authorization_header.substr(0, pos) != "AWS4-HMAC-SHA256") {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header)),
"", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));
}
authorization_header.remove_prefix(pos+1);
std::string credential;
@@ -358,9 +319,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
std::vector<std::string_view> credential_split = split(credential, '/');
if (credential_split.size() != 5) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::validation(fmt::format("Incorrect credential information format: {}", credential)), "", req.get_client_address());
return make_ready_future<std::string>();
throw api_error::validation(fmt::format("Incorrect credential information format: {}", credential));
}
std::string user(credential_split[0]);
std::string datestamp(credential_split[1]);
@@ -384,7 +343,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
auto cache_getter = [&proxy = _proxy, &as = _auth_service] (std::string username) {
return get_key_from_roles(proxy, as, std::move(username));
};
return _key_cache.get_ptr(user, cache_getter).then_wrapped([this, &req, &content,
return _key_cache.get_ptr(user, cache_getter).then([this, &req, &content,
user = std::move(user),
host = std::move(host),
datestamp = std::move(datestamp),
@@ -392,32 +351,18 @@ future<std::string> server::verify_signature(const request& req, const chunked_c
signed_headers_map = std::move(signed_headers_map),
region = std::move(region),
service = std::move(service),
user_signature = std::move(user_signature)] (future<key_cache::value_ptr> key_ptr_fut) {
key_cache::value_ptr key_ptr(nullptr);
try {
key_ptr = key_ptr_fut.get();
} catch (const api_error& e) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
e, user, req.get_client_address());
return std::string();
}
user_signature = std::move(user_signature)] (key_cache::value_ptr key_ptr) {
std::string signature;
try {
signature = utils::aws::get_signature(user, *key_ptr, std::string_view(host), "/", req._method,
datestamp, signed_headers_str, signed_headers_map, &content, region, service, "");
} catch (const std::exception& e) {
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::invalid_signature(fmt::format("invalid signature: {}", e.what())),
user, req.get_client_address());
return std::string();
throw api_error::invalid_signature(e.what());
}
if (signature != std::string_view(user_signature)) {
_key_cache.remove(user);
authentication_error(_executor._stats, _enforce_authorization.get(), _warn_authorization.get(),
api_error::unrecognized_client("wrong signature"),
user, req.get_client_address());
return std::string();
throw api_error::unrecognized_client("The security token included in the request is invalid.");
}
return user;
});
@@ -430,82 +375,35 @@ static tracing::trace_state_ptr create_tracing_session(tracing::tracing& tracing
return tracing_instance.create_session(tracing::trace_type::QUERY, props);
}
// A helper class to represent a potentially truncated view of a chunked_content.
// If the content is short enough and single chunked, it just holds a view into the content.
// Otherwise it will be copied into an internal buffer, possibly truncated (depending on maximum allowed size passed in),
// and the view will point into that buffer.
// `as_view()` method will return the view.
// `take_as_sstring()` will either move out the internal buffer (if any), or create a new sstring from the view.
// You should consider `as_view()` valid as long both the original chunked_content and the truncated_content object are alive.
class truncated_content {
std::string_view _view;
sstring _content_maybe;
void copy_from_content(const chunked_content& content) {
size_t offset = 0;
for(auto &tmp : content) {
size_t to_copy = std::min(tmp.size(), _content_maybe.size() - offset);
std::copy(tmp.get(), tmp.get() + to_copy, _content_maybe.data() + offset);
offset += to_copy;
if (offset >= _content_maybe.size()) {
break;
}
}
// truncated_content_view() prints a potentially long chunked_content for
// debugging purposes. In the common case when the content is not excessively
// long, it just returns a view into the given content, without any copying.
// But when the content is very long, it is truncated after some arbitrary
// max_len (or one chunk, whichever comes first), with "<truncated>" added at
// the end. To do this modification to the string, we need to create a new
// std::string, so the caller must pass us a reference to one, "buf", where
// we can store the content. The returned view is only alive for as long this
// buf is kept alive.
static std::string_view truncated_content_view(const chunked_content& content, std::string& buf) {
constexpr size_t max_len = 1024;
if (content.empty()) {
return std::string_view();
} else if (content.size() == 1 && content.begin()->size() <= max_len) {
return std::string_view(content.begin()->get(), content.begin()->size());
} else {
buf = std::string(content.begin()->get(), std::min(content.begin()->size(), max_len)) + "<truncated>";
return std::string_view(buf);
}
public:
truncated_content(const chunked_content& content, size_t max_len = std::numeric_limits<size_t>::max()) {
if (content.empty()) return;
if (content.size() == 1 && content.begin()->size() <= max_len) {
_view = std::string_view(content.begin()->get(), content.begin()->size());
return;
}
constexpr std::string_view truncated_text = "<truncated>";
size_t content_size = 0;
for(auto &tmp : content) {
content_size += tmp.size();
}
if (content_size <= max_len) {
_content_maybe = sstring{ sstring::initialized_later{}, content_size };
copy_from_content(content);
}
else {
_content_maybe = sstring{ sstring::initialized_later{}, max_len + truncated_text.size() };
copy_from_content(content);
std::copy(truncated_text.begin(), truncated_text.end(), _content_maybe.data() + _content_maybe.size() - truncated_text.size());
}
_view = std::string_view(_content_maybe);
}
std::string_view as_view() const { return _view; }
sstring take_as_sstring() && {
if (_content_maybe.empty() && !_view.empty()) {
return sstring{_view};
}
return std::move(_content_maybe);
}
};
// `truncated_content_view` will produce an object representing a view to a passed content
// possibly truncated at some length. The value returned is used in two ways:
// - to print it in logs (use `as_view()` method for this)
// - to pass it to tracing object, where it will be stored and used later
// (use `take_as_sstring()` method as this produces a copy in form of a sstring)
// `truncated_content` delays constructing `sstring` object until it's actually needed.
// `truncated_content` is valid as long as passed `content` is alive.
// if the content is truncated, `<truncated>` will be appended at the maximum size limit
// and total size will be `max_users_query_size_in_trace_output() + strlen("<truncated>")`.
static truncated_content truncated_content_view(const chunked_content& content, size_t max_size) {
return truncated_content{content, max_size};
}
static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_state, std::string_view username, std::string_view op, const chunked_content& query, size_t max_users_query_size_in_trace_output) {
static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_state, std::string_view username, std::string_view op, const chunked_content& query) {
tracing::trace_state_ptr trace_state;
tracing::tracing& tracing_instance = tracing::tracing::get_local_tracing_instance();
if (tracing_instance.trace_next_query() || tracing_instance.slow_query_tracing_enabled()) {
trace_state = create_tracing_session(tracing_instance);
std::string buf;
tracing::add_session_param(trace_state, "alternator_op", op);
tracing::add_query(trace_state, truncated_content_view(query, max_users_query_size_in_trace_output).take_as_sstring());
tracing::add_query(trace_state, truncated_content_view(query, buf));
tracing::begin(trace_state, seastar::format("Alternator {}", op), client_state.get_client_address());
if (!username.empty()) {
tracing::set_username(trace_state, auth::authenticated_user(username));
@@ -514,207 +412,30 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_
return trace_state;
}
// This read_entire_stream() is similar to Seastar's read_entire_stream()
// which reads the given content_stream until its end into non-contiguous
// memory. The difference is that this implementation takes an extra length
// limit, and throws an error if we read more than this limit.
// This length-limited variant would not have been needed if Seastar's HTTP
// server's set_content_length_limit() worked in every case, but unfortunately
// it does not - it only works if the request has a Content-Length header (see
// issue #8196). In contrast this function can limit the request's length no
// matter how it's encoded. We need this limit to protect Alternator from
// oversized requests that can deplete memory.
static future<chunked_content>
read_entire_stream(input_stream<char>& inp, size_t length_limit) {
chunked_content ret;
// We try to read length_limit + 1 bytes, so that we can throw an
// exception if we managed to read more than length_limit.
ssize_t remain = length_limit + 1;
do {
temporary_buffer<char> buf = co_await inp.read_up_to(remain);
if (buf.empty()) {
break;
}
remain -= buf.size();
ret.push_back(std::move(buf));
} while (remain > 0);
// If we read the full length_limit + 1 bytes, we went over the limit:
if (remain <= 0) {
// By throwing here an error, we may send a reply (the error message)
// without having read the full request body. Seastar's httpd will
// realize that we have not read the entire content stream, and
// correctly mark the connection unreusable, i.e., close it.
// This means we are currently exposed to issue #12166 caused by
// Seastar issue 1325), where the client may get an RST instead of
// a FIN, and may rarely get a "Connection reset by peer" before
// reading the error we send.
throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
}
co_return ret;
}
// safe_gzip_stream is an exception-safe wrapper for zlib's z_stream.
// The "z_stream" struct is used by zlib to hold state while decompressing a
// stream of data. It allocates memory which must be freed with inflateEnd(),
// which the destructor of this class does.
class safe_gzip_zstream {
z_stream _zs;
public:
safe_gzip_zstream() {
memset(&_zs, 0, sizeof(_zs));
// The strange 16 + WMAX_BITS tells zlib to expect and decode
// a gzip header, not a zlib header.
if (inflateInit2(&_zs, 16 + MAX_WBITS) != Z_OK) {
// Should only happen if memory allocation fails
throw std::bad_alloc();
}
}
~safe_gzip_zstream() {
inflateEnd(&_zs);
}
z_stream* operator->() {
return &_zs;
}
z_stream* get() {
return &_zs;
}
void reset() {
inflateReset(&_zs);
}
};
// ungzip() takes a chunked_content with a gzip-compressed request body,
// uncompresses it, and returns the uncompressed content as a chunked_content.
// If the uncompressed content exceeds length_limit, an error is thrown.
static future<chunked_content>
ungzip(chunked_content&& compressed_body, size_t length_limit) {
chunked_content ret;
// output_buf can be any size - when uncompressing input_buf, it doesn't
// need to fit in a single output_buf, we'll use multiple output_buf for
// a single input_buf if needed.
constexpr size_t OUTPUT_BUF_SIZE = 4096;
temporary_buffer<char> output_buf;
safe_gzip_zstream strm;
bool complete_stream = false; // empty input is not a valid gzip
size_t total_out_bytes = 0;
for (const temporary_buffer<char>& input_buf : compressed_body) {
if (input_buf.empty()) {
continue;
}
complete_stream = false;
strm->next_in = (Bytef*) input_buf.get();
strm->avail_in = (uInt) input_buf.size();
do {
co_await coroutine::maybe_yield();
if (output_buf.empty()) {
output_buf = temporary_buffer<char>(OUTPUT_BUF_SIZE);
}
strm->next_out = (Bytef*) output_buf.get();
strm->avail_out = OUTPUT_BUF_SIZE;
int e = inflate(strm.get(), Z_NO_FLUSH);
size_t out_bytes = OUTPUT_BUF_SIZE - strm->avail_out;
if (out_bytes > 0) {
// If output_buf is nearly full, we save it as-is in ret. But
// if it only has little data, better copy to a small buffer.
if (out_bytes > OUTPUT_BUF_SIZE/2) {
ret.push_back(std::move(output_buf).prefix(out_bytes));
// output_buf is now empty. if this loop finds more input,
// we'll allocate a new output buffer.
} else {
ret.push_back(temporary_buffer<char>(output_buf.get(), out_bytes));
}
total_out_bytes += out_bytes;
if (total_out_bytes > length_limit) {
throw api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", length_limit));
}
}
if (e == Z_STREAM_END) {
// There may be more input after the first gzip stream - in
// either this input_buf or the next one. The additional input
// should be a second concatenated gzip. We need to allow that
// by resetting the gzip stream and continuing the input loop
// until there's no more input.
strm.reset();
if (strm->avail_in == 0) {
complete_stream = true;
break;
}
} else if (e != Z_OK && e != Z_BUF_ERROR) {
// DynamoDB returns an InternalServerError when given a bad
// gzip request body. See test test_broken_gzip_content
throw api_error::internal("Error during gzip decompression of request body");
}
} while (strm->avail_in > 0 || strm->avail_out == 0);
}
if (!complete_stream) {
// The gzip stream was not properly finished with Z_STREAM_END
throw api_error::internal("Truncated gzip in request body");
}
co_return ret;
}
future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {
_executor._stats.total_operations++;
sstring target = req->get_header("X-Amz-Target");
// target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)
auto dot = target.find('.');
std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);
if (req->content_length > request_content_length_limit) {
// If we have a Content-Length header and know the request will be too
// long, we don't need to wait for read_entire_stream() below to
// discover it. And we definitely mustn't try to get_units() below for
// for such a size.
co_return api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", request_content_length_limit));
}
// JSON parsing can allocate up to roughly 2x the size of the raw
// document, + a couple of bytes for maintenance.
// If the Content-Length of the request is not available, we assume
// the largest possible request (request_content_length_limit, i.e., 16 MB)
// and after reading the request we return_units() the excess.
size_t mem_estimate = (req->content_length ? req->content_length : request_content_length_limit) * 2 + 8000;
// TODO: consider the case where req->content_length is missing. Maybe
// we need to take the content_length_limit and return some of the units
// when we finish read_content_and_verify_signature?
size_t mem_estimate = req->content_length * 2 + 8000;
auto units_fut = get_units(*_memory_limiter, mem_estimate);
if (_memory_limiter->waiters()) {
++_executor._stats.requests_blocked_memory;
}
auto units = co_await std::move(units_fut);
SCYLLA_ASSERT(req->content_stream);
chunked_content content = co_await read_entire_stream(*req->content_stream, request_content_length_limit);
// If the request had no Content-Length, we reserved too many units
// so need to return some
if (req->content_length == 0) {
size_t content_length = 0;
for (const auto& chunk : content) {
content_length += chunk.size();
}
size_t new_mem_estimate = content_length * 2 + 8000;
units.return_units(mem_estimate - new_mem_estimate);
}
chunked_content content = co_await util::read_entire_stream(*req->content_stream);
auto username = co_await verify_signature(*req, content);
// If the request is compressed, uncompress it now, after we checked
// the signature (the signature is computed on the compressed content).
// We apply the request_content_length_limit again to the uncompressed
// content - we don't want to allow a tiny compressed request to
// expand to a huge uncompressed request.
sstring content_encoding = req->get_header("Content-Encoding");
if (content_encoding == "gzip") {
content = co_await ungzip(std::move(content), request_content_length_limit);
} else if (!content_encoding.empty()) {
// DynamoDB returns a 500 error for unsupported Content-Encoding.
// I'm not sure if this is the best error code, but let's do it too.
// See the test test_garbage_content_encoding confirming this case.
co_return api_error::internal("Unsupported Content-Encoding");
}
// As long as the system_clients_entry object is alive, this request will
// be visible in the "system.clients" virtual table. When requested, this
// entry will be formatted by server::ongoing_request::make_client_data().
auto system_clients_entry = _ongoing_requests.emplace(
req->get_client_address(), req->get_header("User-Agent"),
username, current_scheduling_group(),
req->get_protocol_name() == "https");
if (slogger.is_enabled(log_level::trace)) {
slogger.trace("Request: {} {} {}", op, truncated_content_view(content, _max_users_query_size_in_trace_output).as_view(), req->_headers);
std::string buf;
slogger.trace("Request: {} {} {}", op, truncated_content_view(content, buf), req->_headers);
}
auto callback_it = _callbacks.find(op);
if (callback_it == _callbacks.end()) {
@@ -734,7 +455,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
}
co_await client_state.maybe_update_per_service_level_params();
tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content, _max_users_query_size_in_trace_output.get());
tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);
tracing::trace(trace_state, "{}", op);
auto user = client_state.user();
@@ -742,9 +463,6 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr
client_state = std::move(client_state), trace_state = std::move(trace_state),
units = std::move(units), req = std::move(req)] () mutable -> future<executor::request_return_type> {
rjson::value json_request = co_await _json_parser.parse(std::move(content));
if (!json_request.IsObject()) {
co_return api_error::validation("Request content must be an object");
}
co_return co_await callback(_executor, client_state, trace_state,
make_service_permit(std::move(units)), std::move(json_request), std::move(req));
};
@@ -785,9 +503,9 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
, _auth_service(auth_service)
, _sl_controller(sl_controller)
, _key_cache(1024, 1min, slogger)
, _max_users_query_size_in_trace_output(1024)
, _enforce_authorization(false)
, _enabled_servers{}
, _pending_requests("alternator::server::pending_requests")
, _pending_requests{}
, _timeout_config(_proxy.data_dictionary().get_config())
, _callbacks{
{"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {
@@ -866,13 +584,10 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos
}
future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {
utils::updateable_value<bool> enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {
_memory_limiter = memory_limiter;
_enforce_authorization = std::move(enforce_authorization);
_warn_authorization = std::move(warn_authorization);
_max_concurrent_requests = std::move(max_concurrent_requests);
_max_users_query_size_in_trace_output = std::move(max_users_query_size_in_trace_output);
if (!port && !https_port) {
return make_exception_future<>(std::runtime_error("Either regular port or TLS port"
" must be specified in order to init an alternator HTTP server instance"));
@@ -882,31 +597,23 @@ future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std:
if (port) {
set_routes(_http_server._routes);
_http_server.set_content_length_limit(server::content_length_limit);
_http_server.set_content_streaming(true);
_http_server.listen(socket_address{addr, *port}).get();
_enabled_servers.push_back(std::ref(_http_server));
}
if (https_port) {
set_routes(_https_server._routes);
_https_server.set_content_length_limit(server::content_length_limit);
_https_server.set_content_streaming(true);
if (this_shard_id() == 0) {
_credentials = creds->build_reloadable_server_credentials([this](const tls::credentials_builder& b, const std::unordered_set<sstring>& files, std::exception_ptr ep) -> future<> {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
co_await container().invoke_on_others([&b](server& s) {
if (s._credentials) {
b.rebuild(*s._credentials);
}
});
slogger.info("Reloaded {}", files);
}
}).get();
} else {
_credentials = creds->build_server_credentials();
}
_https_server.listen(socket_address{addr, *https_port}, _credentials).get();
auto server_creds = creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
if (ep) {
slogger.warn("Exception loading {}: {}", files, ep);
} else {
slogger.info("Reloaded {}", files);
}
}).get();
_https_server.listen(socket_address{addr, *https_port}, std::move(server_creds)).get();
_enabled_servers.push_back(std::ref(_https_server));
}
});
@@ -962,37 +669,6 @@ future<> server::json_parser::stop() {
return std::move(_run_parse_json_thread);
}
// Convert an entry in the server's list of ongoing Alternator requests
// (_ongoing_requests) into a client_data object. This client_data object
// will then be used to produce a row for the "system.clients" virtual table.
client_data server::ongoing_request::make_client_data() const {
client_data cd;
cd.ct = client_type::alternator;
cd.ip = _client_address.addr();
cd.port = _client_address.port();
cd.shard_id = this_shard_id();
cd.connection_stage = client_connection_stage::established;
cd.username = _username;
cd.scheduling_group_name = _scheduling_group.name();
cd.ssl_enabled = _is_https;
// For now, we save the full User-Agent header as the "driver name"
// and keep "driver_version" unset.
cd.driver_name = _user_agent;
// Leave "protocol_version" unset, it has no meaning in Alternator.
// Leave "hostname", "ssl_protocol" and "ssl_cipher_suite" unset.
// As reported in issue #9216, we never set these fields in CQL
// either (see cql_server::connection::make_client_data()).
return cd;
}
future<utils::chunked_vector<client_data>> server::get_client_data() {
utils::chunked_vector<client_data> ret;
co_await _ongoing_requests.for_each_gently([&ret] (const ongoing_request& r) {
ret.emplace_back(r.make_client_data());
});
co_return ret;
}
const char* api_error::what() const noexcept {
if (_what_string.empty()) {
_what_string = fmt::format("{} {}: {}", std::to_underlying(_http_code), _type, _msg);

View File

@@ -9,7 +9,6 @@
#pragma once
#include "alternator/executor.hh"
#include "utils/scoped_item_list.hh"
#include <seastar/core/future.hh>
#include <seastar/core/condition-variable.hh>
#include <seastar/http/httpd.hh>
@@ -21,18 +20,12 @@
#include "utils/updateable_value.hh"
#include <seastar/core/units.hh>
struct client_data;
namespace alternator {
using chunked_content = rjson::chunked_content;
class server : public peering_sharded_service<server> {
// The maximum size of a request body that Alternator will accept,
// in bytes. This is a safety measure to prevent Alternator from
// running out of memory when a client sends a very large request.
// DynamoDB also has the same limit set to 16 MB.
static constexpr size_t request_content_length_limit = 16*MB;
class server {
static constexpr size_t content_length_limit = 16*MB;
using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,
tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;
using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;
@@ -47,10 +40,8 @@ class server : public peering_sharded_service<server> {
key_cache _key_cache;
utils::updateable_value<bool> _enforce_authorization;
utils::updateable_value<bool> _warn_authorization;
utils::updateable_value<uint64_t> _max_users_query_size_in_trace_output;
utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;
named_gate _pending_requests;
gate _pending_requests;
// In some places we will need a CQL updateable_timeout_config object even
// though it isn't really relevant for Alternator which defines its own
// timeouts separately. We can create this object only once.
@@ -61,8 +52,6 @@ class server : public peering_sharded_service<server> {
semaphore* _memory_limiter;
utils::updateable_value<uint32_t> _max_concurrent_requests;
::shared_ptr<seastar::tls::server_credentials> _credentials;
class json_parser {
static constexpr size_t yieldable_parsing_threshold = 16*KB;
chunked_content _raw_document;
@@ -83,31 +72,12 @@ class server : public peering_sharded_service<server> {
};
json_parser _json_parser;
// The server maintains a list of ongoing requests, that are being handled
// by handle_api_request(). It uses this list in get_client_data(), which
// is called when reading the "system.clients" virtual table.
struct ongoing_request {
socket_address _client_address;
sstring _user_agent;
sstring _username;
scheduling_group _scheduling_group;
bool _is_https;
client_data make_client_data() const;
};
utils::scoped_item_list<ongoing_request> _ongoing_requests;
public:
server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);
future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,
utils::updateable_value<bool> enforce_authorization, utils::updateable_value<bool> warn_authorization, utils::updateable_value<uint64_t> max_users_query_size_in_trace_output,
semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);
utils::updateable_value<bool> enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);
future<> stop();
// get_client_data() is called (on each shard separately) when the virtual
// table "system.clients" is read. It is expected to generate a list of
// clients connected to this server (on this shard). This function is
// called by alternator::controller::get_client_data().
future<utils::chunked_vector<client_data>> get_client_data();
private:
void set_routes(seastar::httpd::routes& r);
// If verification succeeds, returns the authenticated user's username

View File

@@ -9,63 +9,32 @@
#include "stats.hh"
#include "utils/histogram_metrics_helper.hh"
#include <seastar/core/metrics.hh>
#include "utils/labels.hh"
namespace alternator {
const char* ALTERNATOR_METRICS = "alternator";
static seastar::metrics::histogram estimated_histogram_to_metrics(const utils::estimated_histogram& histogram) {
seastar::metrics::histogram res;
res.buckets.resize(histogram.bucket_offsets.size());
uint64_t cumulative_count = 0;
res.sample_count = histogram._count;
res.sample_sum = histogram._sample_sum;
for (size_t i = 0; i < res.buckets.size(); i++) {
auto& v = res.buckets[i];
v.upper_bound = histogram.bucket_offsets[i];
cumulative_count += histogram.buckets[i];
v.count = cumulative_count;
}
return res;
}
static seastar::metrics::label column_family_label("cf");
static seastar::metrics::label keyspace_label("ks");
static void register_metrics_with_optional_table(seastar::metrics::metric_groups& metrics, const stats& stats, const sstring& ks, const sstring& table) {
stats::stats() : api_operations{} {
// Register the
seastar::metrics::label op("op");
bool has_table = table.length();
std::vector<seastar::metrics::label> aggregate_labels;
std::vector<seastar::metrics::label_instance> labels = {alternator_label};
sstring group_name = (has_table)? "alternator_table" : "alternator";
if (has_table) {
labels.push_back(column_family_label(table));
labels.push_back(keyspace_label(ks));
aggregate_labels.push_back(seastar::metrics::shard_label);
}
metrics.add_group(group_name, {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", stats.api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), labels)(basic_level)(op(CamelCaseName)).aggregate(aggregate_labels).set_skip_when_empty(),
#define OPERATION_LATENCY(name, CamelCaseName) \
metrics.add_group(group_name, { \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), labels, [&stats]{return to_metrics_histogram(stats.api_operations.name.histogram());})(op(CamelCaseName))(basic_level).aggregate({seastar::metrics::shard_label}).set_skip_when_empty()}); \
if (!has_table) {\
metrics.add_group("alternator", { \
seastar::metrics::make_summary("op_latency_summary", \
seastar::metrics::description("Latency summary of an operation via Alternator API"), [&stats]{return to_metrics_summary(stats.api_operations.name.summary());})(op(CamelCaseName))(basic_level)(alternator_label).set_skip_when_empty()}); \
}
_metrics.add_group("alternator", {
#define OPERATION(name, CamelCaseName) \
seastar::metrics::make_total_operations("operation", api_operations.name, \
seastar::metrics::description("number of operations via Alternator API"), {op(CamelCaseName)}).set_skip_when_empty(),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"), {op(CamelCaseName)}, [this]{return to_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::make_summary("op_latency_summary", \
seastar::metrics::description("Latency summary of an operation via Alternator API"), [this]{return to_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),
OPERATION(batch_get_item, "BatchGetItem")
OPERATION(batch_write_item, "BatchWriteItem")
OPERATION(create_backup, "CreateBackup")
OPERATION(create_global_table, "CreateGlobalTable")
OPERATION(create_table, "CreateTable")
OPERATION(delete_backup, "DeleteBackup")
OPERATION(delete_item, "DeleteItem")
OPERATION(delete_table, "DeleteTable")
OPERATION(describe_backup, "DescribeBackup")
OPERATION(describe_continuous_backups, "DescribeContinuousBackups")
OPERATION(describe_endpoints, "DescribeEndpoints")
@@ -94,117 +63,55 @@ static void register_metrics_with_optional_table(seastar::metrics::metric_groups
OPERATION(update_item, "UpdateItem")
OPERATION(update_table, "UpdateTable")
OPERATION(update_time_to_live, "UpdateTimeToLive")
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
OPERATION_LATENCY(batch_write_item_latency, "BatchWriteItem")
OPERATION_LATENCY(batch_get_item_latency, "BatchGetItem")
OPERATION(list_streams, "ListStreams")
OPERATION(describe_stream, "DescribeStream")
OPERATION(get_shard_iterator, "GetShardIterator")
OPERATION(get_records, "GetRecords")
OPERATION_LATENCY(get_records_latency, "GetRecords")
});
OPERATION_LATENCY(put_item_latency, "PutItem")
OPERATION_LATENCY(get_item_latency, "GetItem")
OPERATION_LATENCY(delete_item_latency, "DeleteItem")
OPERATION_LATENCY(update_item_latency, "UpdateItem")
OPERATION_LATENCY(batch_write_item_latency, "BatchWriteItem")
OPERATION_LATENCY(batch_get_item_latency, "BatchGetItem")
OPERATION_LATENCY(get_records_latency, "GetRecords")
if (!has_table) {
// Create and delete operations are not applicable to a per-table metrics
// only register it for the global metrics
metrics.add_group("alternator", {
OPERATION(create_table, "CreateTable")
OPERATION(delete_table, "DeleteTable")
});
}
metrics.add_group(group_name, {
seastar::metrics::make_total_operations("unsupported_operations", stats.unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API"), labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("total_operations", stats.total_operations,
seastar::metrics::description("number of total operations via Alternator API"), labels)(basic_level).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("reads_before_write", stats.reads_before_write,
seastar::metrics::description("number of performed read-before-write operations"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("write_using_lwt", stats.write_using_lwt,
seastar::metrics::description("number of writes that used LWT"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("shard_bounce_for_lwt", stats.shard_bounce_for_lwt,
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("requests_blocked_memory", stats.requests_blocked_memory,
seastar::metrics::description("Counts a number of requests blocked due to memory pressure."), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("requests_shed", stats.requests_shed,
seastar::metrics::description("Counts a number of requests shed due to overload."), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_read_total", stats.cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_matched_total", stats.cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("rcu_total", [&stats]{return 0.5 * stats.rcu_half_units_total;},
seastar::metrics::description("total number of consumed read units"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", stats.wcu_total[stats::wcu_types::PUT_ITEM],
seastar::metrics::description("total number of consumed write units"), labels)(op("PutItem")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", stats.wcu_total[stats::wcu_types::DELETE_ITEM],
seastar::metrics::description("total number of consumed write units"), labels)(op("DeleteItem")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", stats.wcu_total[stats::wcu_types::UPDATE_ITEM],
seastar::metrics::description("total number of consumed write units"), labels)(op("UpdateItem")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", stats.wcu_total[stats::wcu_types::INDEX],
seastar::metrics::description("total number of consumed write units"), labels)(op("Index")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [&stats] { return stats.cql_stats.filtered_rows_read_total - stats.cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"), labels,
stats.api_operations.batch_write_item_batch_total)(op("BatchWriteItem")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"), labels,
stats.api_operations.batch_get_item_batch_total)(op("BatchGetItem")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_get_item_histogram);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("batch_item_count_histogram", seastar::metrics::description("Histogram of the number of items in a batch request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.api_operations.batch_write_item_histogram);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.get_item_op_size_kb);})(op("GetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.put_item_op_size_kb);})(op("PutItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.delete_item_op_size_kb);})(op("DeleteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.update_item_op_size_kb);})(op("UpdateItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.batch_get_item_op_size_kb);})(op("BatchGetItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_histogram("operation_size_kb", seastar::metrics::description("Histogram of item sizes involved in a request"), labels,
[&stats]{ return estimated_histogram_to_metrics(stats.operation_sizes.batch_write_item_op_size_kb);})(op("BatchWriteItem")).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
_metrics.add_group("alternator", {
seastar::metrics::make_total_operations("unsupported_operations", unsupported_operations,
seastar::metrics::description("number of unsupported operations via Alternator API")),
seastar::metrics::make_total_operations("total_operations", total_operations,
seastar::metrics::description("number of total operations via Alternator API")),
seastar::metrics::make_total_operations("reads_before_write", reads_before_write,
seastar::metrics::description("number of performed read-before-write operations")),
seastar::metrics::make_total_operations("write_using_lwt", write_using_lwt,
seastar::metrics::description("number of writes that used LWT")),
seastar::metrics::make_total_operations("shard_bounce_for_lwt", shard_bounce_for_lwt,
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements")),
seastar::metrics::make_total_operations("requests_blocked_memory", requests_blocked_memory,
seastar::metrics::description("Counts a number of requests blocked due to memory pressure.")),
seastar::metrics::make_total_operations("requests_shed", requests_shed,
seastar::metrics::description("Counts a number of requests shed due to overload.")),
seastar::metrics::make_total_operations("filtered_rows_read_total", cql_stats.filtered_rows_read_total,
seastar::metrics::description("number of rows read during filtering operations")),
seastar::metrics::make_total_operations("filtered_rows_matched_total", cql_stats.filtered_rows_matched_total,
seastar::metrics::description("number of rows read and matched during filtering operations")),
seastar::metrics::make_counter("rcu_total", rcu_total,
seastar::metrics::description("total number of consumed read units, counted as half units")).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::PUT_ITEM],
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("PutItem")}).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::DELETE_ITEM],
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("DeleteItem")}).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::UPDATE_ITEM],
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("UpdateItem")}).set_skip_when_empty(),
seastar::metrics::make_counter("wcu_total", wcu_total[wcu_types::INDEX],
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("Index")}).set_skip_when_empty(),
seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },
seastar::metrics::description("number of rows read and dropped during filtering operations")),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},
api_operations.batch_write_item_batch_total).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},
api_operations.batch_get_item_batch_total).set_skip_when_empty(),
});
seastar::metrics::label expression_label("expression");
metrics.add_group(group_name, {
seastar::metrics::make_total_operations("expression_cache_evictions", stats.expression_cache.evictions,
seastar::metrics::description("Counts number of entries evicted from expressions cache"), labels).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_hits", stats.expression_cache.requests[stats::expression_types::UPDATE_EXPRESSION].hits,
seastar::metrics::description("Counts number of hits of cached expressions"), labels)(expression_label("UpdateExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_misses", stats.expression_cache.requests[stats::expression_types::UPDATE_EXPRESSION].misses,
seastar::metrics::description("Counts number of misses of cached expressions"), labels)(expression_label("UpdateExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_hits", stats.expression_cache.requests[stats::expression_types::CONDITION_EXPRESSION].hits,
seastar::metrics::description("Counts number of hits of cached expressions"), labels)(expression_label("ConditionExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_misses", stats.expression_cache.requests[stats::expression_types::CONDITION_EXPRESSION].misses,
seastar::metrics::description("Counts number of misses of cached expressions"), labels)(expression_label("ConditionExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_hits", stats.expression_cache.requests[stats::expression_types::PROJECTION_EXPRESSION].hits,
seastar::metrics::description("Counts number of hits of cached expressions"), labels)(expression_label("ProjectionExpression")).aggregate(aggregate_labels).set_skip_when_empty(),
seastar::metrics::make_total_operations("expression_cache_misses", stats.expression_cache.requests[stats::expression_types::PROJECTION_EXPRESSION].misses,
seastar::metrics::description("Counts number of misses of cached expressions"), labels)(expression_label("ProjectionExpression")).aggregate(aggregate_labels).set_skip_when_empty()
});
// Only register the following metrics for the global metrics, not per-table
if (!has_table) {
metrics.add_group("alternator", {
seastar::metrics::make_counter("authentication_failures", stats.authentication_failures,
seastar::metrics::description("total number of authentication failures"), labels).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
seastar::metrics::make_counter("authorization_failures", stats.authorization_failures,
seastar::metrics::description("total number of authorization failures"), labels).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(),
});
}
}
void register_metrics(seastar::metrics::metric_groups& metrics, const stats& stats) {
register_metrics_with_optional_table(metrics, stats, "", "");
}
table_stats::table_stats(const sstring& ks, const sstring& table) {
_stats = make_lw_shared<stats>();
register_metrics_with_optional_table(_metrics, *_stats, ks, table);
}
}

View File

@@ -12,7 +12,6 @@
#include <seastar/core/metrics_registration.hh>
#include "utils/histogram.hh"
#include "utils/estimated_histogram.hh"
#include "cql3/stats.hh"
namespace alternator {
@@ -22,6 +21,7 @@ namespace alternator {
// visible by the metrics REST API, with the "alternator" prefix.
class stats {
public:
stats();
// Count of DynamoDB API operations by types
struct {
uint64_t batch_get_item = 0;
@@ -75,47 +75,7 @@ public:
utils::timed_rate_moving_average_summary_and_histogram batch_write_item_latency;
utils::timed_rate_moving_average_summary_and_histogram batch_get_item_latency;
utils::timed_rate_moving_average_summary_and_histogram get_records_latency;
utils::estimated_histogram batch_get_item_histogram{22}; // a histogram that covers the range 1 - 100
utils::estimated_histogram batch_write_item_histogram{22}; // a histogram that covers the range 1 - 100
} api_operations;
// Operation size metrics
struct {
// Item size statistics collected per table and aggregated per node.
// Each histogram covers the range 0 - 446. Resolves #25143.
// A size is the retrieved item's size.
utils::estimated_histogram get_item_op_size_kb{30};
// A size is the maximum of the new item's size and the old item's size.
utils::estimated_histogram put_item_op_size_kb{30};
// A size is the deleted item's size. If the deleted item's size is
// unknown (i.e. read-before-write wasn't necessary and it wasn't
// forced by a configuration option), it won't be recorded on the
// histogram.
utils::estimated_histogram delete_item_op_size_kb{30};
// A size is the maximum of existing item's size and the estimated size
// of the update. This will be changed to the maximum of the existing item's
// size and the new item's size in a subsequent PR.
utils::estimated_histogram update_item_op_size_kb{30};
// A size is the sum of the sizes of all items per table. This means
// that a single BatchGetItem / BatchWriteItem updates the histogram
// for each table that it has items in.
// The sizes are the retrieved items' sizes grouped per table.
utils::estimated_histogram batch_get_item_op_size_kb{30};
// The sizes are the the written items' sizes grouped per table.
utils::estimated_histogram batch_write_item_op_size_kb{30};
} operation_sizes;
// Count of authentication and authorization failures, counted if either
// alternator_enforce_authorization or alternator_warn_authorization are
// set to true. If both are false, no authentication or authorization
// checks are performed, so failures are not recognized or counted.
// "authentication" failure means the request was not signed with a valid
// user and key combination. "authorization" failure means the request was
// authenticated to a valid user - but this user did not have permissions
// to perform the operation (considering RBAC settings and the user's
// superuser status).
uint64_t authentication_failures = 0;
uint64_t authorization_failures = 0;
// Miscellaneous event counters
uint64_t total_operations = 0;
uint64_t unsupported_operations = 0;
@@ -124,7 +84,7 @@ public:
uint64_t shard_bounce_for_lwt = 0;
uint64_t requests_blocked_memory = 0;
uint64_t requests_shed = 0;
uint64_t rcu_half_units_total = 0;
uint64_t rcu_total = 0;
// wcu can results from put, update, delete and index
// Index related will be done on top of the operation it comes with
enum wcu_types {
@@ -138,33 +98,10 @@ public:
uint64_t wcu_total[NUM_TYPES] = {0};
// CQL-derived stats
cql3::cql_stats cql_stats;
// Enumeration of expression types only for stats
// if needed it can be extended e.g. per operation
enum expression_types {
UPDATE_EXPRESSION,
CONDITION_EXPRESSION,
PROJECTION_EXPRESSION,
NUM_EXPRESSION_TYPES
};
struct {
struct {
uint64_t hits = 0;
uint64_t misses = 0;
} requests[NUM_EXPRESSION_TYPES];
uint64_t evictions = 0;
} expression_cache;
};
struct table_stats {
table_stats(const sstring& ks, const sstring& table);
private:
// The metric_groups object holds this stat object's metrics registered
// as long as the stats object is alive.
seastar::metrics::metric_groups _metrics;
lw_shared_ptr<stats> _stats;
};
void register_metrics(seastar::metrics::metric_groups& metrics, const stats& stats);
inline uint64_t bytes_to_kb_ceil(uint64_t bytes) {
return (bytes + 1023) / 1024;
}
}

View File

@@ -13,6 +13,7 @@
#include <seastar/json/formatter.hh>
#include "auth/permission.hh"
#include "db/config.hh"
#include "cdc/log.hh"
@@ -31,7 +32,6 @@
#include "executor.hh"
#include "data_dictionary/data_dictionary.hh"
#include "utils/rjson.hh"
/**
* Base template type to implement rapidjson::internal::TypeHelper<...>:s
@@ -126,7 +126,7 @@ public:
}
};
} // namespace alternator
}
template<typename ValueType>
struct rapidjson::internal::TypeHelper<ValueType, alternator::stream_arn>
@@ -217,7 +217,7 @@ future<alternator::executor::request_return_type> alternator::executor::list_str
rjson::add(ret, "LastEvaluatedStreamArn", *last);
}
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
}
struct shard_id {
@@ -296,7 +296,7 @@ sequence_number::sequence_number(std::string_view v)
}())
{}
} // namespace alternator
}
template<typename ValueType>
struct rapidjson::internal::TypeHelper<ValueType, alternator::shard_id>
@@ -356,7 +356,7 @@ static stream_view_type cdc_options_to_steam_view_type(const cdc::options& opts)
return type;
}
} // namespace alternator
}
template<typename ValueType>
struct rapidjson::internal::TypeHelper<ValueType, alternator::stream_view_type>
@@ -475,10 +475,10 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
} else {
status = "ENABLED";
}
}
}
auto ttl = std::chrono::seconds(opts.ttl());
rjson::add(stream_desc, "StreamStatus", rjson::from_string(status));
stream_view_type type = cdc_options_to_steam_view_type(opts);
@@ -491,7 +491,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
if (!opts.enabled()) {
rjson::add(ret, "StreamDescription", std::move(stream_desc));
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
}
// TODO: label
@@ -617,7 +617,7 @@ future<executor::request_return_type> executor::describe_stream(client_state& cl
rjson::add(stream_desc, "Shards", std::move(shards));
rjson::add(ret, "StreamDescription", std::move(stream_desc));
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
});
}
@@ -714,7 +714,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&
auto type = rjson::get<shard_iterator_type>(request, "ShardIteratorType");
auto seq_num = rjson::get_opt<sequence_number>(request, "SequenceNumber");
if (type < shard_iterator_type::TRIM_HORIZON && !seq_num) {
throw api_error::validation("Missing required parameter \"SequenceNumber\"");
}
@@ -724,7 +724,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&
auto stream_arn = rjson::get<alternator::stream_arn>(request, "StreamArn");
auto db = _proxy.data_dictionary();
schema_ptr schema = nullptr;
std::optional<shard_id> sid;
@@ -770,7 +770,7 @@ future<executor::request_return_type> executor::get_shard_iterator(client_state&
auto ret = rjson::empty_object();
rjson::add(ret, "ShardIterator", iter);
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
}
struct event_id {
@@ -789,7 +789,7 @@ struct event_id {
return os;
}
};
} // namespace alternator
}
template<typename ValueType>
struct rapidjson::internal::TypeHelper<ValueType, alternator::event_id>
@@ -827,7 +827,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, schema, auth::permission::SELECT, _stats);
co_await verify_permission(_enforce_authorization, client_state, schema, auth::permission::SELECT);
db::consistency_level cl = db::consistency_level::LOCAL_QUORUM;
partition_key pk = iter.shard.id.to_partition_key(*schema);
@@ -871,12 +871,10 @@ future<executor::request_return_type> executor::get_records(client_state& client
std::transform(pks.begin(), pks.end(), std::back_inserter(columns), [](auto& c) { return &c; });
std::transform(cks.begin(), cks.end(), std::back_inserter(columns), [](auto& c) { return &c; });
auto regular_column_start_idx = columns.size();
auto regular_column_filter = std::views::filter([](const column_definition& cdef) { return cdef.name() == op_column_name || cdef.name() == eor_column_name || !cdc::is_cdc_metacolumn_name(cdef.name_as_text()); });
std::ranges::transform(schema->regular_columns() | regular_column_filter, std::back_inserter(columns), [](auto& c) { return &c; });
auto regular_columns = std::ranges::subrange(columns.begin() + regular_column_start_idx, columns.end())
| std::views::transform(&column_definition::id)
auto regular_columns = schema->regular_columns()
| std::views::filter([](const column_definition& cdef) { return cdef.name() == op_column_name || cdef.name() == eor_column_name || !cdc::is_cdc_metacolumn_name(cdef.name_as_text()); })
| std::views::transform([&] (const column_definition& cdef) { columns.emplace_back(&cdef); return cdef.id; })
| std::ranges::to<query::column_id_vector>()
;
@@ -927,7 +925,6 @@ future<executor::request_return_type> executor::get_records(client_state& client
std::optional<utils::UUID> timestamp;
auto dynamodb = rjson::empty_object();
auto record = rjson::empty_object();
const auto dc_name = _proxy.get_token_metadata_ptr()->get_topology().get_datacenter();
using op_utype = std::underlying_type_t<cdc::operation>;
@@ -937,10 +934,9 @@ future<executor::request_return_type> executor::get_records(client_state& client
dynamodb = rjson::empty_object();
}
if (!record.ObjectEmpty()) {
rjson::add(record, "awsRegion", rjson::from_string(dc_name));
// TODO: awsRegion?
rjson::add(record, "eventID", event_id(iter.shard.id, *timestamp));
rjson::add(record, "eventSource", "scylladb:alternator");
rjson::add(record, "eventVersion", "1.1");
rjson::push_back(records, std::move(record));
record = rjson::empty_object();
--limit;
@@ -959,7 +955,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
rjson::add(dynamodb, "ApproximateCreationDateTime", utils::UUID_gen::unix_timestamp_in_sec(ts).count());
rjson::add(dynamodb, "SequenceNumber", sequence_number(ts));
rjson::add(dynamodb, "StreamViewType", type);
// TODO: SizeBytes
//TODO: SizeInBytes
}
/**
@@ -999,16 +995,6 @@ future<executor::request_return_type> executor::get_records(client_state& client
case cdc::operation::insert:
rjson::add(record, "eventName", "INSERT");
break;
case cdc::operation::service_row_delete:
case cdc::operation::service_partition_delete:
{
auto user_identity = rjson::empty_object();
rjson::add(user_identity, "Type", "Service");
rjson::add(user_identity, "PrincipalId", "dynamodb.amazonaws.com");
rjson::add(record, "userIdentity", std::move(user_identity));
rjson::add(record, "eventName", "REMOVE");
break;
}
default:
rjson::add(record, "eventName", "REMOVE");
break;
@@ -1035,7 +1021,7 @@ future<executor::request_return_type> executor::get_records(client_state& client
// will notice end end of shard and not return NextShardIterator.
rjson::add(ret, "NextShardIterator", next_iter);
_stats.api_operations.get_records_latency.mark(std::chrono::steady_clock::now() - start_time);
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
}
// ugh. figure out if we are and end-of-shard
@@ -1061,19 +1047,21 @@ future<executor::request_return_type> executor::get_records(client_state& client
if (is_big(ret)) {
return make_ready_future<executor::request_return_type>(make_streamed(std::move(ret)));
}
return make_ready_future<executor::request_return_type>(rjson::print(std::move(ret)));
return make_ready_future<executor::request_return_type>(make_jsonable(std::move(ret)));
});
});
}
bool executor::add_stream_options(const rjson::value& stream_specification, schema_builder& builder, service::storage_proxy& sp) {
void executor::add_stream_options(const rjson::value& stream_specification, schema_builder& builder, service::storage_proxy& sp) {
auto stream_enabled = rjson::find(stream_specification, "StreamEnabled");
if (!stream_enabled || !stream_enabled->IsBool()) {
throw api_error::validation("StreamSpecification needs boolean StreamEnabled");
}
if (stream_enabled->GetBool()) {
if (!sp.features().alternator_streams) {
auto db = sp.data_dictionary();
if (!db.features().alternator_streams) {
throw api_error::validation("StreamSpecification: alternator streams feature not enabled in cluster.");
}
@@ -1098,12 +1086,10 @@ bool executor::add_stream_options(const rjson::value& stream_specification, sche
break;
}
builder.with_cdc_options(opts);
return true;
} else {
cdc::options opts;
opts.enabled(false);
builder.with_cdc_options(opts);
return false;
}
}
@@ -1132,4 +1118,4 @@ void executor::supplement_table_stream_info(rjson::value& descr, const schema& s
}
}
} // namespace alternator
}

View File

@@ -16,8 +16,8 @@
#include <seastar/core/future.hh>
#include <seastar/core/lowres_clock.hh>
#include <seastar/coroutine/maybe_yield.hh>
#include <boost/multiprecision/cpp_int.hpp>
#include "cdc/log.hh"
#include "exceptions/exceptions.hh"
#include "gms/gossiper.hh"
#include "gms/inet_address.hh"
@@ -28,7 +28,7 @@
#include "replica/database.hh"
#include "service/client_state.hh"
#include "service_permit.hh"
#include "mutation/timestamp.hh"
#include "timestamp.hh"
#include "service/storage_proxy.hh"
#include "service/pager/paging_state.hh"
#include "service/pager/query_pagers.hh"
@@ -49,7 +49,6 @@
#include "dht/sharder.hh"
#include "db/config.hh"
#include "db/tags/utils.hh"
#include "utils/labels.hh"
#include "ttl.hh"
@@ -57,18 +56,18 @@ static logging::logger tlogger("alternator_ttl");
namespace alternator {
// We write the expiration-time attribute enabled on a table in a
// We write the expiration-time attribute enabled on a table using a
// tag TTL_TAG_KEY.
// Currently, the *value* of this tag is simply the name of the attribute,
// and the expiration scanner interprets it as an Alternator attribute name -
// It can refer to a real column or if that doesn't exist, to a member of
// the ":attrs" map column. Although this is designed for Alternator, it may
// be good enough for CQL as well (there, the ":attrs" column won't exist).
extern const sstring TTL_TAG_KEY;
static const sstring TTL_TAG_KEY("system:ttl_attribute");
future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {
_stats.api_operations.update_time_to_live++;
if (!_proxy.features().alternator_ttl) {
if (!_proxy.data_dictionary().features().alternator_ttl) {
co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Experimental support is available if the 'alternator-ttl' experimental feature is enabled on all nodes.");
}
@@ -82,6 +81,11 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
co_return api_error::validation("UpdateTimeToLive requires boolean Enabled");
}
bool enabled = v->GetBool();
// Alternator TTL doesn't yet work when the table uses tablets (#16567)
if (enabled && _proxy.local_db().find_keyspace(schema->ks_name()).get_replication_strategy().uses_tablets()) {
co_return api_error::validation("TTL not yet supported on a table using tablets (issue #16567). "
"Create a table with the tag 'experimental:initial_tablets' set to 'none' to use vnodes.");
}
v = rjson::find(*spec, "AttributeName");
if (!v || !v->IsString()) {
co_return api_error::validation("UpdateTimeToLive requires string AttributeName");
@@ -95,7 +99,7 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
}
sstring attribute_name(v->GetString(), v->GetStringLength());
co_await verify_permission(_enforce_authorization, _warn_authorization, client_state, schema, auth::permission::ALTER, _stats);
co_await verify_permission(_enforce_authorization, client_state, schema, auth::permission::ALTER);
co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {
if (enabled) {
if (tags_map.contains(TTL_TAG_KEY)) {
@@ -119,7 +123,7 @@ future<executor::request_return_type> executor::update_time_to_live(client_state
// basically identical to the request's
rjson::value response = rjson::empty_object();
rjson::add(response, "TimeToLiveSpecification", std::move(*spec));
co_return rjson::print(std::move(response));
co_return make_jsonable(std::move(response));
}
future<executor::request_return_type> executor::describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {
@@ -136,7 +140,7 @@ future<executor::request_return_type> executor::describe_time_to_live(client_sta
}
rjson::value response = rjson::empty_object();
rjson::add(response, "TimeToLiveDescription", std::move(desc));
co_return rjson::print(std::move(response));
co_return make_jsonable(std::move(response));
}
// expiration_service is a sharded service responsible for cleaning up expired
@@ -287,18 +291,13 @@ static future<> expire_item(service::storage_proxy& proxy,
auto ck = clustering_key::from_exploded(exploded_ck);
m.partition().clustered_row(*schema, ck).apply(tombstone(ts, gc_clock::now()));
}
utils::chunked_vector<mutation> mutations;
std::vector<mutation> mutations;
mutations.push_back(std::move(m));
return proxy.mutate(std::move(mutations),
db::consistency_level::LOCAL_QUORUM,
executor::default_timeout(), // FIXME - which timeout?
qs.get_trace_state(), qs.get_permit(),
db::allow_per_partition_rate_limit::no,
false,
cdc::per_request_options{
.is_system_originated = true,
}
);
db::allow_per_partition_rate_limit::no);
}
static size_t random_offset(size_t min, size_t max) {
@@ -316,10 +315,8 @@ static size_t random_offset(size_t min, size_t max) {
// this range's primary node is down. For this we need to return not just
// a list of this node's secondary ranges - but also the primary owner of
// each of those ranges.
//
// The function is to be used with vnodes only
static future<std::vector<std::pair<dht::token_range, locator::host_id>>> get_secondary_ranges(
const locator::effective_replication_map* erm,
const locator::effective_replication_map_ptr& erm,
locator::host_id ep) {
const auto& tm = *erm->get_token_metadata_ptr();
const auto& sorted_tokens = tm.sorted_tokens();
@@ -330,7 +327,6 @@ static future<std::vector<std::pair<dht::token_range, locator::host_id>>> get_se
auto prev_tok = sorted_tokens.back();
for (const auto& tok : sorted_tokens) {
co_await coroutine::maybe_yield();
// FIXME: pass is_vnode=true to get_natural_replicas since the token is in tm.sorted_tokens()
host_id_vector_replica_set eps = erm->get_natural_replicas(tok);
if (eps.size() <= 1 || eps[1] != ep) {
prev_tok = tok;
@@ -400,7 +396,7 @@ class ranges_holder_primary {
dht::token_range_vector _token_ranges;
public:
explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}
static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map* erm, locator::host_id ep) {
static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, locator::host_id ep) {
co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));
}
std::size_t size() const { return _token_ranges.size(); }
@@ -420,7 +416,7 @@ public:
explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, locator::host_id>> token_ranges, const gms::gossiper& g)
: _token_ranges(std::move(token_ranges))
, _gossiper(g) {}
static future<ranges_holder_secondary> make(const locator::vnode_effective_replication_map* erm, locator::host_id ep, const gms::gossiper& g) {
static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, locator::host_id ep, const gms::gossiper& g) {
co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);
}
std::size_t size() const { return _token_ranges.size(); }
@@ -433,8 +429,6 @@ public:
}
};
// The token_ranges_owned_by_this_shard class is only used for vnodes, where the vnodes give a partition range for the entire node
// and such range still needs to be divided between the shards.
template<class primary_or_secondary_t>
class token_ranges_owned_by_this_shard {
schema_ptr _s;
@@ -528,7 +522,7 @@ struct scan_ranges_context {
// should be possible (and a must for issue #7751!).
lw_shared_ptr<service::pager::paging_state> paging_state = nullptr;
auto regular_columns =
s->regular_columns() | std::views::transform(&column_definition::id)
s->regular_columns() | std::views::transform([] (const column_definition& cdef) { return cdef.id; })
| std::ranges::to<query::column_id_vector>();
selection = cql3::selection::selection::wildcard(s);
query::partition_slice::option_set opts = selection->get_query_options();
@@ -661,17 +655,6 @@ static future<> scan_table_ranges(
}
}
static future<> scan_tablet(locator::tablet_id tablet, service::storage_proxy& proxy, abort_source& abort_source, named_semaphore& page_sem,
expiration_service::stats& expiration_stats, const scan_ranges_context& scan_ctx, const locator::tablet_map& tablet_map) {
auto tablet_token_range = tablet_map.get_token_range(tablet);
dht::ring_position tablet_start(tablet_token_range.start()->value(), dht::ring_position::token_bound::start),
tablet_end(tablet_token_range.end()->value(), dht::ring_position::token_bound::end);
auto partition_range = dht::partition_range::make(std::move(tablet_start), std::move(tablet_end));
// Note that because of issue #9167 we need to run a separate query on each partition range, and can't pass
// several of them into one partition_range_vector that is passed to scan_table_ranges().
return scan_table_ranges(proxy, scan_ctx, {partition_range}, abort_source, page_sem, expiration_stats);
}
// scan_table() scans, in one table, data "owned" by this shard, looking for
// expired items and deleting them.
// We consider each node to "own" its primary token ranges, i.e., the tokens
@@ -747,69 +730,34 @@ static future<bool> scan_table(
expiration_stats.scan_table++;
// FIXME: need to pace the scan, not do it all at once.
scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};
if (s->table().uses_tablets()) {
locator::effective_replication_map_ptr erm = s->table().get_effective_replication_map();
auto my_host_id = erm->get_topology().my_host_id();
const auto &tablet_map = erm->get_token_metadata().tablets().get_tablet_map(s->id());
for (std::optional tablet = tablet_map.first_tablet(); tablet; tablet = tablet_map.next_tablet(*tablet)) {
auto tablet_primary_replica = tablet_map.get_primary_replica(*tablet, erm->get_topology());
// check if this is the primary replica for the current tablet
if (tablet_primary_replica.host == my_host_id && tablet_primary_replica.shard == this_shard_id()) {
co_await scan_tablet(*tablet, proxy, abort_source, page_sem, expiration_stats, scan_ctx, tablet_map);
} else if(erm->get_replication_factor() > 1) {
// Check if this is the secondary replica for the current tablet
// and if the primary replica is down which means we will take over this work.
// If each node only scans its own primary ranges, then when any node is
// down part of the token range will not get scanned. This can be viewed
// as acceptable (when the comes back online, it will resume its scan),
// but as noted in issue #9787, we can allow more prompt expiration
// by tasking another node to take over scanning of the dead node's primary
// ranges. What we do here is that this node will also check expiration
// on its *secondary* ranges - but only those whose primary owner is down.
auto tablet_secondary_replica = tablet_map.get_secondary_replica(*tablet); // throws if no secondary replica
if (tablet_secondary_replica.host == my_host_id && tablet_secondary_replica.shard == this_shard_id()) {
if (!gossiper.is_alive(tablet_primary_replica.host)) {
co_await scan_tablet(*tablet, proxy, abort_source, page_sem, expiration_stats, scan_ctx, tablet_map);
}
}
}
}
} else { // VNodes
locator::static_effective_replication_map_ptr ermp =
db.real_database().find_keyspace(s->ks_name()).get_static_effective_replication_map();
auto* erm = ermp->maybe_as_vnode_effective_replication_map();
if (!erm) {
on_internal_error(tlogger, format("Keyspace {} is local", s->ks_name()));
}
auto my_host_id = erm->get_topology().my_host_id();
token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_host_id));
while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {
// Note that because of issue #9167 we need to run a separate
// query on each partition range, and can't pass several of
// them into one partition_range_vector.
dht::partition_range_vector partition_ranges;
partition_ranges.push_back(std::move(*range));
// FIXME: if scanning a single range fails, including network errors,
// we fail the entire scan (and rescan from the beginning). Need to
// reconsider this. Saving the scan position might be a good enough
// solution for this problem.
co_await scan_table_ranges(proxy, scan_ctx, std::move(partition_ranges), abort_source, page_sem, expiration_stats);
}
// If each node only scans its own primary ranges, then when any node is
// down part of the token range will not get scanned. This can be viewed
// as acceptable (when the comes back online, it will resume its scan),
// but as noted in issue #9787, we can allow more prompt expiration
// by tasking another node to take over scanning of the dead node's primary
// ranges. What we do here is that this node will also check expiration
// on its *secondary* ranges - but only those whose primary owner is down.
token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_host_id, gossiper));
while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {
expiration_stats.secondary_ranges_scanned++;
dht::partition_range_vector partition_ranges;
partition_ranges.push_back(std::move(*range));
co_await scan_table_ranges(proxy, scan_ctx, std::move(partition_ranges), abort_source, page_sem, expiration_stats);
}
auto erm = db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();
auto my_host_id = erm->get_topology().my_host_id();
token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_host_id));
while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {
// Note that because of issue #9167 we need to run a separate
// query on each partition range, and can't pass several of
// them into one partition_range_vector.
dht::partition_range_vector partition_ranges;
partition_ranges.push_back(std::move(*range));
// FIXME: if scanning a single range fails, including network errors,
// we fail the entire scan (and rescan from the beginning). Need to
// reconsider this. Saving the scan position might be a good enough
// solution for this problem.
co_await scan_table_ranges(proxy, scan_ctx, std::move(partition_ranges), abort_source, page_sem, expiration_stats);
}
// If each node only scans its own primary ranges, then when any node is
// down part of the token range will not get scanned. This can be viewed
// as acceptable (when the comes back online, it will resume its scan),
// but as noted in issue #9787, we can allow more prompt expiration
// by tasking another node to take over scanning of the dead node's primary
// ranges. What we do here is that this node will also check expiration
// on its *secondary* ranges - but only those whose primary owner is down.
token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_host_id, gossiper));
while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {
expiration_stats.secondary_ranges_scanned++;
dht::partition_range_vector partition_ranges;
partition_ranges.push_back(std::move(*range));
co_await scan_table_ranges(proxy, scan_ctx, std::move(partition_ranges), abort_source, page_sem, expiration_stats);
}
co_return true;
}
@@ -903,13 +851,13 @@ future<> expiration_service::stop() {
expiration_service::stats::stats() {
_metrics.add_group("expiration", {
seastar::metrics::make_total_operations("scan_passes", scan_passes,
seastar::metrics::description("number of passes over the database"))(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number of passes over the database")),
seastar::metrics::make_total_operations("scan_table", scan_table,
seastar::metrics::description("number of table scans (counting each scan of each table that enabled expiration)"))(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number of table scans (counting each scan of each table that enabled expiration)")),
seastar::metrics::make_total_operations("items_deleted", items_deleted,
seastar::metrics::description("number of items deleted after expiration"))(basic_level)(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number of items deleted after expiration")),
seastar::metrics::make_total_operations("secondary_ranges_scanned", secondary_ranges_scanned,
seastar::metrics::description("number of token ranges scanned by this node while their primary owner was down"))(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number of token ranges scanned by this node while their primary owner was down")),
});
}

View File

@@ -106,8 +106,5 @@ target_link_libraries(api
wasmtime_bindings
absl::headers)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(api REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers api
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

View File

@@ -246,24 +246,6 @@
}
}
},
"sstableinfo":{
"id":"sstableinfo",
"description":"Compacted sstable information",
"properties":{
"generation":{
"type": "string",
"description":"Generation of the sstable"
},
"origin":{
"type":"string",
"description":"Origin of the sstable"
},
"size":{
"type":"long",
"description":"Size of the sstable"
}
}
},
"compaction_info" :{
"id": "compaction_info",
"description":"A key value mapping",
@@ -345,10 +327,6 @@
"type":"string",
"description":"The UUID"
},
"shard_id":{
"type":"int",
"description":"The shard id the compaction was executed on"
},
"cf":{
"type":"string",
"description":"The column family name"
@@ -357,17 +335,9 @@
"type":"string",
"description":"The keyspace name"
},
"compaction_type":{
"type":"string",
"description":"Type of compaction"
},
"started_at":{
"type":"long",
"description":"The time compaction started"
},
"compacted_at":{
"type":"long",
"description":"The time compaction completed"
"description":"The time of compaction"
},
"bytes_in":{
"type":"long",
@@ -383,32 +353,6 @@
"type":"row_merged"
},
"description":"The merged rows"
},
"sstables_in": {
"type":"array",
"items":{
"type":"sstableinfo"
},
"description":"List of input sstables for compaction"
},
"sstables_out": {
"type":"array",
"items":{
"type":"sstableinfo"
},
"description":"List of output sstables from compaction"
},
"total_tombstone_purge_attempt":{
"type":"long",
"description":"Total number of tombstone purge attempts"
},
"total_tombstone_purge_failure_due_to_overlapping_with_memtable":{
"type":"long",
"description":"Number of tombstone purge failures due to data overlapping with memtables"
},
"total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable":{
"type":"long",
"description":"Number of tombstone purge failures due to data overlapping with non-compacting sstables"
}
}
}

View File

@@ -136,6 +136,14 @@
"allowMultiple":false,
"type":"string",
"paramType":"path"
},
{
"name":"unsafe",
"description":"Set to True to perform an unsafe assassination",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}

View File

@@ -220,25 +220,6 @@
}
]
},
{
"path":"/storage_service/nodes/excluded",
"operations":[
{
"method":"GET",
"summary":"Retrieve host ids of nodes which are marked as excluded",
"type":"array",
"items":{
"type":"string"
},
"nickname":"get_excluded_nodes",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/nodes/joining",
"operations":[
@@ -613,50 +594,6 @@
}
]
},
{
"path": "/storage_service/natural_endpoints/v2/{keyspace}",
"operations": [
{
"method": "GET",
"summary":"This method returns the N endpoints that are responsible for storing the specified key i.e for replication. the endpoint responsible for this key",
"type": "array",
"items": {
"type": "string"
},
"nickname": "get_natural_endpoints_v2",
"produces": [
"application/json"
],
"parameters": [
{
"name": "keyspace",
"description": "The keyspace to query about.",
"required": true,
"allowMultiple": false,
"type": "string",
"paramType": "path"
},
{
"name": "cf",
"description": "Column family name.",
"required": true,
"allowMultiple": false,
"type": "string",
"paramType": "query"
},
{
"name": "key_component",
"description": "Each component of the key for which we need to find the endpoint (e.g. ?key_component=part1&key_component=part2).",
"required": true,
"allowMultiple": true,
"type": "string",
"paramType": "query"
}
]
}
]
},
{
"path":"/storage_service/cdc_streams_check_and_repair",
"operations":[
@@ -876,14 +813,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"move_files",
"description":"Move component files instead of copying them",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -961,14 +890,6 @@
"type":"string",
"paramType":"query",
"enum": ["all", "dc", "rack", "node"]
},
{
"name":"primary_replica_only",
"description":"Load the sstables and stream to the primary replica node within the scope, if one is specified. If not, stream to the global primary replica.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -1055,7 +976,7 @@
]
},
{
"path":"/storage_service/cleanup_all/",
"path":"/storage_service/cleanup_all",
"operations":[
{
"method":"POST",
@@ -1065,30 +986,6 @@
"produces":[
"application/json"
],
"parameters":[
{
"name":"global",
"description":"true if cleanup of entire cluster is requested",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/mark_node_as_clean",
"operations":[
{
"method":"POST",
"summary":"Mark the node as clean. After that the node will not be considered as needing cleanup during automatic cleanup which is triggered by some topology operations",
"type":"void",
"nickname":"reset_cleanup_needed",
"produces":[
"application/json"
],
"parameters":[]
}
]
@@ -1195,14 +1092,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name": "drop_unfixable_sstables",
"description": "When set to true, drop unfixable sstables. Applies only to scrub mode SEGREGATE.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -1622,30 +1511,6 @@
}
]
},
{
"path":"/storage_service/exclude_node",
"operations":[
{
"method":"POST",
"summary":"Marks the node as permanently down (excluded).",
"type":"void",
"nickname":"exclude_node",
"produces":[
"application/json"
],
"parameters":[
{
"name":"hosts",
"description":"Comma-separated list of host ids to exclude",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/removal_status",
"operations":[
@@ -2271,31 +2136,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"skip_cleanup",
"description":"Don't cleanup keys from loaded sstables. Invalid if load_and_stream is true",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"skip_reshape",
"description":"Don't reshape the loaded sstables. Invalid if load_and_stream is true",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"scope",
"description":"Defines the set of nodes to which mutations can be streamed",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query",
"enum": ["all", "dc", "rack", "node"]
}
]
}
@@ -3048,14 +2888,6 @@
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"incremental_mode",
"description":"Set the incremental repair mode. Can be 'disabled', 'incremental', or 'full'. 'incremental': The incremental repair logic is enabled. Unrepaired sstables will be included for repair. Repaired sstables will be skipped. The incremental repair states will be updated after repair. 'full': The incremental repair logic is enabled. Both repaired and unrepaired sstables will be included for repair. The incremental repair states will be updated after repair. 'disabled': The incremental repair logic is disabled completely. The incremental repair states, e.g., repaired_at in sstables and sstables_repaired_at in the system.tablets table, will not be updated after repair. When the option is not provided, it defaults to incremental mode.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
@@ -3187,73 +3019,6 @@
}
]
},
{
"path":"/storage_service/retrain_dict",
"operations":[
{
"method":"POST",
"summary":"Retrain the SSTable compression dictionary for the target table.",
"type":"void",
"nickname":"retrain_dict",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"Name of the keyspace containing the target table.",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"Name of the target table.",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/estimate_compression_ratios",
"operations":[
{
"method":"GET",
"summary":"Compute an estimated compression ratio for SSTables of the given table, for various compression configurations.",
"type":"array",
"items":{
"type":"compression_config_result"
},
"nickname":"estimate_compression_ratios",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"Name of the keyspace containing the target table.",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"cf",
"description":"Name of the target table.",
"required":true,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/raft_topology/reload",
"operations":[
@@ -3296,54 +3061,6 @@
]
}
]
},
{
"path":"/storage_service/raft_topology/cmd_rpc_status",
"operations":[
{
"method":"GET",
"summary":"Get information about currently running topology cmd rpc",
"type":"string",
"nickname":"raft_topology_get_cmd_status",
"produces":[
"application/json"
],
"parameters":[
]
}
]
},
{
"path":"/storage_service/drop_quarantined_sstables",
"operations":[
{
"method":"POST",
"summary":"Drops all quarantined sstables in all keyspaces or specified keyspace and tables",
"type":"void",
"nickname":"drop_quarantined_sstables",
"produces":[
"application/json"
],
"parameters":[
{
"name":"keyspace",
"description":"The keyspace name to drop quarantined sstables from.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
},
{
"name":"tables",
"description":"Comma-separated table names to drop quarantined sstables from.",
"required":false,
"allowMultiple":false,
"type":"string",
"paramType":"query"
}
]
}
]
}
],
"models":{
@@ -3480,11 +3197,11 @@
"properties":{
"start_token":{
"type":"string",
"description":"The range start token (exclusive)"
"description":"The range start token"
},
"end_token":{
"type":"string",
"description":"The range end token (inclusive)"
"description":"The range start token"
},
"endpoints":{
"type":"array",
@@ -3557,7 +3274,7 @@
"version":{
"type":"string",
"enum":[
"ka", "la", "mc", "md", "me", "ms"
"ka", "la", "mc", "md", "me"
],
"description":"SSTable version"
},
@@ -3603,32 +3320,6 @@
"type":"string"
}
}
},
"compression_config_result":{
"id":"compression_config_result",
"description":"Compression ratio estimation result for one config",
"properties":{
"level":{
"type":"long",
"description":"The used value of `compression_level`"
},
"chunk_length_in_kb":{
"type":"long",
"description":"The used value of `chunk_length_in_kb`"
},
"dict":{
"type":"string",
"description":"The used dictionary: `none`, `past` (== current), or `future`"
},
"sstable_compression":{
"type":"string",
"description":"The used compressor name (aka `sstable_compression`)"
},
"ratio":{
"type":"float",
"description":"The resulting compression ratio (estimated on a random sample of files)"
}
}
}
}
}

View File

@@ -344,22 +344,6 @@
"sequence_number":{
"type":"long",
"description":"The running sequence number of the task"
},
"shard":{
"type":"long",
"description":"The shard the task is running on"
},
"creation_time":{
"type":"datetime",
"description":"The creation time of the task (when it was queued); extracted from the task_id UUID"
},
"start_time":{
"type":"datetime",
"description":"The start time of the task (when execution began); unspecified (equal to epoch) when state == created"
},
"end_time":{
"type":"datetime",
"description":"The end time of the task; unspecified (equal to epoch) when the task is not completed"
}
}
},
@@ -402,17 +386,13 @@
"type":"boolean",
"description":"Boolean flag indicating whether the task can be aborted"
},
"creation_time":{
"type":"datetime",
"description":"The creation time of the task (when it was queued); extracted from the task_id UUID"
},
"start_time":{
"type":"datetime",
"description":"The start time of the task (when execution began); unspecified (equal to epoch) when state == created"
"description":"The start time of the task"
},
"end_time":{
"type":"datetime",
"description":"The end time of the task (when execution completed); unspecified (equal to epoch) when the task is not completed"
"description":"The end time of the task (unspecified when the task is not completed)"
},
"error":{
"type":"string",
@@ -455,7 +435,7 @@
"description":"The number of units completed so far"
},
"children_ids":{
"type":"chunked_array",
"type":"array",
"items":{
"type":"task_identity"
},

View File

@@ -42,14 +42,6 @@
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
},
{
"name":"consider_only_existing_data",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}

View File

@@ -137,6 +137,14 @@ future<> unset_load_meter(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_load_meter(ctx, r); });
}
future<> set_format_selector(http_context& ctx, db::sstables_format_selector& sel) {
return ctx.http_server.set_routes([&ctx, &sel] (routes& r) { set_format_selector(ctx, r, sel); });
}
future<> unset_format_selector(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_format_selector(ctx, r); });
}
future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader) {
return ctx.http_server.set_routes([&ctx, &sst_loader] (routes& r) { set_sstables_loader(ctx, r, sst_loader); });
}
@@ -216,22 +224,15 @@ future<> unset_server_gossip(http_context& ctx) {
});
}
future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db) {
co_await register_api(ctx, "column_family",
"The column family API", [&db] (http_context& ctx, routes& r) {
set_column_family(ctx, r, db);
});
co_await register_api(ctx, "cache_service",
"The cache service API", [&db] (http_context& ctx, routes& r) {
set_cache_service(ctx, db, r);
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {
return register_api(ctx, "column_family",
"The column family API", [&sys_ks] (http_context& ctx, routes& r) {
set_column_family(ctx, r, sys_ks);
});
}
future<> unset_server_column_family(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) {
unset_column_family(ctx, r);
unset_cache_service(ctx, r);
});
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_column_family(ctx, r); });
}
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms) {
@@ -263,6 +264,15 @@ future<> unset_server_stream_manager(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_stream_manager(ctx, r); });
}
future<> set_server_cache(http_context& ctx) {
return register_api(ctx, "cache_service",
"The cache service API", set_cache_service);
}
future<> unset_server_cache(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cache_service(ctx, r); });
}
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& proxy, sharded<gms::gossiper>& g) {
return register_api(ctx, "hinted_handoff",
"The hinted handoff API", [&proxy, &g] (http_context& ctx, routes& r) {
@@ -274,7 +284,7 @@ future<> unset_hinted_handoff(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_hinted_handoff(ctx, r); });
}
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction::compaction_manager>& cm) {
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm) {
return register_api(ctx, "compaction_manager", "The Compaction manager API", [&cm] (http_context& ctx, routes& r) {
set_compaction_manager(ctx, r, cm);
});
@@ -307,13 +317,13 @@ future<> unset_server_commitlog(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_commitlog(ctx, r); });
}
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg, sharded<gms::gossiper>& gossiper) {
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg) {
auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);
return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg, &gossiper](routes& r) {
return ctx.http_server.set_routes([rb, &ctx, &tm, &cfg = *cfg](routes& r) {
rb->register_function(r, "task_manager",
"The task manager API");
set_task_manager(ctx, r, tm, cfg, gossiper);
set_task_manager(ctx, r, tm, cfg);
});
}
@@ -381,5 +391,32 @@ future<> unset_server_raft(http_context& ctx) {
return ctx.http_server.set_routes([&ctx] (routes& r) { unset_raft(ctx, r); });
}
void req_params::process(const request& req) {
// Process mandatory parameters
for (auto& [name, ent] : params) {
if (!ent.is_mandatory) {
continue;
}
try {
ent.value = req.get_path_param(name);
} catch (std::out_of_range&) {
throw httpd::bad_param_exception(fmt::format("Mandatory parameter '{}' was not provided", name));
}
}
// Process optional parameters
for (auto& [name, value] : req.query_parameters) {
try {
auto& ent = params.at(name);
if (ent.is_mandatory) {
throw httpd::bad_param_exception(fmt::format("Parameter '{}' is expected to be provided as part of the request url", name));
}
ent.value = value;
} catch (std::out_of_range&) {
throw httpd::bad_param_exception(fmt::format("Unsupported optional parameter '{}'", name));
}
}
}
}

View File

@@ -23,6 +23,17 @@
namespace api {
template<class T>
std::vector<sstring> container_to_vec(const T& container) {
std::vector<sstring> res;
res.reserve(std::size(container));
for (const auto& i : container) {
res.push_back(fmt::to_string(i));
}
return res;
}
template<class T>
std::vector<T> map_to_key_value(const std::map<sstring, sstring>& map) {
std::vector<T> res;
@@ -56,6 +67,17 @@ T map_sum(T&& dest, const S& src) {
return std::move(dest);
}
template <typename MAP>
std::vector<sstring> map_keys(const MAP& map) {
std::vector<sstring> res;
res.reserve(std::size(map));
for (const auto& i : map) {
res.push_back(fmt::to_string(i.first));
}
return res;
}
/**
* General sstring splitting function
*/
@@ -73,7 +95,7 @@ inline std::vector<sstring> split(const sstring& text, const char* separator) {
*
*/
template<class T, class F, class V>
future<json::json_return_type> sum_stats(sharded<T>& d, V F::*f) {
future<json::json_return_type> sum_stats(distributed<T>& d, V F::*f) {
return d.map_reduce0([f](const T& p) {return p.get_stats().*f;}, 0,
std::plus<V>()).then([](V val) {
return make_ready_future<json::json_return_type>(val);
@@ -106,7 +128,7 @@ httpd::utils_json::rate_moving_average_and_histogram timer_to_json(const utils::
}
template<class T, class F>
future<json::json_return_type> sum_histogram_stats(sharded<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
future<json::json_return_type> sum_histogram_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).hist;}, utils::ihistogram(),
std::plus<utils::ihistogram>()).then([](const utils::ihistogram& val) {
@@ -115,7 +137,7 @@ future<json::json_return_type> sum_histogram_stats(sharded<T>& d, utils::timed_
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(sharded<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
@@ -124,7 +146,7 @@ future<json::json_return_type> sum_timer_stats(sharded<T>& d, utils::timed_rate
}
template<class T, class F>
future<json::json_return_type> sum_timer_stats(sharded<T>& d, utils::timed_rate_moving_average_summary_and_histogram F::*f) {
future<json::json_return_type> sum_timer_stats(distributed<T>& d, utils::timed_rate_moving_average_summary_and_histogram F::*f) {
return d.map_reduce0([f](const T& p) {return (p.get_stats().*f).rate();}, utils::rate_moving_average_and_histogram(),
std::plus<utils::rate_moving_average_and_histogram>()).then([](const utils::rate_moving_average_and_histogram& val) {
return make_ready_future<json::json_return_type>(timer_to_json(val));
@@ -230,6 +252,67 @@ public:
operator T() const { return value; }
};
using mandatory = bool_class<struct mandatory_tag>;
class req_params {
public:
struct def {
std::optional<sstring> value;
mandatory is_mandatory = mandatory::no;
def(std::optional<sstring> value_ = std::nullopt, mandatory is_mandatory_ = mandatory::no)
: value(std::move(value_))
, is_mandatory(is_mandatory_)
{ }
def(mandatory is_mandatory_)
: is_mandatory(is_mandatory_)
{ }
};
private:
std::unordered_map<sstring, def> params;
public:
req_params(std::initializer_list<std::pair<sstring, def>> l) {
for (const auto& [name, ent] : l) {
add(std::move(name), std::move(ent));
}
}
void add(sstring name, def ent) {
params.emplace(std::move(name), std::move(ent));
}
void process(const request& req);
const std::optional<sstring>& get(const char* name) const {
return params.at(name).value;
}
template <typename T = sstring>
const std::optional<T> get_as(const char* name) const {
return get(name);
}
template <typename T = sstring>
requires std::same_as<T, bool>
const std::optional<bool> get_as(const char* name) const {
auto value = get(name);
if (!value) {
return std::nullopt;
}
std::transform(value->begin(), value->end(), value->begin(), ::tolower);
if (value == "true" || value == "yes" || value == "1") {
return true;
}
if (value == "false" || value == "no" || value == "0") {
return false;
}
throw boost::bad_lexical_cast{};
}
};
httpd::utils_json::estimated_histogram time_to_json_histogram(const utils::time_estimated_histogram& val);
}

View File

@@ -18,9 +18,7 @@
using request = http::request;
using reply = http::reply;
namespace compaction {
class compaction_manager;
}
namespace service {
@@ -58,6 +56,7 @@ class sstables_format_selector;
namespace view {
class view_builder;
}
class system_keyspace;
}
namespace netw { class messaging_service; }
class repair_service;
@@ -84,9 +83,9 @@ struct http_context {
sstring api_dir;
sstring api_doc;
httpd::http_server_control http_server;
sharded<replica::database>& db;
distributed<replica::database>& db;
http_context(sharded<replica::database>& _db)
http_context(distributed<replica::database>& _db)
: db(_db)
{
}
@@ -117,7 +116,7 @@ future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_to
future<> unset_server_token_metadata(http_context& ctx);
future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);
future<> unset_server_gossip(http_context& ctx);
future<> set_server_column_family(http_context& ctx, sharded<replica::database>& db);
future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);
future<> unset_server_column_family(http_context& ctx);
future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);
future<> unset_server_messaging_service(http_context& ctx);
@@ -127,10 +126,12 @@ future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_
future<> unset_server_stream_manager(http_context& ctx);
future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p, sharded<gms::gossiper>& g);
future<> unset_hinted_handoff(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction::compaction_manager>& cm);
future<> set_server_cache(http_context& ctx);
future<> unset_server_cache(http_context& ctx);
future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm);
future<> unset_server_compaction_manager(http_context& ctx);
future<> set_server_done(http_context& ctx);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg, sharded<gms::gossiper>& gossiper);
future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);
future<> unset_server_task_manager(http_context& ctx);
future<> set_server_task_manager_test(http_context& ctx, sharded<tasks::task_manager>& tm);
future<> unset_server_task_manager_test(http_context& ctx);
@@ -140,6 +141,8 @@ future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);
future<> unset_server_raft(http_context&);
future<> set_load_meter(http_context& ctx, service::load_meter& lm);
future<> unset_load_meter(http_context& ctx);
future<> set_format_selector(http_context& ctx, db::sstables_format_selector& sel);
future<> unset_format_selector(http_context& ctx);
future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl);
future<> unset_server_cql_server_test(http_context& ctx);
future<> set_server_service_levels(http_context& ctx, cql_transport::controller& ctl, sharded<cql3::query_processor>& qp);

View File

@@ -16,7 +16,7 @@ using namespace json;
using namespace seastar::httpd;
namespace cs = httpd::cache_service_json;
void set_cache_service(http_context& ctx, sharded<replica::database>& db, routes& r) {
void set_cache_service(http_context& ctx, routes& r) {
cs::get_row_cache_save_period_in_seconds.set(r, [](std::unique_ptr<http::request> req) {
// We never save the cache
// Origin uses 0 for never
@@ -204,53 +204,53 @@ void set_cache_service(http_context& ctx, sharded<replica::database>& db, routes
});
});
cs::get_row_hits.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf(db, uint64_t(0), [](const replica::column_family& cf) {
cs::get_row_hits.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.count();
}, std::plus<uint64_t>());
});
cs::get_row_requests.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf(db, uint64_t(0), [](const replica::column_family& cf) {
cs::get_row_requests.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, uint64_t(0), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count();
}, std::plus<uint64_t>());
});
cs::get_row_hit_rate.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf(db, ratio_holder(), [](const replica::column_family& cf) {
cs::get_row_hit_rate.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx, ratio_holder(), [](const replica::column_family& cf) {
return ratio_holder(cf.get_row_cache().stats().hits.count() + cf.get_row_cache().stats().misses.count(),
cf.get_row_cache().stats().hits.count());
}, std::plus<ratio_holder>());
});
cs::get_row_hits_moving_avrage.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(db, utils::rate_moving_average(), [](const replica::column_family& cf) {
cs::get_row_hits_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_requests_moving_avrage.set(r, [&db] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(db, utils::rate_moving_average(), [](const replica::column_family& cf) {
cs::get_row_requests_moving_avrage.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf_raw(ctx, utils::rate_moving_average(), [](const replica::column_family& cf) {
return cf.get_row_cache().stats().hits.rate() + cf.get_row_cache().stats().misses.rate();
}, std::plus<utils::rate_moving_average>()).then([](const utils::rate_moving_average& m) {
return make_ready_future<json::json_return_type>(meter_to_json(m));
});
});
cs::get_row_size.set(r, [&db] (std::unique_ptr<http::request> req) {
cs::get_row_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {
// In origin row size is the weighted size.
// We currently do not support weights, so we use raw size in bytes instead
return db.map_reduce0([](replica::database& db) -> uint64_t {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().region().occupancy().used_space();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
});
});
cs::get_row_entries.set(r, [&db] (std::unique_ptr<http::request> req) {
return db.map_reduce0([](replica::database& db) -> uint64_t {
cs::get_row_entries.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {
return db.row_cache_tracker().partitions();
}, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);

View File

@@ -7,20 +7,15 @@
*/
#pragma once
#include <seastar/core/sharded.hh>
namespace seastar::httpd {
class routes;
}
namespace replica {
class database;
}
namespace api {
struct http_context;
void set_cache_service(http_context& ctx, seastar::sharded<replica::database>& db, seastar::httpd::routes& r);
void set_cache_service(http_context& ctx, seastar::httpd::routes& r);
void unset_cache_service(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -10,6 +10,7 @@
#include "api/api-doc/collectd.json.hh"
#include <seastar/core/scollectd.hh>
#include <seastar/core/scollectd_api.hh>
#include <boost/range/irange.hpp>
#include <ranges>
#include <regex>
#include "api/api_init.hh"

File diff suppressed because it is too large Load Diff

View File

@@ -13,25 +13,25 @@
#include <any>
#include "api/api_init.hh"
namespace db {
class system_keyspace;
}
namespace api {
void set_column_family(http_context& ctx, httpd::routes& r, sharded<replica::database>& db);
void set_column_family(http_context& ctx, httpd::routes& r, sharded<db::system_keyspace>& sys_ks);
void unset_column_family(http_context& ctx, httpd::routes& r);
table_info parse_table_info(const sstring& name, const replica::database& db);
table_id get_uuid(const sstring& name, const replica::database& db);
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(sharded<replica::database>& db, const sstring& name, I init,
future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
auto uuid = parse_table_info(name, db.local()).id;
using mapper_type = std::function<future<std::unique_ptr<std::any>>(replica::database&)>;
auto uuid = get_uuid(name, ctx.db.local());
using mapper_type = std::function<std::unique_ptr<std::any>(replica::database&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
return db.map_reduce0(mapper_type([mapper, uuid](replica::database& db) {
return futurize_invoke([mapper, &db, uuid] {
return mapper(db.find_column_family(uuid));
}).then([] (auto result) {
return std::make_unique<std::any>(I(std::move(result)));
});
return ctx.db.map_reduce0(mapper_type([mapper, uuid](replica::database& db) {
return std::make_unique<std::any>(I(mapper(db.find_column_family(uuid))));
}), std::make_unique<std::any>(std::move(init)), reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
})).then([] (std::unique_ptr<std::any> r) {
@@ -41,30 +41,33 @@ future<I> map_reduce_cf_raw(sharded<replica::database>& db, const sstring& name,
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(sharded<replica::database>& db, const sstring& name, I init,
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer) {
return map_reduce_cf_raw(db, name, init, mapper, reducer).then([](const I& res) {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([](const I& res) {
return make_ready_future<json::json_return_type>(res);
});
}
template<class Mapper, class I, class Reducer, class Result>
future<json::json_return_type> map_reduce_cf(sharded<replica::database>& db, const sstring& name, I init,
future<json::json_return_type> map_reduce_cf(http_context& ctx, const sstring& name, I init,
Mapper mapper, Reducer reducer, Result result) {
return map_reduce_cf_raw(db, name, init, mapper, reducer).then([result](const I& res) mutable {
return map_reduce_cf_raw(ctx, name, init, mapper, reducer).then([result](const I& res) mutable {
result = res;
return make_ready_future<json::json_return_type>(result);
});
}
future<json::json_return_type> map_reduce_cf_time_histogram(http_context& ctx, const sstring& name, std::function<utils::time_estimated_histogram(const replica::column_family&)> f);
struct map_reduce_column_families_locally {
std::any init;
std::function<future<std::unique_ptr<std::any>>(replica::column_family&)> mapper;
std::function<std::unique_ptr<std::any>(replica::column_family&)> mapper;
std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)> reducer;
future<std::unique_ptr<std::any>> operator()(replica::database& db) const {
auto res = seastar::make_lw_shared<std::unique_ptr<std::any>>(std::make_unique<std::any>(init));
return db.get_tables_metadata().for_each_table_gently([res, this] (table_id, seastar::lw_shared_ptr<replica::table> table) -> future<> {
*res = reducer(std::move(*res), co_await mapper(*table.get()));
return db.get_tables_metadata().for_each_table_gently([res, this] (table_id, seastar::lw_shared_ptr<replica::table> table) {
*res = reducer(std::move(*res), mapper(*table.get()));
return make_ready_future();
}).then([res] () {
return std::move(*res);
});
@@ -72,21 +75,17 @@ struct map_reduce_column_families_locally {
};
template<class Mapper, class I, class Reducer>
future<I> map_reduce_cf_raw(sharded<replica::database>& db, I init,
future<I> map_reduce_cf_raw(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
using mapper_type = std::function<future<std::unique_ptr<std::any>>(replica::column_family&)>;
using mapper_type = std::function<std::unique_ptr<std::any>(replica::column_family&)>;
using reducer_type = std::function<std::unique_ptr<std::any>(std::unique_ptr<std::any>, std::unique_ptr<std::any>)>;
auto wrapped_mapper = mapper_type([mapper = std::move(mapper)] (replica::column_family& cf) mutable {
return futurize_invoke([&cf, mapper] {
return mapper(cf);
}).then([] (auto result) {
return std::make_unique<std::any>(I(std::move(result)));
});
return std::make_unique<std::any>(I(mapper(cf)));
});
auto wrapped_reducer = reducer_type([reducer = std::move(reducer)] (std::unique_ptr<std::any> a, std::unique_ptr<std::any> b) mutable {
return std::make_unique<std::any>(I(reducer(std::any_cast<I>(std::move(*a)), std::any_cast<I>(std::move(*b)))));
});
return db.map_reduce0(map_reduce_column_families_locally{init,
return ctx.db.map_reduce0(map_reduce_column_families_locally{init,
std::move(wrapped_mapper), wrapped_reducer}, std::make_unique<std::any>(init), wrapped_reducer).then([] (std::unique_ptr<std::any> res) {
return std::any_cast<I>(std::move(*res));
});
@@ -94,13 +93,20 @@ future<I> map_reduce_cf_raw(sharded<replica::database>& db, I init,
template<class Mapper, class I, class Reducer>
future<json::json_return_type> map_reduce_cf(sharded<replica::database>& db, I init,
future<json::json_return_type> map_reduce_cf(http_context& ctx, I init,
Mapper mapper, Reducer reducer) {
return map_reduce_cf_raw(db, init, mapper, reducer).then([](const I& res) {
return map_reduce_cf_raw(ctx, init, mapper, reducer).then([](const I& res) {
return make_ready_future<json::json_return_type>(res);
});
}
future<json::json_return_type> get_cf_stats(http_context& ctx, const sstring& name,
int64_t replica::column_family_stats::*f);
future<json::json_return_type> get_cf_stats(http_context& ctx,
int64_t replica::column_family_stats::*f);
std::tuple<sstring, sstring> parse_fully_qualified_cf_name(sstring name);
}

View File

@@ -14,7 +14,6 @@
#include "api/api.hh"
#include "api/api-doc/compaction_manager.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "db/compaction_history_entry.hh"
#include "db/system_keyspace.hh"
#include "column_family.hh"
#include "unimplemented.hh"
@@ -29,9 +28,9 @@ namespace ss = httpd::storage_service_json;
using namespace json;
using namespace seastar::httpd;
static future<json::json_return_type> get_cm_stats(sharded<compaction::compaction_manager>& cm,
int64_t compaction::compaction_manager::stats::*f) {
return cm.map_reduce0([f](compaction::compaction_manager& cm) {
static future<json::json_return_type> get_cm_stats(sharded<compaction_manager>& cm,
int64_t compaction_manager::stats::*f) {
return cm.map_reduce0([f](compaction_manager& cm) {
return cm.get_stats().*f;
}, int64_t(0), std::plus<int64_t>()).then([](const int64_t& res) {
return make_ready_future<json::json_return_type>(res);
@@ -47,9 +46,9 @@ static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_ha
return std::move(a);
}
void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::compaction_manager>& cm) {
void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction_manager>& cm) {
cm::get_compactions.set(r, [&cm] (std::unique_ptr<http::request> req) {
return cm.map_reduce0([](compaction::compaction_manager& cm) {
return cm.map_reduce0([](compaction_manager& cm) {
std::vector<cm::summary> summaries;
for (const auto& c : cm.get_compactions()) {
@@ -58,7 +57,7 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
s.ks = c.ks_name;
s.cf = c.cf_name;
s.unit = "keys";
s.task_type = compaction::compaction_name(c.type);
s.task_type = sstables::compaction_name(c.type);
s.completed = c.total_keys_written;
s.total = c.total_partitions;
summaries.push_back(std::move(s));
@@ -72,9 +71,10 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
cm::get_pending_tasks_by_table.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return ctx.db.map_reduce0([](replica::database& db) {
return do_with(std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>(), [&db](std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_hash>& tasks) {
return db.get_tables_metadata().for_each_table_gently([&tasks] (table_id, lw_shared_ptr<replica::table> table) -> future<> {
return db.get_tables_metadata().for_each_table_gently([&tasks] (table_id, lw_shared_ptr<replica::table> table) {
replica::table& cf = *table.get();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = co_await cf.estimate_pending_compactions();
tasks[std::make_pair(cf.schema()->ks_name(), cf.schema()->cf_name())] = cf.estimate_pending_compactions();
return make_ready_future<>();
}).then([&tasks] {
return std::move(tasks);
});
@@ -103,20 +103,23 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
cm::stop_compaction.set(r, [&cm] (std::unique_ptr<http::request> req) {
auto type = req->get_query_param("type");
return cm.invoke_on_all([type] (compaction::compaction_manager& cm) {
return cm.invoke_on_all([type] (compaction_manager& cm) {
return cm.stop_compaction(type);
}).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
cm::stop_keyspace_compaction.set(r, [&ctx, &cm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto [ks_name, tables] = parse_table_infos(ctx, *req, "tables");
cm::stop_keyspace_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto ks_name = validate_keyspace(ctx, req);
auto table_names = parse_tables(ks_name, ctx, req->query_parameters, "tables");
auto type = req->get_query_param("type");
co_await cm.invoke_on_all([&] (compaction::compaction_manager& cm) {
return parallel_for_each(tables, [&] (const table_info& ti) {
return cm.stop_compaction(type, [id = ti.id] (const compaction::compaction_group_view* x) {
return x->schema()->id() == id;
co_await ctx.db.invoke_on_all([&] (replica::database& db) {
auto& cm = db.get_compaction_manager();
return parallel_for_each(table_names, [&] (sstring& table_name) {
auto& t = db.find_column_family(ks_name, table_name);
return t.parallel_foreach_table_state([&] (compaction::table_state& ts) {
return cm.stop_compaction(type, &ts);
});
});
});
@@ -124,13 +127,13 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
});
cm::get_pending_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) {
return map_reduce_cf(ctx.db, int64_t(0), [](replica::column_family& cf) {
return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {
return cf.estimate_pending_compactions();
}, std::plus<int64_t>());
});
cm::get_completed_tasks.set(r, [&cm] (std::unique_ptr<http::request> req) {
return get_cm_stats(cm, &compaction::compaction_manager::stats::completed_tasks);
return get_cm_stats(cm, &compaction_manager::stats::completed_tasks);
});
cm::get_total_compactions_completed.set(r, [] (std::unique_ptr<http::request> req) {
@@ -148,7 +151,7 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
});
cm::get_compaction_history.set(r, [&cm] (std::unique_ptr<http::request> req) {
noncopyable_function<future<>(output_stream<char>&&)> f = [&cm] (output_stream<char>&& out) -> future<> {
std::function<future<>(output_stream<char>&&)> f = [&cm] (output_stream<char>&& out) -> future<> {
auto s = std::move(out);
bool first = true;
std::exception_ptr ex;
@@ -157,11 +160,8 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
co_await cm.local().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {
cm::history h;
h.id = fmt::to_string(entry.id);
h.shard_id = entry.shard_id;
h.ks = std::move(entry.ks);
h.cf = std::move(entry.cf);
h.compaction_type = entry.compaction_type;
h.started_at = entry.started_at;
h.compacted_at = entry.compacted_at;
h.bytes_in = entry.bytes_in;
h.bytes_out = entry.bytes_out;
@@ -173,24 +173,6 @@ void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction::co
e.value = it.second;
h.rows_merged.push(std::move(e));
}
for (const auto& data : entry.sstables_in) {
httpd::compaction_manager_json::sstableinfo sstable;
sstable.generation = fmt::to_string(data.generation),
sstable.origin = data.origin,
sstable.size = data.size,
h.sstables_in.push(std::move(sstable));
}
for (const auto& data : entry.sstables_out) {
httpd::compaction_manager_json::sstableinfo sstable;
sstable.generation = fmt::to_string(data.generation),
sstable.origin = data.origin,
sstable.size = data.size,
h.sstables_out.push(std::move(sstable));
}
h.total_tombstone_purge_attempt = entry.total_tombstone_purge_attempt;
h.total_tombstone_purge_failure_due_to_overlapping_with_memtable = entry.total_tombstone_purge_failure_due_to_overlapping_with_memtable;
h.total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable = entry.total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable;
if (!first) {
co_await s.write(", ");
}

View File

@@ -13,13 +13,11 @@ namespace seastar::httpd {
class routes;
}
namespace compaction {
class compaction_manager;
}
namespace api {
struct http_context;
void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<compaction::compaction_manager>& cm);
void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<compaction_manager>& cm);
void unset_compaction_manager(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -23,6 +23,22 @@ using namespace seastar::httpd;
namespace sp = httpd::storage_proxy_json;
namespace ss = httpd::storage_service_json;
template<class T>
json::json_return_type get_json_return_type(const T& val) {
return json::json_return_type(val);
}
/*
* As commented on db::seed_provider_type is not used
* and probably never will.
*
* Just in case, we will return its name
*/
template<>
json::json_return_type get_json_return_type(const db::seed_provider_type& val) {
return json::json_return_type(val.class_name);
}
std::string_view format_type(std::string_view type) {
if (type == "int") {
return "integer";
@@ -171,7 +187,7 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx
});
ss::get_all_data_file_locations.set(r, [&cfg](const_req req) {
return cfg.data_file_directories();
return container_to_vec(cfg.data_file_directories());
});
ss::get_saved_caches_location.set(r, [&cfg](const_req req) {

View File

@@ -21,10 +21,10 @@ namespace hf = httpd::error_injection_json;
void set_error_injection(http_context& ctx, routes& r) {
hf::enable_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {
hf::enable_injection.set(r, [](std::unique_ptr<request> req) {
sstring injection = req->get_path_param("injection");
bool one_shot = req->get_query_param("one_shot") == "True";
auto params = co_await util::read_entire_stream_contiguous(*req->content_stream);
auto params = req->content;
const size_t max_params_size = 1024 * 1024;
if (params.size() > max_params_size) {
@@ -39,11 +39,12 @@ void set_error_injection(http_context& ctx, routes& r) {
: rjson::parse_to_map<utils::error_injection_parameters>(params);
auto& errinj = utils::get_local_injector();
co_await errinj.enable_on_all(injection, one_shot, std::move(parameters));
return errinj.enable_on_all(injection, one_shot, std::move(parameters)).then([] {
return make_ready_future<json::json_return_type>(json::json_void());
});
} catch (const rjson::error& e) {
throw httpd::bad_param_exception(format("Failed to parse injections parameters: {}", e.what()));
}
co_return json::json_void();
});
hf::get_enabled_injections_on_all.set(r, [](std::unique_ptr<request> req) {

View File

@@ -22,10 +22,10 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::vector<fd::endpoint_state> res;
res.reserve(g.num_endpoints());
g.for_each_endpoint_state([&] (const gms::endpoint_state& eps) {
g.for_each_endpoint_state([&] (const gms::inet_address& addr, const gms::endpoint_state& eps) {
fd::endpoint_state val;
val.addrs = fmt::to_string(eps.get_ip());
val.is_alive = g.is_alive(eps.get_host_id());
val.addrs = fmt::to_string(addr);
val.is_alive = g.is_alive(addr);
val.generation = eps.get_heart_beat_state().get_generation().value();
val.version = eps.get_heart_beat_state().get_heart_beat_version().value();
val.update_time = eps.get_update_timestamp().time_since_epoch().count();
@@ -40,9 +40,7 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
}
res.emplace_back(std::move(val));
});
return make_ready_future<json::json_return_type>(json::stream_range_as_array(res, [](const fd::endpoint_state& i){
return i;
}));
return make_ready_future<json::json_return_type>(res);
});
});
@@ -66,15 +64,11 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
fd::get_simple_states.set(r, [&g] (std::unique_ptr<request> req) {
return g.container().invoke_on(0, [] (gms::gossiper& g) {
std::vector<fd::mapper> nodes_status;
nodes_status.reserve(g.num_endpoints());
g.for_each_endpoint_state([&] (const gms::endpoint_state& es) {
fd::mapper val;
val.key = fmt::to_string(es.get_ip());
val.value = g.is_alive(es.get_host_id()) ? "UP" : "DOWN";
nodes_status.emplace_back(std::move(val));
std::map<sstring, sstring> nodes_status;
g.for_each_endpoint_state([&] (const gms::inet_address& node, const gms::endpoint_state&) {
nodes_status.emplace(fmt::to_string(node), g.is_alive(node) ? "UP" : "DOWN");
});
return make_ready_future<json::json_return_type>(std::move(nodes_status));
return make_ready_future<json::json_return_type>(map_to_key_value<fd::mapper>(nodes_status));
});
});
@@ -87,7 +81,7 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {
fd::get_endpoint_state.set(r, [&g] (std::unique_ptr<request> req) {
return g.container().invoke_on(0, [req = std::move(req)] (gms::gossiper& g) {
auto state = g.get_endpoint_state_ptr(g.get_host_id(gms::inet_address(req->get_path_param("addr"))));
auto state = g.get_endpoint_state_ptr(gms::inet_address(req->get_path_param("addr")));
if (!state) {
return make_ready_future<json::json_return_type>(format("unknown endpoint {}", req->get_path_param("addr")));
}

View File

@@ -21,45 +21,51 @@ using namespace json;
void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {
httpd::gossiper_json::get_down_endpoint.set(r, [&g] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto res = co_await g.get_unreachable_members_synchronized();
co_return json::json_return_type(res | std::views::transform([] (auto& ep) { return fmt::to_string(ep); }) | std::ranges::to<std::vector>());
co_return json::json_return_type(container_to_vec(res));
});
httpd::gossiper_json::get_live_endpoint.set(r, [&g] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto res = co_await g.get_live_members_synchronized();
co_return json::json_return_type(res | std::views::transform([] (auto& ep) { return fmt::to_string(ep); }) | std::ranges::to<std::vector>());
httpd::gossiper_json::get_live_endpoint.set(r, [&g] (std::unique_ptr<request> req) {
return g.get_live_members_synchronized().then([] (auto res) {
return make_ready_future<json::json_return_type>(container_to_vec(res));
});
});
httpd::gossiper_json::get_endpoint_downtime.set(r, [&g] (std::unique_ptr<request> req) -> future<json::json_return_type> {
gms::inet_address ep(req->get_path_param("addr"));
// synchronize unreachable_members on all shards
co_await g.get_unreachable_members_synchronized();
co_return g.get_endpoint_downtime(g.get_host_id(ep));
co_return g.get_endpoint_downtime(ep);
});
httpd::gossiper_json::get_current_generation_number.set(r, [&g] (std::unique_ptr<http::request> req) {
gms::inet_address ep(req->get_path_param("addr"));
return g.get_current_generation_number(g.get_host_id(ep)).then([] (gms::generation_type res) {
return g.get_current_generation_number(ep).then([] (gms::generation_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
});
httpd::gossiper_json::get_current_heart_beat_version.set(r, [&g] (std::unique_ptr<http::request> req) {
gms::inet_address ep(req->get_path_param("addr"));
return g.get_current_heart_beat_version(g.get_host_id(ep)).then([] (gms::version_type res) {
return g.get_current_heart_beat_version(ep).then([] (gms::version_type res) {
return make_ready_future<json::json_return_type>(res.value());
});
});
httpd::gossiper_json::assassinate_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {
return g.assassinate_endpoint(req->get_path_param("addr")).then([] {
if (req->get_query_param("unsafe") != "True") {
return g.assassinate_endpoint(req->get_path_param("addr")).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
}
return g.unsafe_assassinate_endpoint(req->get_path_param("addr")).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});
httpd::gossiper_json::force_remove_endpoint.set(r, [&g](std::unique_ptr<http::request> req) {
gms::inet_address ep(req->get_path_param("addr"));
return g.force_remove_endpoint(g.get_host_id(ep), gms::null_permit_id).then([] () {
return g.force_remove_endpoint(ep, gms::null_permit_id).then([] {
return make_ready_future<json::json_return_type>(json_void());
});
});

View File

@@ -148,7 +148,7 @@ void set_messaging_service(http_context& ctx, routes& r, sharded<netw::messaging
hf::inject_disconnect.set(r, [&ms] (std::unique_ptr<request> req) -> future<json::json_return_type> {
auto ip = msg_addr(req->get_path_param("ip"));
co_await ms.invoke_on_all([ip] (netw::messaging_service& ms) {
ms.remove_rpc_client(ip, std::nullopt);
ms.remove_rpc_client(ip);
});
co_return json::json_void();
});

View File

@@ -71,7 +71,7 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
co_return json_void{};
});
r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
if (req->get_query_param("group_id").empty()) {
if (!req->query_parameters.contains("group_id")) {
const auto leader_id = co_await raft_gr.invoke_on(0, [] (service::raft_group_registry& raft_gr) {
auto& srv = raft_gr.group0();
return srv.current_leader();
@@ -100,7 +100,7 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
r::read_barrier.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {
auto timeout = get_request_timeout(*req);
if (req->get_query_param("group_id").empty()) {
if (!req->query_parameters.contains("group_id")) {
// Read barrier on group 0 by default
co_await raft_gr.invoke_on(0, [timeout] (service::raft_group_registry& raft_gr) -> future<> {
co_await raft_gr.group0_with_timeouts().read_barrier(nullptr, timeout);
@@ -131,7 +131,7 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis
const auto stepdown_timeout_ticks = dur / service::raft_tick_interval;
auto timeout_dur = raft::logical_clock::duration(stepdown_timeout_ticks);
if (req->get_query_param("group_id").empty()) {
if (!req->query_parameters.contains("group_id")) {
// Stepdown on group 0 by default
co_await raft_gr.invoke_on(0, [timeout_dur] (service::raft_group_registry& raft_gr) {
apilog.info("Triggering stepdown for group0");

View File

@@ -11,7 +11,7 @@
#include "cql3/query_processor.hh"
#include "cql3/untyped_result_set.hh"
#include "db/consistency_level_type.hh"
#include <seastar/json/json_elements.hh>
#include "seastar/json/json_elements.hh"
#include "transport/controller.hh"
#include <unordered_map>

View File

@@ -39,7 +39,7 @@ utils::time_estimated_histogram timed_rate_moving_average_summary_merge(utils::t
* @return A future that resolves to the result of the aggregation.
*/
template<typename V, typename Reducer, typename InnerMapper>
future<V> two_dimensional_map_reduce(sharded<service::storage_proxy>& d,
future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
InnerMapper mapper, Reducer reducer, V initial_value) {
return d.map_reduce0( [mapper, reducer, initial_value] (const service::storage_proxy& sp) {
return map_reduce_scheduling_group_specific<service::storage_proxy_stats::stats>(
@@ -59,7 +59,7 @@ future<V> two_dimensional_map_reduce(sharded<service::storage_proxy>& d,
* @return A future that resolves to the result of the aggregation.
*/
template<typename V, typename Reducer, typename F, typename C>
future<V> two_dimensional_map_reduce(sharded<service::storage_proxy>& d,
future<V> two_dimensional_map_reduce(distributed<service::storage_proxy>& d,
C F::*f, Reducer reducer, V initial_value) {
return two_dimensional_map_reduce(d, [f] (F& stats) -> V {
return stats.*f;
@@ -75,20 +75,20 @@ future<V> two_dimensional_map_reduce(sharded<service::storage_proxy>& d,
*
*/
template<typename V, typename F>
future<json::json_return_type> sum_stats_storage_proxy(sharded<proxy>& d, V F::*f) {
future<json::json_return_type> sum_stats_storage_proxy(distributed<proxy>& d, V F::*f) {
return two_dimensional_map_reduce(d, [f] (F& stats) { return stats.*f; }, std::plus<V>(), V(0)).then([] (V val) {
return make_ready_future<json::json_return_type>(val);
});
}
static future<utils::rate_moving_average> sum_timed_rate(sharded<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
static future<utils::rate_moving_average> sum_timed_rate(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).rate();
}, std::plus<utils::rate_moving_average>(), utils::rate_moving_average());
}
static future<json::json_return_type> sum_timed_rate_as_obj(sharded<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
static future<json::json_return_type> sum_timed_rate_as_obj(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
httpd::utils_json::rate_moving_average m;
m = val;
@@ -100,7 +100,7 @@ httpd::utils_json::rate_moving_average_and_histogram get_empty_moving_average()
return timer_to_json(utils::rate_moving_average_and_histogram());
}
static future<json::json_return_type> sum_timed_rate_as_long(sharded<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
static future<json::json_return_type> sum_timed_rate_as_long(distributed<proxy>& d, utils::timed_rate_moving_average service::storage_proxy_stats::stats::*f) {
return sum_timed_rate(d, f).then([](const utils::rate_moving_average& val) {
return make_ready_future<json::json_return_type>(val.count);
});
@@ -152,7 +152,7 @@ static future<json::json_return_type> total_latency(sharded<service::storage_pr
*/
template<typename F>
future<json::json_return_type>
sum_histogram_stats_storage_proxy(sharded<proxy>& d,
sum_histogram_stats_storage_proxy(distributed<proxy>& d,
utils::timed_rate_moving_average_summary_and_histogram F::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {
return (stats.*f).hist;
@@ -172,7 +172,7 @@ sum_histogram_stats_storage_proxy(sharded<proxy>& d,
*/
template<typename F>
future<json::json_return_type>
sum_timer_stats_storage_proxy(sharded<proxy>& d,
sum_timer_stats_storage_proxy(distributed<proxy>& d,
utils::timed_rate_moving_average_summary_and_histogram F::*f) {
return two_dimensional_map_reduce(d, [f] (service::storage_proxy_stats::stats& stats) {

File diff suppressed because it is too large Load Diff

View File

@@ -43,28 +43,33 @@ sstring validate_keyspace(const http_context& ctx, sstring ks_name);
// containing the description of the respective keyspace error.
sstring validate_keyspace(const http_context& ctx, const std::unique_ptr<http::request>& req);
// verify that the keyspace:table is found, otherwise a bad_param_exception exception is thrown
// returns the table_id of the table if found
table_id validate_table(const replica::database& db, sstring ks_name, sstring table_name);
// verify that the table parameter is found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective table error.
void validate_table(const http_context& ctx, sstring ks_name, sstring table_name);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns an empty vector if no parameter was found.
// If the parameter is found and empty, returns a list of all table names in the keyspace.
std::vector<sstring> parse_tables(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
// splits a request parameter assumed to hold a comma-separated list of table names
// verify that the tables are found, otherwise a bad_param_exception exception is thrown
// containing the description of the respective no_such_column_family error.
// Returns a vector of all table infos given by the parameter, or
// if the parameter is not found or is empty, returns a list of all table infos in the keyspace.
std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, const std::unordered_map<sstring, sstring>& query_params, sstring param_name);
std::vector<table_info> parse_table_infos(const sstring& ks_name, const http_context& ctx, sstring value);
std::pair<sstring, std::vector<table_info>> parse_table_infos(const http_context& ctx, const http::request& req, sstring cf_param_name = "cf");
struct scrub_info {
compaction::compaction_type_options::scrub opts;
sstables::compaction_type_options::scrub opts;
sstring keyspace;
std::vector<sstring> column_families;
sstring snapshot_tag;
};
scrub_info parse_scrub_options(const http_context& ctx, std::unique_ptr<http::request> req);
future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl, std::unique_ptr<http::request> req);
void set_storage_service(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, service::raft_group0_client&);
void unset_storage_service(http_context& ctx, httpd::routes& r);
@@ -82,13 +87,6 @@ void set_snapshot(http_context& ctx, httpd::routes& r, sharded<db::snapshot_ctl>
void unset_snapshot(http_context& ctx, httpd::routes& r);
void set_load_meter(http_context& ctx, httpd::routes& r, service::load_meter& lm);
void unset_load_meter(http_context& ctx, httpd::routes& r);
seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, bool legacy_request = false);
// converts string value of boolean parameter into bool
// maps (case insensitively)
// "true", "yes" and "1" into true
// "false", "no" and "0" into false
// otherwise throws runtime_error
bool validate_bool_x(const sstring& param, bool default_value);
seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, http_context &ctx, bool legacy_request = false);
} // namespace api

View File

@@ -10,7 +10,7 @@
#include "api/api-doc/system.json.hh"
#include "api/api-doc/metrics.json.hh"
#include "replica/database.hh"
#include "sstables/sstables_manager.hh"
#include "db/sstables-format-selector.hh"
#include <rapidjson/document.h>
#include <boost/lexical_cast.hpp>
@@ -54,8 +54,7 @@ void set_system(http_context& ctx, routes& r) {
hm::set_metrics_config.set(r, [](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
rapidjson::Document doc;
auto content = co_await util::read_entire_stream_contiguous(*req->content_stream);
doc.Parse(content.c_str());
doc.Parse(req->content.c_str());
if (!doc.IsArray()) {
throw bad_param_exception("Expected a json array");
}
@@ -88,19 +87,21 @@ void set_system(http_context& ctx, routes& r) {
relabels[i].expr = element["regex"].GetString();
}
}
bool failed = false;
co_await smp::invoke_on_all([&relabels, &failed] {
return metrics::set_relabel_configs(relabels).then([&failed](const metrics::metric_relabeling_result& result) {
if (result.metrics_relabeled_due_to_collision > 0) {
failed = true;
return do_with(std::move(relabels), false, [](const std::vector<seastar::metrics::relabel_config>& relabels, bool& failed) {
return smp::invoke_on_all([&relabels, &failed] {
return metrics::set_relabel_configs(relabels).then([&failed](const metrics::metric_relabeling_result& result) {
if (result.metrics_relabeled_due_to_collision > 0) {
failed = true;
}
return;
});
}).then([&failed](){
if (failed) {
throw bad_param_exception("conflicts found during relabeling");
}
return;
return make_ready_future<json::json_return_type>(seastar::json::json_void());
});
});
if (failed) {
throw bad_param_exception("conflicts found during relabeling");
}
co_return seastar::json::json_void();
});
hs::get_system_uptime.set(r, [](const_req req) {
@@ -183,13 +184,18 @@ void set_system(http_context& ctx, routes& r) {
apilog.info("Profile dumped to {}", profile_dest);
return make_ready_future<json::json_return_type>(json::json_return_type(json::json_void()));
}) ;
}
hs::get_highest_supported_sstable_version.set(r, [&ctx] (std::unique_ptr<request> req) {
return smp::submit_to(0, [&ctx] {
auto format = ctx.db.local().get_user_sstables_manager().get_highest_supported_format();
return make_ready_future<json::json_return_type>(seastar::to_sstring(format));
void set_format_selector(http_context& ctx, routes& r, db::sstables_format_selector& sel) {
hs::get_highest_supported_sstable_version.set(r, [&sel] (std::unique_ptr<request> req) {
return smp::submit_to(0, [&sel] {
return make_ready_future<json::json_return_type>(seastar::to_sstring(sel.selected_format()));
});
});
}
void unset_format_selector(http_context& ctx, routes& r) {
hs::get_highest_supported_sstable_version.unset(r);
}
}

View File

@@ -12,9 +12,14 @@ namespace seastar::httpd {
class routes;
}
namespace db { class sstables_format_selector; }
namespace api {
struct http_context;
void set_system(http_context& ctx, seastar::httpd::routes& r);
void set_format_selector(http_context& ctx, seastar::httpd::routes& r, db::sstables_format_selector& sel);
void unset_format_selector(http_context& ctx, seastar::httpd::routes& r);
}

View File

@@ -6,7 +6,6 @@
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include <seastar/core/chunked_fifo.hh>
#include <seastar/core/coroutine.hh>
#include <seastar/coroutine/exception.hh>
#include <seastar/http/exception.hh>
@@ -15,7 +14,6 @@
#include "api/api.hh"
#include "api/api-doc/task_manager.json.hh"
#include "db/system_keyspace.hh"
#include "gms/gossiper.hh"
#include "tasks/task_handler.hh"
#include "utils/overloaded_functor.hh"
@@ -27,26 +25,20 @@ namespace tm = httpd::task_manager_json;
using namespace json;
using namespace seastar::httpd;
static ::tm get_time(db_clock::time_point tp) {
auto time = db_clock::to_time_t(tp);
::tm t;
::gmtime_r(&time, &t);
return t;
}
tm::task_status make_status(tasks::task_status status) {
auto start_time = db_clock::to_time_t(status.start_time);
auto end_time = db_clock::to_time_t(status.end_time);
::tm st, et;
::gmtime_r(&end_time, &et);
::gmtime_r(&start_time, &st);
tm::task_status make_status(tasks::task_status status, sharded<gms::gossiper>& gossiper) {
chunked_fifo<tm::task_identity> tis;
tis.reserve(status.children.size());
for (const auto& child : status.children) {
std::vector<tm::task_identity> tis{status.children.size()};
std::ranges::transform(status.children, tis.begin(), [] (const auto& child) {
tm::task_identity ident;
gms::inet_address addr{};
if (gossiper.local_is_initialized()) {
addr = gossiper.local().get_address_map().find(child.host_id).value_or(gms::inet_address{});
}
ident.task_id = child.task_id.to_sstring();
ident.node = fmt::format("{}", addr);
tis.push_back(std::move(ident));
}
ident.node = fmt::format("{}", child.node);
return ident;
});
tm::task_status res{};
res.id = status.task_id.to_sstring();
@@ -55,9 +47,8 @@ tm::task_status make_status(tasks::task_status status, sharded<gms::gossiper>& g
res.scope = status.scope;
res.state = status.state;
res.is_abortable = bool(status.is_abortable);
res.creation_time = get_time(status.creation_time);
res.start_time = get_time(status.start_time);
res.end_time = get_time(status.end_time);
res.start_time = st;
res.end_time = et;
res.error = status.error;
res.parent_id = status.parent_id ? status.parent_id.to_sstring() : "none";
res.sequence_number = status.sequence_number;
@@ -83,14 +74,10 @@ tm::task_stats make_stats(tasks::task_stats stats) {
res.keyspace = stats.keyspace;
res.table = stats.table;
res.entity = stats.entity;
res.shard = stats.shard;
res.creation_time = get_time(stats.creation_time);
res.start_time = get_time(stats.start_time);
res.end_time = get_time(stats.end_time);;
return res;
}
void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg, sharded<gms::gossiper>& gossiper) {
void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg) {
tm::get_modules.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
std::vector<std::string> v = tm.local().get_modules() | std::views::keys | std::ranges::to<std::vector>();
co_return v;
@@ -109,11 +96,11 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
throw bad_param_exception(fmt::format("{}", std::current_exception()));
}
if (auto param = req->get_query_param("keyspace"); !param.empty()) {
keyspace = param;
if (auto it = req->query_parameters.find("keyspace"); it != req->query_parameters.end()) {
keyspace = it->second;
}
if (auto param = req->get_query_param("table"); !param.empty()) {
table = param;
if (auto it = req->query_parameters.find("table"); it != req->query_parameters.end()) {
table = it->second;
}
return module->get_stats(internal, [keyspace = std::move(keyspace), table = std::move(table)] (std::string& ks, std::string& t) {
@@ -121,7 +108,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
});
});
noncopyable_function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
auto s = std::move(os);
std::exception_ptr ex;
try {
@@ -148,7 +135,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
co_return std::move(f);
});
tm::get_task_status.set(r, [&tm, &gossiper] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::get_task_status.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tasks::task_status status;
try {
@@ -157,7 +144,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
} catch (tasks::task_manager::task_not_found& e) {
throw bad_param_exception(e.what());
}
co_return make_status(status, gossiper);
co_return make_status(status);
});
tm::abort_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
@@ -173,12 +160,12 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
co_return json_void();
});
tm::wait_task.set(r, [&tm, &gossiper] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::wait_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
tasks::task_status status;
std::optional<std::chrono::seconds> timeout = std::nullopt;
if (auto param = req->get_query_param("timeout"); !param.empty()) {
timeout = std::chrono::seconds(boost::lexical_cast<uint32_t>(param));
if (auto it = req->query_parameters.find("timeout"); it != req->query_parameters.end()) {
timeout = std::chrono::seconds(boost::lexical_cast<uint32_t>(it->second));
}
try {
auto task = tasks::task_handler{tm.local(), id};
@@ -188,24 +175,24 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
} catch (timed_out_error& e) {
throw httpd::base_exception{e.what(), http::reply::status_type::request_timeout};
}
co_return make_status(status, gossiper);
co_return make_status(status);
});
tm::get_task_status_recursively.set(r, [&_tm = tm, &gossiper] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tm::get_task_status_recursively.set(r, [&_tm = tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& tm = _tm;
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
try {
auto task = tasks::task_handler{tm.local(), id};
auto res = co_await task.get_status_recursively(true);
noncopyable_function<future<>(output_stream<char>&&)> f = [r = std::move(res), &gossiper] (output_stream<char>&& os) -> future<> {
std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {
auto s = std::move(os);
auto res = std::move(r);
co_await s.write("[");
std::string delim = "";
for (auto& status: res) {
co_await s.write(std::exchange(delim, ", "));
co_await formatter::write(s, make_status(status, gossiper));
co_await formatter::write(s, make_status(status));
}
co_await s.write("]");
co_await s.close();
@@ -219,7 +206,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
tm::get_and_update_ttl.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
uint32_t ttl = cfg.task_ttl_seconds();
try {
co_await cfg.task_ttl_seconds.set_value_on_all_shards(req->get_query_param("ttl"), utils::config_file::config_source::API);
co_await cfg.task_ttl_seconds.set_value_on_all_shards(req->query_parameters["ttl"], utils::config_file::config_source::API);
} catch (...) {
throw bad_param_exception(fmt::format("{}", std::current_exception()));
}
@@ -234,7 +221,7 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>
tm::get_and_update_user_ttl.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
uint32_t user_ttl = cfg.user_task_ttl_seconds();
try {
co_await cfg.user_task_ttl_seconds.set_value_on_all_shards(req->get_query_param("user_ttl"), utils::config_file::config_source::API);
co_await cfg.user_task_ttl_seconds.set_value_on_all_shards(req->query_parameters["user_ttl"], utils::config_file::config_source::API);
} catch (...) {
throw bad_param_exception(fmt::format("{}", std::current_exception()));
}

View File

@@ -18,7 +18,7 @@ namespace tasks {
namespace api {
void set_task_manager(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm, db::config& cfg, sharded<gms::gossiper>& gossiper);
void set_task_manager(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm, db::config& cfg);
void unset_task_manager(http_context& ctx, httpd::routes& r);
}

View File

@@ -57,16 +57,20 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man
tmt::register_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
sharded<tasks::task_manager>& tms = tm;
const auto id_param = req->get_query_param("task_id");
auto id = !id_param.empty() ? tasks::task_id{utils::UUID{id_param}} : tasks::task_id::create_null_id();
const auto shard_param = req->get_query_param("shard");
unsigned shard = shard_param.empty() ? 0 : boost::lexical_cast<unsigned>(shard_param);
std::string keyspace = req->get_query_param("keyspace");
std::string table = req->get_query_param("table");
std::string entity = req->get_query_param("entity");
auto it = req->query_parameters.find("task_id");
auto id = it != req->query_parameters.end() ? tasks::task_id{utils::UUID{it->second}} : tasks::task_id::create_null_id();
it = req->query_parameters.find("shard");
unsigned shard = it != req->query_parameters.end() ? boost::lexical_cast<unsigned>(it->second) : 0;
it = req->query_parameters.find("keyspace");
std::string keyspace = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("table");
std::string table = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("entity");
std::string entity = it != req->query_parameters.end() ? it->second : "";
it = req->query_parameters.find("parent_id");
tasks::task_info data;
if (auto parent_id = req->get_query_param("parent_id"); !parent_id.empty()) {
data.id = tasks::task_id{utils::UUID{parent_id}};
if (it != req->query_parameters.end()) {
data.id = tasks::task_id{utils::UUID{it->second}};
auto parent_ptr = co_await tasks::task_manager::lookup_task_on_all_shards(tm, data.id);
data.shard = parent_ptr->get_status().shard;
}
@@ -84,7 +88,7 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man
});
tmt::unregister_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_query_param("task_id")}};
auto id = tasks::task_id{utils::UUID{req->query_parameters["task_id"]}};
try {
co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_variant task_v, tasks::virtual_task_hint) -> future<> {
return std::visit(overloaded_functor{
@@ -105,8 +109,9 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man
tmt::finish_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};
std::string error = req->get_query_param("error");
bool fail = !error.empty();
auto it = req->query_parameters.find("error");
bool fail = it != req->query_parameters.end();
std::string error = fail ? it->second : "";
try {
co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_variant task_v, tasks::virtual_task_hint) -> future<> {

View File

@@ -12,7 +12,6 @@
#include "api/api.hh"
#include "api/storage_service.hh"
#include "api/api-doc/tasks.json.hh"
#include "api/api-doc/storage_service.json.hh"
#include "compaction/compaction_manager.hh"
#include "compaction/task_manager_module.hh"
#include "service/storage_service.hh"
@@ -26,163 +25,96 @@ extern logging::logger apilog;
namespace api {
namespace t = httpd::tasks_json;
namespace ss = httpd::storage_service_json;
using namespace json;
using ks_cf_func = std::function<future<json::json_return_type>(http_context&, std::unique_ptr<http::request>, sstring, std::vector<table_info>)>;
static auto wrap_ks_cf(http_context &ctx, ks_cf_func f) {
return [&ctx, f = std::move(f)](std::unique_ptr<http::request> req) {
auto [keyspace, table_infos] = parse_table_infos(ctx, *req);
auto keyspace = validate_keyspace(ctx, req);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
return f(ctx, std::move(req), std::move(keyspace), std::move(table_infos));
};
}
static future<shared_ptr<compaction::major_keyspace_compaction_task_impl>> force_keyspace_compaction(http_context& ctx, std::unique_ptr<http::request> req) {
auto& db = ctx.db;
auto [ keyspace, table_infos ] = parse_table_infos(ctx, *req, "cf");
auto flush = validate_bool_x(req->get_query_param("flush_memtables"), true);
auto consider_only_existing_data = validate_bool_x(req->get_query_param("consider_only_existing_data"), false);
apilog.info("force_keyspace_compaction: keyspace={} tables={}, flush={} consider_only_existing_data={}", keyspace, table_infos, flush, consider_only_existing_data);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<compaction::flush_mode> fmopt;
if (!flush && !consider_only_existing_data) {
fmopt = compaction::flush_mode::skip;
}
return compaction_module.make_and_start_task<compaction::major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt, consider_only_existing_data);
}
static future<shared_ptr<compaction::upgrade_sstables_compaction_task_impl>> upgrade_sstables(http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) {
auto& db = ctx.db;
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, table_infos, exclude_current_version);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
return compaction_module.make_and_start_task<compaction::upgrade_sstables_compaction_task_impl>({}, std::move(keyspace), db, table_infos, exclude_current_version);
}
static future<shared_ptr<compaction::cleanup_keyspace_compaction_task_impl>> force_keyspace_cleanup(http_context& ctx, sharded<service::storage_service>& ss, std::unique_ptr<http::request> req) {
auto& db = ctx.db;
auto [keyspace, table_infos] = parse_table_infos(ctx, *req);
const auto& rs = db.local().find_keyspace(keyspace).get_replication_strategy();
if (rs.is_local() || !rs.is_vnode_based()) {
auto reason = rs.is_local() ? "require" : "support";
apilog.info("Keyspace {} does not {} cleanup", keyspace, reason);
co_return nullptr;
}
apilog.info("force_keyspace_cleanup: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_vnodes_cleanup_allowed(keyspace)) {
auto msg = "Can not perform cleanup operation when topology changes";
apilog.warn("force_keyspace_cleanup: keyspace={} tables={}: {}", keyspace, table_infos, msg);
co_await coroutine::return_exception(std::runtime_error(msg));
}
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
co_return co_await compaction_module.make_and_start_task<compaction::cleanup_keyspace_compaction_task_impl>(
{}, std::move(keyspace), db, table_infos, compaction::flush_mode::all_tables, tasks::is_user_task::yes);
}
void set_tasks_compaction_module(http_context& ctx, routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {
t::force_keyspace_compaction_async.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto task = co_await force_keyspace_compaction(ctx, std::move(req));
auto& db = ctx.db;
auto params = req_params({
std::pair("keyspace", mandatory::yes),
std::pair("cf", mandatory::no),
std::pair("flush_memtables", mandatory::no),
});
params.process(*req);
auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));
auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));
auto flush = params.get_as<bool>("flush_memtables").value_or(true);
apilog.debug("force_keyspace_compaction_async: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<flush_mode> fmopt;
if (!flush) {
fmopt = flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
ss::force_keyspace_compaction.set(r, [&ctx](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto task = co_await force_keyspace_compaction(ctx, std::move(req));
co_await task->done();
co_return json_void();
});
t::force_keyspace_cleanup_async.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
tasks::task_id id = tasks::task_id::create_null_id();
auto task = co_await force_keyspace_cleanup(ctx, ss, std::move(req));
if (task) {
id = task->get_status().id;
auto& db = ctx.db;
auto keyspace = validate_keyspace(ctx, req);
auto table_infos = parse_table_infos(keyspace, ctx, req->query_parameters, "cf");
apilog.info("force_keyspace_cleanup_async: keyspace={} tables={}", keyspace, table_infos);
if (!co_await ss.local().is_cleanup_allowed(keyspace)) {
auto msg = "Can not perform cleanup operation when topology changes";
apilog.warn("force_keyspace_cleanup_async: keyspace={} tables={}: {}", keyspace, table_infos, msg);
co_await coroutine::return_exception(std::runtime_error(msg));
}
co_return json::json_return_type(id.to_sstring());
});
ss::force_keyspace_cleanup.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto task = co_await force_keyspace_cleanup(ctx, ss, std::move(req));
if (task) {
co_await task->done();
}
co_return json::json_return_type(0);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<cleanup_keyspace_compaction_task_impl>({}, std::move(keyspace), db, table_infos, flush_mode::all_tables, tasks::is_user_task::yes);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
t::perform_keyspace_offstrategy_compaction_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, table_infos);
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<compaction::offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, nullptr);
auto task = co_await compaction_module.make_and_start_task<offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, nullptr);
co_return json::json_return_type(task->get_status().id.to_sstring());
}));
ss::perform_keyspace_offstrategy_compaction.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
apilog.info("perform_keyspace_offstrategy_compaction: keyspace={} tables={}", keyspace, table_infos);
bool res = false;
auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<compaction::offstrategy_keyspace_compaction_task_impl>({}, std::move(keyspace), ctx.db, table_infos, &res);
co_await task->done();
co_return json::json_return_type(res);
}));
t::upgrade_sstables_async.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
auto task = co_await upgrade_sstables(ctx, std::move(req), std::move(keyspace), std::move(table_infos));
co_return json::json_return_type(task->get_status().id.to_sstring());
}));
auto& db = ctx.db;
bool exclude_current_version = req_param<bool>(*req, "exclude_current_version", false);
ss::upgrade_sstables.set(r, wrap_ks_cf(ctx, [] (http_context& ctx, std::unique_ptr<http::request> req, sstring keyspace, std::vector<table_info> table_infos) -> future<json::json_return_type> {
auto task = co_await upgrade_sstables(ctx, std::move(req), std::move(keyspace), std::move(table_infos));
co_await task->done();
co_return json::json_return_type(0);
apilog.info("upgrade_sstables: keyspace={} tables={} exclude_current_version={}", keyspace, table_infos, exclude_current_version);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<upgrade_sstables_compaction_task_impl>({}, std::move(keyspace), db, table_infos, exclude_current_version);
co_return json::json_return_type(task->get_status().id.to_sstring());
}));
t::scrub_async.set(r, [&ctx, &snap_ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto info = parse_scrub_options(ctx, std::move(req));
if (!info.snapshot_tag.empty()) {
co_await snap_ctl.local().take_column_family_snapshot(info.keyspace, info.column_families, info.snapshot_tag, db::snapshot_ctl::skip_flush::no);
}
auto info = co_await parse_scrub_options(ctx, snap_ctl, std::move(req));
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
auto task = co_await compaction_module.make_and_start_task<compaction::scrub_sstables_compaction_task_impl>({}, std::move(info.keyspace), db, std::move(info.column_families), info.opts, nullptr);
auto task = co_await compaction_module.make_and_start_task<scrub_sstables_compaction_task_impl>({}, std::move(info.keyspace), db, std::move(info.column_families), info.opts, nullptr);
co_return json::json_return_type(task->get_status().id.to_sstring());
});
ss::force_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {
auto& db = ctx.db;
auto flush = validate_bool_x(req->get_query_param("flush_memtables"), true);
auto consider_only_existing_data = validate_bool_x(req->get_query_param("consider_only_existing_data"), false);
apilog.info("force_compaction: flush={} consider_only_existing_data={}", flush, consider_only_existing_data);
auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();
std::optional<compaction::flush_mode> fmopt;
if (!flush && !consider_only_existing_data) {
fmopt = compaction::flush_mode::skip;
}
auto task = co_await compaction_module.make_and_start_task<compaction::global_major_compaction_task_impl>({}, db, fmopt, consider_only_existing_data);
co_await task->done();
co_return json_void();
});
}
void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r) {
t::force_keyspace_compaction_async.unset(r);
ss::force_keyspace_compaction.unset(r);
t::force_keyspace_cleanup_async.unset(r);
ss::force_keyspace_cleanup.unset(r);
t::perform_keyspace_offstrategy_compaction_async.unset(r);
ss::perform_keyspace_offstrategy_compaction.unset(r);
t::upgrade_sstables_async.unset(r);
ss::upgrade_sstables.unset(r);
t::scrub_async.unset(r);
ss::force_compaction.unset(r);
}
}

View File

@@ -54,23 +54,12 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to
for (const auto host_id: leaving_host_ids) {
eps.insert(g.local().get_address_map().get(host_id));
}
return eps | std::views::transform([] (auto& i) { return fmt::to_string(i); }) | std::ranges::to<std::vector>();
return container_to_vec(eps);
});
ss::get_moving_nodes.set(r, [](const_req req) {
std::unordered_set<sstring> addr;
return addr | std::ranges::to<std::vector>();
});
ss::get_excluded_nodes.set(r, [&tm](const_req req) {
const auto& local_tm = *tm.local().get();
std::vector<sstring> eps;
local_tm.get_topology().for_each_node([&] (auto& node) {
if (node.is_excluded()) {
eps.push_back(node.host_id().to_sstring());
}
});
return eps;
return container_to_vec(addr);
});
ss::get_joining_nodes.set(r, [&tm, &g](const_req req) {
@@ -81,21 +70,15 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to
for (const auto& [token, host_id]: points) {
eps.insert(g.local().get_address_map().get(host_id));
}
return eps | std::views::transform([] (auto& i) { return fmt::to_string(i); }) | std::ranges::to<std::vector>();
return container_to_vec(eps);
});
ss::get_host_id_map.set(r, [&tm, &g](const_req req) {
if (!g.local().is_enabled()) {
throw std::runtime_error("The gossiper is not ready yet");
}
return tm.local().get()->get_host_ids()
| std::views::transform([&g] (locator::host_id id) {
ss::mapper m;
m.key = fmt::to_string(g.local().get_address_map().get(id));
m.value = fmt::to_string(id);
return m;
})
| std::ranges::to<std::vector<ss::mapper>>();
std::vector<ss::mapper> res;
auto map = tm.local().get()->get_host_ids() |
std::views::transform([&g] (locator::host_id id) { return std::make_pair(g.local().get_address_map().get(id), id); }) |
std::ranges::to<std::unordered_map>();
return map_to_key_value(std::move(map), res);
});
static auto host_or_broadcast = [&tm](const_req req) {
@@ -141,7 +124,6 @@ void unset_token_metadata(http_context& ctx, routes& r) {
ss::get_leaving_nodes.unset(r);
ss::get_moving_nodes.unset(r);
ss::get_joining_nodes.unset(r);
ss::get_excluded_nodes.unset(r);
ss::get_host_id_map.unset(r);
httpd::endpoint_snitch_info_json::get_datacenter.unset(r);
httpd::endpoint_snitch_info_json::get_rack.unset(r);

View File

@@ -5,7 +5,6 @@ target_sources(scylla_audit
PRIVATE
audit.cc
audit_cf_storage_helper.cc
audit_composite_storage_helper.cc
audit_syslog_storage_helper.cc)
target_include_directories(scylla_audit
PUBLIC
@@ -17,7 +16,4 @@ target_link_libraries(scylla_audit
PRIVATE
cql3)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(scylla_audit REUSE_FROM scylla-precompiled-header)
endif()
add_whole_archive(audit scylla_audit)

View File

@@ -13,11 +13,9 @@
#include "cql3/statements/batch_statement.hh"
#include "cql3/statements/modification_statement.hh"
#include "storage_helper.hh"
#include "audit_cf_storage_helper.hh"
#include "audit_syslog_storage_helper.hh"
#include "audit_composite_storage_helper.hh"
#include "audit.hh"
#include "../db/config.hh"
#include "utils/class_registrator.hh"
#include <boost/algorithm/string/split.hpp>
#include <boost/algorithm/string/trim.hpp>
@@ -28,50 +26,8 @@ namespace audit {
logging::logger logger("audit");
static std::set<sstring> parse_audit_modes(const sstring& data) {
std::set<sstring> result;
if (!data.empty()) {
std::vector<sstring> audit_modes;
boost::split(audit_modes, data, boost::is_any_of(","));
if (audit_modes.empty()) {
return {};
}
for (sstring& audit_mode : audit_modes) {
boost::trim(audit_mode);
if (audit_mode == "none") {
return {};
}
if (audit_mode != "table" && audit_mode != "syslog") {
throw audit_exception(fmt::format("Bad configuration: invalid 'audit': {}", audit_mode));
}
result.insert(std::move(audit_mode));
}
}
return result;
}
static std::unique_ptr<storage_helper> create_storage_helper(const std::set<sstring>& audit_modes, cql3::query_processor& qp, service::migration_manager& mm) {
SCYLLA_ASSERT(!audit_modes.empty() && !audit_modes.contains("none"));
std::vector<std::unique_ptr<storage_helper>> helpers;
for (const sstring& audit_mode : audit_modes) {
if (audit_mode == "table") {
helpers.emplace_back(std::make_unique<audit_cf_storage_helper>(qp, mm));
} else if (audit_mode == "syslog") {
helpers.emplace_back(std::make_unique<audit_syslog_storage_helper>(qp, mm));
}
}
SCYLLA_ASSERT(!helpers.empty());
if (helpers.size() == 1) {
return std::move(helpers.front());
}
return std::make_unique<audit_composite_storage_helper>(std::move(helpers));
}
static sstring category_to_string(statement_category category)
{
switch (category) {
sstring audit_info::category_string() const {
switch (_category) {
case statement_category::QUERY: return "QUERY";
case statement_category::DML: return "DML";
case statement_category::DDL: return "DDL";
@@ -82,9 +38,19 @@ static sstring category_to_string(statement_category category)
return "";
}
sstring audit_info::category_string() const {
return category_to_string(_category);
}
audit::audit(locator::shared_token_metadata& token_metadata,
sstring&& storage_helper_name,
std::set<sstring>&& audited_keyspaces,
std::map<sstring, std::set<sstring>>&& audited_tables,
category_set&& audited_categories)
: _token_metadata(token_metadata)
, _audited_keyspaces(std::move(audited_keyspaces))
, _audited_tables(std::move(audited_tables))
, _audited_categories(std::move(audited_categories))
, _storage_helper_class_name(std::move(storage_helper_name))
{ }
audit::~audit() = default;
static category_set parse_audit_categories(const sstring& data) {
category_set result;
@@ -145,56 +111,49 @@ static std::set<sstring> parse_audit_keyspaces(const sstring& data) {
return result;
}
audit::audit(locator::shared_token_metadata& token_metadata,
cql3::query_processor& qp,
service::migration_manager& mm,
std::set<sstring>&& audit_modes,
std::set<sstring>&& audited_keyspaces,
std::map<sstring, std::set<sstring>>&& audited_tables,
category_set&& audited_categories,
const db::config& cfg)
: _token_metadata(token_metadata)
, _audited_keyspaces(std::move(audited_keyspaces))
, _audited_tables(std::move(audited_tables))
, _audited_categories(std::move(audited_categories))
, _cfg(cfg)
, _cfg_keyspaces_observer(cfg.audit_keyspaces.observe([this] (sstring const& new_value){ update_config<std::set<sstring>>(new_value, parse_audit_keyspaces, _audited_keyspaces); }))
, _cfg_tables_observer(cfg.audit_tables.observe([this] (sstring const& new_value){ update_config<std::map<sstring, std::set<sstring>>>(new_value, parse_audit_tables, _audited_tables); }))
, _cfg_categories_observer(cfg.audit_categories.observe([this] (sstring const& new_value){ update_config<category_set>(new_value, parse_audit_categories, _audited_categories); }))
{
_storage_helper_ptr = create_storage_helper(std::move(audit_modes), qp, mm);
}
audit::~audit() = default;
future<> audit::start_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm) {
std::set<sstring> audit_modes = parse_audit_modes(cfg.audit());
if (audit_modes.empty()) {
future<> audit::create_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm) {
sstring storage_helper_name;
if (cfg.audit() == "table") {
storage_helper_name = "audit_cf_storage_helper";
} else if (cfg.audit() == "syslog") {
storage_helper_name = "audit_syslog_storage_helper";
} else if (cfg.audit() == "none") {
// Audit is off
logger.info("Audit is disabled");
return make_ready_future<>();
} else {
throw audit_exception(fmt::format("Bad configuration: invalid 'audit': {}", cfg.audit()));
}
category_set audited_categories = parse_audit_categories(cfg.audit_categories());
if (!audited_categories) {
return make_ready_future<>();
}
std::map<sstring, std::set<sstring>> audited_tables = parse_audit_tables(cfg.audit_tables());
std::set<sstring> audited_keyspaces = parse_audit_keyspaces(cfg.audit_keyspaces());
if (audited_tables.empty()
&& audited_keyspaces.empty()
&& !audited_categories.contains(statement_category::AUTH)
&& !audited_categories.contains(statement_category::ADMIN)
&& !audited_categories.contains(statement_category::DCL)) {
return make_ready_future<>();
}
logger.info("Audit is enabled. Auditing to: \"{}\", with the following categories: \"{}\", keyspaces: \"{}\", and tables: \"{}\"",
cfg.audit(), cfg.audit_categories(), cfg.audit_keyspaces(), cfg.audit_tables());
return audit_instance().start(std::ref(stm),
std::ref(qp),
std::ref(mm),
std::move(audit_modes),
std::move(storage_helper_name),
std::move(audited_keyspaces),
std::move(audited_tables),
std::move(audited_categories),
std::cref(cfg))
.then([&cfg] {
if (!audit_instance().local_is_initialized()) {
return make_ready_future<>();
}
return audit_instance().invoke_on_all([&cfg] (audit& local_audit) {
return local_audit.start(cfg);
});
std::move(audited_categories));
}
future<> audit::start_audit(const db::config& cfg, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm) {
if (!audit_instance().local_is_initialized()) {
return make_ready_future<>();
}
return audit_instance().invoke_on_all([&cfg, &qp, &mm] (audit& local_audit) {
return local_audit.start(cfg, qp.local(), mm.local());
});
}
@@ -220,7 +179,15 @@ audit_info_ptr audit::create_no_audit_info() {
return audit_info_ptr();
}
future<> audit::start(const db::config& cfg) {
future<> audit::start(const db::config& cfg, cql3::query_processor& qp, service::migration_manager& mm) {
try {
_storage_helper_ptr = create_object<storage_helper>(_storage_helper_class_name, qp, mm);
} catch (no_such_class& e) {
logger.error("Can't create audit storage helper {}: not supported", _storage_helper_class_name);
throw;
} catch (...) {
throw;
}
return _storage_helper_ptr->start(cfg);
}
@@ -240,11 +207,6 @@ future<> audit::log(const audit_info* audit_info, service::query_state& query_st
static const sstring anonymous_username("anonymous");
const sstring& username = client_state.user() ? client_state.user()->name.value_or(anonymous_username) : no_username;
socket_address client_ip = client_state.get_client_address().addr();
if (logger.is_enabled(logging::log_level::debug)) {
logger.debug("Log written: node_ip {} category {} cl {} error {} keyspace {} query '{}' client_ip {} table {} username {}",
node_ip, audit_info->category_string(), cl, error, audit_info->keyspace(),
audit_info->query(), client_ip, audit_info->table(), username);
}
return futurize_invoke(std::mem_fn(&storage_helper::write), _storage_helper_ptr, audit_info, node_ip, client_ip, cl, username, error)
.handle_exception([audit_info, node_ip, client_ip, cl, username, error] (auto ep) {
logger.error("Unexpected exception when writing log with: node_ip {} category {} cl {} error {} keyspace {} query '{}' client_ip {} table {} username {} exception {}",
@@ -255,10 +217,6 @@ future<> audit::log(const audit_info* audit_info, service::query_state& query_st
future<> audit::log_login(const sstring& username, socket_address client_ip, bool error) noexcept {
socket_address node_ip = _token_metadata.get()->get_topology().my_address().addr();
if (logger.is_enabled(logging::log_level::debug)) {
logger.debug("Login log written: node_ip {}, client_ip {}, username {}, error {}",
node_ip, client_ip, username, error ? "true" : "false");
}
return futurize_invoke(std::mem_fn(&storage_helper::write_login), _storage_helper_ptr, username, node_ip, client_ip, error)
.handle_exception([username, node_ip, client_ip, error] (auto ep) {
logger.error("Unexpected exception when writing login log with: node_ip {} client_ip {} username {} error {} exception {}",
@@ -302,33 +260,4 @@ bool audit::should_log(const audit_info* audit_info) const {
|| audit_info->category() == statement_category::DCL);
}
template<class T>
void audit::update_config(const sstring & new_value, std::function<T(const sstring&)> parse_func, T& cfg_parameter)
{
try {
cfg_parameter = parse_func(new_value);
} catch (...) {
logger.error("Audit configuration update failed because cannot parse value=\"{}\".", new_value);
return;
}
// If update_config is called with an invalid new_value, this line is not reached.
// But logging the invalid value must be avoided later, when a different configuration parameter is changed to a correct value.
// That's why values from _audited_{categories, keyspaces, tables} are logged instead of _cfg.audit_{categories, keyspaces, tables}
// Each table as "keyspace.table_name" like in the configuration file
auto table_entries = _audited_tables | std::views::transform([](const auto& pair) {
return pair.second | std::views::transform([&](const std::string& table_name) {
return fmt::format("{}.{}", pair.first, table_name);
});
}) | std::views::join;
logger.info(
"Audit configuration is updated. Auditing to: \"{}\", with the following categories: \"{}\", keyspaces: \"{}\", and tables: \"{}\".",
_cfg.audit(),
fmt::join(std::views::transform(_audited_categories, category_to_string), ","),
fmt::join(_audited_keyspaces, ","),
fmt::join(table_entries, ","));
}
}

View File

@@ -9,7 +9,6 @@
#include "seastarx.hh"
#include "utils/log.hh"
#include "utils/observable.hh"
#include "db/consistency_level.hh"
#include "locator/token_metadata_fwd.hh"
#include <seastar/core/sharded.hh>
@@ -97,21 +96,13 @@ class storage_helper;
class audit final : public seastar::async_sharded_service<audit> {
locator::shared_token_metadata& _token_metadata;
std::set<sstring> _audited_keyspaces;
const std::set<sstring> _audited_keyspaces;
// Maps keyspace name to set of table names in that keyspace
std::map<sstring, std::set<sstring>> _audited_tables;
category_set _audited_categories;
const std::map<sstring, std::set<sstring>> _audited_tables;
const category_set _audited_categories;
sstring _storage_helper_class_name;
std::unique_ptr<storage_helper> _storage_helper_ptr;
const db::config& _cfg;
utils::observer<sstring> _cfg_keyspaces_observer;
utils::observer<sstring> _cfg_tables_observer;
utils::observer<sstring> _cfg_categories_observer;
template<class T>
void update_config(const sstring & new_value, std::function<T(const sstring&)> parse_func, T& cfg_parameter);
bool should_log_table(const sstring& keyspace, const sstring& name) const;
public:
static seastar::sharded<audit>& audit_instance() {
@@ -124,20 +115,17 @@ public:
static audit& local_audit_instance() {
return audit_instance().local();
}
static future<> start_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm);
static future<> create_audit(const db::config& cfg, sharded<locator::shared_token_metadata>& stm);
static future<> start_audit(const db::config& cfg, sharded<cql3::query_processor>& qp, sharded<service::migration_manager>& mm);
static future<> stop_audit();
static audit_info_ptr create_audit_info(statement_category cat, const sstring& keyspace, const sstring& table);
static audit_info_ptr create_no_audit_info();
audit(locator::shared_token_metadata& stm,
cql3::query_processor& qp,
service::migration_manager& mm,
std::set<sstring>&& audit_modes,
audit(locator::shared_token_metadata& stm, sstring&& storage_helper_name,
std::set<sstring>&& audited_keyspaces,
std::map<sstring, std::set<sstring>>&& audited_tables,
category_set&& audited_categories,
const db::config& cfg);
category_set&& audited_categories);
~audit();
future<> start(const db::config& cfg);
future<> start(const db::config& cfg, cql3::query_processor& qp, service::migration_manager& mm);
future<> stop();
future<> shutdown();
bool should_log(const audit_info* audit_info) const;

View File

@@ -11,11 +11,11 @@
#include "cql3/query_processor.hh"
#include "data_dictionary/keyspace_metadata.hh"
#include "utils/UUID_gen.hh"
#include "utils/class_registrator.hh"
#include "cql3/query_options.hh"
#include "cql3/statements/ks_prop_defs.hh"
#include "service/migration_manager.hh"
#include "service/storage_proxy.hh"
#include "locator/abstract_replication_strategy.hh"
namespace audit {
@@ -64,8 +64,8 @@ future<> audit_cf_storage_helper::migrate_audit_table(service::group0_guard grou
data_dictionary::database db = _qp.db();
cql3::statements::ks_prop_defs old_ks_prop_defs;
auto old_ks_metadata = old_ks_prop_defs.as_ks_metadata_update(
ks->metadata(), *_qp.proxy().get_token_metadata_ptr(), db.features(), db.get_config());
locator::replication_strategy_config_options strategy_opts;
ks->metadata(), *_qp.proxy().get_token_metadata_ptr(), db.features());
std::map<sstring, sstring> strategy_opts;
for (const auto &dc: _qp.proxy().get_token_metadata_ptr()->get_topology().get_datacenters())
strategy_opts[dc] = "3";
@@ -73,7 +73,6 @@ future<> audit_cf_storage_helper::migrate_audit_table(service::group0_guard grou
"org.apache.cassandra.locator.NetworkTopologyStrategy",
strategy_opts,
std::nullopt, // initial_tablets
std::nullopt, // consistency_option
old_ks_metadata->durable_writes(),
old_ks_metadata->get_storage_options(),
old_ks_metadata->tables());
@@ -197,4 +196,7 @@ cql3::query_options audit_cf_storage_helper::make_login_data(socket_address node
return cql3::query_options(cql3::default_cql_config, db::consistency_level::ONE, std::nullopt, std::move(values), false, cql3::query_options::specific_options::DEFAULT);
}
using registry = class_registrator<storage_helper, audit_cf_storage_helper, cql3::query_processor&, service::migration_manager&>;
static registry registrator1("audit_cf_storage_helper");
}

View File

@@ -1,68 +0,0 @@
/*
* Copyright (C) 2025 ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#include <seastar/core/loop.hh>
#include <seastar/core/future-util.hh>
#include "audit/audit_composite_storage_helper.hh"
#include "utils/class_registrator.hh"
namespace audit {
audit_composite_storage_helper::audit_composite_storage_helper(std::vector<std::unique_ptr<storage_helper>>&& storage_helpers)
: _storage_helpers(std::move(storage_helpers))
{}
future<> audit_composite_storage_helper::start(const db::config& cfg) {
auto res = seastar::parallel_for_each(
_storage_helpers,
[&cfg] (std::unique_ptr<storage_helper>& h) {
return h->start(cfg);
}
);
return res;
}
future<> audit_composite_storage_helper::stop() {
auto res = seastar::parallel_for_each(
_storage_helpers,
[] (std::unique_ptr<storage_helper>& h) {
return h->stop();
}
);
return res;
}
future<> audit_composite_storage_helper::write(const audit_info* audit_info,
socket_address node_ip,
socket_address client_ip,
db::consistency_level cl,
const sstring& username,
bool error) {
return seastar::parallel_for_each(
_storage_helpers,
[audit_info, node_ip, client_ip, cl, &username, error](std::unique_ptr<storage_helper>& h) {
return h->write(audit_info, node_ip, client_ip, cl, username, error);
}
);
}
future<> audit_composite_storage_helper::write_login(const sstring& username,
socket_address node_ip,
socket_address client_ip,
bool error) {
return seastar::parallel_for_each(
_storage_helpers,
[&username, node_ip, client_ip, error](std::unique_ptr<storage_helper>& h) {
return h->write_login(username, node_ip, client_ip, error);
}
);
}
} // namespace audit

View File

@@ -1,37 +0,0 @@
/*
* Copyright (C) 2025 ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include "audit/audit.hh"
#include <seastar/core/future.hh>
#include "storage_helper.hh"
namespace audit {
class audit_composite_storage_helper : public storage_helper {
std::vector<std::unique_ptr<storage_helper>> _storage_helpers;
public:
explicit audit_composite_storage_helper(std::vector<std::unique_ptr<storage_helper>>&&);
virtual ~audit_composite_storage_helper() = default;
virtual future<> start(const db::config& cfg) override;
virtual future<> stop() override;
virtual future<> write(const audit_info* audit_info,
socket_address node_ip,
socket_address client_ip,
db::consistency_level cl,
const sstring& username,
bool error) override;
virtual future<> write_login(const sstring& username,
socket_address node_ip,
socket_address client_ip,
bool error) override;
};
} // namespace audit

View File

@@ -21,6 +21,7 @@
#include <fmt/chrono.h>
#include "cql3/query_processor.hh"
#include "utils/class_registrator.hh"
namespace cql3 {
@@ -39,18 +40,6 @@ static auto syslog_address_helper(const db::config& cfg)
: unix_domain_addr(_PATH_LOG);
}
static std::string json_escape(std::string_view str) {
std::string result;
result.reserve(str.size() * 1.2);
for (auto c : str) {
if (c == '"' || c == '\\') {
result.push_back('\\');
}
result.push_back(c);
}
return result;
}
}
future<> audit_syslog_storage_helper::syslog_send_helper(const sstring& msg) {
@@ -107,7 +96,7 @@ future<> audit_syslog_storage_helper::write(const audit_info* audit_info,
auto now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
tm time;
localtime_r(&now, &time);
sstring msg = seastar::format(R"(<{}>{:%h %e %T} scylla-audit: node="{}", category="{}", cl="{}", error="{}", keyspace="{}", query="{}", client_ip="{}", table="{}", username="{}")",
sstring msg = seastar::format("<{}>{:%h %e %T} scylla-audit: \"{}\", \"{}\", \"{}\", \"{}\", \"{}\", \"{}\", \"{}\", \"{}\", \"{}\"",
LOG_NOTICE | LOG_USER,
time,
node_ip,
@@ -115,7 +104,7 @@ future<> audit_syslog_storage_helper::write(const audit_info* audit_info,
cl,
(error ? "true" : "false"),
audit_info->keyspace(),
json_escape(audit_info->query()),
audit_info->query(),
client_ip,
audit_info->table(),
username);
@@ -131,15 +120,18 @@ future<> audit_syslog_storage_helper::write_login(const sstring& username,
auto now = std::chrono::system_clock::to_time_t(std::chrono::system_clock::now());
tm time;
localtime_r(&now, &time);
sstring msg = seastar::format(R"(<{}>{:%h %e %T} scylla-audit: node="{}", category="AUTH", cl="", error="{}", keyspace="", query="", client_ip="{}", table="", username="{}")",
sstring msg = seastar::format("<{}>{:%h %e %T} scylla-audit: \"{}\", \"AUTH\", \"\", \"\", \"\", \"\", \"{}\", \"{}\", \"{}\"",
LOG_NOTICE | LOG_USER,
time,
node_ip,
(error ? "true" : "false"),
client_ip,
username);
username,
(error ? "true" : "false"));
co_await syslog_send_helper(msg.c_str());
}
using registry = class_registrator<storage_helper, audit_syslog_storage_helper, cql3::query_processor&, service::migration_manager&>;
static registry registrator1("audit_syslog_storage_helper");
}

View File

@@ -9,7 +9,6 @@ target_sources(scylla_auth
allow_all_authorizer.cc
authenticated_user.cc
authenticator.cc
cache.cc
certificate_authenticator.cc
common.cc
default_authorizer.cc
@@ -45,8 +44,5 @@ target_link_libraries(scylla_auth
add_whole_archive(auth scylla_auth)
if (Scylla_USE_PRECOMPILED_HEADER_USE)
target_precompile_headers(scylla_auth REUSE_FROM scylla-precompiled-header)
endif()
check_headers(check-headers scylla_auth
GLOB_RECURSE ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

Some files were not shown because too many files have changed in this diff Show More