scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Nadav Har'El	2c02e463ff	test/alternator: fix test's expected error message on DynamoDB The Alternator test test_tag.py::test_tag_lsi_gsi expects to see an error - it's not allowed to set a tag on a GSI or LSI - but the error message that DynamoDB prints recently changed - instead of saying "ResourceArn" the new error message says "resource arn". Change the test to allow both forms, so it will pass on both Alternator (which still uses the word ResourceArn - which is the name of the parameter) and on DynamoDB (which uses "resource arn"). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-01-07 12:51:10 +02:00
Nadav Har'El	4f3150c282	test/alternator: mark Alternator-only test scylla_only The test test_batch.py::test_batch_write_item_large_broken_connection failed on DynamoDB (Refs #26079). It turns out this test has many problems: 1. This test wrongly assumes a batch write needs to complete in one attempt - and this fails on DynamoDB with low WCU capacity where the batch needs to be resumed in multiple requests. Using boto3's batch_writer() fixes this problem. 2. This test has NOTHING to do with batches - so is mis-named and mis-placed. The batch write is just a way to prepare some data in the table, and the real test is about Query'ing the data back and observing the long response and reproducing issue #14454. I did not rename or move the test, but left a comment explaining the situation. 3. This test is written to assume the Query's response uses HTTP chunked encoding. Which isn't actually true for DynamoDB, at least not at the time of this writing. So the test fails on DynamoDB. For the last reason, I made this test scylla_only. This test can't really be run on DynamoDB without rewriting it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-01-07 12:51:10 +02:00
Nadav Har'El	df6b347911	test/alternator: fix test on DynamoDB The test test_batch.py::test_batch_write_item_large often fails when running on DynamoDB, and this patch fixes it. The test checks that a large but not over-the-limits large batch works. However, "works" only means that the batch is not an error - it doesn't guarantee that all the items in the batch are performed. If the WCU limits of the table are exceeded DynamoDB may perform only part of the the batch and return the remaining items as UnprocessedItems. This not only can happen, it usually does happen on DynamoDB - because a new on-demand-billing table always start with a very low WCU capacity. So in this patch we update the test to recognize and perform the UnprocessedItems, instead of assuming it needs to be empty. The test continues to pass on Alternator, and finally passes on DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-01-07 12:51:10 +02:00
Nadav Har'El	9d6a463324	test/alternator: increase wait_for_gsi() timeout In Alternator tests, the wait_for_gsi() utility function is used in tests that add a GSI to an existing table, to wait for this new GSI to become ready. Although this takes a fraction of a second on Alternator, we noticed that this takes many minutes (!) on DynamoDB so we used an absurdly high 10 minute timeout to allow tests to also pass on DynamoDB. But it turns out that 10 minutes wasn't absurdly high enough, and tests using it in test_gsi_updatetable.py started to fail on DynamoDB. Empirically, 10 minutes was enough in the past but it seems that today adding a GSI to an empty table routinely takes as much as 20 minutes. So this patch increases the wait_for_gsi() timeout to a whopping 30 minutes. After this patch, the tests in test_gsi_updatetable.py which used to fail - test_gsi_backfill_with_lsi, test_gsi_backfill_with_real_column, test_gsi_creates_and_deletes and test_gsi_backfill_oversized_key now all pass on DynamoDB - but each takes more than 20 minutes to pass. To allow the test to fail much more quickly on Alternator (where creating a GSI takes a fraction of a second), we set a much lower but still very high timeout when running on Alternator - 60 seconds. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-01-07 12:50:54 +02:00
Nadav Har'El	5c2ca56adf	test/alternator: fix test passing a spurious parameter The test test_streams.py::test_streams_putitem_new_item_overrides_old_lsi failed on DynamoDB (Refs #26079) because we passed an unused parameter NonKeyAttributes to the Projection setting an LSI. NonKeyAttributes is only allowed when ProjectionType=INCLUDE, but we used ProjectionType=ALL. DynamoDB refuses to create an LSI with such inconsistent parameters, and we just need to remove this unnecessary parameter from this test. The reason why this test didn't fail on Alternator is that Alternator doesn't yet support or even parse the Projection parameter (Refs #5036). We also add an xfailing test (passes on DynamoDB, fails on Alternator) checking that a spurious NonKeyAttributes parameter is rejected. When we get around to implement the projection feature (#5036), this will be yet another acceptance test for this feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-01-05 13:51:01 +02:00
Nadav Har'El	6c8ddfc018	test/alternator: fix typo in test_returnvalues.py Different DynamoDB operations have different settings allowed for their "ReturnValues" argument. In particular, some operations allow ReturnValues=UPDATED_OLD but the DeleteItem operation does not. We have a test, test_delete_item_returnvalues, aimed to verify this but it had a typo and didn't actually check "UPDATED_OLD". This patch fixes this typo. The test still passes because the code itself (executor.cc, delete_item_operation's constructor) has the correct check - it was just the test that was wrong. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27918	2026-01-01 19:33:23 +02:00
Israel Fruchter	40ada3f187	Update tools/cqlsh submodule (v6.0.32) * tools/cqlsh scylladb/scylla-cqlsh@9e5a91d7...scylladb/scylla-cqlsh@5a1d7842 (9): > fix wrong reference in copyutil.py > Add GitHub Action workflow to create releases on new tags > test_copyutil.py: introdcue test for ImportTask > fix(copyutil.py): avoid situatuions file might be move withing multiple processes > Fix Unix socket port display in show_host() method > Merge pull request #157 from scylladb/alert-autofix-1 .github/workflows/build-push.yml: Potential fix for code scanning alert no. 1: Workflow does not contain permissions > .github/workflows/dockerhub-description.yml: Potential fix for code scanning alert no. 9: Workflow does not contain permissions > test_cqlsh_output: skip some cassandra 5.0 table options > tests: template compression cql to use `class` insted of `sstable_comprission` > Pin Cassandra version to 5.0 for reproducible builds > Remove scylla-enterprise integration test and update Cassandra to latest Closes scylladb/scylladb#27924	2026-01-01 19:30:34 +02:00
Łukasz Paszkowski	76b84b71d1	storage/test_out_of_space_prevention.py: Fix async/await bugs - Add missing await keywords for async operations on s2_log.wait_for() and coord_log.wait_for() - Fix incorrect regex: "compaction .* Split {cf}" → "compaction.*Split {cf}" - The commit https://github.com/scylladb/scylladb/commit/f7324a4 demoted compaction start/end log messages to debug level. Hence add compaction=debug log messages to the following tests: test_split_compaction_not_triggered test_node_restart_while_tablet_split test_repair_failure_on_split_rejection Fixes https://github.com/scylladb/scylladb/issues/27931 Closes scylladb/scylladb#27932	2026-01-01 14:24:30 +02:00
Anna Stuchlik	624869de86	doc: remove cassandra-stress from installation instructions The cassandra-stress tool is no longer part of the default package and cannot be run in the way described. This commit removes the instruction to run cassandra-stress. Fixes https://github.com/scylladb/scylladb/issues/24994 Closes scylladb/scylladb#27726	2026-01-01 14:20:58 +02:00
Jenkins Promoter	69d6e63a58	Update pgo profiles - aarch64	2026-01-01 05:10:51 +02:00
Jenkins Promoter	d6e2d3d34c	Update pgo profiles - x86_64	2026-01-01 04:27:14 +02:00
Nadav Har'El	e28df9b3d0	test: fix Python warnings in regular expressions Like C, Python supports some escape sequences in strings such as the familiar "\n" that converts to a newline character. Originally, when backslash was used before a random character, for example, "\.", Python used to just use these literal characters backslash and dot, in the string - and not make a fuss about it. This made it ok to use a string like "hi\.there" as a regular expression. We have a few instances of this in our Python tests. But recent releases of Python started to produce ugly warnings about these cases. The error message looks like: SyntaxWarning: "\." is an invalid escape sequence. Such sequences will not work in the future. Did you mean "\\."? A raw string is also an option. Indeed in most cases the easiest solution is to use a "raw string", a string literal preceded with r. For example, r"hi\.there". In such strings Python doesn't replace escape sequences like \n in the string, and also leaves the \. unchanged for the regular expression to see. So in this patch we use raw strings in all places in test/ where Python warns have this problem. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27856	2025-12-31 20:44:01 +02:00
Yaniv Michael Kaul	597d300527	main.cc: remove warning: 'metric_help' is deprecated Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Backport: no, benign issue. Closes scylladb/scylladb#27680	2025-12-31 18:36:55 +02:00
Avi Kivity	b690ddb9e5	tools: toolchain: dbuild: bind-mount full ~/.cache to container In `afb96b6387`, we added support for sccache. As a side effect it changed the method of invoking ccache from transparent via PATH (if it contains /usr/lib64/ccache) to explicit, by changing the compiler command line from 'clang++' (which may or may not resolve the the ccache binary) to 'ccache /usr/local/bin/clang++', which always invokes ccache. In the default dbuild configuration, PATH does not contain /usr/lib64/ccache, so ccache isn't invoked by default. Users can change this via the SCYLLADB_DBUILD environment variable. As a result of ccache being suddenly enabled for dbuild builds, ccache will now attempt to create ~/.cache/ccache. Under docker, this does not work, because we bind-mount ~/.cache/dbuild. Docker will create the intermediate ~/.cache, but under the root user, not $USER. The intermediate directory being root-owned prevents ~/.cache/ccache from being created. Under podman, this does work, because everything runs under the container's root user. The fix is to bind-mount the entire ~/.ccache into the container. This not only lets ccache create the directory, it will also find an existing ~/.cache/ccache directory and use it, enabling reuse across invocations. Since ccache will now respect configuration changes without access to its configuration file (notably, the maximum cache size), we also bind-mount ~/.config. Since ~/.ccache and ~/.config are not automatically created, we create them explicitly so the bind mounts can work. This is for new nodes enlisted from the cloud; developer machines will have those directories preexisting. Note that the ccache directory used to be ~/.ccache, but was later changed. Had the author known, we would have bind-mounted ~/.cache much earlier. Fixes #27919. Closes scylladb/scylladb#27920	2025-12-31 14:08:41 +01:00
Asias He	3abda7d15e	topology_coordinator: Ensure repair_update_compaction_ctrl is executed Consider this: - n1 is a coordinator and schedules tablet repair - n1 detects tablet repair failed, so it schedules tablet transition to end_repair state - n1 loses leadership and n2 becomes the new topology coordinator - n2 runs end_repair on the tablet with session_id=00000000-0000-0000-0000-000000000000 - when a new tablet repair is scheduled, it hangs since the lock is already taken because it was not removed in previous step To fix, we use the global_tablet_id to index the lock instead of the session id. In addition, we retry the repair_update_compaction_ctrl verb in case of error to ensure the verb is eventually executed. The verb handler is also updated to check if it is still in end_repair stage. Fixes #26346 Closes scylladb/scylladb#27740	2025-12-31 13:17:18 +01:00
Benny Halevy	3e9b071838	Update seastar submodule * seastar f0298e40...4dcd4df5 (29): > file: provide a default implementation for file_impl::statat > util: Genralize memory_data_sink > defer: Replace static_assert() with concept > treewide: drop the support of fmtlib < 9.0.0 > test: Improve resilience of netsed scheduling fairness test > Merge 'file: Use query_device_alignment_info in blkdev_alignments ' from Kefu Chai file: Put alignment helpers in anonymous namespace file: Use query_device_alignment_info in blkdev_alignments > Merge 'file: Query physical block size and minimum I/O size' from Kefu Chai file: Apply physical_block_size override to filesystem files file: Use designated initializers in xfs_alignments iotune: Add physical block size detection disk_params: Add support for physical_block_size overrides from io_properties.yaml block_device: Query alignment requirements separately for memory and I/O > Merge 'json: formatter: fix formatting of std:string_view' from Benny Halevy json: formatter: fix formatting of std:string_view json: formatter: make sure std::string_view conforms to is_string_like Fixes #27887 > demos:improve the output of demo_with_io_intent() in file_demo > test: Add accept() vs accept_abort() socket test > file: Refine posix_file_impl alignments initialization > Add file::statat and a corresponding file_stat overload > cmake: don't compile memcached app for API < 9 > Merge 'Revert to ~old lifetime semantics for lvalues passed to then()-alikes' from Travis Downs future: adjust lifetime for lvalue continuations future: fix value class operator() > pollable_fd: Unfriend everything > Merge 'file: experimental_list_directory: use buffered generator' from Benny Halevy file: experimental_list_directory: use buffered generator file: define list_directory_generator_type > Merge 'Make datagram API use temporary_buffer<>-s' from Pavel Emelyanov net: Deprecate datagram::get_data() returning packet memcache: Fix indentation after previous patch memcache: Use new datagram::get_buffers() API dns: Use new datagram::get_buffers() API tests: Use new datagram::get_buffers() API demo: Use new datagram::get_buffers() API udp: Make datagram implementations return span of temporary_buffer-s > Merge 'Remove callback from timer_set::complete()' from Pavel Emelyanov reactor: Fix indentation after previous patch timers: Remove enabling callback from timer_set::complete() > treewide: avoid 'static sstring' in favor of 'constexpr string_view' > resource: Hide hwloc from public interface > Merge 'Fix handle_exception_type for lvalues' from Travis Downs futures_test: compile-time tests function_traits: handle reference_wrapper > posix_data_sink_impl: Assert to guard put UB > treewide: fix build with `SEASTAR_SSTRING` undefined > avoid deprecation warnings for json_exception > `util/variant_utils`: correct type deduction for `seastar::visit` > net/dns: fixed socket concurrent access > treewide: add missing headers > Merge 'Remove posix file helper file_read_state class' from Pavel Emelyanov file: Remove file_read_state test: Add a test for posix_file_impl::do_dma_read_bulk() > membarrier: simplify locking Adjust scylla to the following changes in scylla: - file_stat became polymorphic - needs explicit inference in table::snapshot_exists, table::get_snapshot_details - file::experimental_list_directory now returns list_directory_generator_type Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#27916	2025-12-30 19:37:13 +03:00
Yaniv Kaul	0264ec3c1d	test: test_downgrade_after_partial_upgrade: check that feature is disabled on all nodes after partial upgrade We should check that the test feature is disabled on all nodes after a partial upgrade. This hardens the test a bit, although the old code wasn't that bad, since enabled features are a part of the group 0 state shared by all nodes. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27654	2025-12-30 17:34:56 +01:00
Nadav Har'El	ffcce1ffc8	test/boost: fix flaky test node_view_update_backlog The boost test view_schema_test.cc::node_view_update_backlog can be flaky if the test machine has a hiccup of 100ms, and this patch fixes it: The test is a unit test for db::view::node_update_backlog, which is supposed to cache the backlog calculation for a given interval. The test asks to cache the backlog for 100ms, and then without sleeping at all tries to fetch a value again and expect the unchanged cached value to be returned. However, if the test run experiences a context switch of 100ms, it can fail, and it did once as reported in #27876. The fix is to change the interval in this test from 100ms to something much larger, like 10 seconds. We don't sleep this amount - we just need the second fetch to happen before 10 seconds has passed, so there's no harm in using a very large interval. However, the second half of this test wants to check that after the interval is over, we do get a new backlog calculation. So for the second half of this test we can and should use a shorter backlog - e.g., 10ms. We don't care if the test machine is slow or context switched, for this half of the test we want to to sleep more than 10ms, and that's easy. The fixed test is faster than the old one (10ms instead of 100ms) and more reliable on a shared test machine. Fixes #27876. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27878	2025-12-30 10:10:42 +01:00
Benny Halevy	c9eab7fbd4	test: test_refresh: add test_refresh_deletes_uploaded_sstables The refresh api is expected to automatically delete the sstable files from the uploads/ dir. Verify that. The code that does that is currently called by sstables_loader::load_new_sstables: ```c++ if (load_and_stream) { ... co_await loader.load_and_stream(ks_name, cf_name, table_id, std::move(sstables_on_shards[this_shard_id()]), primary_replica_only(primary), true /* unlink */, scope, {}); ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#27586	2025-12-30 10:51:24 +03:00
Nadav Har'El	80e5860a8c	docs/alternator: document that Streams needs vnodes The current state (after PR #26836) is that Alternator tables are created by default using tablets. But due to issue #23838, Alternator Streams cannot be enabled on a table that uses tablets... An attempt to enable Streams on such a table results in a clear error: "Streams not yet supported on a table using tablets (issue #23838). If you want to use streams, create a table with vnodes by setting the tag 'system:initial_tablets' set to 'none'." But users should be able to learn this fact from the documentation - not just retroactively from an error message. This is especially important because a user might create and fill a table using tablets, and only get this error when attempting to enable Streams on the existing table - when it is too late to change anything. So this patch adds a paragraph on this to compatibility.md, where several other requirements of Alternator Streams are already mentioned. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27000	2025-12-30 10:45:34 +03:00
Avi Kivity	853f3dadda	Merge 'treewide: fix some spelling errors' from Piotr Smaron Irritated by prevailing spellchecker comments attached to every PR, I aim to fix them all. No need to backport, just cosmetic changes. Closes scylladb/scylladb#27897 * github.com:scylladb/scylladb: treewide: fix some spelling errors codespell: ignore `iif` and `tread`	2025-12-29 20:45:31 +02:00
Patryk Jędrzejczak	4e63e74438	messaging: improve the error messages of closed_errors The default error message of `closed_error` is "connection is closed". It lacks the host ID and the IP address of the connected node, which makes debugging harder. Also, it can be more specific when `closed_error` is thrown due to the local node shutting down. Fixes #16923 Closes scylladb/scylladb#27699	2025-12-29 18:36:07 +02:00
Avi Kivity	567c28dd0d	Merge 'Decouple sstables::storage::snapshot() and ::clone() functionality' from Pavel Emelyanov The storage::snapshot() is used in two different modes -- one to save sstable as snapshot somewhere, and another one to create a copy of sstable. The latter use-case is "optimized" by snapshotting an sstable under new generation, but it's only true for local storage. Despite for S3 storage snapshot is not implemented, _cloning_ sstable stored on S3 is not necessarily going to be the same as doing a snapshot. Another sign of snapshot and clone being different is that calling snapshot() for snapshot itself and for clone use two very different sets of arguments -- snapshotting specifies relative name and omits new generation, while cloning doesn't need "name" and instead provides generation. Recently (#26528) cloning got extra "leave_unsealed" tag, that makes no sense for snapshotting. Having said that, this PR introduces sstables::storage::clone() method and modifies both, callers and implementations, according to the above features of each. As a result, code logic in both methods become much simpler and a bunch of bool classes and "_tag" helper structures goes away. Improving internal APIs, no need to backport Closes scylladb/scylladb#27871 * github.com:scylladb/scylladb: sstables, storage: Drop unused bool classes and tags sstables/storage: Drop create_links_common() overloads sstable: Simplify storage::snapshot() sstables: Introduce storage::clone()	2025-12-29 17:50:54 +02:00
Avi Kivity	9927c6a3d4	Merge 'Reapply "audit: enable some subset of auditing by default"' from Piotr Smaron This reverts commit a5edbc7d612df237a1dd9d46fd5cecf251ccfd13. <h3>Why re-enabling table audit</h3> Audit has been disabled (scylladb/scylla-enterprise/pull/3094) over many concerns raised against the table implementation, e.g. scylladb/scylla-enterprise/issues/2939 / scylladb/scylla-enterprise/issues/2759 + there's whole outstanding backlog of issues . One of the concerns was also a possible loss of availability, and since then we migrated audit keyspace from SimpleStrategy RF=1 to NetworkTopologyStrategy RF=3 (scylladb/scylla-enterprise/pull/3399) and stopped failing queries when auditing fails (scylladb/scylla-enterprise/pull/3118 & scylladb/scylla-enterprise/pull/3117), which improves the situation but doesn't address all the concerns. Eventually we want to use syslog as audit's sink, but it's not fully ready just yet, and so we'll restore table audit for now to increase the security, but later switch to syslog. BTW. cloud will enable table audit for AUTH category scylladb/sre-ops-automation/issues/2970 separately from this effort. <h3>Performance considerations</h3> We are assuming that the events for the enabled categories, i.e. DCL, DDL, AUTH & ADMIN, should appear at about the same, low cadence, with AUTH perhaps having the biggest impact of them all under some workloads. The performance penalty of enabling just the AUTH category [has been measured](https://scylladb.atlassian.net/wiki/spaces/RND/pages/148308005/Audit+performance+impact+test) and while authentication throughput and read/write throughput remain stable, the queries' P99 latency may decrease by a couple of % in the most hardcore scenarios. Fixes: https://github.com/scylladb/scylladb/issues/26020 Gradually re-enabling audit feature, no need to backport. Closes scylladb/scylladb#27262 * github.com:scylladb/scylladb: doc: audit: set audit as enabled by default Reapply "audit: enable some subset of auditing by default"	2025-12-29 16:41:04 +02:00
Tomasz Grabiec	bbf9ce18ef	Merge 'load_balancer: compute node load based on tablet sizes' from Ferenc Szili Currently, the tablet load balancer performs capacity based balancing by collecting the gross disk capacity of the nodes, and computes balance assuming that all tablet sizes are the same. This change introduces size-based load balancing. The load balancer does not assume identical tablet sizes any more, and computes load based on actual tablet sizes. The size-based load balancer computes the difference between the most and least loaded nodes in the balancing set (nodes in DC, or nodes in a rack in case of `rf-rack-valid-keyspaces`) and stops further balancing if this difference is bellow the config option `size_based_balance_threshold_percentage`. This config option does not apply to the absolute load, but instead to the percentage of how much the most loaded node is more loaded than the least loaded node: `delta = (most_loaded - least_loaded) / most_loaded` If this delta is smaller then the config threshold, the balancer will consider the nodes balanced. This PR is a part of a series of PRs which are based on top of each other. - First part for tablet size collection via load_stats: #26035 - Second part reconcile load_stats: #26152 - The third part for load_sketch changes: #26153 - The fourth part which performs tablet load balancing based on tablet size: #26254 - The fifth part changes the load balancing simulator: #26438 This is a new feature, backport is not needed. Fixes #26254 Closes scylladb/scylladb#26254 * github.com:scylladb/scylladb: test, load balancing: add test for table balance load_balancer: add cluster feature for size based balancing load_balancer: implement size-based load balancing config: add size based load balancing config params load_stats: use trinfo to decide how to reconcile tablet size load_sketch: use tablet sizes in load computation load_stats: add get_tablet_size_in_transition()	2025-12-29 15:01:38 +01:00
Pavel Emelyanov	d892140655	Merge 'Reduce allocations when traversing compaction_groups' from Benny Halevy - table, storage_group: add compaction_group_count - And use to reserve vector capacity before adding an item per compaction_group - table: reduce allocations by using for_each_compaction_group rather than compaction_groups() - compaction_groups() may allocate memory, but when called from a synchronous call site, the caller can use for_each_compaction_group instead. * Improvement, no backport needed Closes scylladb/scylladb#27479 * github.com:scylladb/scylladb: table: reduce allocations by using for_each_compaction_group rather than compaction_groups() replica: storage_group: rename compaction_groups to compaction_groups_immediate	2025-12-29 16:26:33 +03:00
Gleb Natapov	4a5292e815	raft topology: Notify that a node was removed only once Raft topology goes over all nodes in a 'left' state and triggers 'remove node' notification in case id/ip mapping is available (meaning the node left recently), but the problem is that, since the mapping is not removed immediately, when multiple nodes are removed in succession a notification for the same node can be sent several times. Fix that by sending notification only if the node still exists in the peers table. It will be removed by the first notification and following notification will not be sent. Closes scylladb/scylladb#27743	2025-12-29 14:22:34 +01:00
Piotr Smaron	fb4d89f789	treewide: fix some spelling errors	2025-12-29 13:53:56 +01:00
Piotr Smaron	ba5c70d5ab	codespell: ignore `iif` and `tread` There are correct: - iif is a boost's header name - `tread carefully` is an actual english phrase	2025-12-29 13:53:56 +01:00
Nadav Har'El	8df9cfcde8	Merge 'Add table size bytes to describe table' from Radosław Cybulski Add table size to DescribeTable's reply in Alternator Fills DescribeTable's reply with missing field TableSizeBytes. - add helper class simple_value_with_expiry, which is like std::optional but the value put has a timeout. - add ignore_errors to estimate_total_sstable_volume function - if set to true the function will catch errors during RPC and ignore them, substituting 0 for missing value. - add a reference to storage_service to executor class (needed to call estimate_total_sstable_volume function). - add fill_table_description and create_table_on_shard0 as non static methods to executor class - calculate TableSizeBytes value for a given table and return it as part of DescribeTable's return value. The value calculated is cached for approximately 6 hours (as per DescribeTable's specification). The algorithm is as follows: - if the requested value is in cache and is still valid it's returned, nothing else happens. - otherwise: - every shard of every node is requested to calculate size of its data - if the error happens, the error is ignored and we assume the given shard has a size of 0 - all such values are summed producing total size - produced value is returned to a caller - on the node the call for a size happened every shard is requested to cache produced value with a 6 hour timeout. - if the next call comes for a differet shard on the same node that doesn't yet have cached value, the shard will request the value to be calculated again. The new value will overwrite the old one on every shard on this node. - if the next call comes to a different node, the process of calculation will happen from start, possibly producing different value. The value will have it's own timeout, there's no attempt made to synchronize value between nodes. - add a alternator_describe_table_info_timeout_in_seconds parameter, which will control, how long DescribeTable's table information are being held in cache. Default is 6 hours. - update test to use parameter `alternator_describe_table_info_timeout_in_seconds` - setting it to 0 and forcing flushing memtables to disk allows checking, that table size has grown. Fixes #7551 Closes scylladb/scylladb#24634 * github.com:scylladb/scylladb: alternator: fix invalid rebase Update tests Update documentation Add table size to DescribeTable's output Promote fill_table_description and create_table_on_shard0 to methods Modify estimate_total_sstable_volume to opt ignore errors Add alternator_describe_table_info_cache_validity_in_seconds config option Add ref to service::storage_service to executor Add simple_value_with_expiry util class	2025-12-29 14:47:36 +02:00
Benny Halevy	f60033db63	db: system_keyspace: get_group0_history: unfreeze_gently Prevent stall when the group0 history is too long using unfreeze_gently rather than the synchronous unfreeze() function Fixes #27872 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#27873	2025-12-29 12:00:54 +02:00
Radosław Cybulski	df20f178aa	alternator: fix invalid rebase Fix an invalid rebase, that would properly merge code coming from master, except that code would ignore refactor done in the patch.	2025-12-29 08:33:10 +01:00
Radosław Cybulski	a31c8762ca	Update tests	2025-12-29 08:33:09 +01:00
Radosław Cybulski	5e1254eef0	Update documentation	2025-12-29 08:33:08 +01:00
Radosław Cybulski	a86b782d3f	Add table size to DescribeTable's output Add a table size to DescribeTable's output.	2025-12-29 08:33:07 +01:00
Radosław Cybulski	1bd855a650	Promote fill_table_description and create_table_on_shard0 to methods Promote `executor::fill_table_description` and `executor::create_table_on_shard0` to methods (from static functions).	2025-12-29 08:33:06 +01:00
Radosław Cybulski	6a26381f4f	Modify estimate_total_sstable_volume to opt ignore errors Modify `storage_service::estimate_total_sstable_volume` function to optionally ignore errors (instead substitute 0), when `ignore_errors` parameter is set to `yes`.	2025-12-29 08:33:06 +01:00
Radosław Cybulski	a532fc73bc	Add alternator_describe_table_info_cache_validity_in_seconds config option Add a `alternator_describe_table_info_cache_validity_in_seconds` configuration option with default value of 6 hours.	2025-12-29 08:33:05 +01:00
Radosław Cybulski	e246abec4d	Add ref to service::storage_service to executor Add a reference to `service::storage_service` to executor object.	2025-12-29 08:33:03 +01:00
Radosław Cybulski	dfa600fb8f	Add simple_value_with_expiry util class Add a `simple_value_with_expiry` utility class, which functions like a `std::optional` with added timeout. When emplacing a value, user needs to provide timeout, after which value expires (in which case the `simple_value_with_expiry` object behaves as if was never set at all). Add boost tests for the new class.	2025-12-29 08:32:52 +01:00
Pavel Emelyanov	2e33234e91	util: Remove lister::rmdir() There's seastar helper that does the same, no need to carry yet another implementation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27851	2025-12-28 19:46:19 +02:00
Avi Kivity	63e3a22f2e	Merge 'group0_state_machine: don't update in-memory state machine until start' from Piotr Dulikowski Group0 commands consist of one or more mutations and are supposed to be atomic - i.e. the data structures that reflect the group0 tables state are not supposed to be updated while only some mutations of a command are applied, the logic responsible for that is not supposed to observe an inconsistent state of group0 tables. It turns out that this assumption can be broken if a node crashes in the middle of applying a multi-mutation group0 command. Because these mutations are, in general, applied separately, only some mutations might survive a crash and a restart, so the group0 tables might be in an inconsistent state. The current logic of group0_state_machine will attempt to read the group0 tables' state as it was left after restart, so it may observe inconsistent state. This can confuse the node as it may observe a state that it was not supposed to observe, or the state will just outright break some invariants and trigger some sanity checks. One of those was observed in https://github.com/scylladb/scylladb/issues/26945, where a command from the CDC generation publisher fiber was partially applied. The fiber, in addition to publishing generations, it removes old, expired generations as well. Removal is done by removing data that describes the generation from cdc_generations_v3 and by removing the generation's ID from the committed generation list in the topology table. If only the first mutation gets through but not the other one, on reload the node will see a committed CDC generation without data, which will trigger an on_internal_error check. Fix this by delaying the moment when the in memory data structures are first loaded. In `579dcf187a`, a mechanism was introduced which persists the commit index before applying commands that are considered committed. Starting a raft server waits until commands are replayed up to that point. The fix is to start the group0_state_machine in a mode which only applies mutations - the aforementioned mechanism will re-apply the commands which will, thanks to the mutation idempotency, bring the group0 to a consistent state. After the group0 is known to be in consistent state (so, after raft::server_impl::start) the in-memory data structures of group0 are loaded for the first time. There is an exception, however: schema tables. Information about schema is actually loaded into memory earlier than the moment when group0 is started. Applying changes to schema is done through the migration manager module which compares the persisted state before and after the schema mutations are applied and acts on that. Refactoring migration manager is out of scope of this PR. However, this is not a problem because the migration manager takes care to apply all of the mutations given in a command in a single commitlog segment, so the initial schema loading code should not see an inconsistent state due to the state being partially applied. The fix is accompanied by a reproducer of scylladb/scylladb#26945. Fixes: scylladb/scylladb#26945 This is not a regression, so no need to backport. Closes scylladb/scylladb#27528 * github.com:scylladb/scylladb: test: cluster: test for recovery after partial group0 command group0_state_machine: remove obsolete comment about group0 consistency group0_state_machine: don't update in-memory state machine until start group0_state_machine: move reloading out of std::visit service: raft: add state machine ref to raft_server_for_group	2025-12-28 13:59:26 +02:00
Pavel Emelyanov	e963a8d603	checked-file: Implement experimental_list_directory() The method in question returns coroutine generator that co_yields directory_entry-s. In case the method is not implemented, seastar creates a fallback generator, that calls existing subscription-based list_directory() and co_yields them. And since checked file doesn't yet have it, fallback generator is used, thus skipping the lower file yielding lister. Not nice. This patch implements the generator lister for checked file, thus making full use of lower file generator lister too. A side note. It's not enough to implement it like return do_io_check([] { return lower_file->experimental_list_directory(); }); like list_directory() does, since io-checking will _not_ happen on directory reading itself, as it's supposed to. This is the problem of the check_file::list_directory() implementation -- it only checks for exception when creating the subscription (and it really never happens), but reading the directory itself happens without io checks. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27850	2025-12-28 13:37:44 +02:00
Yaron Kaikov	1ee89c9682	Revert "scripts: benign fixes flagged by CodeQL/PyLens" This reverts commit `377c3ac072`. This breaks all artifact tests and cloud image build process Closes scylladb/scylladb#27881	2025-12-28 09:49:49 +02:00
Ferenc Szili	6d3c720a08	test, load balancing: add test for table balance This change adds a boost test which validates the resulting table balance of size based load balancing. The threshold was set to a conservative 1.5 overcommit to avoid flakyness.	2025-12-27 11:39:08 +01:00
Ferenc Szili	b7ebd73e53	load_balancer: add cluster feature for size based balancing This patch adds a cluster feature size_based_load_balancing which, until enabled, will force capacity based balancing. This is needed because during rolling upgrades some of the nodes will have incomplete data in load_stats (missing tablet sizes and effective_capacity) which are needed for size based balancing to make good decisions and issue correct migrations.	2025-12-27 11:39:08 +01:00
Ferenc Szili	10eb364821	load_balancer: implement size-based load balancing This changes introduces tablet size based load balancing. It is an extension of capacity based balancing with the addition of actual tablet sizes. It computes the difference between the most and least loaded nodes in the DC and stops further balancing if this difference is bellow the config option size_based_balance_threshold_percentage. This config option does not apply to the absolute load, but instead to the percentage of how much the most loaded node is more loaded than the least loaded node: delta = (most_loaded - least_loaded) / most_loaded If this delta is smaller then the config threshold, the balancer will consider the nodes balanced.	2025-12-27 11:20:20 +01:00
Ferenc Szili	cc9e125f12	config: add size based load balancing config params This change adds: - The config paremeter force_capacity_based_balancing which, when enabled performs capacity based balancing instead of size based. - The config parameter size_based_balance_threshold_percentage which sets the balance threshold for the size based load balancer. - The config parameter minimal_tablet_size_for_balancing which sets the minimal tablet size for the load balancer.	2025-12-27 10:37:38 +01:00
Ferenc Szili	0c9b93905e	load_stats: use trinfo to decide how to reconcile tablet size This patch corrects the way update_load_stats_on_end_migration() decides which tablet transition occured, in order to reconcile tablet sizes in load_stats. Before, the transition kind was inferred from the value of leaving and pending replicas. This patch changes this to use the value of trinfo.transition. In case of a rebuild, and in case there is only one replica, the new tablet size will be set to 0.	2025-12-27 10:37:38 +01:00
Ferenc Szili	621cb19045	load_sketch: use tablet sizes in load computation This commit changes load_sketch so that it computes node and shard load based on tablet sizes instead of tablet count.	2025-12-27 10:37:23 +01:00

1 2 3 4 5 ...

51254 Commits