scylladb

Author	SHA1	Message	Date
Dani Tweig	5078a054d6	github: bug_report.yml: Improve bug report template structure V1: Perform a yaml "face lift" on the old bug report md template, making bug reporting more efficient. Add dedicated textarea fields for problem description and expected behavior Include pre-filled placeholders to guide issue reporting Add formatted log output section with shell syntax highlighting V2: updated the contact details of scylla and performed some code cleanup.	2024-11-14 12:47:06 +02:00
Dani Tweig	8cc510a51b	Update bug_report.yml Perform a yaml "face lift" to the old bug report md template. Asking to fill in addition to the former data details about reproduction steps and description of the problem. Making bug reporting more efficient.	2024-11-11 16:03:53 +02:00
Yaron Kaikov	cc71077e33	.github/scripts/label_promoted_commits.py: only match the Close tag in the last line in the commit message When a backport PR is promoted to the release branch, we automatically close the backport PR (since GitHub will only close the one based on the default branch) and update the labels in the original PRs In a situation when we have multiple `closes` prefixes, the script will use the first one (which is not the correct one), see `3ddb61c90e` Fixing this by always using the last line with the `closes` prefix Closes scylladb/scylladb#21498	2024-11-11 11:04:33 +02:00
Dani Tweig	381faa2649	Rename .github/ISSUE_TEMPLATE.md to .github/ISSUE_TEMPLATE/bug_report.yml GitHub issue template process has changed. The issue template file should be replaced and renamed. Closes scylladb/scylladb#21518	2024-11-11 11:00:38 +02:00
Nikita Kurashkin	3032d8ccbf	add check to refuse usage of DESC TABLE on a materialized view Fixes #21026 Closes scylladb/scylladb#21500	2024-11-11 10:23:30 +02:00
Yaron Kaikov	2596d1577b	./github/workflows/add-label-when-promoted.yaml: Run auto-backport only on default branch In https://github.com/scylladb/scylladb/pull/21496#event-15221789614 ``` scylladbbot force-pushed the backport/21459/to-6.1 branch from 414691c to `59a4ccd` Compare 2 days ago ``` Backport automation triggered by `push` but also should either start from `master` branch (or `enterprise` branch from Enterprise), we need to verify it by checking also the default branch. Fixes: https://github.com/scylladb/scylladb/issues/21514 Closes scylladb/scylladb#21515	2024-11-11 09:16:35 +02:00
Pavel Emelyanov	57af69e15f	Merge 'Add retries to the S3 client' from Ernest Zaslavsky 1. Add `retry_strategy` interface and default implementation for exponential back-off retry strategy. 2. Add new S3 related errors, also introduce additional errors to describe pure http errors that has no additional information in the body. 3. Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html. Introduce `aws_exception` to carry the original `aws_error`. 4. Discard whatever exception is thrown in `abort_upload` when aborting multipart upload since we don't care about cleanly aborting it since there are other means to clean up dangling parts, for example `rclone cleanup` or S3 bucket's Lifecycle Management Policy. 5. Add tests to cover retries, and retry exhaustion. Also add tests for jumbo upload. 6. Add the S3 proxy which is used to randomly inject retryable S3 errors to test the "retry" part of the S3 client. Switch the `s3_test` to use the S3 proxy. `s3_tests` set afloat `put_object` problem that was causing segmentation when retrying, fixed. 7. Extend the `s3_test` to use both `minio` and `proxy` configurations. 8. Add parameter to the proxy to seed the error injection randomization to make it replayable. fixes: #20611 fixes: #20613 Closes scylladb/scylladb#21054 * github.com:scylladb/scylladb: aws_errors: Make error messages more verbose. test: Make the minio proxy randomization re-playable test/boost/s3_test: add error injection scenarios to existing test suite test: Switch `s3_test` to use proxy test: Add more tests client: Stop returning error on `DELETE` in multipart upload abortion client: Fix sigsegv when retrying client: Add retries client: Adjust `map_s3_client_exception` to return exception instance aws_errors: Change aws_error::parse to return std::optional<> aws_errors: Add http errors mapping into aws_error client: Add aws_exception mapping aws_error: Add `aws_exeption` to carry original `aws_error` aws_errors: Add new error codes client: Introduce retry strategy	2024-11-11 08:35:55 +03:00
Takuya ASADA	92af373fab	unified: drop scylla-tools from unified package On `b8634fb`, we dropped scylla-tools from rpm and deb, we should drop it from unified package as well. Closes #20739 Closes scylladb/scylladb#20740	2024-11-10 12:56:43 +02:00
Avi Kivity	b58dbe57aa	Merge 'repair: introduce and use buffer size hint for mixed-shard multishard reader' from Botond Dénes Add a buffer hint to the multishard reader. This is an internal hint, used by the multishard reader to provide a hint to the shard reader, on how much data exactly is needed by the multishard reader from the respective shard. This hint allows eliminating extraneous cross-shard round-trips and possible shard reader evict-recreate cycles. Building on this, repair sets its own row buffer size as the max buffer size on the multishard reader, ensuring that the row buffer is filled with the minimum amount of cross-shard round trips and minimal reader recreation. To further eliminate unnecessary evictions, this PR also disables the multishard reader's read-ahead which is a mechanism that was designed to reduce latency for user-reads but it can be too aggressive for repair, causing unnecessary extra congestion on the already struggling streaming semaphores. Refs: https://github.com/scylladb/scylladb/issues/18269 Fixes: https://github.com/scylladb/scylladb/issues/21113 The performance impact was measured with an SCT test, which creates a cluster of 3 nodes with 16 shards, then adds a 4th one with 12 shards. Currently, it is the bootstrap time which is the worse in the case of mixed shard clusters, see below for the improvement measured during bootstrap: \| \| master \| buffer-hint \| metric \| \| ------------ \| ------------- \| ------------- \| --------------------------------------------------- \| \| evictions \| 0.9M \| 93.0K \| scylla_database_paused_reads_permit_based_evictions \| \| read (bytes) \| 9.0T \| 3.9T \| scylla_reactor_aio_bytes_read \| \| read (ops) \| 88.0M \| 33.5M \| scylla_reactor_aio_reads \| \| time \| 56min \| 20min \| N/A \| This is a performance improvement, no backport required. Closes scylladb/scylladb#20815 * github.com:scylladb/scylladb: test/boost/mutation_reader_test: add test for multishard reader buffer hint repair/row_level: disable read-ahead db/config: introduce repair_multishard_reader_enable_read_ahead readers/multishard: implement the read_ahead flag replica/database: make_multishard_streaming_reader(): expose the read_ahead parameter readers/multishard: add read_ahead parameter repair/row_level: set max buffer size on multishard reader replica/database: make_multishard_streaming_reader(): expose buffer_hint parameter db/config: introduce enable_repair_multishard_reader_buffer_hint readers/multishard: multishard_reader: pass hint to shard_reader readers/multishard: shard_reader_v2::fill_reader_buffer(): respect the hint readers/multishard: propagate fill_buffer_hint to shard_reader:fill_reader_buffer() readers/multishard: shard_reader: extract buffer-fill into its own method	2024-11-10 12:55:19 +02:00
Kefu Chai	961a53f716	dist: systemd: use default KillMode before this change, we specify the KillMode of the scylla-service service unit explicitly to "process". according to according to https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html, > If set to process, only the main process itself is killed (not recommended!). and the document suggests use "control-group" over "process". but scylla server is not a multi-process server, it is a multi-threaded server. so it should not make any difference even if we switch to the recommended "control-group". in the light that we've been seeing "defunct" scylla process after stopping the scylla service using systemd. we are wondering if we should try to change the `KillMode` to "control-group", which is the default value of this setting. in this change, we just drop the setting so that the systemd stops the service by stopping all processes in the control group of this unit are stopped. Refs scylladb/scylladb#21507 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21508	2024-11-09 20:07:11 +02:00
Kefu Chai	1f940d56b2	build: cmake: s/idle_compiler/idl_compiler/ before this change, the header files generated with `idl-compiler.py` are not regenerated if `idl-compiler.py` is updated. but they should, as the change to the script could in turn change the generated header files. because we have a typo in the `DEPENDS` argument, `${idle_compiler}` is expanded to an empty string. in this change, the typo is corrected, and the dependency from the generated headers to the script is correctly reflected in the building rules. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21475	2024-11-09 20:06:23 +02:00
Piotr Dulikowski	7021efd6b0	Merge 'main,cql_test_env: start group0_service before view_builder' from Michał Jadwiszczak In scylladb/scylladb#19745, view_builder was migrated to group0 and since then it is dependant on group0_service. Because of this, group0_service should be initialized/destroyed before/after view_builder. This patch also adds error injection to `raft_server_with_timeouts::read_barrier`, which does 1s sleep before doing the read barrier. There is a new test which reproduces the use after free bug using the error injection. Fixes scylladb/scylladb#20772 scylladb/scylladb#19745 is present in 6.2, so this fix should be backported to it. Closes scylladb/scylladb#21471 * github.com:scylladb/scylladb: test/boost/secondary_index_test: add test for use after free api/raft: use `get_server_with_timeouts().read_barrier()` in coroutines main,cql_test_env: start group0_service before view_builder	2024-11-08 20:27:09 +01:00
Kefu Chai	aebb532906	bytes, utils: include fmt/iostream.h and iostream when appropriate in seastar e96932b05f394b27cd0101e24f0584736795b50f, we stopped including unused `fmt/ostream.h`. this helped to reduce the header dependency. but this also broke the build of scylladb, as we rely on the `fmt/ostream.h` indirectly included by seastar's header project. in this change, we include `fmt/iostream.h` and `iostream` explictly when we are using the declarations in them. this enables us to - bump up the seastar submodule - potentially reduce the header dependency as we will be able to include seastar/core/format.hh instead of a more bloated seastar/core/print.hh after bumping up seastar submodule Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21494	2024-11-08 16:43:25 +03:00
Michał Jadwiszczak	f998f027a2	test/boost/secondary_index_test: add test for use after free Reproduces scylladb/scylladb#20772. Add error injection to `raft_server_with_timeouts::read_barrier`, which does 1s sleep before doing the read barrier.	2024-11-08 14:16:19 +01:00
Michał Jadwiszczak	de7b58e8d4	api/raft: use `get_server_with_timeouts().read_barrier()` in coroutines It is unsafe to do `get_server_with_timeouts().read_barrier()` in continuations because `get_server_with_timeouts()` returns raft server by value and it may be deallocated when `read_barrier()` yields, causing use-after-return. Simple workaround is to use the read barrier in coroutine and co_await it. Then the raft server is kept on stack until the read barrier is finished. I've checked all codebase and it looks like the only place where `group0_with_timeouts().read_barrier()` is in continuation, is api/raft.cc. Co-authored-by: Piotr Dulikowski <piodul@scylladb.com>	2024-11-08 14:15:13 +01:00
Botond Dénes	e3e8a94c9a	Merge 'Allow explicitly enabling or disabling tablets when creating a new keyspace' from Benny Halevy Separate the configuration for enabling the tablets feature from the enablement of tablets when creating new keyspaces. This change always enables the TABLETS cluster feature and the tablets logic respectively. The `enable_tablets` config option just controls whether tablets are enabled or disabled by default for new keyspaces. If `enable_tablets` is set to `true`, tablets can be disabled using `CREATE KEYSPACE WITH tablets = { 'enabled': false }` as it is today. If `enable_tablets` is set to `false`, tablets can be enabled using `CREATE KEYSPACE WITH tablets = { 'enabled': true }`. The motivation for this change is to simplify the user experience of using tablets by setting the default for new keyspaces to false amd allowing the user to simply opt-in by using tablets = {enabled: true }. This is not pissible today. The user has to enable tablets by default for all new keyspaces (that use the NetworkTopologyStrategy) and then actively opt-out to use vnodes. * Not required to be backported to OSS versions. May be backported to specific enterprise versions * This PR resubmits https://github.com/scylladb/scylladb/pull/20729 that was reverted in `73b1f66b70` due to https://github.com/scylladb/scylladb/issues/21159 which is now fixed Closes scylladb/scylladb#21451 * github.com:scylladb/scylladb: data_dictionary: keyspace_metadata::describe: print tablets enabled also when defaulted tablets_test: test enable/disable tablets when creating a new keyspace treewide: always allow tablets keyspaces feature_service: prevent enabling both tablets and gossip topology changes alternator: create_keyspace_metadata: enable tablets using feature_service	2024-11-08 09:15:42 +02:00
Michał Chojnowski	35921eb67e	mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity For performance reasons, mutation_partition_v2::maybe_drop(), and by extension also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&) can evict empty row entries, and hence change the continuity of the merged entry. For checking that apply_to_incomplete respects continuity, test_apply_to_incomplete_respects_continuity obtains the continuity of the partition entry before and after apply_to_incomplete by calling e.squashed().get_continuity(). But squashed() uses apply_monotonically(), so in some circumstances the result of squashed() can have smaller continuity than the argument of squashed(), which messes with the thing that the test is trying to check, and causes spurious failures. This patch changes the method of calculating the continuity set, so that it matches the entry exactly, fixing the test failures. Fixes scylladb/scylladb#13757 Closes scylladb/scylladb#21459	2024-11-08 06:08:39 +01:00
Ernest Zaslavsky	029837a4a1	aws_errors: Make error messages more verbose. Add more information to the error messages to make the failure reason clearer. Also add tests to check exceptions propagated from s3 client failure.	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	14f3832749	test: Make the minio proxy randomization re-playable Provide a seed to the proxy randomization, the idea that the `test.py` will initialize the seed from `/dev/urandom` and print the seed when starting, in case some tests failed the dev is supposed to re-play it locally with the same seed (if it didnt repro otherwise) using the `start_s3_proxy.py` and providing it with the aforementioned seed using `--rnd-seed` command line argument	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	0c62635f05	test/boost/s3_test: add error injection scenarios to existing test suite Add variants of existing S3 tests that route through a proxy instead of connecting directly to MinIO. The proxy allows injecting errors to validate error handling and recovery mechanisms under failure conditions.	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	8919e0abab	test: Switch `s3_test` to use proxy Switch `s3_test` to use the S3 proxy which is used to randomly inject retryable S3 errors to test the "retry" part of the S3 client. Fix `put_object` to make it retryable	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	b1e36c868c	test: Add more tests Add tests to cover retries, and retry exhaustion. Also add tests for jumbo upload.	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	7fd1ff8d79	client: Stop returning error on `DELETE` in multipart upload abortion Discard whatever exception is thrown in `abort_upload` when aborting multipart upload since we don't care about cleanly aborting it since there are other means to clean up dangling parts, for example `rclone cleanup` or S3 bucket's Lifecycle Management Policy	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	064a239180	client: Fix sigsegv when retrying Stop moving the `file` into the `make_file_input_stream` since it will try to use it again on retry	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	dc6e4c0d97	client: Add retries Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html. Also move the expected http status check to the `make_s3_error_handler` since the http::client::make_request call is done with `nullopt` - we want to manage all the aws errors handling in s3 client to prevent the http client to validate it and fail before we have a chance to analyze the error properly	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	244635ebd8	client: Adjust `map_s3_client_exception` to return exception instance "Unfuturize" the `map_s3_client_exception` since the retryable client is going to be implemented using coroutines and no `future` is needed here, just to save unnecessary `co_await` on it	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	bd3d4ed417	aws_errors: Change aws_error::parse to return std::optional<> Change aws_error::parse to return std::optional<> to signify that no error was found in the response body	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	58decef509	aws_errors: Add http errors mapping into aws_error Add http errors mapping into aws_error since the retry strategy is going to operate on aws_error and should not be aware of HTTP status codes	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	fa9e8b7ed0	client: Add aws_exception mapping Map aws_exceptions in `map_s3_client_exception`, will be needed in retryable client calls to remap newly added AWS errors to `storage_io_error`	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	54e250a6f1	aws_error: Add `aws_exeption` to carry original `aws_error` Add `aws_exeption` to carry original `aws_error` for proper error handling in retryable s3 client	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	e6ff34046f	aws_errors: Add new error codes Add new S3 related errors, also introduce additional errors to describe pure http errors that has no additional information in the body	2024-11-07 21:01:25 +02:00
Ernest Zaslavsky	8dbe351888	client: Introduce retry strategy Add `retry_strategy` interface and default implementation for exponential back-off retry strategy	2024-11-07 21:01:25 +02:00
Michał Jadwiszczak	7bad8378c7	main,cql_test_env: start group0_service before view_builder In scylladb/scylladb#19745, view_builder was migrated to group0 and since then it is dependent on group0_service. Because of this, group0_service should be initialized/destroyed before/after view_builder. Fixes scylladb/scylladb#20772 Co-authored-by: Dawid Mędrek <dawid.medrek@scylladb.com>	2024-11-07 14:08:11 +01:00
Kamil Braun	c268cf2e33	Merge 'test: rename "cql-pytest" to "cqlpy"' from Nadav Har'El Python and Python developers don't like directory names to include a minus sign, like "cql-pytest". In this patch we rename test/cql-pytest to test/cqlpy, and also change a few references in other code (e.g., code that used test/cql-pytest/run.py) and also references to this test suite in documentation and comments. Arguably, the word "test" was always redundant in test/cql-pytest, and I want to leave the "py" in test/cqlpy to emphasize that it's Python-based tests, contrasting with test/cql which are CQL-request-only approval tests. The second patch in the series fixes a small regression in the test/cqlpy/run script. Fixes #20846 Test organization only, so backports not strictly necessary, but let's do them anyway because otherwise it will make any future backporting of tests in the cqlpy directory more messy than it needs to be. Closes scylladb/scylladb#21446 * github.com:scylladb/scylladb: test/cqlpy: fix "run" script without any parameters test: rename "cql-pytest" to "cqlpy"	2024-11-07 13:26:07 +01:00
Benny Halevy	40928bd886	data_dictionary: keyspace_metadata::describe: print tablets enabled also when defaulted Now that tablets may be explicitly enabled when creating a new keyspace, describe tablets as enabled even when the default initial_tablets==0 is used. Refs https://github.com/scylladb/scylla-enterprise/issues/4860 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-07 13:59:59 +02:00
Benny Halevy	8620d9f672	tablets_test: test enable/disable tablets when creating a new keyspace Test both configuration values for `enable_tablets` and the possibility to explicitly enable or disable tablets, respectively, when creating a keyspace using the `tablets = {'enabled': true\|false}` CREATE KEYSPACE option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-07 13:57:40 +02:00
Benny Halevy	4b21cca443	treewide: always allow tablets keyspaces With the tablets feature always enabled (Unless gossip toopology changes are forced), the enable_tablets option now controls only the default for newly created keyspaces. Even when set to `false`, tablets are still enabled as a feature and the user may explicitly enable tablets using `CREATE KEYSPACE <name> WITH tablets = {'enabled': true}` Note: best viewed with `git show -w` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-07 13:57:39 +02:00
Benny Halevy	974b0f2080	feature_service: prevent enabling both tablets and gossip topology changes Tablets require raft consistent topology changes. Therefore, document that they are incompatible in the config help and prevent their usage in `feature_config_from_db_config` Fixes scylladb/scylladb#21075 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-07 13:56:59 +02:00
Benny Halevy	4cf3b683bc	alternator: create_keyspace_metadata: enable tablets using feature_service Rather than using the local configuration option on this node, check the cluster feature instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-07 13:56:59 +02:00
Botond Dénes	e21346179c	test/boost/mutation_reader_test: add test for multishard reader buffer hint	2024-11-07 02:47:54 -05:00
Botond Dénes	5c5c77746e	repair/row_level: disable read-ahead The multishard reader's read-ahead was designed to reduce the latency of range scans. But in the case of repair, read-ahead is suspected to contribute significant extra load on the congested streaming semaphore and thus contribute to the subsequent trashing (excessive reader eviction). First off, read-ahead was designed with pages of limited size in mind. Repair can read much more, even for a single repair buffer. This can lead to read-ahead concurrency to continue ramping up, creating and using more and more readers. Secondly, repair is not latency sensitive, so even when working well and there is no congestion, the benefits are negligible. The use of read-ahead is now controllable by the new repair_multishard_reader_enable_read_ahead config item, defaulting to false.	2024-11-07 02:47:54 -05:00
Botond Dénes	a248520201	db/config: introduce repair_multishard_reader_enable_read_ahead Not used yet.	2024-11-07 02:47:54 -05:00
Botond Dénes	36a8756028	readers/multishard: implement the read_ahead flag Don't do read-aheads when read-ahead was not enabled.	2024-11-07 02:47:54 -05:00
Botond Dénes	8938e06ebe	replica/database: make_multishard_streaming_reader(): expose the read_ahead parameter Continuing the previous patch, expose the just added read_ahead parameter of make_multishard_combining>_reader_v2(). Set to read_ahead::yes by all callers, keeping the current default.	2024-11-07 02:47:54 -05:00
Botond Dénes	c6c62deaa5	readers/multishard: add read_ahead parameter And propagate to the reader itself. Not used yet.	2024-11-07 02:47:54 -05:00
Botond Dénes	784f89f585	repair/row_level: set max buffer size on multishard reader The multishard reader is used in the mixed-shard case, when a repair has to read from all other shards. It is very important that cross-shard roundtrips and possible evict-recreate cycles for the shard readers is avoided. For this end, make use of the recently introduced internal buffer hint feature in the multishard reader and set it's buffer size to match that of the row level repair buffer size. The use of the buffer-hint can be controlled with the recently introduced repair_multishard_reader_buffer_hint_size config param.	2024-11-07 02:47:54 -05:00
Botond Dénes	e2344e28b6	replica/database: make_multishard_streaming_reader(): expose buffer_hint parameter Expose the buffer hint functionality added by the previous commits, to callers of make_multishard_streaming_reader(). All callers disable it currently, it will be used in the next patch.	2024-11-07 02:47:46 -05:00
Yaron Kaikov	ef104b7b96	.github/scripts/auto-backport.py: update method to get closed prs `commit.get_pulls()` in PyGithub returns pull requests that are directly associated with the given commit Since in closed PR. the relevant commit is an event type, the backport automation didn't get the PR info for backporting Ref: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21468	2024-11-07 09:28:46 +02:00
Avi Kivity	9e67649fe5	utils: loading_cache: tighten clock sampling Sample the clock once to avoid the filter returning different results. Range algorithms may use multiple passes, so it's better to return consistent results. Closes scylladb/scylladb#21400	2024-11-07 10:28:01 +03:00
Kefu Chai	50fbab29ca	compaction: remove unused "#include" we don't use `std::list` in compaction/compaction_manager.hh, neither is this header responsible for exposing the declarations in `<list>`. so let's stop `#include` this header. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21436	2024-11-07 10:25:27 +03:00
Avi Kivity	f5489ba4a1	locator: tablet_metadata_guard: forward declare database No need to bring in a heavy databas.hh dependency. Closes scylladb/scylladb#21447	2024-11-07 10:24:35 +03:00
Kefu Chai	ba021f72a6	api: s/mulformatted/malformatted mulformatted was a typo, let's fix it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21442	2024-11-07 10:07:11 +03:00
Pavel Emelyanov	49949092ad	Merge 'Make s3 client ops use abort source + use in backup task' from Calle Wilund Fixes #20716 Adds optional abort_source to all s3 client operations. If provided, will propagate to actual HTTP client and allow for aborting actual net op. Note: this uses an abort source per call, not a client-local one. This is for two reasons: 1.) The usage pattern of the client object is to create it outside the eventual owning object (task) that hosts the relevant abort source 2.) It is quite possible to want to have different/no abort source for some operation usage. Also adds forward usage of task abort_source in backup tasks upload s3 call, making it more readily abort-able. Closes scylladb/scylladb#21431 * github.com:scylladb/scylladb: backup_task: Use task abort source in s3 client call s3::client: Make operations (individually) abortable	2024-11-07 10:03:25 +03:00
Yaron Kaikov	9d8562caf3	Add conflict_reminder action for backport PR In order not to forget to resolve conflicts in backport PRs, we should add some reminders to the PR author so it will not be forgotten the new action will run twice a week and will send a reminder only for PR opened with conflicts for 3 days or more Fixes: https://github.com/scylladb/scylladb/issues/21448 Closes scylladb/scylladb#21449	2024-11-07 06:55:37 +02:00
Calle Wilund	0db4b9fd94	backup_task: Use task abort source in s3 client call Fixes #20716 Propagates abort source in task object to actual network call, thus making the upload workload more quickly abortable. v2: Fix test to handle two versions after each other	2024-11-06 15:20:23 +00:00
Nadav Har'El	1fd7b797c7	test/cqlpy: fix "run" script without any parameters A recent improvement to test/cqlpy/run to add the "--release" option broke the ability to run this script it without any options (no test name, etc.). This patch fixes this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-11-06 16:48:36 +02:00
Nadav Har'El	8c215141a1	test: rename "cql-pytest" to "cqlpy" Python and Python developers don't like directory names to include a minus sign, like "cql-pytest". In this patch we rename test/cql-pytest to test/cqlpy, and also change a few references in other code (e.g., code that used test/cql-pytest/run.py) and also references to this test suite in documentation and comments. Arguably, the word "test" was always redundant in test/cql-pytest, and I want to leave the "py" in test/cqlpy to emphasize that it's Python-based tests, contrasting with test/cql which are CQL-request-only approval tests. Fixes #20846 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-11-06 16:48:36 +02:00
Kefu Chai	6efde20939	utils/to_string: do not include fmt/ostream.h to_string.hh does not use this header, neither is it obliged to expose the content of this header. so, let's remove this include. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21440	2024-11-06 17:21:29 +03:00
Botond Dénes	3c25e6fcb4	db/config: introduce enable_repair_multishard_reader_buffer_hint Allows enabling/disabling the multishard reader buffer hint optimization. Not wired yet.	2024-11-06 08:51:00 -05:00
Botond Dénes	b052c5df62	readers/multishard: multishard_reader: pass hint to shard_reader Calculate a buffer fill hint and pass it to shard_reader_v2::fill_buffer(), so the underlying buffer-fill can be optimized to avoid multiple cross shard round-trips, as well as possible evict-recreate cycles. The buffer hint mechanism is opt-in, enabled via the new multishard_reader_buffer_hint parameter.	2024-11-06 08:51:00 -05:00
Botond Dénes	912b4dfba3	readers/multishard: shard_reader_v2::fill_reader_buffer(): respect the hint When the hint is provided, respect it: make sure the returned buffer is of the requested size, stopping early if the stop_token is seen. To reduce the amount of possible eviction-recreate cycles while the buffer is filled, disable auto-pause for the duration of the fill_reader_buffer() call. For this purpose, auto_pause_disable_guard is added to evictable_reader_v2.	2024-11-06 08:51:00 -05:00
Botond Dénes	8d5283f036	readers/multishard: propagate fill_buffer_hint to shard_reader:fill_reader_buffer() The hint will tell the shard reader exactly how much data to produce, to avoid multiple cross-shard round-trips and possible evict-recreate cycles. The hint is neither used yet or calculated yet, this is coming in the next patches.	2024-11-06 08:51:00 -05:00
Botond Dénes	ee7ecb9155	readers/multishard: shard_reader: extract buffer-fill into its own method It is about to get a bit more complicated, so worth to extract into a method so it can be shared by the two call-sites.	2024-11-06 08:51:00 -05:00
Tomasz Grabiec	f7d35d535e	Merge 'bytes_ostream: replace boost ranges with std ranges' from Avi Kivity To reduce the dependency load, replace boost ranges with std::ranges. Cleanup; no backport. Closes scylladb/scylladb#21450 * github.com:scylladb/scylladb: bytes_ostream: replace boost ranges with std ranges bytes_ostream: extract fragment_iterator into namespace scope	2024-11-06 14:01:27 +01:00
Yaron Kaikov	77604b4ac7	.github/script/auto-backport.py: push backport PR to `scylladbbot` fork Since Scylla is a public repo, when we create a fork, it doesn't fork the team and permissions (unlike private repos where it does). When we have a backport PR with conflicts, the developers need to be able to update the branch to fix the conflicts. To do so, we modified the logic of the backport automation as follows: - Every backport PR (with and without conflicts) will be open directly on the `scylladbbot` fork repo - When there are conflicts, an email will be sent to the original PR author with an invitation to become a contributor in the `scylladbbot` fork with `push` permissions. This will happen only once if Auther is not a contributor. - Together with sending the invite, all backport labels will be removed and a comment will be added to the original PR with instructions - The PR author must add the backport labels after the invitation is accepted Fixes: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21401	2024-11-06 14:29:37 +02:00
David Garcia	a072478f4f	docs: enable tooltips Updates the theme to the latest version to enable tooltips and modifies the db_options.tmpl to show the new role in action. Closes scylladb/scylladb#21324	2024-11-06 14:09:28 +02:00
Andrei Chekun	afd1fc8e9f	test.py: Add pytest-xdist to the toolchain Add new dependency pytest-xdist to the toolchain. This will allow executing boost and unit tests from pytest in parallel, reducing the time needed for the run. Closes scylladb/scylladb#21222	2024-11-06 14:09:01 +02:00
Botond Dénes	0ad32c153d	Merge 'test_tablets: add rack decommission test cases' from Benny Halevy test_tablets: add rack decommission test cases Test scenarios where decommissioing a compelte rack should succeed, and reproduce scylladb/scylladb#19475 where decommissioning a rack would fail since the number of remaining racks is insufficient to satisfy the replication factor, even though the number of nodes is sufficient, enshrining this behavior. Refs scylladb/scylladb#19475 * This PR adds unit tests and improves an error message. No backport required. Closes scylladb/scylladb#20747 * github.com:scylladb/scylladb: tablet_allocator: improve error message when unable to find replicas when draining test_tablets: add rack decommission test cases topology_experimental_raft/test_tablets: get_tablet_count_per_shard_for_host: move shards_count param to be last test/pylib: ServerInfo: add datacenter and rack attributes test: everywhere: drop unused imports of ServerInfo	2024-11-06 14:07:47 +02:00
Calle Wilund	3321820c67	s3::client: Make operations (individually) abortable Refs #20716 Adds optional abort_source to all s3 client operations. If provided, will propagate to actual HTTP client and allow for aborting actual net op. Note: this uses an abort source per call, not a client-local one. This is for two reasons: 1.) The usage pattern of the client object is to create it outside the eventual owning object (task) that hosts the relevant abort source 2.) It is quite possible to want to have different/no abort source for some operation usage.	2024-11-05 14:23:24 +00:00
Avi Kivity	8fb6d98ba3	Merge "various gossiper code cleanups" from Gleb * 'gleb/gossip-cleanup-v3' of github.com:scylladb/scylla-dev: gossiper: start failure_detector_loop on shard 0 only gossiper: use 1 seconds instead of 1000 milliseconds gossiper: remove unused code gossiper: co-routinize do_send_ack2_msg gossiper: do not needlessly call get_endpoint_state_ptr in handle_major_state_change gossiper: fix weird logic in get_live_members gossiper: drop unneeded this-> gossiper: fold get_or_create_endpoint_state into my_endpoint_state gossiper: co-routinize do_send_ack_msg	2024-11-05 15:31:58 +02:00
Avi Kivity	baaa92c6f5	bytes_ostream: replace boost ranges with std ranges Have fragment_iterator support iterator_concept for compatibility with std ranges, and switch from boost iterator_range to std::ranges::subrange.	2024-11-05 14:50:38 +02:00
Avi Kivity	cb026c347e	bytes_ostream: extract fragment_iterator into namespace scope C++ concept evaluation rules clash with nested class definition rules with the result that evaluating concepts about the nested class within the enclosing class doesn't work. Extract bytes_ostream::fragment_iterator to avoid that.	2024-11-05 14:43:49 +02:00
Avi Kivity	4dab2473a2	Merge 'treewide: trade boost's any_of and all_of for std's any_of and all_of' from Kefu Chai now that we are allowed to use C++23. we now have the luxury of using `std::ranges::all_of` and `std::ranges::any_of` in this change, we replace `boost::algorithm::all_of` and `boost::algorithm::any_of` with `std::ranges::all_of` and `std::ranges::any_of` respectively. to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#21411 * github.com:scylladb/scylladb: treewide: s/boost::algorithm::any_of/std::ranges::any_of/ treewide: s/boost::algorithm::all_of/std::ranges::all_of/	2024-11-05 12:48:24 +02:00
Piotr Dulikowski	7f17894c88	Merge 'cql3: Allow for describing CDC log tables' from Dawid Mędrek In the past, DESC SCHEMA would produce create statements for both the base and the log table. That was incorrect as the log table is automatically created alongside the base one. That was solved in scylladb/scylladb@9ab57b1 (scylladb/scylladb#18467). The mentioned changes implemented the following solution: * DESC SCHEMA/KEYSPACE/TABLE would still print a create statement for the CDC base table, * DESC SCHEMA/KEYSPACE would start printing an alter statement for the CDC log table. That statement would ensure that the restored log table has the same parameters as the original one, * DESC TABLE <base table> would behave as DESC SCHEMA/KEYSPACE, i.e. it would print a create statement for the base table and an alter statement for the log table, * DESC TABLE <log table> would result in an error. While that solution was good and behaved correctly in the context of restoring the schema, it had one flaw: describe statement aren't only used as a means for producing a backup; they also serve an informative purpose to learn about the schema, e.g. to learn what parameters a specific table uses. Because we didn't allow for describing CDC log tables, the user couldn't look them up directly via a describe statement -- they had to describe the base table for that. Attempting to describe a log table ended with an error, e.g.: ``` $ DESC TABLE ks.t_scylla_cdc_log; ks.t_scylla_cdc_log is a cdc log table and it cannot be described directly. Try `DESC TABLE ks.t` to describe cdc base table and it's log table. ``` In these changes, we allow for describing CDC log tables again. The semantics of the first three bullets above remains unchanged, but we impose new behavior for DESC TABLE <log table>: * When the user executes DESC TABLE <log table>, a create statement will be returned, treating the table as if it were a regular one, * The create statement will be wrapped in CQL comment markers. The rationale for the second bullet is that although we want to give the user a means to look into the structure and options of a CDC log table, the returned statement is not supposed to be ever executed by them. We want to minimize the risk of that. An example of the behavior after the change: ``` $ DESC TABLE ks.t_scylla_cdc_log; /* Do NOT execute this statement! It's only for informational purposes. A CDC log table is created automatically when the base is created. CREATE TABLE ks.t_scylla_cdc_log ( "cdc$stream_id" blob, "cdc$time" timeuuid, "cdc$batch_seq_no" int, "cdc$end_of_batch" boolean, "cdc$operation" tinyint, "cdc$ttl" bigint, p int, PRIMARY KEY ("cdc$stream_id", "cdc$time", "cdc$batch_seq_no") ) WITH CLUSTERING ORDER BY ("cdc$time" ASC, "cdc$batch_seq_no" ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'enabled': 'false', 'keys': 'NONE', 'rows_per_partition': 'NONE'} AND comment = 'CDC log for ks.t' AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '60', 'compaction_window_unit': 'MINUTES', 'expired_sstable_check_frequency_seconds': '1800'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1 AND default_time_to_live = 0 AND gc_grace_seconds = 0 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99.0PERCENTILE'; / ``` We also extend the developer documentation regarding DESCRIBE statements on CDC tables. Fixes scylladb/scylladb#21235 Backport: these changes are an enhancement, so not needed. Closes scylladb/scylladb#21228 github.com:scylladb/scylladb: docs/dev: Document semantics of describing CDC tables cql3: Allow for describing CDC log tables	2024-11-05 10:06:13 +01:00
Pavel Emelyanov	440c1e3e3f	error_injection: Remove unused inject(sleep, then invoke) overload The overload was introduced by `a8b14b0227` (utils: add timeout error injection with lambda), but is only used by the test nowadays. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21377	2024-11-05 09:56:08 +02:00
Yaniv Michael Kaul	4c5e102aee	node_exporter: use fewer collectors Remove unused / less useful collectors by default. While it doesn't seem to reduce memory usage, it may reduce potential performance or security issues in the future. This is what we are left with (snippet of log when loading node exporter manually with the changed command line): ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:111 level=info msg="Enabled collectors" ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=arp ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=bonding ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=conntrack ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=cpu ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=cpufreq ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=diskstats ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=dmi ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=edac ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=entropy ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=filefd ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=filesystem ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=interrupts ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=loadavg ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=mdadm ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=meminfo ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=netclass ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=netdev ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=netstat ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=nvme ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=os ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=pressure ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=schedstat ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=selinux ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=sockstat ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=softnet ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=stat ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=textfile ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=time ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=timex ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=uname ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=vmstat ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=watchdog ts=2024-11-03T15:41:06.855Z caller=node_exporter.go:118 level=info collector=xfs Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Improvement, no need to backport. Closes scylladb/scylladb#21419	2024-11-05 10:41:09 +03:00
Avi Kivity	b292aeecac	replica: query.hh: drop dependency on database.hh database.hh has large fan-in and therefore can trigger a lot of recompilations if included. Replace with smaller dependencies. Closes scylladb/scylladb#21424	2024-11-05 10:40:33 +03:00
Pavel Emelyanov	a98b57212e	Merge 'lang, .github: remove unused includes, add more directories to CLEANER_DIR' from Kefu Chai in this series: - remove unused `#include` in "lang" subdirectory - add index and lang to CLEANER_DIR --- cleanup and improvements in the CI, hence no need to backport. Closes scylladb/scylladb#21437 * github.com:scylladb/scylladb: .github: add index and lang to CLEANER_DIR lang: remove unused "#includes"	2024-11-05 10:37:04 +03:00
Kefu Chai	f1d4812ad6	test: lib: rest_client: use isinstance() over type() in addition to the inheritance support, `isinstance()` is also the recommended way to check for types by PEP8. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21438	2024-11-05 10:36:31 +03:00
Kefu Chai	59eb2ab119	treewide: s/boost::algorithm::any_of/std::ranges::any_of/ now that we are allowed to use C++23. we now have the luxury of using `std::ranges::any_of`. in this change, we replace `boost::algorithm::any_of` with `std::ranges::any_of` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-05 14:06:09 +08:00
Kefu Chai	f8bb1c64f1	treewide: s/boost::algorithm::all_of/std::ranges::all_of/ now that we are allowed to use C++23. we now have the luxury of using `std::ranges::all_of`. in this change, we replace `boost::algorithm::all_of` with `std::ranges::all_of` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-05 14:05:24 +08:00
Kefu Chai	e651b6dc69	.github: add index and lang to CLEANER_DIR also explain why we don't run the cleaner against the "idl" subdirectory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-05 10:01:04 +08:00
Kefu Chai	ee2a9419b3	lang: remove unused "#includes" these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-05 10:01:04 +08:00
Avi Kivity	ee92784098	serialization: replace boost::type with std::type_identity Recently, seastar rpc started accepting std::type_identity in addition to boost::type as a type marker (while labeling the latter with an ominous deprecation warning). Reduce our depedendency on boost by switching to std::type_identity.	2024-11-05 00:43:27 +01:00
Avi Kivity	075b13597d	serializer: drop dependency on boost ranges The call to boost::range::for_each is easily replaced with ranged for. Closes scylladb/scylladb#21422	2024-11-04 17:48:17 +02:00
Gleb Natapov	2dbae78542	gossiper: start failure_detector_loop on shard 0 only failure_detector_loop does nothing on all other shards.	2024-11-04 17:15:06 +02:00
Gleb Natapov	323b04137d	gossiper: use 1 seconds instead of 1000 milliseconds	2024-11-04 17:15:06 +02:00
Gleb Natapov	0cb4c71846	gossiper: remove unused code	2024-11-04 17:15:06 +02:00
Gleb Natapov	0e4f149dee	gossiper: co-routinize do_send_ack2_msg	2024-11-04 17:15:06 +02:00
Gleb Natapov	1fbac54fb8	gossiper: do not needlessly call get_endpoint_state_ptr in handle_major_state_change The code calls for get_endpoint_state_ptr several times instead of using the result of the first call. Change it.	2024-11-04 17:15:06 +02:00
Gleb Natapov	e2cf93abb9	gossiper: fix weird logic in get_live_members The code adds a node to a set and then removes it if a condition is met. Add to the set if the condition is not met instead. Note that the original set never has local endpoint (it is only added locally), so the code is equivalent.	2024-11-04 17:14:55 +02:00
Benny Halevy	caedcf20c6	tablet_allocator: improve error message when unable to find replicas when draining Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-04 14:54:54 +02:00
Benny Halevy	d8be1cafb5	test_tablets: add rack decommission test cases Test scenarios where decommissioing a compelte rack should succeed, and reproduce scylladb/scylladb#19475 where decommissioning a rack would fail since the number of remaining racks is insufficient to satisfy the replication factor, even though the number of nodes is sufficient, enshrining this behavior. Refs scylladb/scylladb#19475 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-04 14:54:13 +02:00
Avi Kivity	b706e3e9e4	Merge 'sstables/index_reader: avoid unnecessary index page reads in single-partition reads' from Michał Chojnowski Terminology note: in the context of this series, "index page" means an contiguous segment of the index file starting (inclusive) at a key corresponding to a summary entry and ending (exclusive) before the key corresponding to the next summary entry. "Index pages" are not related to filesystem pages. --- In a single-partition read, if the searched partition key is the first key in its index page, we start scanning the index for that key starting at the previous index page (inclusive), even though we could start directly from the key's page. Similarly, if the searched partition key is absent from the sstable and lies after all other keys in its appropriate page, we additionally scan the next page, even though it's known from the summary that it can't possibly contain the key. Those cases are wasteful. It's worse than it might seem at first glance. When partitions are small, only a small fraction of search keys fulfills those conditions (i.e. "first key in its page" or "an absent key greater than the last key in its page"), so the waste doesn't matter much. But when partitions are big enough, every index page contains only one partition key (and a promoted index for that partition), which directly means that all search keys fulfill the conditions, which means that total index reading work is two times bigger than what it should be. In addition, there is a secondary performance bug which, when the aforementioned conditions are fulfilled, causes additional I/O to happen past the index reads which are actually parsed and used. In effect, the index I/O in single-partition reads might be not just doubled, but even tripled (that's for IOPS — throughput might be multiplied even more), all because of a slight inaccuracy in the edge cases. This series fixes those inefficiencies by tightening the edge cases and ensuring that single-partition reads always read only a single index page. Here's an example where we query the first row (i.e. `LIMIT 1`) of a certain partition key, in a table with large (1 MB) promoted indexes. Before the patch, the lookup of the lower bound involves 3 serialized disk reads (as described above) to subsequent index pages, and even the lookup of the upper bound involves 2 disk reads: ``` Execute CQL3 query Parsing a statement [shard 0] Processing a statement for authenticated user: anonymous [shard 0] Executing read query (reversed false) [shard 0] Creating read executor for token -1297921881139976049 with all: [127.11.11.1] targets: [127.11.11.1] repair decision: NONE [shard 0] Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0] read_data: querying locally [shard 0] Start querying singular range {{-1297921881139976049, pk{00023130}}} [shard 0] [reader concurrency semaphore user] admitted immediately [shard 0] [reader concurrency semaphore user] executing read [shard 0] Reading key {-1297921881139976049, pk{00023130}} from sstable ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 38359040 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 38391808 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 38359040, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 38391808, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39370752 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39403520 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39370752, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39403520, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40378368 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40411136 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40378368, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40411136, successfully read 32768 bytes [shard 0] upper_bound_cache_only({position: clustered, ckp{}, 1}): no upper bound [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40378368 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40411136 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40378368, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40411136, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 41390080 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 41422848 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 41390080, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 41422848, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: scheduling bulk DMA read of size 21926 at offset 819200 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: finished bulk DMA read of size 21926 at offset 819200, successfully read 24576 bytes [shard 0] Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 0 cell(s) (0 live, 0 dead) [shard 0] Querying is done [shard 0] Done processing - preparing a result [shard 0] Request complete ``` After the patch, the lookup of each bound involves 1 read: ``` Execute CQL3 query Parsing a statement [shard 0] Processing a statement for authenticated user: anonymous [shard 0] Executing read query (reversed false) [shard 0] Creating read executor for token -1297921881139976049 with all: [127.11.11.1] targets: [127.11.11.1] repair decision: NONE [shard 0] Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0] read_data: querying locally [shard 0] Start querying singular range {{-1297921881139976049, pk{00023130}}} [shard 0] [reader concurrency semaphore user] admitted immediately [shard 0] [reader concurrency semaphore user] executing read [shard 0] Reading key {-1297921881139976049, pk{00023130}} from sstable ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39370752 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 39403520 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39370752, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 39403520, successfully read 32768 bytes [shard 0] upper_bound_cache_only({position: clustered, ckp{}, 1}): no upper bound [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40378368 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: scheduling bulk DMA read of size 32768 at offset 40411136 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40378368, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Index.db: finished bulk DMA read of size 32768 at offset 40411136, successfully read 32768 bytes [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: scheduling bulk DMA read of size 21926 at offset 819200 [shard 0] ./workdir_01/data/ks/t-536c31f09a9c11efbd5082a6aa3e8d0c/me-3gky_0v18_3rgjk2dsjae431s4uz-big-Data.db: finished bulk DMA read of size 21926 at offset 819200, successfully read 24576 bytes [shard 0] Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 0 cell(s) (0 live, 0 dead) [shard 0] Querying is done [shard 0] Done processing - preparing a result [shard 0] Request complete ``` Doesn't have to be backported, since the problem only affects performance, not correctness, and it has been present since forever. Closes scylladb/scylladb#20897 * github.com:scylladb/scylladb: index_reader: remove a piece of misguided code involved in single-partition reads index_reader: in single-partition reads, don't read more than one page index_reader: fix unnecessary reads of preceding index pages	2024-11-04 14:28:27 +02:00
Avi Kivity	2531dc2d80	schema_registry: stop including replica/database.hh database.hh is a hotspot that changes often (or its dependencies do). Avoid including it to reduce recompilations. Closes scylladb/scylladb#21407	2024-11-04 13:16:27 +01:00
Benny Halevy	9ff614da9f	topology_experimental_raft/test_tablets: get_tablet_count_per_shard_for_host: move shards_count param to be last Prepare for the next comit that will add a version accepting a list of servers: `get_tablet_count_per_shard_for_hosts` for which we want `shards_per_node` to be last and have a default value. Also, fix the type hint for `full_tables`, as it had a syntax error, using `:` instead of `,`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-04 14:11:30 +02:00
Benny Halevy	0c1e85b6e3	test/pylib: ServerInfo: add datacenter and rack attributes Set to "DEFAULT_DC" and "DEFAULT_RACK" by default. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-04 14:11:30 +02:00
Benny Halevy	efa64cb92a	test: everywhere: drop unused imports of ServerInfo Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-04 14:11:30 +02:00
Avi Kivity	7cb1ad8c87	Merge 'compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors' from Benny Halevy stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them by continue with shutdown. stop_ongoing_compactions, in particular, currently returns the status of stopped compaction tasks from `stop_tasks`, but still all tasks must be stopped after it, even if they failed, so assert that and ignore the errors. Fixes scylladb/scylladb#21159 * Needs backport to 6.2 and 6.1, as commit `8cc99973eb` causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body Closes scylladb/scylladb#21299 * github.com:scylladb/scylladb: compaction_manager: stop: await _stop_future if engaged compaction_manager: really_do_stop: assert that no tasks are left behind compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors compaction/compaction_manager: stop_tasks(): unlink stopped tasks compaction/compaction_manager: make _tasks an intrusive list	2024-11-04 13:54:16 +02:00
Gleb Natapov	3b7d9fddbc	gossiper: drop unneeded this->	2024-11-04 12:02:51 +02:00
Gleb Natapov	501b8f6984	gossiper: fold get_or_create_endpoint_state into my_endpoint_state my_endpoint_state() is the only called of get_or_create_endpoint_state() and calling it is the only thing the function does anyway.	2024-11-04 12:02:51 +02:00
Gleb Natapov	300cbcebf6	gossiper: co-routinize do_send_ack_msg	2024-11-04 12:02:51 +02:00
Pavel Emelyanov	f3f956841f	sstables: Remove unused mp_row_consumer_m::range_tombstone_start It's only used by its operator<< so remove it as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21380	2024-11-03 16:40:02 +02:00
Avi Kivity	704ea9d3b4	Merge 'api: Remove foreach_column_family() helper' from Pavel Emelyanov There's a whole lot of helpers and wrappers in api/ that help handlers manipulate keyspaces and tables. One of those is foreach_column_family which calls the provided callable on a table on each shard. There's exactly the same (but a bit more flexible) helper nearby. While at it, this helper gets a better name. Closes scylladb/scylladb#21398 * github.com:scylladb/scylladb: api: Rename set_tables -> for_tables_on_all_shards api: Remove foreach_column_family() helper	2024-11-03 15:46:27 +02:00
Avi Kivity	856489ded1	cql3: remove unused request_validations methods These methods are not used and therefore removed. Closes scylladb/scylladb#21392	2024-11-03 13:17:32 +02:00
Benny Halevy	6cce67bec8	compaction_manager: stop: await _stop_future if engaged The current condition that consults the compaction manager state for awaiting `_stop_future` works since _stop_future is assigned after the state is set to `stopped`, but it is incidental. What matters is that `_stop_future` is engaged. While at it, exchange _stop_future with a ready future so that stop() can be safely called multiple times. And dropped the superfluous co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-03 10:53:35 +02:00
Benny Halevy	a7a55298ea	compaction_manager: really_do_stop: assert that no tasks are left behind stop_ongoing_compactions now ignores any errors returned by tasks, and it should leave no task left behind. Assert that here, before the compaction_manager is destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-03 10:53:34 +02:00
Benny Halevy	c08ba8af68	compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them but continue with shutdown. Leaked errors on the stop path may cause termination on shutdown, when called in a deferred action destructor. Fixes scylladb/scylladb#21298 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-11-03 10:52:58 +02:00
Botond Dénes	d8500472b3	compaction/compaction_manager: stop_tasks(): unlink stopped tasks Stopped tasks currently linger in _tasks until the fiber that created the task is scheduled again and unlinks the task. This window between stop and remove prevents reliable checks for empty _tasks list after all tasks are stopped. Unlink the task early so really_do_stop() can safely check for an empty _tasks list (next patch).	2024-11-03 10:17:11 +02:00
Botond Dénes	e942c074f2	compaction/compaction_manager: make _tasks an intrusive list _tasks is currently std::list<shared_ptr<compaction_task_executor>>, but it has no role in keeping the instances alive, this is done by the fibers which create the task (and pin a shared ptr instance). This lends itself to an intrusive list, avoiding that extra allocation upon push_back(). Using an intrusive list also makes it simpler and much cheaper (O(1) vs. O(N)) to remove tasks from the _tasks list. This will be made use of in the next patch. Code using _task has to be updated because the value_type changes from shared_ptr<compaction_task_executor> to compaction_task_executor&.	2024-11-03 10:17:11 +02:00
Avi Kivity	39b55bd3a0	Update seastar submodule * seastar f821bda19...fba36a3d1 (13): > build: do not include -DBoost_TEST_DYN_LINK in seastar_testing_cflags > doc: compatibility: update the notes on supported GCC versions > docker: bump up to clang {18,19} and gcc {13,14} > rpc: optimize small tuple deserialization > rpc: switch rpc::type from boost to std > thread: do not use fortify source > build: suppress CMake warning about CMP0057 > core/units: remove space before literal identifier > signal.md: describe auto signal handling > build: persist Seastar options in SeastarConfig.cmake > sharded.hh: seperate invoke_on decls from defs > test: Add perf test for http client > gate: check: mark as const Closes scylladb/scylladb#21390	2024-11-02 13:58:45 +02:00
Botond Dénes	19a43b5859	Merge 'repair: Reduce hints and batchlog flush' from Asias He The hints and batchlog flush requests are issued to all nodes for each repair request when tombstone_gc repair mode is used. The amount of such flush requests is high when all nodes in the cluster run repair. It is observed it takes a long time, up to 15s, for a repair request to finish such a flush request. To reduce overhead of the flush, each node caches the flush and only executes the real flush when some time has passed. It is safe to do so before the real flush_time is returned. Repair uses the smallest flush_time from peers as the repair time. The nice thing about the cache on the receiver side is that all senders can hit the cache. It is better than cache on the sender side. A slightly smaller flush_time compared to the real flush time will be used with the benefits of significantly dropped hints and batchlog flush. The tradeoff is reasonable. Fixes #20259 Performance improvement. No backports. Closes scylladb/scylladb#20260 * github.com:scylladb/scylladb: test/test_repair.py: Add test_batchlog_flush_in_repair repair: Reduce hints and batchlog flush db/batchlog_manager: Add add_delay_to_batch_replay db/batchlog_manager: Add get_last_replay db/batchlog_manager: wire in batchlog_replay_cleanup_after_replays db/config: introduce batchlog_replay_cleanup_after_replays db/batchlog_manager: do_batch_log_replay(): add cleanup flag	2024-11-01 14:23:27 +02:00
Pavel Emelyanov	292fd52a60	Merge 'utils: chunked_vector: various constructor improvements' from Avi Kivity Optimize the various constructors a little, and add an std::from_range_t constructor. Minor improvement, so no backports. Closes scylladb/scylladb#21399 * github.com:scylladb/scylladb: utils: chunked_vector: add from_range_t constructor utils: chunked_vector: optimize initializer_list constructor utils: chunked_vector: iterator constructor: copy spanwise utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction	2024-11-01 15:02:56 +03:00
Botond Dénes	4bafaee523	Merge 'tasks: improve task_manager::lookup_virtual_task' from Aleksandra Martyniuk Currently, to find the operation with given id, all operations tracked by a virtual task are listed. This isn't necessary, since we only need info regarding one particular operation. Add a method to check whether a virtual task tracks the operation with the given id. No backport needed Closes scylladb/scylladb#20769 * github.com:scylladb/scylladb: tasks: delete virtual_task::get_ids method as it is unused tasks: improve task_manager::lookup_virtual_task	2024-11-01 13:44:04 +02:00
Kefu Chai	1b8446f92d	compaction: fix the indent in `38ce2c605d`, we left a TODO for reindent the code. in this change, we reindent the code to address this TODO. Refs `38ce2c605d` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21383	2024-11-01 12:55:47 +03:00
Avi Kivity	b5e46077df	sstables: generation_type: replace boost ranges with std ranges Reduce dependency load. Closes scylladb/scylladb#21402	2024-11-01 12:45:24 +03:00
Pavel Emelyanov	d6169630a4	api: Rename set_tables -> for_tables_on_all_shards The former name is not extremely descriptive, hopefully the latter one is better in this sense. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-11-01 12:15:01 +03:00
Pavel Emelyanov	822758dffd	api: Remove foreach_column_family() helper There's a whole lot of helpers and wrappers in api/ that help handlers manipulate keyspaces and tables. One of those is foreach_column_family which calls the provided callable on a table on each shard. There's exactly the same (but a bit more flexible) set_table() helper nearby. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-11-01 12:13:35 +03:00
Botond Dénes	0ee0dd3ef4	Merge 'Collect and report backup progress' from Pavel Emelyanov Task manager GET /status method returns two counters that reflect task progress -- total and completed. To make caller reason about their meaning, additionally there's progress_units field next to those counters. This patch implements this progress report for backup task. The units are bytes, the total counter is total size of files that are being uploaded, and the completed counter is total amount of bytes successfully sent with PUT requests. To get the counters, the client::upload_file() is extended to calculate those. fixes #20653 Closes scylladb/scylladb#21144 * github.com:scylladb/scylladb: backup_task: Report uploading progress s3/client: Account upload progress for real s3/client: Introduce upload_progress s3: Extract client_fwd.hh	2024-11-01 10:57:12 +02:00
Kefu Chai	64122b3df3	treewide: s/boost::transform/std::ranges::transform/ now that we are allowed to use C++23. we now have the luxury of using `std::ranges::transform`. in this change, we: - replace `boost::transform` with `std::ranges::transform` - update affected code to work with `std::ranges::transform` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21318	2024-11-01 08:15:14 +02:00
Avi Kivity	8c67f9b42e	cql3: util: remove unneeded boost/range includes from header files The includes are redistributed to the source files that need them. Closes scylladb/scylladb#21391	2024-10-31 23:49:44 +01:00
Nadav Har'El	ee2d75b088	Merge 'Generalize "breakpoint" type of error injection' from Pavel Emelyanov This pattern is -- if requested (by test) suspend code execution until requestor (the test) explicitly wakes it up. For that the injected place should inject a lambda that is called with so called "handler" at hand and try to read message from the handler. In many cases the inner lambda additionally prints a message into logs that tests waits upon to make sure injection was stepped on. In the end of the day this "breakpoint" is injected like ``` co_await inject("foo", [] (auto& handler) { log.info("foo waiting"); co_await handler.wait_for_message(timeout); }); ``` This PR makes breakpoints shorter and more unified, like this ``` co_await inject("foo", wait_for_message(timeout)); ``` where `wait_for_message` is a wrapper structure used to pick new `inject()` overload. Closes scylladb/scylladb#21342 * github.com:scylladb/scylladb: sstables: Use inject(wait_for_message_overload) treewide,error_injection: Use inject(wait_for_message) and fix tests treewide,error_injection: Use inject(wait_for_message) overload error_injection: Add inject() overload with wait_for_message wrapper	2024-10-31 21:56:27 +02:00
Avi Kivity	6a9852d47b	utils: chunked_vector: add from_range_t constructor std::ranges::to<> has a little protocol with containers. Implement it to get optimized construction. Similar to the iterator pair constructor, if the range's size can be obtained (even with an O(N) algorithm), favor that to avoid reallocations. Copy elements spanwise to promote optimization to memcpy when possible.	2024-10-31 19:32:16 +02:00
Avi Kivity	b2769403d2	utils: chunked_vector: optimize initializer_list constructor Delegate to the previously optimized iterator-pair constructor.	2024-10-31 18:10:14 +02:00
Avi Kivity	0a81be4321	utils: chunked_vector: iterator constructor: copy spanwise Instead of copying element-by-element, copy contiguous spans. This is much faster if the input is a span and the constructor is trivial, since the whole thing translates to a memcpy. Make the two branches constexpr to reduce work for the compiler in optimizing the other branch away.	2024-10-31 18:10:08 +02:00
Avi Kivity	4653430c8e	utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction For a forward iterator, prefer a two pass algorithm to first count the number of elements, reserver, then copy the elements, to a single pass algorithm that involves reallocation and copying.	2024-10-31 17:55:42 +02:00
Kefu Chai	673b107ffa	github: use GithubException when appropriate `Exception` could be too general, what we really care about is `GithubException`. so let's catch the latter instead for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21364	2024-10-31 18:21:29 +03:00
Kefu Chai	f8221b960f	test: route S3 mock server messages through logger The S3 mock server (introduced in `5a96549c`) currently prints its status messages directly to stdout, which can be distracting when reviewing test results. For example: ```console $ ./test.py --verbose --mode debug object_store/test_backup::test_simple_backup Found 1 tests. Starting S3 mock server on ('127.226.51.1', 2012) ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [1/1] object_store debug [ PASS ] object_store.test_backup.1 5.99s Stopping S3 mock server ------------------------- CPU utilization: 6.5% ``` Move these messages to use proper logging to give developers more control over their visibility: - Make logger parameter mandatory in MockS3Server constructor - Route "Stopping S3 mock server" message through the provided logger - Add --log-level option to the standalone mock server launcher The message is now hidden: ```console $ ./test.py --verbose --mode debug --save-log-on-success object_store/test_backup::test_simple_backup Found 1 tests. ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [1/1] object_store debug [ PASS ] object_store.test_backup.1 6.25s ------------------------------------------------------------------------------ CPU utilization: 5.5% ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21384	2024-10-31 18:21:29 +03:00
Benny Halevy	78ceaeabca	compaction_manager: compaction_disabled: return true if not in compaction_state When a compaction_group is removed via `compaction_manager::remove`, it is erase from `_compaction_state`, and therefore compaction is definitely not enabled on it. This triggers an internal error if tablets are cleaned up during drop/truncate, which checks that compaction is disabled in all compaction groups. Note that the callers of `compaction_disabled` aren't really interested in compaction being actively disabled on the compaction_group, but rather if it's enabled or not. A follow-up patch can be consider to reverse the logic and expose `compaction_enabled` rather than `compaction_disabled`. Fixes scylladb/scylladb#20060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21378	2024-10-31 18:21:29 +03:00
Dawid Mędrek	495c1188e9	docs/dev: Document semantics of describing CDC tables	2024-10-31 11:25:19 +01:00
Dawid Mędrek	39e0513e1b	cql3: Allow for describing CDC log tables In the past, DESC SCHEMA would produce create statements for both the base and the log table. That was incorrect as the log table is automatically created alongside the base one. That was solved in scylladb/scylladb@9ab57b1 (scylladb/scylladb#18467). The mentioned changes implemented the following solution: * DESC SCHEMA/KEYSPACE/TABLE would still print a create statement for the CDC base table, * DESC SCHEMA/KEYSPACE would start printing an alter statement for the CDC log table. That statement would ensure that the restored log table has the same parameters as the original one, * DESC TABLE <base table> would behave as DESC SCHEMA/KEYSPACE, i.e. it would print a create statement for the base table and an alter statement for the log table, * DESC TABLE <log table> would result in an error. While that solution was good and behaved correctly in the context of restoring the schema, it had one flaw: describe statement aren't only used as a means for producing a backup; they also serve an informative purpose to learn about the schema, e.g. to learn what parameters a specific table uses. Because we didn't allow for describing CDC log tables, the user couldn't look them up directly via a describe statement -- they had to describe the base table for that. Attempting to describe a log table ended with an error, e.g.: ``` $ DESC TABLE ks.t_scylla_cdc_log; ks.t_scylla_cdc_log is a cdc log table and it cannot be described directly. Try `DESC TABLE ks.t` to describe cdc base table and it's log table. ``` In these changes, we allow for describing CDC log tables again. The semantics of the first three bullets above remains unchanged, but we impose new behavior for DESC TABLE <log table>: * When the user executes DESC TABLE <log table>, a create statement will be returned, treating the table as if it were a regular one, * The create statement will be wrapped in CQL comment markers. The rationale for the second bullet is that although we want to give the user a means to look into the structure and options of a CDC log table, the returned statement is not supposed to be ever executed by them. We want to minimize the risk of that. An example of the behavior after the change: ``` $ DESC TABLE ks.t_scylla_cdc_log; /* Do NOT execute this statement! It's only for informational purposes. A CDC log table is created automatically when the base is created. CREATE TABLE ks.t_scylla_cdc_log ( "cdc$stream_id" blob, "cdc$time" timeuuid, "cdc$batch_seq_no" int, "cdc$end_of_batch" boolean, "cdc$operation" tinyint, "cdc$ttl" bigint, p int, PRIMARY KEY ("cdc$stream_id", "cdc$time", "cdc$batch_seq_no") ) WITH CLUSTERING ORDER BY ("cdc$time" ASC, "cdc$batch_seq_no" ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'enabled': 'false', 'keys': 'NONE', 'rows_per_partition': 'NONE'} AND comment = 'CDC log for ks.t' AND compaction = {'class': 'TimeWindowCompactionStrategy', 'compaction_window_size': '60', 'compaction_window_unit': 'MINUTES', 'expired_sstable_check_frequency_seconds': '1800'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1 AND default_time_to_live = 0 AND gc_grace_seconds = 0 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND speculative_retry = '99.0PERCENTILE'; */ ``` Fixes scylladb/scylladb#21235	2024-10-31 11:25:19 +01:00
Wojciech Mitros	88ab8db944	mv: run view building in streaming scheduling group View building is an expensive process that takes a long time to complete. During the build, it's impact on other work should be minimized, even at the expense of slightly slowing it down. Instead, view building is currently performed in the the same scheduling group (gossip) as other high-priority tasks, in particular raft processing, which slows it down, making races more likely and increasing the number of retries that need to be done. While view building is still initiated in the gossip group (as it's the result of adding a view, which is a schema change), in this patch the bulk of the view building work is moved to a low-priority, maintenance scheduling group (named "streaming" after its main use case). Additionally, a test is added, where we make sure that the scheduling group is the one most used when building a view. Fixes https://github.com/scylladb/scylladb/issues/21232 Closes scylladb/scylladb#21326	2024-10-31 10:13:20 +01:00
Nadav Har'El	7572c483b1	test/topology_experimental_raft: fix flaky test Today, each test function in test/topology_experimental_raft creates a cluster in the beginning of the test and drops it at the end of the function. This is very inefficient if you hope (like I do) to write many small and pinpointed test functions instead of large test functions that test 20 unrelated things. Trying to propose a way to change this sad state of affairs, in test_alternator.py I created a fixture "alternator3" which I hoped could be used in multiple tests that need a 3-node Alternator cluster. Currently only one test uses this fixture. Unfortunately, it turns out the alternator3 fixture is broken, and led to flaky test runs (sometimes the test using alternator3 picked up an existing cluster instead of starting with an empty cluster, and failed). These problems cannot be completely fixed at the current state of the framework. The framework does not currently allow keeping a 3-node cluster between test functions, while also allowing other test functions to create different clusters. The specific flakiness we saw could be fixed by adding a missing before_test() call, but in the future we would need to ensure that all the test functions that use it are contiguous in the test file, and I don't see how we can (or want to) ensure this. So at this point I am giving up and withdrawing this proposal until the developers of the topology test framework make this one of their design goals. Since there was only one test using this fixture, removing it should make no performance or correctness difference - it should just fix the flakiness. Fixes scylladb/scylladb#21322. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21370	2024-10-31 10:12:26 +01:00
Calle Wilund	c4361037f7	cql_test_env/gossip: Prevent double shutdown call crash Fixes scylladb/scylladb#21159 When an exception is thrown in sstable write etc such that storage_manager::isolate is initiated, we start a shutdown chain for message service, gossip etc. These are synced (properly) in storage_manager::stop, but if we somehow call gossiper::shutdown outside the normal service::stop cycle, we can end up running the method simultaneously, intertwined (missing the guard because of the state change between check and set). We then end up co_awaiting an invalid future (_failure_detector_loop_done) - a second wait. Fixed by a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added in `20496ed`, ages ago. However, it should not be needed nowadays. b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure. Closes scylladb/scylladb#21379	2024-10-31 10:11:20 +01:00
Nadav Har'El	d3f09638f0	Merge 'compound_compat: replace use of boost ranges with std ranges' from Avi Kivity Replace use of boost::ranges::join() with another construct, as it has no std replacement, and replace other uses with their std equivalent, in order to reduce dependency load. Code cleanup - no backport. Closes scylladb/scylladb#21382 * github.com:scylladb/scylladb: compound_compat: replace use of boost ranges with std ranges compound_compat: simplify seriakization of ka/la sstables static cell names	2024-10-31 10:16:41 +02:00
Nadav Har'El	65e29f28bd	Merge 'gms: remove unused #includes ' from Kefu Chai these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#21374 * github.com:scylladb/scylladb: .github: add gms to iwyu's CLEANER_DIR gms: remove unused `#include`s	2024-10-31 09:06:37 +02:00
Kefu Chai	2498e37a2f	mutation_writer,streaming: use reader_consumer_v2 type when appropriate The `reader_consumer_v2` type (`std::function<future<> (mutation_reader)>`) is defined alongside `mutation_reader` in `mutation_reader.hh`. before this change, we sometimes use `std::function<future<> (mutation_reader)>` directly when defining a consumer parameter or a consumer variable. in this change, we improve maintainability by: - Reducing duplicate function type declarations - Centralizing the consumer type definition - Making future signature updates easier to implement Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21369	2024-10-31 07:17:47 +02:00
Avi Kivity	907da210b6	compound_compat: replace use of boost ranges with std ranges To reduce the dependency load, replace use of boost ranges with the std equivalent. Files that lost the indirect boost dependency have it added as a direct dependency.	2024-10-30 19:58:07 +02:00
Avi Kivity	982cebc1f6	compound_compat: simplify seriakization of ka/la sstables static cell names compound_compat is used for serializing ka/la sstables static cell names. Since we can no longer write such sstabkes, the function is used only in some tests. Reduce the use of boost::range::join(): it has no direct equivalent in std (std::views::concat is in C++26), and it is slow due to the need to type-erase. Instead of using boost::range::join, extend the vector used to hold the empty clustering key a bit more, and copy the view representing the static cell name into into it.	2024-10-30 19:19:57 +02:00
Kefu Chai	d3a6931b14	.github: add gms to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-30 23:01:34 +08:00
Kefu Chai	52ec315ffd	gms: remove unused `#include`s these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-30 23:01:34 +08:00
Pavel Emelyanov	c16369323b	sstables: Use inject(wait_for_message_overload) This place could be in the pre-previous patch, it just can use the overload, but it seemengly has a bug. It prints _two_ messages -- that the injection handler was suspended and that it was woken up. The bug is in the 2nd message -- it's printed without waiting for the message, so it likely gets printed before wakeup itself. It seems that no tests care about it though. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-30 16:53:33 +03:00
Pavel Emelyanov	39cb93be3c	treewide,error_injection: Use inject(wait_for_message) and fix tests This is continuation of previous patch, this time also update tests that wait for specific message in logs (to make sure injection handler was called and paused the code execution). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-30 16:53:33 +03:00
Pavel Emelyanov	7d8cc3ccc2	treewide,error_injection: Use inject(wait_for_message) overload Many places want to inject a handler that waits for external kick. Now there's convenience inject() method overload for this. It will result in extra messages in logs, but so far no code/test cares about it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-30 16:53:33 +03:00
Pavel Emelyanov	c1432f3657	error_injection: Add inject() overload with wait_for_message wrapper The wrapper object denotes that injection should run a handler and wait_for_message() on it. Wrapper carries the timeout used to call the mentioned method. It's currently unused, next patches will start enjoing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-30 16:53:33 +03:00
Dawid Mędrek	b984488552	cql3: Rename `SALTED HASH` to `HASHED PASSWORD` Cassandra 4.1 announced a new option to create a role with: `HASHED PASSWORD`. Example: ``` CREATE ROLE bob WITH HASHED PASSWORD = 'hashed_password'; ``` We've already introduced another option following the same semantics: `SALTED HASH`; example: ``` CREATE ROLE bob WITH SALTED HASH = 'salted_hash'; ``` The change hasn't made it to any release yet, so in this commit we rename it to `HASHED PASSWORD` to be compatible with Cassandra. Additionally, we adjust existing tests to work against Cassandra too. Fixes scylladb/scylladb#21350 Closes scylladb/scylladb#21352	2024-10-30 14:07:58 +02:00
Aleksandra Martyniuk	bc5b1f9a5d	tasks: delete virtual_task::get_ids method as it is unused	2024-10-30 12:25:47 +01:00
Aleksandra Martyniuk	9b5d69ae96	tasks: improve task_manager::lookup_virtual_task Currently, lookup_virtual_task gets the list of ids of all operations tracked by a virtual task and checks whether it contains given id. The list of all ids isn't required and the check whether one particular operation id is tracked by the virtual task may be quicker than listing all operations. Add virtual_task::contains method and use it in lookup_virtual_task.	2024-10-30 12:24:38 +01:00
Kefu Chai	d81ed5adb4	compaction: explain make_interpose_consumer() in compaction strategy Add documentation to clarify the purpose and behavior of make_interpose_consumer() in the compaction_strategy_impl class. This method is crucial for building layered processing pipelines but its semantics were previously undocumented. The added documentation explains how: - It decorates end consumers with additional processing steps - It enables construction of processing pipelines - The original consumer's semantics are preserved This improves code maintainability by making the pipeline construction pattern more apparent to developers. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21336	2024-10-30 13:22:00 +03:00
Tomasz Grabiec	f3869dadc6	Merge 'compound: replace boost ranges with std ranges' from Avi Kivity Continue standardization on std::ranges. Since compound contains a custom iterator, we first have to upgrade it to C++20 iterator concepts. Cleanup / minor refactoring, so no backport. Closes scylladb/scylladb#21320 * github.com:scylladb/scylladb: compound: replace boost ranges with std ranges compound: upgrade iterator to be an std::forward_iterator	2024-10-30 11:02:51 +01:00
Asias He	73806f66a5	test/test_repair.py: Add test_batchlog_flush_in_repair It checks batchlog flush request cache in repair.	2024-10-30 11:10:39 +08:00
Asias He	b3b3e880d3	repair: Reduce hints and batchlog flush The hints and batchlog flush requests are issued to all nodes for each repair request when tombstone_gc repair mode is used. The amount of such flush requests is high when all nodes in the cluster run repair. It is observed it takes a long time, up to 15s, for a repair request to finish such a flush request. To reduce overhead of the flush, each node caches the flush and only executes the real flush when the cahce time has passed. It is safe to do so because the real flush_time is returned. Repair uses the smallest flush_time returned from peers as the repair time. The nice thing about the cache on the receiver side is that all senders can hit the cache. It is better than cache on the sender side. A slightly smaller flush_time compared to the real flush time will be used with the benefits of significantly dropped hints and batchlog flush. The trade-off looks reasonable. Tests: 2 nodes, with 1s batchlog delay: Before: Repair nr_repairs=20 cache_time_in_ms=0 total_repair_duration=40.04245328903198 After: Repair nr_repairs=20 cache_time_in_ms=5000 total_repair_duration=1.252073049545288 Fixes #20259	2024-10-30 11:07:57 +08:00
Asias He	f8ad78ba1e	db/batchlog_manager: Add add_delay_to_batch_replay It is used to simulate slow replay.	2024-10-30 11:07:57 +08:00
Asias He	fed9b54664	db/batchlog_manager: Add get_last_replay It is used to get the time when the last replay is executed.	2024-10-30 11:07:57 +08:00
Botond Dénes	3361542e84	db/batchlog_manager: wire in batchlog_replay_cleanup_after_replays After the specified amount of replays, trigger a cleanup: flush batchlog table memtables. This allows the cleanup to happen on a configurable interval, instead of on every batchlog replay attempt, which might be too much.	2024-10-30 11:07:57 +08:00
Botond Dénes	1635525526	db/config: introduce batchlog_replay_cleanup_after_replays Not used yet.	2024-10-30 11:07:57 +08:00
Botond Dénes	169c74346d	db/batchlog_manager: do_batch_log_replay(): add cleanup flag Add a flag controlling whether cleanup (memtable flush) will be done after the replay. This is to allow repair to opt out from cleanup -- when many concurrenty repairs are running, there can be storms of calles to do_batch_log_replay(), which will be mostly no-op, but they will all attempt to flush the memtable to clean-up after themselves. This is unnecessary and introduces latency to repairs, best to leave the cleanup to the periodic batch-log replay.	2024-10-30 11:07:57 +08:00
Avi Kivity	73b1f66b70	Revert "Merge 'Allow explicitly enabling or disabling tablets when creating a new keyspace' from Benny Halevy" This reverts commit `c286434e4c`, reversing changes made to `6712fcc316`. The commit causes memtable_test to be very flaky in debug mode. Specifically, subtests test_exceptions_in_flush_on_sstable_open and test_exceptions_in_flush_on_sstable_write).	2024-10-30 00:55:29 +02:00
Avi Kivity	b9df3aec12	gdb: avoid @classmethod/@property combinations The @classmethod/@property combination was deprecated in Python 3.11 and removed[1] in Python 3.13. It's used in scylla-gdb.py, breaking it with Python 3.13. To fix, just make all users (size_t and _vptr_type) top-level functions. The definitions are all identical and don't need to be in class scope. [1] https://docs.python.org/3.13/library/functions.html#classmethod Closes scylladb/scylladb#21349	2024-10-29 19:37:07 +02:00
Gleb Natapov	cc7f25062a	topology coordinator: take a copy of a replication state in raft_topology_cmd_handler Current code takes a reference and holds it past preemption points. And while the state itself is not suppose to change the reference may become stale because the state is re-created on each raft topology command. Fix it by taking a copy instead. This is a slow path anyway. Fixes: scylladb/scylladb#21220 Closes scylladb/scylladb#21316	2024-10-29 15:47:43 +01:00
Avi Kivity	020ccbd76a	Merge 'utils: cached_file: Mark permit as awaiting on page miss' from Tomasz Grabiec Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522). But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or partition index IO involved because that IO will signal read concurrency semaphore to invite more concurrency. Fixes #21325 Closes scylladb/scylladb#21323 * github.com:scylladb/scylladb: utils: cached_file: Mark permit as awaiting on page miss utils: cached_file: Push resource_unit management down to cached_file	2024-10-29 16:15:21 +02:00
Kamil Braun	36cc3bcc90	test: test_crash_coordinator_before_streaming: enable TRACE for `raft_topology` logger Issue scylladb/scylladb#21114 reported that sometimes during the test we timeout when waiting for node to restart after it was killed. Preliminary investigation showed that the node appears to be hanging inside `topology_state_load`, while holding `token_metadata` lock, which prevents `join_topology` from progressing. Enable TRACE level logging for `raft_topology` so we get more accurate info where inside `topology_state_load` the hang happens, once the problem reproduces again in CI. Closes scylladb/scylladb#21247	2024-10-29 12:46:47 +02:00
Kefu Chai	54d438168a	build: cmake: explicitly mark convenience libraries as STATIC before this change, these [convenience libraries](https://www.gnu.org/software/automake/manual/html_node/Libtool-Convenience-Libraries.html) were implicitly built as static libraries by default, but weren't explicitly marked as STATIC in CMake. While this worked with default settings, it could cause issues if `BUILD_SHARED_LIBS` is enabled. So before we are ready for building these components as shared libraries, let's mark all convenience libraries as STATIC for consistency and to prevent potential issues before we properly support shared library builds. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21274	2024-10-29 10:22:19 +01:00
Yaron Kaikov	94a9efbf1c	github: add script for backports automation instead of Mergify Adding an auto-backport.py script to handle backport automation instead of Mergify. The rules of backport are as follows: * Merged or Closed PRs with any backport/x.y label (one or more) and promoted-to-master label * Backport PR will be automatically assigned to the original PR author * In case of conflicts the backport PR will be open in the original autoor fork in draft mode. This will give the PR owner the option to resolve conflicts and push those changes to the PR branch (Today in Scylla when we have conflicts, the developers are forced to open another PR and manually close the backport PR opened by Mergify) * Fixing cherry-pick the wrong commit SHA. With the new script, we always take the SHA from the stable branch * Support backport for enterprise releases (from Enterprise branch) Fixes: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21302	2024-10-29 10:04:30 +02:00
Pavel Emelyanov	25ae3d0aed	backup_task: Report uploading progress Do it by passing reference to s3::upload_progress_monitor object that sits on task impl itself. Different files' uploads would then update the monitor with their sizes and uploaded counters. The structure is reported by get_progress() method. Unit size is set to be bytes. Test is updated. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:40:35 +03:00
Pavel Emelyanov	2efcfc13e8	s3/client: Account upload progress for real Before upload starts file size is checked, so this is the place that updates progress.total counter. Uploading a file happens by reading unit_size bytes from file input stream and writing the buffer into http body writer stream. This is the place to update progress.uploaded counter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:38:39 +03:00
Pavel Emelyanov	51e03b1025	s3/client: Introduce upload_progress This is a structure with "total" and "uploaded" counters that's passed by user to client::upload_file() method so that client would update it with the progress. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:38:39 +03:00
Pavel Emelyanov	f9a5e02b53	s3: Extract client_fwd.hh This is to export some simple structures to users without the need to include client.hh itself (rather large already) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-29 08:38:39 +03:00
Avi Kivity	49d3e281d6	Merge 'Sanitize /system/highest_supported_sstable_version API endpoint' from Pavel Emelyanov Its handler dereferences long chain of objects to get to the value it needs. There's shorter way. Also, the endpoint in question is not unregistered on stop. Closes scylladb/scylladb#21279 * github.com:scylladb/scylladb: api: Make get_highest_supported_sstable_version use proper service api: Move system::get_highest_supported_sstable_version set/unset api: Scaffold for sstables-format-selector	2024-10-28 21:42:41 +02:00
Pavel Emelyanov	b09bb6bc19	error_injection: Re-use enter() code in inject() overloads Most of inject() overloads check if the injection is enabled, then optionally clear the one-shot one, then do the injection. Everything but doing the injection is implemented in the enter() method, it's perfectly worth re-using one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21285	2024-10-28 21:37:20 +02:00
Kefu Chai	7610b907c6	build: include subdirectory rules in compilation database merge Previously in `e65185ba`, when merging Seastar's and ScyllaDB's compilation databases, the "prefix" parameter in merge-compdb.py was too restrictive. It only included build rules for files with "CMakeFiles" prefix, excluding source files in subdirectories like `apps/iotune/CMakeFiles/app_iotune.dir/iotune.cc.o`. In this change, we change the prefix parameter to an empty string to include all source files whose object files are located under build directories, regardless of their path structure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21312	2024-10-28 21:34:21 +02:00
Avi Kivity	c286434e4c	Merge 'Allow explicitly enabling or disabling tablets when creating a new keyspace' from Benny Halevy Separate the configuration for enabling the tablets feature from the enablement of tablets when creating new keyspaces. This change always enables the TABLETS cluster feature and the tablets logic respectively. The `enable_tablets` config option just controls whether tablets are enabled or disabled by default for new keyspaces. If `enable_tablets` is set to `true`, tablets can be disabled using `CREATE KEYSPACE WITH tablets = { 'enabled': false }` as it is today. If `enable_tablets` is set to `false`, tablets can be enabled using `CREATE KEYSPACE WITH tablets = { 'enabled': true }`. The motivation for this change is to simplify the user experience of using tablets by setting the default for new keyspaces to false amd allowing the user to simply opt-in by using tablets = {enabled: true }. This is not pissible today. The user has to enable tablets by default for all new keyspaces (that use the NetworkTopologyStrategy) and then actively opt-out to use vnodes. * Not required to be backported to OSS versions. May be backported to specific enterprise versions Closes scylladb/scylladb#20729 * github.com:scylladb/scylladb: data_dictionary: keyspace_metadata::describe: print tablets enabled also when defaulted tablets_test: test enable/disable tablets when creating a new keyspace treewide: always allow tablets keyspaces feature_service: prevent enabling both tablets and gossip topology changes alternator: create_keyspace_metadata: enable tablets using feature_service	2024-10-28 21:33:17 +02:00
Nadav Har'El	6712fcc316	test/cql-pytest: add option to run cql-pytes tests against specific release This patch adds the option "--release <version>" to test/cql-pytest/run, which downloads the pre-compiled Scylla release with the given version number and runs the tests against that version. For example, it can be used to demonstrate that #15559 was indeed a regression between 2022.1 and 2022.2, by running a recently-added test against these two old versions: test/cql-pytest/run --release 2022.1 --runxfail \ test_prepare.py::test_duplicate_named_bind_marker_prepared test/cql-pytest/run --release 2022.2 --runxfail \ test_prepare.py::test_duplicate_named_bind_marker_prepared The first run passes, the second fails - showing the regression. The Scylla releases are downloaded from ScyllaDB's S3 bucket (downloads.scylladb.com). They are saved in the build/ directory (e.g., build/2022.2.9), and if that directory is not removed, when "run --release" requests the same version again, the previous download is reused. Release numbers can look like: * 5.4.7 * 5.4 (will get the latest in the 5.4 branch, e.g., 5.4.7) * 5.4.0~rc2 (a prerelease) * 2021.1.9 (Enterprise release) * 2023.1 (latest in this branch, Enterprise release) Fixes #13189 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19228	2024-10-28 21:29:44 +02:00
Kefu Chai	f3dee5b636	build: enable CMAKE_CXX_EXTENSIONS explicitly before this change, Seastar enables CXX_EXTENSIONS in its own build rules. but it does not expose it to the parent project. but scylladb's CMake building system respect seastar's .pc file and includes the cflags exposed by it. without this change, scylladb included "-std=c++23" from seastar, and "-std=gnu++23" from itself. this is both confusing and inconsistent with the build rules generated by `configure.py`. in this change, we explicitly set `CMAKE_CXX_EXTENSIONS` when creating Seastar's building rules, so that it can populate this setting to its .pc file. in this way, we don't have two different options for specifying the C++ standard when building scylladb with CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21311	2024-10-28 21:23:04 +02:00
Kefu Chai	8b80ef3290	build: Remove GCC ARM warning workaround (originally added in `193d1942`) The workaround was initially added to silence warnings on GCC < 6.4 for ARM platforms due to a compiler bug (gcc.gnu.org/bugzilla/show_bug.cgi?id=77728). Since our codebase now requires modern GCC versions for coroutine support, and the bug was fixed in GCC 6.4+, this workaround is no longer needed. Refs `193d1942f2` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21308	2024-10-28 21:19:56 +02:00
Avi Kivity	94c21e5c05	Merge 'sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions' from Tomasz Grabiec Single-row reads from large partition issue 64 KiB reads to the data file, which is equal to the default span of the promoted index block in the data file. If users would want to increase selectivity of the index to speed up single-row reads, this won't be effective. The reason is that the reader uses promoted index to look up the start position in the data file of the read, but end position will in practice extend to the next partition, and amount of I/O will be determined by the underlying file input stream implementation and its read-ahead heuristics. By default, that results in at least 2 IOs 32KB each. There is already infrastructure to lookup end position based on upper bound of the read, in anticipation for sharing the promoted index cache, but it's not effective becasue it's a non-populating lookup and the upper bound cursor has its own private cached_promoted_index, which is cold when positions are computed. It's non-populating on purpose, to avoid extra index file IO to read upper bound. In case upper bound is far-enough from the lower bound, this will only increase the cost of the read. The solution employed here is to warm up the lower bound cursor's cache before positions are computed, and use that cursor for non-populating lookup of the upper bound. We use the lower bound cursor and the slice's lower bound so that we read the same blocks as later lower-bound slicing would, so that we don't incur extra IO for cases where looking up upper bound is not worth it, that is when upper bound is far from the lower bound. If upper bound is near lower bound, then warming up using lower bound will populate cached_promoted_index with blocks which will allow us to locate the upper bound block accurately. This is especially important for single-row reads, where the bounds are around the same key. In this case we want to read the data file range which belongs to a single promoted index block. It doesn't matter that the upper bound is not exactly the same. They both will likely lie in the same block, and if not, binary search will bring adjacent blocks into cache. Even if upper bound is not near, the binary search will populate the cache with blocks which can be used to narrow down the data file range somewhat. Fixes #10030. The change was tested with perf-fast-forward. I populated the data set with `column_index_size_in_kb` set to 1 scylla perf-fast-forward --populate --run-tests=large-partition-slicing --column-index-size-in-kb=1 Test run: build/release/scylla perf-fast-forward --run-tests=large-partition-select-few-rows -c1 --keep-cache-across-test-cases --test-case-duration=0 This test issues two reads of subsequent keys from the middle of a large partition (1M rows in total). The first read will miss in the index file page cache, the second read will hit. Notice that before the change, the second read issued 2 aio requests worth of 64KiB in total. After the change, the second read issued 1 aio worth of 2 KiB. That's because promoted index block is larger than 1 KiB. I verified using logging that the data file range matches a single promoted index block. Also, the first read which misses in cache is still faster after the change. Before: ``` running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009802 1 1 102 0 102 102 21.0 21 196 2 1 0 1 1 0 0 0 568 269 4716050 53.4% 500001 1 0.000321 1 1 3113 0 3113 3113 2.0 2 64 1 0 1 0 0 0 0 0 116 26 555110 45.0% ``` After: ``` running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009609 1 1 104 0 104 104 20.0 20 137 2 1 0 1 1 0 0 0 561 268 4633407 43.1% 500001 1 0.000217 1 1 4602 0 4602 4602 1.0 1 2 1 0 1 0 0 0 0 0 110 26 313882 64.1% ``` Backports: none, not a regression Closes scylladb/scylladb#20522 * github.com:scylladb/scylladb: perf: perf_fast_forward: Add test case for querying missing rows perf-fast-forward: Allow overriding promoted index block size perf-fast-forward: Test subsequent key reads from the middle in test_large_partition_select_few_rows perf-fast-forward: Allow adding key offset in test_large_partition_select_few_rows perf-fast-forward: Use single-partition reads in test_large_partition_select_few_rows sstables: bsearch_clustered_cursor: Add more tracing points sstables: reader: Log data file range sstables: bsearch_clustered_cursor: Unify skip_info logging sstables: bsearch_clustered_cursor: Narrow down range using "end" position of the block sstables: bsearch_clustered_cursor: Skip even to the first block test: sstables: sstable_3_x_test: Improve failure message sstables: mx: writer: Never include partition_end marker in promoted index block width sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions sstables: clustered_cursor: Track current block	2024-10-28 21:13:23 +02:00
Tomasz Grabiec	0f2101b055	utils: cached_file: Mark permit as awaiting on page miss Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO. But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or index file IO involved because that IO will signal read concurrency semaphore to invite more concurrency.	2024-10-28 19:54:58 +01:00
Tomasz Grabiec	868f5b59c4	utils: cached_file: Push resource_unit management down to cached_file It saves us permit operations on the hot path when we hit in cache. Also, it will lay the ground for marking the permit as awaiting later.	2024-10-28 19:49:58 +01:00
Avi Kivity	d3dae09316	compound: replace boost ranges with std ranges Standardize on the standard range library. The serialize_value(initializer_list) overload is disambiguated not to call itself. Apparently it wasn't called before. Since std::ranges::subrange does not provide operator==, replace it with std::ranges::equals().	2024-10-28 18:35:41 +02:00
Avi Kivity	61d7f1f6a5	compound: upgrade iterator to be an std::forward_iterator compound::iterator isn't far from a forward_iterator, and if we want to use it with std::ranges, we have to upgrade it. This is because std::ranges::subrange() only provides front() for forward ranges, and we do use this front(). Boost apparently isn't as strict. To make it a forward_range, we have to drop operator-> and make operator* return a value (similar to std::views::tranform), since forward iterators require that pointers and references be stable, and this iterator returns a pointer to one of its members. We also add an iterator_concept member to declare the compatibility to std::ranges.	2024-10-28 17:16:36 +02:00
Kamil Braun	101c1d50f0	Merge 'fix nodetool status to show zero-token nodes' from Abhinav Kumar Jha In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. A test is also added in nodetool/test_status.py to verify this logic. This test fails without this commit’s zero token node support logic, hence verifying the behavior. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions don't support zero token nodes. Fixes: scylladb/scylladb#19849 Fixes: scylladb/scylladb#17857 Closes scylladb/scylladb#20909 * github.com:scylladb/scylladb: fix nodetool status to show zero-token nodes test: move `wait_for_first_completed` to pylib/util.py token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes	2024-10-28 12:19:36 +01:00
Kefu Chai	9f8adcd207	backup_task: track the first failure uploading sstables before this change, we only record the exception returned by `upload_file()`, and rethrow the exception. but the exception thrown by `update_file()` not populated to its caller. instead, the exceptional future is ignored on pupose -- we need to perform the uploads in parallel. this is why the task is not marked fail even if some of the uploads performed by it fail. in this change, we - coroutinize `backup_task_impl::do_backup()`. strictly speaking, this is not necessary to populate the exception. but, in order to ensure that the possible exception is captured before the gate is closed, and to reduce the intentation, the teardown steps are performed explicitly. - in addition to note down the exception in the logging message, we also store it in a local variable, which it rethrown before this function returns. Fixes scylladb/scylladb#21248 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21254	2024-10-28 12:54:27 +03:00
Tzach Livyatan	1878af9399	Update os-support-info.rst - add CentOS ScyllaDB support RHEL 9 and derivatives, including CentOS 9. Fix https://github.com/scylladb/scylladb/issues/21309 Closes scylladb/scylladb#21310	2024-10-28 10:02:31 +02:00
Kefu Chai	8ac471b74b	dht: do not include unused headers in `8d1b3223`, we removed some unused "#include"s, but we failed to address all of them in "dht" subdirectory. and the unaddressed "#include"s are identified by the iwyu workflow. in this change, we address the leftovers. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21291	2024-10-28 09:58:42 +02:00
Anna Stuchlik	44a807f5bc	doc: improve the README file in the docs folder This commit improves the README file so that it's more helpful to documentation contributors. Especially, it: - Adds the link to the prerequisites. - Add information on troubleshooting (checking the links, headings, etc.) - Removes the section on creating a knowledge base article, as we no longer promote adding KBs in favor of creating a coherent documentation set. Fixes https://github.com/scylladb/scylladb/issues/21257 Closes scylladb/scylladb#21262	2024-10-28 09:55:40 +02:00
Anna Stuchlik	212eb204a7	doc: set 6.2 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: - 6.2 is the latest version. - 6.2 is removed from the list of unstable versions. It must be merged when ScyllaDB 6.2 is released. In addition, this commit uncomments the redirections that should be applied when version 6.2 is the latest stable version (which will happen when this commit is merged). No backport is required. Closes scylladb/scylladb#21133	2024-10-28 09:45:37 +02:00
Pavel Emelyanov	420baf5035	api: Make get_highest_supported_sstable_version use proper service This endpoint now grabs one via database -> table -> sstables manager chain, but there's shorter route, namely via sstables format selector. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-28 10:18:57 +03:00
Pavel Emelyanov	61c8b571e5	api: Move system::get_highest_supported_sstable_version set/unset It's currently registered with all other system endpoints and is not unregistered. Its correct place is in the sstables-format-selector set/unset functions. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-28 10:18:23 +03:00
Pavel Emelyanov	f090bdabbb	api: Scaffold for sstables-format-selector This "service" will have its own endpoint soon Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-28 10:17:38 +03:00
Botond Dénes	31342ecb5d	Merge 'tasks: fix virtual tasks children' from Aleksandra Martyniuk Fix how regular tasks that have a virtual parent are created in task_manager::module::make_task: set sequence number of a task and subscribe to module's abort source. Fixes: #21278. Needs backport to 6.2 Closes scylladb/scylladb#21280 * github.com:scylladb/scylladb: tasks: fix sequence number assignment tasks: fix abort source subscription of virtual task's child	2024-10-28 08:59:40 +02:00
Aleksandra Martyniuk	85d9565158	test: repair: drop log checks from test_repair_succeeds_with_unitialized_bm Currently, test_repair_succeeds_with_unitialized_bm checks whether repair finishes successfully and the error is properly handled if batchlog_manager isn't initialized. Error handling depends on logs, making the test fragile to external conditions and flaky. Drop the error handling check, successful repair is a sufficient passing condition. Fixes: #21167. Closes scylladb/scylladb#21208	2024-10-28 08:39:16 +02:00
Botond Dénes	416159e5d9	Merge 'docs/alternator: explain service discovery HTTP requests' from Nadav Har'El Add a description of the service discovery HTTP requests - `/` and `/localnodes` that was previously not documented except in a design document that is unfortunately no longer available publically (https://docs.google.com/document/d/1twgrs6IM1B10BswMBUNqm7bwu5HCm47LOYE-Hdhuu_8/edit). Fixes https://github.com/scylladb/scylladb/issues/20989 Developer-oriented documentation so no need to backport. Closes scylladb/scylladb#21000 * github.com:scylladb/scylladb: docs/alternator: explain service discovery HTTP requests docs/alternator: split Alternator-specific APIs from alternator.md	2024-10-28 08:21:28 +02:00
Benny Halevy	2268912589	docs: add documentation for scylla_identifier Commit `3a12ad96c7` added an sstable_identifier uuid to the SSTable scylla_metadata component, however it was under-documented and this patch adds the missing documentation for the sstable component format, and to the scylla sstable tool documentation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21221	2024-10-28 08:18:08 +02:00
Kefu Chai	0f9d2ab577	build: cmake: disable Seastar exception hack in `cc3953e5`, we disabled Seastar exception hack in configure.py. this change disabled the Seastar exception hack in the following two builds: - build generated directly by configure.py - build configured with multi-config generator using CMake but we also have non-multi-config build using CMake. to be more consistent, let's apply the equivalent change to non-multi-config build of CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21233	2024-10-28 08:11:43 +02:00
Botond Dénes	be70755f47	Merge 'repair: Fix finished ranges metrics for removenode' from Asias He The skipped ranges should be multiplied by the number of tables Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 Closes scylladb/scylladb#21252 * github.com:scylladb/scylladb: test: Add test_node_ops_metrics.py repair: Make the ranges more consistent in the log repair: Fix finished ranges metrics for removenode	2024-10-28 08:09:32 +02:00
Asias He	9868ccbac0	test: Add test_node_ops_metrics.py It tests the node_ops_metrics_done metric reaches 100% when a node ops is done. Refs: #21174	2024-10-28 08:45:37 +08:00
Pavel Emelyanov	2f9f76fddf	sstables_loader: Mark to_replica_set() private It's not called from outside Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21210	2024-10-27 22:28:54 +02:00
Anna Stuchlik	ef4bcf8b3f	doc: remove the Cassandra references from notedool This PR removes the reference to Cassandra from the nodetool index, as the native nodetool is no longer a fork. In addition, it removes the Apache copyright. Fixes https://github.com/scylladb/scylladb/issues/21238 Closes scylladb/scylladb#21240	2024-10-27 22:26:33 +02:00
Kefu Chai	e65185ba6f	build: merge scylla's and seastar's compilation database Since commit `415c83fa`, Seastar is built as an external project. As a result, the compile_commands.json file generated by ScyllaDB's CMake build system no longer contains compilation rules for Seastar's object files. This limitation prevents tools from performing static analysis using the complete dependency tree of translation units. This change merges Seastar's compilation database with ScyllaDB's and places the combined database in the source root directory, maintaining backward compatibility. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21234	2024-10-27 22:01:29 +02:00
Tomasz Grabiec	850d9cfb59	node-exporter: Disable hwmon collector This collector reads nvme temperature sensor, which was observed to cause bad performance on Azure cloud following the reading of the sensor for ~6 seconds. During the event, we can see elevated system time (up to 30%) and softirq time. CPU utilization is high, with nvm_queue_rq taking several orders of magnitude more time than normally. There are signs of contention, we can see __pv_queued_spin_lock_slowpath in the perf profile, called. This manifests as latency spikes and potentially also throughput drop due to reduced CPU capacity. By default, the monitoring stack queries it once every 60s. Closes scylladb/scylladb#21165	2024-10-27 21:59:15 +02:00
Kefu Chai	f5b29331a2	build: populate --enable-dist --disable-dist to CMake before this change, the "dist" targets are always enabled in the CMake-based building system. but the build rules generated by `configure.py` does respect `--enable-dist` and `--disable-dist` command line options, and enable/distable the dist targets respectively. in this change, we - add an CMake option named "Scylla_DIST". the "dist" subdirectory in CMake only if this option is ON. - pouplate the `--enable-dist` and `--disable-dist` option down to cmake by setting the `Scylla_DIST` option, when creating the build system using CMake. this enables the CMake-based build system to be functionality wise more closer to the legacy building system. Refs scylladb/scylladb#2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21253	2024-10-27 21:57:46 +02:00
Kefu Chai	24d14b601b	treewide: s/boost::adaptors::map_values/std::views::values/ now that we are allowed to use C++23. we now have the luxury of using `std::views::values`. in this change, we: - replace `boost::adaptors::map_values` with `std::views::values` - update affected code to work with `std::views::values` - the places where we use `boost::join()` are not changed, because we cannot use `std::views::concat` yet. this helper is only available in C++26. to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21265	2024-10-27 21:32:45 +02:00
Avi Kivity	3124711fc4	Merge 'Report rows_merged in compaction_history rest api and nodetool' from Łukasz Paszkowski Currently, running the `nodetool compactionhistory` command or using the rest api `curl -X GET --header "Accept: application/json" "http://localhost:10000/compaction_manager/compaction_history"` return compaction history without the `row_merged` field. The series computes rows merged during compaction and provides this information to users via both the nodetool command and the rest api. The `rows_merged` field contains information on merged clustering keys across multiple sstable files. For instance, compacting two sstables of a table consisting of 7 rows where two rows are part of the both sstables, the output would have the following format: {1: 5, 2: 2}. No backport is required. It extends the existing compaction history output. Fixes https://github.com/scylladb/scylladb/issues/666 Closes scylladb/scylladb#20481 * github.com:scylladb/scylladb: test/rest_api: Add tests for compactionhistory nodetool: Add rows merged stats into compactionhistory output compaction: Update compaction history with collected histogram compaction: Remove const qualifier from methods creating sstable readers sstable_set: Add optional statistics to make_local_shard_sstable_reader make_combined_reader: Add optional parameter, combined_reader_statistics reader_selector: Extend with maximum reader count mutation_fragment_merger: Create histogram while consuming mutation fragment batches	2024-10-27 21:26:11 +02:00
Kefu Chai	158008dd2c	mutation_writer: simplify classification using with_deserialized() return value Since `with_deserialized()` returns the lambda function's result, we can directly return the bucket from within the lambda instead of relying on side effects. This makes the code more explicit and functional. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21273	2024-10-27 21:20:55 +02:00
Nadav Har'El	6fdd0ebd3b	RBAC: confirm that unprivileged users can't read the roles table A worry was raised that an unprivileged user might be able to read the system.roles table - which contains the Alternator secret keys (and also CQL's hashed passwords). This patch adds tests that show that this worry is unjustified - and acts as a regression test to ensure it never becomes justified. The tests show that an unprivileged user cannot read the system.roles table using either CQL or Alternator APIs. More specifically, the two tests in this patch demonstrate that: * The Alternator API does not allow an unprivileged user to read ANY system table, unless explicitly granted permissions for that table. * The CQL API whitelists (see service::client_state::has_access) specific system tables - e.g., system_schema.tables - that are made readable to any unprivileged user. But the system.auth table is NOT whitelisted in this way - and is unreadable to unprivileged users unless explicitly granted permissions on that table. The new tests passes on both Scylla and Casssandra. Refs #5206 (that issue is about removing the Alternator secret keys from the roles table - but stealing CQL salted hashes is still pretty bad, so it's good to know that unprivileged users can't read them). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21215	2024-10-27 21:09:38 +02:00
Nadav Har'El	1634a64ffd	cql-pytest: test a few small materialized views CQL issues While documenting materialized view in a new document (Refs #16569) I encountered a few questions on how various CQL operations work on a table that has views, and this patch contains tests that clarify their answer - and can later guarantee that the answer doesn't unintentionally change in the future. The questions that these tests answer are: 1. That TRUNCATE on a base table also TRUNCATEs its views. This is just a basic test, with no attempt to reproduce issue #17635 (which is about the truncation of the base and views not being atomic). 2. That DROP TABLE is not allowed on a base table that has views. 3. That DROP KEYSPACE is allowed, even if there are tables with views. 4. Test that ALTER TABLE tbl DROP is never allowed in Cassandra, but allowed in some cases by Scylla 5. Test that ALTER TABLE tbl ADD is allowed, and "SELECT *" expands to select the new column into the materialized view as well. All the new tests pass on both Scylla and Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21142	2024-10-27 21:08:28 +02:00
Botond Dénes	7c75fc599f	streaming: stream-session: switch to tracking permit The stream-session is the receiving end of streaming, it reads the mutation fragment stream from an RPC stream and writes it onto the disk. As such, this part does no disk IO and therefore, using a permit with count resources is superfluous. Furthermore, after `d98708013c`, the count resources on this permit can cause a deadlock on the receiver end, via the `db::view::check_view_update_path()`, which wants to read the content of a system table and therefore has to obtain a permit of its own. Switch to a tracking-only permit, primarily to resolve the deadlock, but also because admission is not necessary for a read which does no IO. Refs: scylladb/scylladb#20885 (partial fix, solves only one of the deadlocks) Fixes: scylladb/scylladb#21264 Closes scylladb/scylladb#21059	2024-10-27 20:01:25 +02:00
Avi Kivity	7ffbfe8bb3	Merge 'Squash some sstables::test helpers' from Pavel Emelyanov There's a `missing_summary_first_last_sane` test case that uses some very specific way of modifying an sstable -- it loads one from resources, then tries to "write" the loaded stuff elsewhere. For that it uses a special purpose test::store() helper and a bunch of auxiliary ones from the same class. Those aux helpers are not used anywhere else and are also very special for this test case, so it make sense to keep this whole functionality in a single helper. Closes scylladb/scylladb#21255 * github.com:scylladb/scylladb: test: Squash test::change_generation_number() into test::store() test: Squash test::change_dir() into test::store() test: Coroutinize sstables::test::store()	2024-10-27 19:59:59 +02:00
Anna Stuchlik	aa0dadea48	doc: extend the ToC for CDC This commit adds the missing links to the CDC index page. Fixes https://github.com/scylladb/scylladb/issues/21137 Closes scylladb/scylladb#21286	2024-10-27 19:57:59 +02:00
Anna Stuchlik	b2b9622e32	doc: fix redundant references to version 6.2 This commit removes mentions of version 6.2 that were introduced with https://github.com/scylladb/scylladb/pull/17969. Now that the documentation is versioned, there should be no reference to specific versions. Fixes https://github.com/scylladb/scylladb/issues/21276 Closes scylladb/scylladb#21277	2024-10-27 14:47:40 +02:00
Paweł Zakrzewski	b077685fec	test/cql-pytest: GROUP BY with static columns This commit adds a new test case 'test_group_by_static_column_and_tombstones' to verify the behavior of GROUP BY queries with static columns. The test is adapted from Cassandra's test suite and aims to reproduce issue #21267. Original, larger test: cassandra_tests/validation/operations/select_group_by_test.py::testGroupByWithPaging() Closes scylladb/scylladb#21270	2024-10-27 14:45:53 +02:00
Aleksandra Martyniuk	910a6fc032	tasks: fix sequence number assignment Currently, children of virtual tasks do not have sequence number assigned. Fix it.	2024-10-25 15:30:13 +02:00
Aleksandra Martyniuk	1eb47b0bbf	tasks: fix abort source subscription of virtual task's child Currently, if a regular task does not have a parent or its parent is a virtual tasks then it subscribes to module's abort source in task_manager::task::impl constructor. However, at this point the kind of the task's parent isn't set. Due to that, children of virtual tasks aren't aborted on shutdown. Subscribe to module's abort source in task::impl::set_virtual_parent.	2024-10-25 14:18:00 +02:00
Kefu Chai	e7d6ab576b	backup_task: remove unused member variable Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21258	2024-10-25 11:49:06 +03:00
Abhinav	c00d40b239	fix nodetool status to show zero-token nodes In the current scenario, the nodetool status doesn’t display information regarding zero token nodes. For example, if 5 nodes are spun by the administrator, out of which, 2 nodes are zero token nodes, then nodetool status only shows information regarding the 3 non-zero token nodes. This commit intends to fix this issue by leveraging the “/storage_service/host_id ” API and adding appropriate logic in scylla-nodetool.cc to support zero token nodes. Robust topology tests are added, which spins up scylla nodes and confirm nodetool status output for various cases, providing good coverage. A test is also added in nodetool/test_status.py to verify this logic. These tests fail without this commit’s zero token node support logic, hence verifying the behavior. The test `test_status_keyspace_joining_node` has been removed. This test is based on case where host_id=None, which is impossible. Since we now use host_id_map for node discovery in nodetool, the nodes with "host_id=None" go undetected. Since this case is anyway impossible, we can get rid of this. This PR fixes a bug. Hence we need to backport it. Backporting needs to be done only to 6.2 version, since earlier versions dont support zero token nodes. Fixes: scylladb/scylladb#19849	2024-10-25 13:28:09 +05:30
Abhinav	39dfd2d7ac	test: move `wait_for_first_completed` to pylib/util.py This function is needed in a new test added in the next commit and this refactoring avoids code duplication.	2024-10-25 13:26:42 +05:30
Abhinav	72f3c95a63	token_metadata: rename endpoint_to_host_id_map getter and add support for joining nodes Rename host_id map getter, 'get_endpoint_to_host_id_map_for_reading' to 'get_endpoint_to_host_id_map_' Also modify the getter to return information regarding joining nodes as well. This getter will later be used for retrieving the nodes in nodetool status, hence it needs to show all nodes, including joining ones. The function name suffix `_for_reading` suggests that the function was used in some other places in the past, and indeed if we need endpoints "for reading" then we cannot show joining endpoints. But it was confirmed that this function is currently only used by "/storage_service/host_id" endpoint, hence it can be modified as required. Fixes: scylladb/scylladb#17857	2024-10-25 13:20:27 +05:30
Pavel Emelyanov	5e713b2b14	Merge 'dht: remove unused #includes ' from Kefu Chai these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#21237 * github.com:scylladb/scylladb: .github: add dht to iwyu's CLEANER_DIR dht: remove unused `#include`s	2024-10-24 18:40:49 +03:00
Pavel Emelyanov	7595ef7303	test: Squash test::change_generation_number() into test::store() No other usages of the former helper other than immediatelly followed by the latter, no point in keepint it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-24 11:29:17 +03:00
Pavel Emelyanov	e885b0e6cd	test: Squash test::change_dir() into test::store() No other usages of the former helper other than immediatelly followed by the latter, no point in keepint it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-24 11:28:39 +03:00
Pavel Emelyanov	874cf2ea6f	test: Coroutinize sstables::test::store() Ahead of future changes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-24 11:28:07 +03:00
Benny Halevy	5498018cbe	data_dictionary: keyspace_metadata::describe: print tablets enabled also when defaulted Now that tablets may be explicitly enabled when creating a new keyspace, describe tablets as enabled even when the default initial_tablets==0 is used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-24 10:18:42 +03:00
Benny Halevy	63cbb6e071	tablets_test: test enable/disable tablets when creating a new keyspace Test both configuration values for `enable_tablets` and the possibility to explicitly enable or disable tablets, respectively, when creating a keyspace using the `tablets = {'enabled': true\|false}` CREATE KEYSPACE option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-24 10:18:42 +03:00
Benny Halevy	b0e12cb40d	treewide: always allow tablets keyspaces With the tablets feature always enabled (Unless gossip toopology changes are forced), the enable_tablets option now controls only the default for newly created keyspaces. Even when set to `false`, tablets are still enabled as a feature and the user may explicitly enable tablets using `CREATE KEYSPACE <name> WITH tablets = {'enabled': true}` Note: best viewed with `git show -w` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-24 10:18:42 +03:00
Benny Halevy	bc62407421	feature_service: prevent enabling both tablets and gossip topology changes Tablets require raft consistent topology changes. Therefore, document that they are incompatible in the config help and prevent their usage in `feature_config_from_db_config` Fixes scylladb/scylladb#21075 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-24 10:18:42 +03:00
Benny Halevy	9ef2dc2428	alternator: create_keyspace_metadata: enable tablets using feature_service Rather than using the local configuration option on this node, check the cluster feature instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-24 10:18:42 +03:00
Asias He	1392a6068d	repair: Make the ranges more consistent in the log Consider the number of tables for the number of ranges logging. Make it more consistent with the log when the ops starts.	2024-10-24 10:31:15 +08:00
Asias He	cffe3dc49f	repair: Fix finished ranges metrics for removenode The skipped ranges should be multiplied by the number of tables. Otherwise the finished ranges ratio will not reach 100%. Fixes #21174	2024-10-24 10:31:15 +08:00
Kefu Chai	a9e18f70b0	Revert submodule change in `6ead5a4696` in `6ead5a46`, we included submodule changes in cqlsh and java by accident. this was not intended. and this broke the artifacts-rocky8-test. in this change, both changes in the submodule are reverted. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21236	2024-10-23 19:49:20 +03:00
Pavel Emelyanov	9014da26e1	Merge 'docs: reference object storage config doc from nodetool commands ' from Kefu Chai this series: - promote object storage configuration to user-facing documentation - reference object storage config doc from nodetool commands --- the nodetool backup/restore commands are not included by any LTS branches yet, hence no need to backport. Closes scylladb/scylladb#21071 * github.com:scylladb/scylladb: docs: move keyspace-storage-option from cql-extensions to admin docs: reference admin.rst for object storage config docs: reference object storage config doc from nodetool commands docs: promote object storage configuration to user-facing documentation	2024-10-23 19:41:46 +03:00
Michał Jadwiszczak	68d0c9a18a	test/auth_cluster/test_raft_service_levels: match enterprise SL limit Despite OSS doesn't limit number of created service levels, match the enterprise limit to decrease divergence in the test between OSS and enterprise. Fixes scylladb/scylladb#21044 Closes scylladb/scylladb#21045	2024-10-23 17:44:19 +02:00
Kefu Chai	bea18f0571	.github: add dht to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-23 17:45:14 +08:00
Kefu Chai	8d1b3223ab	dht: remove unused `#include`s these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-23 17:45:14 +08:00
Dawid Mędrek	298cafff35	cql-pytest/test_describe: Introduce auxiliary type for service levels We introduce an auxiliary type representing a service level for making it easier to adjust the tests in Enterprise. We move the responsibility of producing create statements for service levels to the class, so we only need to modify the code in one place when necessary. All existing relevant tests have been adjusted to this change. Closes scylladb/scylladb#21230	2024-10-23 10:15:25 +02:00
Kamil Braun	f5c60e538d	Merge 'cql/tablets: fix retrying ALTER tablets KEYSPACE' from Piotr Smaron ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102 Should be backported to every 6.x branch, as it may lead to a crash. Closes scylladb/scylladb#21121 * github.com:scylladb/scylladb: test: add UT to test retrying ALTER tablets KEYSPACE cql/tablets: fix indentation in `rf_change` event handler cql/tablets: fix retrying ALTER tablets KEYSPACE	2024-10-23 10:01:21 +02:00
Botond Dénes	519e167611	Merge 'replica/table: check memtable before discarding tombstone during read' from Lakshmi Narayanan Sreethar On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables	2024-10-23 10:28:00 +03:00
Botond Dénes	d6a79fefda	Merge 'Do not leak S3 file-uploading parts on exceptions' from Pavel Emelyanov File uploading code spawns all parts uploading into background. If this "spawning" fails (not the uploading code itself), any fiber that was spawned before is orphaned. It will eventually stop on its own, by while it's alive it may use(-after-free) the do_upload_file object. Another issue with not handling spawn exception, is that multipart upload object is not aborted in this case. So it's leaked until garbage collector picks it up, which is not critical, but unpleasant. Closes scylladb/scylladb#21139 * github.com:scylladb/scylladb: s3/client: Restore indentation after previous patch s3/client: Catch do_upload_file::upload_part() exceptions	2024-10-23 10:12:29 +03:00
Ernest Zaslavsky	59e2ed884d	Update seastar submodule * seastar abd20efd...f821bda1 (17): > http: http status classification > loopback: add pending capacity param and fix deadlock in httpd_test > allow setting buffer sizes on server_socket > core: add missing assert header to chunked_fifo > cmake: Don't emit message when searching for libarchive > stall-analyser: pass args.tmin instead of tmin > build: do not check for CMAKE_CXX_STANDARD < 20 > README.md: specify CMAKE_CXX_STANDARD in the sample > cmake: Fix DPDK libarchive dep > c-ares: update cooking version to 1.32.3 > build: support c-ares >= 1.34.1 > iotune: clarify fsqual error message > Make total_steal_time() monotonic. > Remove account_idle > reactor: add better sleep time accounting > reactor: add cpu and awake time reactor metrics > Zero-init total sleep time Closes scylladb/scylladb#21225	2024-10-23 09:30:56 +03:00
Botond Dénes	b9b778054a	Merge 'test.py: Add option to fail after number of failures' from Petr Hála * Add `--max-failures` flag to test.py, which will stop the execution after number of failures * Helps with "fails-fast" approach and can be used to improve CI speed, especially the 100times run * Adds the number of cancelled tests to both summary and junit xml. I did not include them in boost, since it does not contain any statistics. * Removes unnecessary list creation in test.py * Completely unrelated change, but it is small enough that I feel it can be included as part of this one. If this is an issue I can create separate PR for it * Add `Test.started` property * Helps with determining the current status of the Test and differentiating cancelled/not started tests. * Add `Test.failed` and `Test.did_not_run` read-only computed properties * Helper methods to determine status, instead of using `Test.success`, which does not tell the entire story * Fix `ScyllaClusterManager.stop()` method, so it doesn't fail when ran multiple times * This happens when tasks are cancelled, not sure yet why, it almost certainly non-wanted behaviour but this behaviour was already there and with this fix it no longer causes errors I will use backport/None for now as it is a new feature. Fixes https://github.com/scylladb/qa-tasks/issues/1714 Closes scylladb/scylladb#21098 * github.com:scylladb/scylladb: test.py: Add option to fail after number of failures test.py: Add started, failed and did_not_run properties to Test test.py: Remove unnecessary list creation test: lib: Fix ScyllaClusterManager.stop()	2024-10-23 09:11:52 +03:00
Kefu Chai	6a7eaea9f4	mutation_writer/feed_writer: remove redundant check `mutation_reader::is_end_of_stream()` returns `_impl->is_end_of_stream() && is_buffer_empty()`, so `!is_end_of_stream()` equals to ` `!_impl->is_end_of_stream() \|\| !is_buffer_empty()`, which in turn always equals to `!_impl->is_end_of_stream() \|\| !is_buffer_empty() \|\| !is_buffer_empty()`. hence there is no need to check `rd.is_buffer_empty()` again. in this change, the redundant condition is dropped. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21224	2024-10-23 08:48:08 +03:00
Avi Kivity	cc3953e504	build: disable Seastar exception hack In [1], Seastar started to bypass a lock in libgcc's exception throwing mechanism to allow scalability on large machines. The problem is documented in [2] and reported as fixed. In [3], testing results on a 2s96c192t machine are reported. The problem appears indeed fixed with gcc 14's runtime (which we use, even though we build with clang). Given the new results, we can safely drop the exception scalability hack. As [1] states that the hack causes the loss of a translation cache, we may gain some performance this way. With that, we disable the cache by defining some random macro. [1] https://github.com/scylladb/seastar/464f5e3ae43b366b05573018fc46321863bf2fae [2] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744 [3] https://github.com/scylladb/seastar/issues/2479#issuecomment-2427098413 Closes scylladb/scylladb#21217	2024-10-22 22:20:07 +03:00
Nadav Har'El	5fd3177057	Merge 'mv: add a dedicated read concurrency semaphore for view update read before writes' from Wojciech Mitros When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. This patch also adds a test to confirm that the view update workload doesn't impact the read latency, as well as a test which confirms that we do not run out of memory even under heavy view udpate workload. The issue of view updates causing increased latencies most often occurs in the following scenario: * we have a medium to high write workload to a table with a materialized view which requires reading from the base table before sending the update to delete the old rows * we have any read workload * one replica is slower or is handling more writes due to an imbalance of data distribution * we write with a cl<ALL, the mentioned replica is replying to write requests slower while new ones keep being sent to it. * each write performs a read first taking resources from the user read concurrency semaphore, so when enough writes accumulate the reads using the semaphore start getting queued * the queue is shared by regular reads and view update reads. When there's enough view update reads in the queue, regular reads start getting increased latencies An sct test (perf-regression-latency-mv-read-concurrency) was prepared to somewhat resemble this scenario: * the tables were prepared satisfying the conditions above * we use a medium write workload and a very low read workload * the imbalance is achieved by writing to just a few (10) partitions - some replicas (and shards) can have twice or more used partitions than others. We also keep writing to a limited (though high) number of rows, to cause overwrites which require reading before sending the view update * to minimize the test case, we use a cluster of 3 nodes and rf=2, we write with cl=ONE to have background replica writes and read with cl=ALL to wait for the slower replica to respond. In the test above: * without the fix, the latency of reads increases over 50s * with the fix, the latency of reads stays below 20ms Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805 The patch is not that small and it isn't fixing a regression, so no backports Closes scylladb/scylladb#20887 * github.com:scylladb/scylladb: test: add test for high view update concurrency causing bad_allocs test: add test for high view update concurrency degrading read latency mv: add a dedicated read concurrency semaphore for view update read before writes	2024-10-22 22:17:23 +03:00
Aleksandra Martyniuk	878a12c922	test: change quotation marks Before python 3.12 formatted strings couldn't have reused quotes. Change the type of quotation mark in get_cgroup so it could be used with earlier python versions. Closes scylladb/scylladb#21209	2024-10-22 20:42:05 +03:00
Piotr Smaron	522bede8ec	test: add UT to test retrying ALTER tablets KEYSPACE The newly added testcase is based on the already existing `test_alter_dropped_tablets_keyspace`. A new error injection is created, which stops the ALTER execution just before the changes are submitted to RAFT. In the meantime, a new schema change is performed using the 2nd node in the cluster, thus causing the 1st node to retry the ALTER statement.	2024-10-22 18:22:01 +02:00
Piotr Smaron	3f4c8a30e3	cql/tablets: fix indentation in `rf_change` event handler Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left.	2024-10-22 18:22:01 +02:00
Piotr Smaron	de511f56ac	cql/tablets: fix retrying ALTER tablets KEYSPACE ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. `topology_coordinator::handle_topology_coordinator_error` handling the case of `group0_concurrent_modification` has been extended with logging in order not to write catch-log-throw boilerplate. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102	2024-10-22 18:22:00 +02:00
Avi Kivity	ec543e3902	Merge 'Remove all_datadirs vector of strings from table::config' from Pavel Emelyanov The all_datadirs keeps paths to directories where local sstables can be. In fact, Scylla doesn't put sstables there, but can try to find them on boot and when checking snapshots. The 0th element of this vector, called datadir, had recently been removed by #20675, now it's time to drop all_datadirs as well. The needed paths can be obtained from table's storage options (see #20542) and db::config::data_file_directories option. Closes scylladb/scylladb#21212 * github.com:scylladb/scylladb: sstables: Open-code format_table_directory_name() moved recently replica,sstables: Move format_table_directory_name() table: Remove all_datadirs sstables: Generate table::all_datadirs from db::config and storage_options replica: Prepare vector of fs::path-s with table dirs table: Check storage options in get_snapshot_details()	2024-10-22 17:21:31 +03:00
Laszlo Ersek	63417f6a57	utils/small_vector: refactor expansion condition in reserve*() Rewrite _begin + n > _capacity_end as n > _capacity_end - _begin and then as n > capacity() for two reasons: - The last form is easier to read than the first form. - Per N4950 (the final C++23 working draft), [expr.add] paragraph 4, the expression _begin + n (i.e., P + J) is defined only if 0 ≤ 0 + n ≤ _capacity_end - _begin (i.e., 0 ≤ i + j ≤ n) equivalently, only if _begin ≤ _begin + n ≤ _capacity_end Therefore, the expression _begin + n invokes undefined behavior exactly when we'd expect our check _begin + n > _capacity_end to evaluate to true. gcc and clang have been aggressively equating undefined behavior to "never happens"; let's prevent that here. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#21213	2024-10-22 17:12:11 +03:00
Avi Kivity	847c850034	schema: add accessors for primary key columns and non-primary-key columns It's somewhat common to ask for the partition key and clustering key columns, or for the static and regular columsn. Provide accessors for them rather than requiring the user to glue them. Some callers are converted. Closes scylladb/scylladb#21191	2024-10-22 15:01:14 +02:00
pehala	870f3b00fc	test.py: Add option to fail after number of failures Add --max-failures configuration option to specify the amount, if not set, or not positive, it will never trigger. Update also the junit reporting to include skipped tests	2024-10-22 13:29:34 +02:00
pehala	c1dd97a049	test.py: Add started, failed and did_not_run properties to Test This ensures we can determine where in the execution pipeline the test currently is. failed and did_not_run are helper properties	2024-10-22 13:29:19 +02:00
pehala	e34dec71e7	test.py: Remove unnecessary list creation Using generators & set constructor, we can get rid of unnecessary list creation	2024-10-22 13:29:18 +02:00
pehala	16cd3fccdd	test: lib: Fix ScyllaClusterManager.stop() When cancelling running tasks, stop() could run multiple times and fail. Removed usage of del and added checks to ensure it won't crash.	2024-10-22 13:29:18 +02:00
Kefu Chai	7a1e067b4e	docs: move keyspace-storage-option from cql-extensions to admin as the admin needs to known the name of the experimental feature option they need to enable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 18:30:29 +08:00
Kefu Chai	6f97c86a2b	docs: reference admin.rst for object storage config instead of repeating it in cql-extensions.md, let's reference the object storage related settings in admin.rst Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 18:26:19 +08:00
Kefu Chai	fe13b4e10e	docs: reference object storage config doc from nodetool commands Enhance the documentation for nodetool commands that use the `--endpoint` option by linking to the object storage configuration guide. This change provides users with essential context and detailed setup instructions for S3-compatible storage endpoints. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 18:26:19 +08:00
Kefu Chai	9bd9ee9f36	docs: promote object storage configuration to user-facing documentation this commit moves the object storage configuration guide from the developer documentation to the user-facing admin documentation. the change reflects the increasing importance of object storage integration in user-facing features. in this change: - move relevant content from `docs/dev/object_storage.md` to `docs/operating-scylla/admin.rst` - reformat the content from Markdown to reStructuredText (RST) - reword and restructure the content to be more user-friendly - add explanations and context suitable for a broader audience this change makes the object storage configuration information more accessible to Scylla administrators and end-users, supporting the adoption of new features built on top of object storage integration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 18:26:19 +08:00
Benny Halevy	04d741bcbb	storage_service: on_change: update_peer_info only if peer info changed Return an optional peer_info from get_peer_info_for_update when the `app_state_map` arg does not change peer_info, so that we can skip calling update_peer_info, if it didn't change. Fixes scylladb/scylladb#20991 Refs scylladb/scylladb#16376 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21152	2024-10-22 10:26:08 +02:00
Dawid Medrek	4ec0a014e3	docs/hinted-handoff: Add link to API reference We add a link to the API reference for the convenience of the user. Closes scylladb/scylladb#20065	2024-10-22 09:24:14 +03:00
pehala	28aa57f836	test.py: Refactor retry() Instead of metamethod that looks at all subclasses, use OOP with super() calls Closes scylladb/scylladb#21155	2024-10-22 09:23:30 +03:00
David Garcia	6b7b4addf9	docs: add dark theme to api Closes scylladb/scylladb#21161	2024-10-22 09:22:32 +03:00
pehala	59eb4eb528	test.py: Enhance progress report * Do not leave passed tests in between failed ones. * Use ANSI Escape sequences for manipulating console * Simplifies code and removes need for two object parameters Closes scylladb/scylladb#21176	2024-10-22 09:22:08 +03:00
Łukasz Paszkowski	34c05cb94f	test/rest_api: Add tests for compactionhistory For a table with NullCompactionStrategy and TimeWindowCompactionStrategy, the test - inserts a bunch of data and flushes the table - deletes/update some data, delete a range of data and flushes the table - Triggers a major compaction and calls for compactionhistory to retrieve and validate the histogram	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	8188a71787	nodetool: Add rows merged stats into compactionhistory output Incorporate rows merged statistics into the output of the compactionhistory command. Depending on the requested format type, the output has different form. For instance, compacting two sstables of a table consisting of 7 rows where two rows are part of the both sstables, the output would have the following format: text: {1: 5, 2: 2} json: [{"key":1,"value":5},{"key":2,"value":1}]} yaml: - key: 1 value: 5 - key: 2 value: 1	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	c01a38f3cf	compaction: Update compaction history with collected histogram A new field has been added to the compaction_stats structure to hold collected combined reader statistics. The struct is than used to update the compaction_history table.	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	7eac89da73	compaction: Remove const qualifier from methods creating sstable readers Compaction classes start mutate their internal members to be used in methods setup_sstable_reader and make_sstable_reader creating sstable reades that are marked as const. Remove the const qualifier from these methods. Even though it made sense initially to mark them as const, it is no longer applicable.	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	484655bf0d	sstable_set: Add optional statistics to make_local_shard_sstable_reader The pointer to combined_reader_statistics is propagated down to make_combined_reader in order to collect statistics. By default, a null pointer is propagated. Note that in case the pointer is valid and the sstable_set consists of exactly one sstable, statistics are skipped as all rows originate from exactly a single sstable file. The existing optimization is crucial `f75154afca`	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	a9f776494c	make_combined_reader: Add optional parameter, combined_reader_statistics All the overloaded make_combined_reader functions accept an optional pointer to combined_reader_statistics, to be propagated down through merging_reader to mutation_fragment_merger. By default, a null pointer is propagated.	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	84912c3155	reader_selector: Extend with maximum reader count The maximum reader count allows to predict the number of readers that can be created with create_new_readers(). This helps to correctly allocate a vector size in the rows_merged statistics when a combiner reader is created via make_combined_reader.	2024-10-22 08:15:02 +02:00
Łukasz Paszkowski	92f5c56afc	mutation_fragment_merger: Create histogram while consuming mutation fragment batches The mutation_fragment_merger takes one additional parameter in its constructor, that is a pointer to a combined_reader_statistics used to collect various statistics. The histogram is populated with data while the merger consumes batches from the producer and merges them into seperate mutation fragments. The size of the batch, that represents the number of streams the mutation fragment originates from, is used as a key in the historgam and its corresponding value is increased by one.	2024-10-22 08:15:02 +02:00
Botond Dénes	41de340d93	Merge 'Update get_description.py script' from Amnon Heiman get_description.py script is a document related script that looks for metrics description in the code. Its configuration needs to address changes in the code. This series contains a configuration change and a code fix that allows it to run as a standalone script, and not as a library. No need to backport, this a documentation related script. Closes scylladb/scylladb#19950 * github.com:scylladb/scylladb: scripts/get_description.py: param_mapping was missing scripts/metrics-config.yml: no need to get metrics from the tests	2024-10-22 08:42:15 +03:00
Kefu Chai	27fb893d9b	docs: nodetools-commands/restore: update to reflect the latest implementation in `787ea4b1d4`, we added "sstables" argument to the "nodetool restore" command. but we failed to update the document to reflect the change. in this change, we update the document for "restore" command to reflect the latest implementation changes introduced in commit `787ea4b1d4`: * Add information about the new "sstables" argument * Update command line usage of "--table" argument -- it is now madatory * Update the example accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21135	2024-10-22 08:30:06 +03:00
Kefu Chai	ce0a86c585	build: cmake: correct some tests' KIND before this change, we build some tests as if they are Seastar tests. but after `415c83fa`, these tests failed to link. because the Seastar::seastar_testing does not expose `-DSEASTAR_TESTING_MAIN` in its cflags. the behavior of the Seastar::seastar_testing is expected. because a test linking against this library is not necessarily driven by the `main()` provided by `testing/seastar_test.hh`. so, in this change, we correct the `KIND` parameter of these tests, so that they use `KIND BOOST`, as these tests can be driven by the `main()` provided by Boost.Test's driver. also there are some tests driven by Boost.Test's `main()`, but in the meanwhile, they utilize seastar_testing, so let's add `Seastar::seastar_testing` to their `LIBRARIES`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21183	2024-10-22 07:10:47 +03:00
Kefu Chai	6ead5a4696	treewide: move log.hh into utils/log.hh the log.hh under the root of the tree was created keep the backward compatibility when seastar was extracted into a separate library. so log.hh should belong to `utils` directory, as it is based solely on seastar, and can be used all subsystems. in this change, we move log.hh into utils/log.hh to that it is more modularized. and this also improves the readability, when one see `#include "utils/log.hh"`, it is obvious that this source file needs the logging system, instead of its own log facility -- please note, we do have two other `log.hh` in the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 06:54:46 +03:00
Kefu Chai	6645cdf3b6	build: cmake: improve source generated for check_header in check_headers.cmake, we verify the self containness of a header file by replicating it and remove `#pragma once` directive in this header. but this approach failed to compile headers which include a header file with the same name in the root source directory, as we add `-I<directory-of-original-header>` in the cflags when building the generated source file, so that it can include the headers in the same directory. but this confuses the compiler, as, assuming we have "log.hh" in current directory, and under the root source directory, the compiler would always include the "log.hh" in the current directory even it should have included "log.hh" under the root source directory. in this change, instead of adding `-I<directory-of-original-header>` to cflags, we just include the header under test in a new .cc file solely generated for testing. this should address this problem. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21216	2024-10-22 06:28:16 +03:00
Kefu Chai	2d6af2791e	compaction: simplify time_window_compaction_strategy::get_window_lower_bound() since chrono allows dividion between durations with different units. let use it instead for rounding down to the nearest multiple of the window size, for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20476	2024-10-21 16:01:15 +03:00
Pavel Emelyanov	516a5f06a8	sstables: Open-code format_table_directory_name() moved recently This helper is small enough and it's easier to understand how table directory name is formatted without it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:18:19 +03:00
Pavel Emelyanov	eeb0d637bb	replica,sstables: Move format_table_directory_name() Now this helper is not needed in replica code, as all manipulations of tables' sstables now sit in the sstables/storage.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:17:30 +03:00
Pavel Emelyanov	74728d3889	table: Remove all_datadirs It's write-only now, all the places than wanted to know where table's storage is (well -- "are", there can be several directories) already use storage_options. This finishes the work started by `9fe64b5d70`. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:15:54 +03:00
Pavel Emelyanov	dedb9d349c	sstables: Generate table::all_datadirs from db::config and storage_options As mentioned in the previous patch, there are several places that need to scan all datafile directories for a given table. This list is currently stored on table.config.all_datadirs, this patch stops using one and instead generates it from db::config::data_file_directories and table's storage options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:13:27 +03:00
Pavel Emelyanov	0358515118	replica: Prepare vector of fs::path-s with table dirs Most of the time table with local storage keeps its sstables in a single directory referenced by its storage_options::local.dir path. However, there are two cases when code needs to check all datafile directories that could be configured -- on boot when distributed loader loads sstables, and when checking table snapshots. Both those places check table.cfg.all_datadirs vector of strings and convert strings to fs::path-s along the way. This patch prepares the vector of fs::path-s in advance and updates the loop code to work with path-s. This is preparation to next patching that will generate vector of paths for a table. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:10:53 +03:00
Pavel Emelyanov	4e329ba08f	table: Check storage options in get_snapshot_details() This is continuation of `24589cf00c` and `a734fd5c9c` -- if table is not based on local storage, getting snapshot details makes no sense. Another goal this change pursuits is to have storage_options::local object at hand to be used later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-21 15:08:54 +03:00
Wojciech Mitros	4d719bacca	test: add test for high view update concurrency causing bad_allocs This commit add a test for checking whether a large view update workload can cause Scylla to run out of memory. In the test, we keep writing to a table table with a materialized view with a limited number of rows, causing overwrites which require reading from the table to perform view updates. Currently, due to the unlimited concurrency of view update reads, we may use too much memory which can lead to bad_allocs, causing Scylla to fail. To reach the failing state more consistently, we use add a sleep after reading the old value of the base row, to keep the reader concurrency semaphore units longer. At the same time, we use high concurrency and large row size to use up all Scylla's memory quickly. The test fails if Scylla runs out of memory and aborts, and succeeds otherwise.	2024-10-21 12:35:20 +02:00
Wojciech Mitros	f2c740710c	test: add test for high view update concurrency degrading read latency This commit add a test for checking whether a large view update workload impacts the latency of other user reads. In the test, we first create a table for reads and another table with a materialized view. We then start writing to the table with the view with a limited number of rows - when overwriting, we need to read the previous value of the row to prepare a delete of the old row in the view. This should not impact the latency of the read workload from the other table that we start at the same time. The test fails if any of the reads times out. To reach the failing state more consistantly, we use add a sleep after reading the old value of the base row, to keep the reader concurrency semaphore units longer. At the same time, we use a lower threshold for queueing reads on the semaphore, to see the impact of view update reads earlier. Because of the high load, the writes may timeout, but that's expected - we fail the test only if the user reads time out.	2024-10-21 12:34:55 +02:00
Kefu Chai	5cd619a60c	treewide: s/boost::adaptors::map_keys/std::views::keys/ now that we are allowed to use C++23. we now have the luxury of using `std::views::keys`. in this change, we: - replace `boost::adaptors::map_keys` with `std::views::keys` - update affected code to work with `std::views::keys` to reduce the dependency to boost for better maintainability, and leverage standard library features for better long-term support. this change is part of our ongoing effort to modernize our codebase and reduce external dependencies where possible. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21198	2024-10-21 12:47:52 +03:00
Wojciech Mitros	242079d70b	mv: add a dedicated read concurrency semaphore for view update read before writes When writing to some tables with materialized views, we need to read from the base table first to perform a delete of the old view row. When doing so, the memory used for the read is tracked by the user read concurrency semaphore. When we have a large number of such reads, we may use up all of the semaphore units, causing the following reads to be queued. When we have some user reads coming at the same time, these reads can have very high latency due to the write workload on the base table. We want to avoid this, so that the write workload doesn't have a high impact on the latency of the read workload. This is fixed in this patch by adding a separate read concurrency semaphore just for view update read-before-writes. With the new semaphore, even if there are many view update read-before-writes, they will be queued on a different semaphore than the user reads, and they won't impact their latency. The second issue fixed by this patch is the concurrency of the view updates that is currently unlimited. Because of that view updates may take up so much memory that they we may run out of memory. This is fixed by using the read admission on the view update concurrency semaphore. This limits the number of concurrent view update reads to max_count_concurrent_view_update_reads, all other incoming view update reads are queued using just a small chunk of memory. Without this, the reads would also get queued after exceeding view_update_reader_concurrency_semaphore_serialize_limit_multiplier, but they would take much more memory while staying in the queue. The new semaphore has half the capacity of the regular user read concurrency semahpore and is currently used only for user writes - is't used independently of the scheduling group on which we base the read semaphore selection, but we use a different code path for streaming (not database::do_apply) and we shouldn't have view updates in system writes or during compaction. Fixes https://github.com/scylladb/scylladb/issues/8873 Fixes https://github.com/scylladb/scylladb/issues/15805	2024-10-21 11:02:06 +02:00
Kefu Chai	5255f18c35	date: do not put space before literal operator when compiling date.h, clang 20 complains: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Debug/seastar/gen/include -isystem /usr/include/p11-kit-1 -isystem /home/kefu/dev/scylladb/abseil -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -std=c++23 -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -DWITH_GZFILEOP -MD -MT lang/CMakeFiles/lang.dir/Debug/lua.cc.o -MF lang/CMakeFiles/lang.dir/Debug/lua.cc.o.d -o lang/CMakeFiles/lang.dir/Debug/lua.cc.o -c /home/kefu/dev/scylladb/lang/lua.cc In file included from /home/kefu/dev/scylladb/lang/lua.cc:18: /home/kefu/dev/scylladb/utils/date.h:836:34: error: identifier '_d' preceded by whitespace in a literal operator declaration is deprecated [-Werror,-Wdeprecated-literal-operator] 836 \| CONSTCD11 date::day operator "" _d(unsigned long long d) NOEXCEPT; \| ~~~~~~~~~~~~^~ \| operator""_d ``` because, in [CWG2521](https://wg21.link/CWG2521), it proposes that compiler should consider ```c++ string operator "" _i18n(const char*, std::size_t); // OK, deprecated ``` as "OK, deprecated". and Clang implemented this proposal, as it was accepted by C++23. since scylladb uses C++23 standard. let's remove the space between `"` and `_` to be more compliant to the C++23 standard and to silence the warning, which is taken as an error. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21194	2024-10-21 11:21:52 +03:00
Kefu Chai	8355056453	build: cmake: expose and use the path to iotune correctly in `415c83fa`, we introduced a regression which broke the build of target of "package". because - the IMPORT_LOCATION_<CONFIG> of the imported target of "Seastar::iotune" includes a literal `$<CONFIG>` - we retrieve the property named "IMPORTED_LOCATION" from this target. but value of this property is empty. so, when we copied this file, the "src" parameter passed to `cmake -E copy` is actually an empty string. in this change, we - set the `IMPORTED_LOCATION_${CONFIG}` property with a correct path. - retrieve the property with the right approach -- to use `TARGET_FILE` generator expression. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21181	2024-10-21 10:32:51 +03:00
Avi Kivity	b5a1173880	utils: small_vector: support from_range_t std::ranges::to<>() has a little protocol with containers to allow them to optimize their construction from ranges. Implement it for small_vector. It optimizes ranges that can have their size determined quickly, or that can be traversed twice to determine the size by reserving up front. Single-pass ranges (std::ranges::input_range) use the less efficient push_back method. A unit test (which fails without the new constructor) is added. Closes scylladb/scylladb#21094	2024-10-21 09:31:38 +03:00
Kefu Chai	d28d64f7fe	service: remove extraneous space in `#pragma once` to be more consistent with the rest of the tree. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21188	2024-10-20 20:27:38 +03:00
Avi Kivity	c3be2489ce	treewide: drop includes of <boost/range/adaptors.hpp> This includes way too much, including <boost/regex.hpp>, which is huge. Drop includes of adaptors.hpp and replace by what is needed. Closes scylladb/scylladb#21187	2024-10-20 17:17:11 +03:00
Aleksandra Martyniuk	29c2d4e7eb	tasks: add comments about map_each_task safety Closes scylladb/scylladb#21172	2024-10-19 21:16:38 +03:00
Avi Kivity	9a521c25b5	Merge 'test/boost: stop using ranges::to()' from Kefu Chai now that we are able to use ranges library provided by the C++ standard library. there is no need to use the homebrew `ranges::to()`. in this series, we - switch to `std::ranges::to()` in favor of `ranges::to()`. - and drop the unused `utils/ranges.hh` header file. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#21182 * github.com:scylladb/scylladb: utils: remove unused ranges.hh test/boost: stop using ranges::to()	2024-10-19 16:57:51 +03:00
Kefu Chai	c5e666b7b1	column_computation.hh: include used header when building the check-header target, we have following failure: ``` FAILED: CMakeFiles/check-headers-scylla-main.dir/Dev/check-headers/column_computation.hh.cc.o /home/kefu/.local/bin/clang++ -DDEVEL -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Dev/seastar/gen/include -isystem /usr/include/p11-kit-1 -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Wno-unused-const-variable -Wno-unused-function -Wno-unused-variable -std=c++23 -Werror=unused-result -fstack-clash-protection -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -DWITH_GZFILEOP -MD -MT CMakeFiles/check-headers-scylla-main.dir/Dev/check-headers/column_computation.hh.cc.o -MF CMakeFiles/check-headers-scylla-main.dir/Dev/check-headers/column_computation.hh.cc.o.d -o CMakeFiles/check-headers-scylla-main.dir/Dev/check-headers/column_computation.hh.cc.o -c /home/kefu/dev/scylladb/build/check-headers/column_computation.hh.cc /home/kefu/dev/scylladb/build/check-headers/column_computation.hh.cc:24:37: error: no template named 'unique_ptr' in namespace 'std' 24 \| using column_computation_ptr = std::unique_ptr<column_computation>; \| ~~~~~^ /home/kefu/dev/scylladb/build/check-headers/column_computation.hh.cc:40:12: error: unknown type name 'column_computation_ptr'; did you mean 'column_computation'? 40 \| static column_computation_ptr deserialize(bytes_view raw); \| ^~~~~~~~~~~~~~~~~~~~~~ \| column_computation ``` it turns out we failed to include `<memory>`. in this change, we include `<memory>` so that this header is self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21185	2024-10-19 16:56:02 +03:00
Raphael S. Carvalho	dfc217f99a	locator: Always preserve balancing_enabled in tablet_metadata::copy() When there are zero tablets, tablet_metadata::_balancing_enabled is ignored in the copy. The property not being preserved can result in balancer not respecting user's wish to disable balancing when a replica is created later on. Fixes #21175. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#21177	2024-10-19 14:51:36 +02:00
Kefu Chai	4d4b0b35b7	utils: remove unused ranges.hh now that this header is not used, let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-19 13:21:20 +08:00
Kefu Chai	85518463a9	test/boost: stop using ranges::to() now that we are able to use ranges library provided by the C++ standard library. there is no need to use the homebrew `ranges::to()`. in this change, we switch to `std::ranges::to()` in favor of `ranges::to()`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-19 13:21:20 +08:00
Kefu Chai	5c0db8a49e	sstable_directory: remove extraneous semicolon one semicolon is enough to mark the end of a statement. so let's remove the extraneous one. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21171	2024-10-18 21:58:04 +03:00
Kefu Chai	e2b18eb7eb	data_dictionary: compose the location with "/" in `787ea4b1`, we construct a new `storage_options` for each sstable to be restored. the `location` of the new `storage_option` instances is composed of the configured `prefix` and the dirname of each toc component. but instead of separating them with "/", we just concatenate them. this breaks the test if the specified key representing toc components includes "dirname" in them. in this change - data_directory: instead of using "{prefix}{dirname}", we use "{prefix}/{dirname}". - test/object_store: update the existing test to add a suffix in the keys of the toc objects to mimic the typical use case. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21170	2024-10-18 21:57:56 +03:00
Lakshmi Narayanan Sreethar	afad1b3c85	topology-custom: add test to verify tombstone gc in read path Co-authored-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-10-18 19:20:03 +05:30
Lakshmi Narayanan Sreethar	5a93277904	replica/table: check memtable before discarding tombstone during read On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-10-18 19:19:58 +05:30
Lakshmi Narayanan Sreethar	6a357b55e3	compaction_group: track maximum timestamp across all sstables This will be used in a following patch to decide if the compacting reader has to check the memtables before purging a tombstone. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-10-18 19:19:11 +05:30
Pavel Emelyanov	b11d50f591	Merge 'multishard reader: make it safe to create with admitted permits' from Botond Dénes Passing an admitted permit -- i.e. one with count resources on it -- to the multishard reader, will possibly result in a deadlock, because the permit of the multishard reader is destroyed after the permits of its child readers. Therefore its semaphore resources won't be automatically released until children acquire their own resources. This creates a dependency (an edge in the "resource allocation graph"), where the semaphore used by the multishard reader depends on the semaphores used by children. When such dependencies create a cycle, and permits are acquired by different reads in just the right order, a deadlock will happen. Users of the multishard reader have to be aware of this gotcha -- and of course they aren't. This is small wonder, considering that not even the documentation on the multishard reader mentions this problem. To work around this, the user has to call `reader_permit::release_base_resources()` on the permit, before passing it to the multishard reader. On multiple occasions, developers (including the very author of the multishard reader), forgot or didn't know about this and this resulted in deadlocks down the line. This is a design-flaw of the multishard reader, which is addressed in this PR, after which, it is safe to pass admitted or not admitted permits to the multishard reader, it will handle the call to `release_base_resources()` if needed. After fixing the problem in the multishard reader, the existing calls to `release_base_resources()` on permits passed to multishard readers are removed. A test is added which reproduces the problem and ensures we don't regress. Refs: https://github.com/scylladb/scylladb/issues/20885 (partial fix, there is another deadlock in that issue, which this PR doesn't fix) This fixes (indirectly) a regression introduced by `d98708013c` so it has to be backported to 6.2 Closes scylladb/scylladb#21058 * github.com:scylladb/scylladb: test/boost/mutation_test: add test for multishard permit safety test/lib/reader_lifecycle_policy: add semaphore factory to constructor test/lib/reader_lifecycle_policy: rename factory_function repair/row_level: drop now unneeded release_base_resource() calls readers/multishard: make multishard reader safe to create with admitted permits	2024-10-18 13:30:21 +03:00
Pavel Emelyanov	280cd23c13	Merge 'Allow specifying TLS options with internode_encryption=none + add "transitional" mode' from Calle Wilund Fixes #18903 Adds a "transitional" internode encryption mode, under which all _outgoing_ RPC connections will use TLS, but we will still accept any incoming non-tls connection. This allows an operator to perform a move to TLS RPC without cluster downtime: 1. For each server, add certificate etc options to server_encryption_options + internode_encryption=none + set ssl_storage_port + restart (rolling) 2. For each server, set internode_encryption=transitional + RR 3. For each server, set internode_encryption=all + RR Closes scylladb/scylladb#18939 * github.com:scylladb/scylladb: test::topology: Add test for TLS upgrade and downgrade of internode encryption docs: Add internode_encryption=transitional documentation messaging_service: Add "transitional" internode encryptipn mode messaging_service: Create TLS connector even if internode_enc=none when certs set	2024-10-18 11:01:07 +03:00
Avi Kivity	1bbd1436b4	types: move from boost ranges to standard ranges Reduce depdendency load. tuple_deserializing_iterator gained a default constructor so it matches iterator constraints. Closes scylladb/scylladb#21029	2024-10-18 11:00:49 +03:00
Botond Dénes	b6da82dba3	Merge 'build: build seastar as an external project' from Kefu Chai before this change, scylla's CMake-based system consumes Seastar library by including it directly. but this failed to address the needs of linking against Seastar shared libraries in Debug and Dev builds, while linking against the static libraries in other builds. because Seastar uses `BUILD_SHARED_LIBS` CMake variable to determine if it builds shared libraries. and we cannot assign different values to this CMake variable based on current configure type -- CMake does not support. see https://gitlab.kitware.com/cmake/cmake/-/issues/19467 in order to address this problem, we have a couple possible solutions: - to enable Seastar to build both shared and static libraries in a pass. without sacrificing the performance, we have to build all object files twice: once with -fPIC, once without. in order to accompolish this goal, we need to develop a machinary to populate the same settings to these two builds. this would complicate the design of Seastar's building system further. - to build Seastar libraries twice in scylla, we could use the ExternalProject module to implement this. but it'd be complicate to extract the compile options, and link options previously populated by Seastar's targets with CMake -- we would have to replicate all of them in scylla. this is out of the question. - to build Seastar libraries twice before building scylla, and let scylla to consume them using CMake config files or .pc files. this is a compromise. it enables scylla to drive the build of Seastar libraries and to consume the compile options and link options. the downside is: * the generated compilation database (compile_commands.json) does not include the commands building Seastar anymore. * the building system of scylla does not have finer graind control on the building process of seastar. for instance, we cannot specify the build dependency to a certain seastar library, and just build it instead of building the whole seastar project. turns out the last approach is the best one we can have at this moment. this is also the approach used by the existing `configure.py`. in this change, we - add FindSeastar.cmake to * detect the preconfigured Seastar builds, and * extract the build options from .pc files * expose library targets to be consumed by parent project - add Seastar as an external project, so we can build it from the parent project. this is atypical compared to standard ExternalProject usage: - Seastar's build system should already be configured at this point. - We maintain separate project variants for each configuration type. Benefits of this approach: - Allows the parent project to consume the compile options exposed by .pc file. as the compile options vary from one config to another. - Allows application of config-specific settings - Enables building Seastar within the parent project's build system - Facilitates linking of artifacts with the external project target, establishing proper dependencies between them we will update `configure.py` to merge the compilation database of scylla and seastar. Refs scylladb/scylladb#2717 --- this is a CMake-related change, hence no need to backport. Closes scylladb/scylladb#21131 * github.com:scylladb/scylladb: build: cmake: use GENERATOR_IS_MULTI_CONFIG property to detect mult-config build: cmake: consume Seastar using its .pc files build: do not use `mode` as the index into `modes` build: cmake: detect and link against GnuTLS library build: cmake: detect and link against yaml-cpp build: cmake: link Seastar with Seastar::<COMPONENT> build: cmake: define CMake generate helper funcs in scylla	2024-10-18 09:42:59 +03:00
Amnon Heiman	09fa625672	scripts/get_description.py: param_mapping was missing get_description.py was moved from a standalone script to a library. During the transition, param_mapping was not included in the script option. This patch makes it possible to use the file as a standalone script again.	2024-10-18 08:58:04 +03:00
Amnon Heiman	10af854ec4	scripts/metrics-config.yml: no need to get metrics from the tests	2024-10-18 08:57:53 +03:00
Kefu Chai	b5f5a963ca	build: do not pass Seastar_CXX_DIALECT=gnu++23 when building Seastar Seastar now respect CMAKE_CXX_STANDARD in favor of Seastar_CXX_DIALECT, which has been dropped in Seastar's commit of 60bc8603bd438232614e9b3dcd7537dc83c85206 . Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21130	2024-10-18 08:57:23 +03:00
Botond Dénes	6811411288	Merge 'Sanitize commitlog API endpoints' from Pavel Emelyanov Endpoints are registered next to the service they use, and the unregistration deferred action is created right after it. When registered, the service in question is passed as argument and then captured by enpoints lambdas. This makes sure that service is not used by endpoints after being stopped. That's not so for commitlog endpoints. These are registered in several places, and /commitlog "function" is not unregistered on stop. This patch fixes some of this misbehavior, in particular: - adds unregistration of commitlog API function - uses sharded<database>& argument in endpoints instead of ctx.db - moves some endpoints from storage_service.cc to commitlog.cc Closes scylladb/scylladb#21053 * github.com:scylladb/scylladb: api: Use captured database, not the one from ctx api: Pass sharded<database> to commitlog endpoints registration api: Move commitlog-related from storage_service.cc api: Unset commitlog API endpoints api: Extract set_server_commitlog() from set_server_done()	2024-10-18 08:56:13 +03:00
Botond Dénes	568b767ec3	Merge 'schema: convert from boost ranges to std ranges' from Avi Kivity To reduce dependency load, change uses of boost ranges to std::ranges. The first patch is preparation, replacing a construct that isn't easy to support with std ranges with something simpler. No backport as this is a code cleanup. Closes scylladb/scylladb#21122 * github.com:scylladb/scylladb: schema: replace boost ranges with std ranges schema: precompute all_columns_in_select_order()	2024-10-18 08:42:50 +03:00
Pavel Emelyanov	df6991edd3	test: Do not duplicate sstable twice The statistics_rewrite test case copies an sstable from resources two times: - first time -- explicitly by listing resource components and copying files to the test temp dir - second time -- implicitly, by calling create_links() linking copied files by new set in the staging/ subdirectory The 2nd step is not needed and the history of changes justifies that. The test itself appeared with `70b793e4d3` and it only contained the 2nd "copying" -- test linked files from resource directory and then worked in the newly created set. Later, commit `59c57861ae` added the first step and copied the files from resource into test temp dir. At this point linking copied files because pointless, but was preserved. Let's remove it now. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21097	2024-10-18 08:31:08 +03:00
Kefu Chai	26a5a00b20	interval: include used header when building the tree with Clang-20 and libstdc++ shippped with GCC-14.2, we have following build failure: ``` /home/kefu/dev/scylladb/interval.hh:638:14: error: no member named 'sort' in namespace 'std' 638 \| std::sort(intervals.begin(), intervals.end(), [&](auto&& r1, auto&& r2) { \| ~~~~~^ /home/kefu/dev/scylladb/interval.hh:691:21: error: no member named 'upper_bound' in namespace 'std' 691 \| return std::upper_bound(r.begin(), r.end(), value, std::forward<LessComparator>(cmp)); \| ~~~~~^ /home/kefu/dev/scylladb/interval.hh:723:18: error: no member named 'minmax' in namespace 'std'; did you mean 'fminmag'? 723 \| auto p = std::minmax(_interval, other._interval, [&cmp] (auto&& a, auto&& b) { \| ^~~~~~~~~~~ \| fminmag ``` it turns out we failed to include the used header. in this change, we include `<algorithm>` so that this header is self-contained. after this change, the build passes. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21168	2024-10-18 08:26:27 +03:00
Kefu Chai	e73b0c942f	build: cmake: use GENERATOR_IS_MULTI_CONFIG property to detect mult-config this is more reliable way to check if we are configured to use a mult-config generator. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Kefu Chai	415c83fa67	build: cmake: consume Seastar using its .pc files before this change, scylla's CMake-based system consumes Seastar library by including it directly. but this failed to address the needs of linking against Seastar shared libraries in Debug and Dev builds, while linking against the static libraries in other builds. because Seastar uses `BUILD_SHARED_LIBS` CMake variable to determine if it builds shared libraries. and we cannot assign different values to this CMake variable based on current configure type -- CMake does not support. see https://gitlab.kitware.com/cmake/cmake/-/issues/19467 in order to address this problem, we have a couple possible solutions: - to enable Seastar to build both shared and static libraries in a pass. without sacrificing the performance, we have to build all object files twice: once with -fPIC, once without. in order to accompolish this goal, we need to develop a machinary to populate the same settings to these two builds. this would complicate the design of Seastar's building system further. - to build Seastar libraries twice in scylla, we could use the ExternalProject module to implement this. but it'd be complicate to extract the compile options, and link options previously populated by Seastar's targets with CMake -- we would have to replicate all of them in scylla. this is out of the question. - to build Seastar libraries twice before building scylla, and let scylla to consume them using CMake config files or .pc files. this is a compromise. it enables scylla to drive the build of Seastar libraries and to consume the compile options and link options. the downside is: * the generated compilation database (compile_commands.json) does not include the commands building Seastar anymore. * the building system of scylla does not have finer graind control on the building process of seastar. for instance, we cannot specify the build dependency to a certain seastar library, and just build it instead of building the whole seastar project. turns out the last approach is the best one we can have at this moment. this is also the approach used by the existing `configure.py`. in this change, we - add FindSeastar.cmake to * detect the preconfigured Seastar builds, and * extract the build options from .pc files * expose library targets to be consumed by parent project - add Seastar as an external project, so we can build it from the parent project. BUILD_AWAYS is set to ensure that Seastar is rebuilt, as scylla developers are expected to modify Seastar occasionally. since the change in Seastar's SOURCE_DIR is not detectable via the ExternalProject, we have to rebuild it. this is atypical compared to standard ExternalProject usage: - Seastar's build system should already be configured at this point. - We maintain separate project variants for each configuration type. Benefits of this approach: - Allows the parent project to consume the compile options exposed by .pc file. as the compile options vary from one config to another. - Allows application of config-specific settings - Enables building Seastar within the parent project's build system - Facilitates linking of artifacts with the external project target, establishing proper dependencies between them - preserve the existing machinery of including Seastar only when building without multi-config generator. this allows users who don't use mult-config generator to build Seastar in-the-tree. the typical use case is the CI workflows performing the static analysis. we will update `configure.py` to merge the compilation database of scylla and seastar. Refs scylladb/scylladb#2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Kefu Chai	7cb74df323	build: do not use `mode` as the index into `modes` before this change, in `configure_seastar()`, we use `mode` as a component in the build directory, and use it as the index into `modes` dict. but in a succeeding commit, we will reuse `configure_seastar()` when preparing for the CMake-based building system, in which, `mode` will be the CMake configure type, like "Debug" instead of scylla's build mode, like "debug". to be prepared for this change, let's use `mode_config` directly. it's identical to `modes[mode]`. this also improves the readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Kefu Chai	1bd2ed7826	build: cmake: detect and link against GnuTLS library before this change, in the CMake-based building system, we rely on Seastar to provide this linkage, but this is wrong and fragile. as Seastar is not supposed to expose and provide GnuTLS symbols. that's why we have following build failure: ``` : && /home/kefu/.local/bin/clang++ -g -Og -g -gz -Xlinker --build-id=sha1 --ld-path=ld.lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 /home/kefu/dev/scylladb/build/Debug/seastar/libseastar.so -fsanitize=address -fsanitize=undefined /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so /usr/lib64/libcares.so /usr/lib64/libfmt.so.11.0.2 -L/usr/lib64 -llz4 CMakeFiles/scylla_version.dir/Debug/release.cc.o CMakeFiles/scylla.dir/Debug/main.cc.o -o Debug/scylla -L/home/kefu/dev/scylladb/idl/absl::headers -Wl,-rpath,/home/kefu/dev/scylladb/idl/absl::headers:/home/kefu/dev/scylladb/build/Debug/seastar Debug/libscylla-main.a api/Debug/libapi.a alternator/Debug/libalternator.a db/Debug/libdb.a cdc/Debug/libcdc.a compaction/Debug/libcompaction.a cql3/Debug/libcql3.a data_dictionary/Debug/libdata_dictionary.a gms/Debug/libgms.a index/Debug/libindex.a lang/Debug/liblang.a message/Debug/libmessage.a mutation/Debug/libmutation.a mutation_writer/Debug/libmutation_writer.a raft/Debug/libraft.a readers/Debug/libreaders.a redis/Debug/libredis.a repair/Debug/librepair.a replica/Debug/libreplica.a schema/Debug/libschema.a service/Debug/libservice.a sstables/Debug/libsstables.a streaming/Debug/libstreaming.a test/perf/Debug/libtest-perf.a tools/Debug/libtools.a transport/Debug/libtransport.a types/Debug/libtypes.a utils/Debug/libutils.a Debug/seastar/libseastar.so /usr/lib64/libyaml-cpp.so /usr/lib64/libboost_program_options.so.1.83.0 test/lib/Debug/libtest-lib.a -Xlinker --push-state -Xlinker --whole-archive auth/Debug/libscylla_auth.a -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/Debug/libcdc.a compaction/Debug/libcompaction.a mutation_writer/Debug/libmutation_writer.a -Xlinker --push-state -Xlinker --whole-archive dht/Debug/libscylla_dht.a -Xlinker --pop-state index/Debug/libindex.a -Xlinker --push-state -Xlinker --whole-archive locator/Debug/libscylla_locator.a -Xlinker --pop-state message/Debug/libmessage.a gms/Debug/libgms.a sstables/Debug/libsstables.a readers/Debug/libreaders.a schema/Debug/libschema.a -Xlinker --push-state -Xlinker --whole-archive tracing/Debug/libscylla_tracing.a -Xlinker --pop-state Debug/libscylla-main.a -Xlinker --push-state -Xlinker --whole-archive Debug/libscylla-zstd.a -Xlinker --pop-state /usr/lib64/libzstd.so abseil/absl/strings/Debug/libabsl_cord.a abseil/absl/strings/Debug/libabsl_cordz_info.a abseil/absl/strings/Debug/libabsl_cord_internal.a abseil/absl/strings/Debug/libabsl_cordz_functions.a abseil/absl/strings/Debug/libabsl_cordz_handle.a abseil/absl/crc/Debug/libabsl_crc_cord_state.a abseil/absl/crc/Debug/libabsl_crc32c.a abseil/absl/crc/Debug/libabsl_crc_internal.a abseil/absl/crc/Debug/libabsl_crc_cpu_detect.a abseil/absl/strings/Debug/libabsl_str_format_internal.a service/Debug/libservice.a node_ops/Debug/libnode_ops.a service/Debug/libservice.a node_ops/Debug/libnode_ops.a raft/Debug/libraft.a repair/Debug/librepair.a streaming/Debug/libstreaming.a replica/Debug/libreplica.a abseil/absl/container/Debug/libabsl_raw_hash_set.a abseil/absl/hash/Debug/libabsl_hash.a abseil/absl/hash/Debug/libabsl_city.a abseil/absl/types/Debug/libabsl_bad_variant_access.a abseil/absl/hash/Debug/libabsl_low_level_hash.a abseil/absl/types/Debug/libabsl_bad_optional_access.a abseil/absl/container/Debug/libabsl_hashtablez_sampler.a abseil/absl/profiling/Debug/libabsl_exponential_biased.a abseil/absl/synchronization/Debug/libabsl_synchronization.a abseil/absl/debugging/Debug/libabsl_stacktrace.a abseil/absl/synchronization/Debug/libabsl_graphcycles_internal.a abseil/absl/synchronization/Debug/libabsl_kernel_timeout_internal.a abseil/absl/debugging/Debug/libabsl_symbolize.a abseil/absl/debugging/Debug/libabsl_debugging_internal.a abseil/absl/base/Debug/libabsl_malloc_internal.a abseil/absl/debugging/Debug/libabsl_demangle_internal.a abseil/absl/time/Debug/libabsl_time.a abseil/absl/strings/Debug/libabsl_strings.a abseil/absl/strings/Debug/libabsl_strings_internal.a abseil/absl/strings/Debug/libabsl_string_view.a abseil/absl/base/Debug/libabsl_throw_delegate.a abseil/absl/numeric/Debug/libabsl_int128.a abseil/absl/base/Debug/libabsl_base.a abseil/absl/base/Debug/libabsl_raw_logging_internal.a abseil/absl/base/Debug/libabsl_log_severity.a abseil/absl/base/Debug/libabsl_spinlock_wait.a -lrt abseil/absl/time/Debug/libabsl_civil_time.a abseil/absl/time/Debug/libabsl_time_zone.a -lsystemd /usr/lib64/libz.so /usr/lib64/libdeflate.so types/Debug/libtypes.a utils/Debug/libutils.a /usr/lib64/libyaml-cpp.so /usr/lib64/libcryptopp.so /usr/lib64/libboost_regex.so.1.83.0 /usr/lib64/libicui18n.so /usr/lib64/libicuuc.so -ldl /usr/lib64/libboost_unit_test_framework.so.1.83.0 Debug/seastar/libseastar_perf_testing.so /usr/lib64/libjsoncpp.so.1.9.5 db/Debug/libdb.a data_dictionary/Debug/libdata_dictionary.a cql3/Debug/libcql3.a transport/Debug/libtransport.a cql3/Debug/libcql3.a transport/Debug/libtransport.a lang/Debug/liblang.a /usr/lib64/liblua-5.4.so -lm rust/Debug/libwasmtime_bindings.a rust/librust_combined.a /usr/lib64/libsnappy.so.1.2.1 mutation/Debug/libmutation.a Debug/seastar/libseastar.so /usr/lib64/liblz4.so /usr/lib64/libxxhash.so && : ld.lld: error: undefined symbol: gnutls_hmac_fast >>> referenced by aws_sigv4.cc:21 (/home/kefu/dev/scylladb/utils/aws_sigv4.cc:21) >>> aws_sigv4.cc.o:(utils::aws::hmac_sha256(std::basic_string_view<char, std::char_traits<char>>, std::basic_string_view<char, std::char_traits<char>>)) in archive utils/Debug/libutils.a ld.lld: error: undefined symbol: gnutls_strerror >>> referenced by aws_sigv4.cc:23 (/home/kefu/dev/scylladb/utils/aws_sigv4.cc:23) >>> aws_sigv4.cc.o:(utils::aws::hmac_sha256(std::basic_string_view<char, std::char_traits<char>>, std::basic_string_view<char, std::char_traits<char>>)) in archive utils/Debug/libutils.a ``` in this change, we detect this library, and link its caller against it. this addresses the link failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Kefu Chai	fc8212483e	build: cmake: detect and link against yaml-cpp in main.cc, we use yaml-cpp library directly. so we are obliged to detect this library in scylla and link against it instead of relying on other library to do this. currently, Seastar detects it and pulls in yaml-cpp for us, but we should not take this for granted and rely on this. in this change, we detect and link against yaml-cpp to make this dependency explicit. the same applies to the "utils" library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Kefu Chai	2e4be56112	build: cmake: link Seastar with Seastar::<COMPONENT> before this change, we link against the targets defined in Seastar's source tree. but these targets are not part of Seastar's public interface -- they are not exposed by Seastar's CMake config files. so, let link against the target names qualified by the library module name. this also prepares for the transition to using Seastar without including it directly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Kefu Chai	b2dc261841	build: cmake: define CMake generate helper funcs in scylla before this change, we assume that scylla's CMake script includes Seastar's CMake script. but we are going to consume Seastar using its .pc files or its CMake config files instead of including it directly. more over these helper functions are not part of Seastar's public interface. actually the same applies to the `check_headers()` helper, which was adapted from seastar's CheckHeaders.cmake. so to be prepared for this change, let's define these generate helper functions in scylla. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-18 08:36:52 +08:00
Avi Kivity	f4acaa5473	cql3: index_target: forward declare boost::regex No need to burden everyone with the full boost::regex code. Closes scylladb/scylladb#21148	2024-10-17 19:14:40 +02:00
Botond Dénes	e1d8cddd09	test/boost/mutation_test: add test for multishard permit safety Add a test checking that the multishard reader will not deadlock, when created with an admitted permit, on a semaphore with a single count resource.	2024-10-17 08:47:50 -04:00
Botond Dénes	5a3fd69374	test/lib/reader_lifecycle_policy: add semaphore factory to constructor Allowing callers to specify how the semaphore is created and stopped, instead of doing so via boolean flags like it is done currently. This method doesn't scale, so use a factory instead.	2024-10-17 08:47:50 -04:00
Botond Dénes	c8598e21e8	test/lib/reader_lifecycle_policy: rename factory_function To reader_factor_function. We are about to add a new factory function parameters, so the current factory_function has to be renamed to something more specific.	2024-10-17 08:47:50 -04:00
Botond Dénes	76a5ba2342	repair/row_level: drop now unneeded release_base_resource() calls The multishard reader now does this itself, no need to do it here.	2024-10-17 08:47:50 -04:00
Botond Dénes	218ea449a5	readers/multishard: make multishard reader safe to create with admitted permits Passing an admitted permit -- i.e. one with count resources on it -- to the multishard reader, will possibly result in a deadlock, because the permit of the multishard reader is destroyed after the permits of its child readers. Therefore its semaphore resources won't be automatically released until children acquire their own resources. This creates a dependency (an edge in the "resource allocation graph"), where the semaphore used by the multishard reader depends on the semaphores used by children. When such dependencies create a cycle, and permits are acquired by different reads in just the right order, a deadlock will happen. Users of the multishard reader have to be aware of this gotcha -- and of course they aren't. This is small wonder, considering that not even the documentation on the multishard reader mentions this problem. To work around this, the user has to call `reader_permit::release_base_resources()` on the permit, before passing it to the multishard reader. On multiple occasions, developers (including the very author of the multishard reader), forgot or didn't know about this and this resulted in deadlocks down the line. This is a design-flaw of the multishard reader, which is addressed in this patch, after which, it is safe to pass admitted or not admitted permits to the multishard reader, it will handle the call to `release_base_resources()` if needed.	2024-10-17 08:45:21 -04:00
Raphael S. Carvalho	f3ab5e1f1e	tests: Fix perf test for load balancer Broken after introduction of zero-token nodes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#21156	2024-10-17 14:02:31 +02:00
Kamil Braun	f02afefd34	Merge 'raft: consider the gossiper state then sending the group0 state id' from Emil Maskovsky Skip the advertisement of the group0 state id in case the gossiper is not active (ready). Sending the application state when the gossiper is not active caused a warning being shown in the log about the local endpoint not being found in the gossiper endpoint state map on a (graceful) node restart. The local endpoint is initialized on the gossiper startup, so we skip the state id advertisement until the startup is finished. Fixes: scylladb/scylladb#21117 No backport: Fixes an issue that is currently only present in master Closes scylladb/scylladb#21119 * github.com:scylladb/scylladb: raft: consider the gossiper state then sending the group0 state id raft: add the test for GROUP0_STATE_ID gossip application state	2024-10-17 13:41:15 +03:00
Kefu Chai	5ef0cbb693	tools/scylla-nodetool: s/vm.count()/vm.contains()/ this change is created in the same spirit of `0104c7d3`, which used `std::map::contains()` in the place of `std::map::count()` when checking for the existence of a paramter with given name for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21158	2024-10-17 13:41:15 +03:00
Alexey Novikov	b965729f0a	replica: implement memtable_flush_period_in_ms schema option implement cassandra original schema option memtable_flush_period_in_ms: Milliseconds before memtables associated with the table are flushed. there are few things concerning this patch: * milliseconds look strange and scary for this option. Unlike Cassandra we use 60000ms (1min) minimum value for this option. * This is limitation of Cassandra but it is impossible to set this option for system tables. However sometimes it could be very useful to use automatic flushing for such a tables: some system tables have small traffic and as a result prevent tombstone garbage collection. Fixes #20270 Closes scylladb/scylladb#20999	2024-10-17 13:41:15 +03:00
Anna Stuchlik	b54ce3b0c0	doc: remove the redundant raw:: html directive This commit removes the raw:: html directive (with the exception of an embedded animation) because: - It is not supported by the dark theme and looks bad. - It's a legacy directive, and we no longer need it on index pages. Fixes https://github.com/scylladb/scylladb/issues/20881 Closes scylladb/scylladb#21062	2024-10-17 13:41:15 +03:00
Kefu Chai	d7f315ef63	tool/scylla-nodetool: check for positional argument passed to "restore" before this change, if no positional arguments are passed to "restore" subcommand, the tool fails with following error message: ``` error running operation: boost::wrapexcept<boost::bad_any_cast> (boost::bad_any_cast: failed conversion using boost::any_cast) ``` this is difficult to digest. after this change, if no sstables are specified: ``` error processing arguments: missing required parameter: sstables ``` this is slightly better from user experience's perspective. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21136	2024-10-17 13:41:15 +03:00
Kefu Chai	9355a32b5c	utils/loading_cache: s/typeof/decltype/ `typeof` is a GNU extension, and is part of C23, but it is not included by C++23. if we compile the tree with c++23 instead of gnu++23, the compilation fails like: ``` FAILED: repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o /home/kefu/.local/bin/clang++ -DSCYLLA_BUILD_MODE=release -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/RelWithDebInfo/seastar/gen/include -isystem /usr/include/p11-kit-1 -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -mllvm -inline-threshold=2500 -fno-slp-vectorize -std=c++23 -Werror=unused-result -DSEASTAR_API_LEVEL=7 -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_LOGGER_TYPE_STDOUT -DFMT_SHARED -DWITH_GZFILEOP -MD -MT repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o -MF repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o.d -o repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o -c /home/kefu/dev/scylladb/repair/repair.cc In file included from /home/kefu/dev/scylladb/repair/repair.cc:21: In file included from /home/kefu/dev/scylladb/service/storage_service.hh:19: In file included from /home/kefu/dev/scylladb/service/qos/service_level_controller.hh:19: In file included from /home/kefu/dev/scylladb/auth/service.hh:23: In file included from /home/kefu/dev/scylladb/auth/permissions_cache.hh:22: /home/kefu/dev/scylladb/utils/loading_cache.hh:754:66: error: use of undeclared identifier 'typeof'; did you mean 'typeid'? 754 \| static_assert(SectionHitThreshold <= std::numeric_limits<typeof(_touch_count)>::max() / 2, "SectionHitThreshold value is too big"); \| ^ /home/kefu/dev/scylladb/utils/loading_cache.hh:754:66: error: template argument for template type parameter must be a type 754 \| static_assert(SectionHitThreshold <= std::numeric_limits<typeof(_touch_count)>::max() / 2, "SectionHitThreshold value is too big"); \| ^~~~~~~~~~~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/limits:311:21: note: template parameter is declared here 311 \| template<typename _Tp> \| ^ 2 errors generated. ``` in this change, we trade `typeof` for a more standard compliant `decltype`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21116	2024-10-17 13:41:15 +03:00
Pavel Emelyanov	df83fe2dae	Merge 'interval: replace boost ranges with std ranges' from Avi Kivity To reduce dependency load, replace use of boost ranges with std ranges. Since std ranges are more particular about what iterators they accept, a custom iterator in size_estimates_virtual_reader has to be fixed first. No backport; code cleanup. Closes scylladb/scylladb#21143 * github.com:scylladb/scylladb: interval: change boost ranges to std ranges size_estimates_virtual_reader: make virtual_row_iterator more conforming	2024-10-17 13:41:15 +03:00
Avi Kivity	6fd219d982	sstables: generation_type: deinline from_string() This is not performance sensitive and penalizes everyone by including boost/regex.hpp. Fix by deinlining. Closes scylladb/scylladb#21147	2024-10-17 13:41:15 +03:00
Emil Maskovsky	e082fef32c	raft: remove the group0 state id handler stop check The stop assertion check in the group0 state id handler was triggering under some circumstances (stopping server during restart). In that case it might be that the stop is initiated before the server is fully initialized, and then the handler destructor is being called without calling to the `stop()` method first. This is a valid scenario. The whole `stop()` in the group0 state id handler is not necessary, as the only operation being done is cancelling the timer which is done by the timer destructor automatically anyway. There is the concern of a currently running timer callback, but it doesn't preempt (not async) so the timer shouldn't be destroyed before the callback finishes. Fixes: scylladb/scylladb#21074 Closes scylladb/scylladb#21127	2024-10-17 13:41:15 +03:00
Emil Maskovsky	3f1af268c2	raft: consider the gossiper state then sending the group0 state id Skip the advertisement of the group0 state id in case the gossiper is not active (ready). Sending the application state when the gossiper is not active caused a warning being shown in the log about the local endpoint not being found in the gossiper endpoint state map on a (graceful) node restart. The local endpoint is initialized on the gossiper startup, so we skip the state id advertisement until the startup is finished. Fixes: scylladb/scylladb#21117	2024-10-16 19:26:25 +02:00
Emil Maskovsky	65d3d4fd93	raft: add the test for GROUP0_STATE_ID gossip application state Test that the GROUP0_STATE_ID gossip application state is not causing the "endpoint_state_map does not contain endpoint" error. Refs: scylladb/scylladb#21117	2024-10-16 19:21:14 +02:00
Calle Wilund	f2ef75c3da	commitlog_test: Up timeout for large entry tests Fixes #21150 Apparently, on some CI, in debug, these tests can time out (large alloc) without actually failing what they do. Up the timeout (could consider removing as well, but...) so they hopefully pass. Closes scylladb/scylladb#21151	2024-10-16 18:13:04 +03:00
Avi Kivity	f799234c82	Update tools/java submodule (deprecation notice) * tools/java b2d025fd6b...807e991de7 (1): > README.md: add deprecation notice for java tools	2024-10-16 17:09:48 +03:00
Avi Kivity	b73f0197a8	Merge 'micro-updates to documentation development, on python-poetry' from Laszlo Ersek - `docs/Makefile`: work around python-poetry issue https://github.com/python-poetry/poetry/issues/8761 - `docs/README.md`: fix minimum poetry version No backporting needed (docs development). Closes scylladb/scylladb#21118 * github.com:scylladb/scylladb: docs/README.md: fix minimum poetry version docs/Makefile: work around python-poetry issue #8761	2024-10-16 14:16:29 +03:00
Nadav Har'El	ee0e7a7adf	mv: test that operations that should not be allowed on a view, aren't This patch adds test/cql-pytest tests which verify that all CQL operations that shouldn't be allowed on a materialized view, actually aren't: * All operations writing to a table - INSERT, UPDATE, BATCH, DELETE, and TRUNCATE - should be rejected when asked to operate on a view. * All operations with "TABLE" in their name (DROP TABLE, ALTER TABLE, DESC TABLE) should be rejected on a view - the ".. MATERIALIZED VIEW" operation should be used instead. * A materialized view cannot get materialized views or indexes of its own. All tests pass on Cassandra (Cassandra 4 or above is needed for the "DESC" test), and all but one pass on Scylla - Scylla does allow "DESC TABLE" on a materialized view, unlike Cassandra. I opened an issue to track that difference: Refs #21026 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21028	2024-10-16 13:43:36 +03:00
Avi Kivity	d58cd262ca	interval: change boost ranges to std ranges Reduce dependency load. size_estimates_virtual_reader is adjusted due to poor boost ranges and std ranges interoperability.	2024-10-16 13:21:43 +03:00
Avi Kivity	3a75efd6d4	size_estimates_virtual_reader: make virtual_row_iterator more conforming To work with std::ranges, an iterator has to have a default constructor, and be assignable. Add the default constructor and convert references to pointers to support this.	2024-10-16 13:21:25 +03:00
Pavel Emelyanov	4a8ab9b3bc	s3/client: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-16 12:27:29 +03:00
Pavel Emelyanov	a15dfe0154	s3/client: Catch do_upload_file::upload_part() exceptions This method spawns part uploading in the background, but still may throw, e.g. preparing http request or claiming memory. In this case any outstanding part upload fibers are not waited on, and the whole do_upload_file object can be freed from under their feet. Also, the multipart upload is not aborted, thus losing track of it until g.c. happens. To fix it, catch any exception from upload_part() too, and if it happens, do what the regular upload_sink would do -- close the gate thus picking up any outstanding activity that may happen there and abort the multipart upload. Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-16 12:23:31 +03:00
Nadav Har'El	210d53070e	docs/alternator: explain service discovery HTTP requests In this patch we add to docs/new-apis.md (Alternator-specific API) a description of the service discovery HTTP requests - `/` and `/localnodes` that was previously not documented except in a design document that is unfortunately no longer available publically. The description also includes the recently added `dc` and `rack` parameters for the `/localnodes` request. Fixes #20989 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-10-16 10:15:04 +03:00
Nadav Har'El	367e18ed4a	docs/alternator: split Alternator-specific APIs from alternator.md Before this patch, the documentation of Alternator-specific APIs (APIs which are unique to Alternator and don't exist in DynamoDB) appear as a section of the main document alternator.md. In the next patch we want to describe yet another Alternator feature and make this section even longer. But there is growing sentiment that the Alternator documentation should be split into more, shorter, pages (Refs #19822) so this patch splits the Alternator-specific API documentation into a new file, new-apis.md. There is no new content in the patch - just movement of existing content plus a reference to the new page. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-10-16 10:14:31 +03:00
Kefu Chai	32f508d450	raft: fix typo in logging message s/miminum/minimum/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21073	2024-10-16 06:33:43 +03:00
Avi Kivity	d59038fa93	storage_proxy: convert boost range algorithms to std::ranges Standardize on a single range library. The changes are mostly mechanical. The only exception is boost::join, which has no analog in std::ranges (rightly so, since it cannot be implemented efficiently). A variety of tricks were used to convert it: - use std::ranges::join() on an std::array of std::span (when the inputs were all contiguous) - copy to a utils::small_vector (when it is expected that there will be no allocation) - use a small_vector of pointers and iterate+dereference that Closes scylladb/scylladb#21082	2024-10-15 16:52:27 +02:00
Avi Kivity	820509026f	schema: replace boost ranges with std ranges To reduce dependency load, use std ranges instead of boost ranges. The std::ranges::{lower,upper}_bound don't support heterogeneous lookup, but a more natural solution is to use a projection to search for the name, so we use that and the custom comparator is removed. Many callers are converted as well due to poor interoperability between boost ranges and std ranges.	2024-10-15 16:42:54 +03:00
Piotr Dulikowski	a380a2efd9	test/test_view_build_status: properly wait for v2 in migration test The test_view_build_status_migration_to_v2 test case creates a new view (vt2) after peforming the view_build_status -> view_build_status_v2 migration and waits until it is built by `wait_for_view_v2` function. It works by waiting until a SELECT from view_build_status_v2 will return the expected number of rows for a given view. However, if the host parameter is unspecified, it will query only one node on each attempt. Because `view_build_status_v2` is managed via raft, queries always return data from the queried node only. It might happen that `wait_for_view_v2` fetches expected results from one node while a different node might be lagging behind the group0 coordinator and might not have all data yet. In case of test_view_build_status_migration_to_v2 this is a problem - it first uses `wait_for_view_v2` to wait for view, later it queries `view_build_status_v2` on a random node and asserts its state - and might fail because that node didn't have the newest state yet. Fix the issue by issuing `wait_for_view_v2` in parallel for all nodes in the cluster and waiting until all nodes have the most recent state. Fixes: scylladb/scylladb#21060 Closes scylladb/scylladb#21091	2024-10-15 14:57:47 +03:00
Pavel Emelyanov	63725b10a8	Merge 'cql: create default superuser if it doesn't exist' from Paweł Zakrzewski This change reorganizes the way standard_role_manager startup is handled: role_manager::ensure_superuser_is_created() is added, which returns a future that resolves once the superuser is available. We wait for this future before starting the CQL server. There is a change in behavior auth::do_after_system_ready is potentially an infinite loop, and we await its result. Fixes #10481 Reason for no backports: it's not a regresson and it's an issue that may only affect a tiny time window during the cluster startup. Closes scylladb/scylladb#20137 * github.com:scylladb/scylladb: test: test_restart_cluster: create the test auth: standard_role_manager allows awaiting superuser creation auth: coroutinize the standard_role_manager start() function auth: don't start server until the superuser is created	2024-10-15 14:56:04 +03:00
Avi Kivity	a5c37a110f	schema: precompute all_columns_in_select_order() all_columns_in_select_order() returns a complicated boost range type that has no analog in std::ranges. To ease the transition to std::ranges, precompute most of the work done in that function, and only convert pointers to references in the function itself. Since boost ranges and std::ranges don't fully interoperate, one of the user has to be adjusted.	2024-10-15 14:04:12 +03:00
Pavel Emelyanov	6e0899c2b4	data_dictionary: Replace boost ranges with std ranges Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21105	2024-10-15 13:22:08 +03:00
Laszlo Ersek	a0ffbd5bcf	docs/README.md: fix minimum poetry version Commit `2a3012db7f` ("docs/README.md: expand prerequisites list", 2022-08-31) referenced poetry release 1.12, which does not exist even today (as of this writing, the latest release is 1.8.4). The intent was probably 1.1.12. Copy the minimum version from "sphinx-scylladb-theme": 1.8.1 (see "docs/source/getting-started/installation.rst" and "docs/source/getting-started/quickstart.rst" at commit f7c26b422572). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-10-15 12:19:21 +02:00
Laszlo Ersek	e5c2d4bd1d	docs/Makefile: work around python-poetry issue #8761 Python-poetry is affected by bug <https://github.com/python-poetry/poetry/issues/8761>. Namely, if you have "keyring" <https://pypi.org/project/keyring/> installed, poetry will try to gain access to the Default collection in the (ex. GNOME) keyring, even if poetry only needs read-only access to package repositories, and even if those repos are public. Consequently, you either unlock your Default collection for poetry (unjustifiedly), or your GUI session gets effectively locked up, because any time you hit Cancel on the keyring unlock dialog, poetry immediately pops up another, and this dialog grabs the keyboard -- you cannot even switch to a character VT, for killing poetry; you have to log in via ssh for that. This issue is not visible to users who don't use "keyring" (GNOME or otherwise). For those who do, work around the problem by selecting the "null" keyring back-end, in the environment of every poetry invocation. Note: I have not regression-tested the workaround in a desktop environment where "keyring" is unavailable to begin with. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-10-15 12:07:00 +02:00
Botond Dénes	f93abebbb9	Merge 'Sanitize compaction manager API endpoints' from Pavel Emelyanov Endpoints are registered next to the service they use, and the unregistration deferred action is created right after it. When registered, the service in question is passed as argument and then captured by enpoints lambdas. This makes sure that service is not used by endpoints after being stopped. That's not quite the case for compaction manager. Its endpoints can be registered in several places, and compaction_manager "function" is not unregistered on stop. This patch fixes some of this misbehavior, in particular: - adds unregistration of compaction_manager API function - uses sharded<compaction_manager>& argument in endpoints instead of ctx.db.local().get_compaction_manager() chain - moves some endpoints from storage_service.cc to compaction_manager.cc Closes scylladb/scylladb#20962 * github.com:scylladb/scylladb: api: Use captured compaction_manager in get_cm_stats() helper api: Use captured compaction_manager in endpoints api: Add sharded<compaction_manager> argument to compaction_manager API reg/unreg api: Move some endpoints from storage_service.cc to compaction_manager.cc api: Unset compaction_manager endpoints api: Use shorter registration method for compaction_manager function	2024-10-15 10:20:58 +03:00
Daniel Reis	28a265ccd8	docs: fix redirect from cert-based auth to security/enable-auth page Closes scylladb/scylladb#19943	2024-10-15 09:29:05 +03:00
Kefu Chai	a82706eb8f	Update seastar submodule * seastar 3c9c2696...abd20efd (44): > Revert "build: enable Seastar to build shared and static libs in a single build" > dns: Support c-ares before 1.22 > build: improve c-ares version extraction method > Minor typos fix in doc: reference_wrapper.hh > build: enable Seastar to build shared and static libs in a single build > build: include -fno-semantic-interposition in CXXFLAGS > loop: add Sentinel iterator support to parallel_for_each() > dns: use ARES_LIB_INIT_NONE instead of a magic number > dns: use struct typedef for `_channel` > doc/testing.md: explain seastar + boost test colocation > build: extract c-ares version from header file > dns: replace deprecated ares_process() with ares_process_fd() > build: do not support c-ares >= 1.33 > http: fix indentation > http: Add non-owning `make_request` to http client > treewide: replace boost::irange with std::views::iota where possible > Added unit test for "http_content_length_data_sink_impl" > sharded.hh: migrate to concepts > file, scheduling: remove non-unified I/O and CPU scheduling > http: Add more HTTP response codes > http: add constness to `response_line` > http: refactor response_line to use `seastar::format` > httpd/file_handler: Always close stream > build: add compiler and C++ standard compatibility checks > rpc: rpc_types: replace boost::any with std::any > tls: drop dependency on boost::any > rpc: drop unnecessaty includes to boost libraries > rpc: compressor factory: deinline some boost-using functions > sharded: replace boost ranges with <ranges> > scheduling_specific: drop dependency on boost range adaptors > prefetch: drop dependency on boost::mpl > resource: drop unused dependency on boost::any > smp: drop dependency on boost ranges > reactor: remove unnecessary boost includes > execution_stage: remove unnecessary boost includes > sharded.hh: add invoke_on variant for a shard range > shared_ptr: remove deprecated lw_shared_ptr assignment operator > seastar-addr2line: add --debug arg > addr2line: add type checking > warnings: fix unused result warnings > thread_pool: fix includes > signal: remove trailing spaces > tests/unit: chmod -x signal_test.cc > iostream/http: Fix output_stream::write(temporary_buffer) overload Closes scylladb/scylladb#21109	2024-10-15 09:09:29 +03:00
Tomasz Grabiec	3e438d23e1	Merge 'Check system.tablets update before putting it into the table' from Pavel Emelyanov Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load. fixes #20043 Closes scylladb/scylladb#21020 * github.com:scylladb/scylladb: tablets: Validate system.tablets update group0_client: Introduce change validation group0_client: Add shared_token_metadata dependency	2024-10-15 00:38:59 +02:00
Piotr Smaron	3969ffb39f	test: fix flaky `test_multidc_alter_tablets_rf` The testcase is flaky due to a known python driver issue: https://github.com/scylladb/python-driver/issues/317. This issue causes the `CREATE KEYSPACE` statement to be sometimes executed twice in a row, and the 2nd CREATE statement causes the test to fail. In order to work around it, it's enough to add `if not exists` when creating a ks. Fixes: scylladb/scylladb#21034 Needs to be backported to all 6.x branches, as the PR introducing this flakiness is backported to every 6.x branch. Closes scylladb/scylladb#21056	2024-10-14 16:18:44 +02:00
Avi Kivity	c286ddab38	test: lib: rest_client: use 'http' scheme even when connecting via a unix socket aiohttp 3.10.5 complains when 'unix+http' is used for a unix-domain socket. USe 'http', which work with 3.10.5 and the toolchain's 3.9.5. Closes scylladb/scylladb#21080	2024-10-14 15:32:56 +02:00
Piotr Dulikowski	48d75818fd	SCYLLA-VERSION-GEN: correct the logic for skipping SCYLLA--FILE The SCYLLA-VERSION-GEN file skips updating the SCYLLA--FILE files if the commit hash from SCYLLA-RELEASE-FILE is the same. The original reason for this was to prevent the date in the version string from changing if multiple modes are built across midnight (scylladb/scylla-pkg#826). However - intentionally or not - it serves another purpose: it prevents an infinite loop in the build process. If the build.ninja file needs to be rebuilt, the configure.py script unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one of the SCYLLA-*-FILE files is updated then this triggers rebuild of build.ninja. Apparently, this is sufficient for ninja to enter an infinite loop. However, the check assumes that the RELEASE is in the format <build identifier>.<date>.<commit hash> and assumes that none of the components have a dot inside - otherwise it breaks and just works incorrectly. Specifically, when building a private version, it is recommended to set the build identifier to `count.yourname`. Previously, before `85219e9`, this problem wasn't noticed most likely because reconfigure process was broken and stopped overwriting the build.ninja file after the first iteration. Fix the problem by fixing the logic that extracts the commit hash - instead of looking at the third dot-separated field counting from the left side, look at the last field. Fixes: scylladb/scylladb#21027 Closes scylladb/scylladb#21049	2024-10-14 13:49:15 +03:00
Calle Wilund	8eaf00ff11	test::topology: Add test for TLS upgrade and downgrade of internode encryption Test a rolling upgrade of cluster while active. Note: This is a unit test version of dtest test. Has the big drawback of not being able to use cassandra-stress to work and verify the cluster and results Test moves from none to all to none encryption while writing and then checking written data.	2024-10-13 23:54:06 +00:00
Calle Wilund	a557f699a2	docs: Add internode_encryption=transitional documentation Describing upgrading cluster(s) without downtime.	2024-10-13 23:54:06 +00:00
Calle Wilund	390b9759b6	messaging_service: Add "transitional" internode encryptipn mode Fixes #18903 Adds a "transitional" internode encryption mode, under which all _outgoing_ RPC connections will use TLS, but we will still accept any incoming non-tls connection. This allows an operator to perform a move to TLS RPC without cluster downtime: 1. For each server, add certificate etc options to server_encryption_options + internode_encryption=none + set ssl_storage_port + restart (rolling) 2. For each server, set internode_encryption=transitional + RR 3. For each server, set internode_encryption=all + RR	2024-10-13 23:54:06 +00:00
Calle Wilund	503a71f9b8	messaging_service: Create TLS connector even if internode_enc=none when certs set Refs #18903 If ssl_storage_port is non-zero _and_ we have specified actual certificates are set/exists, create TLS connector for RPC regardless of whether internode encryption is enables. I.e. potentially unused. For transitioning cluster to TLS.	2024-10-13 23:54:05 +00:00
Avi Kivity	db14a01901	Merge 'Use table id as system.sstables partition key' from Pavel Emelyanov The system.sstables (a.k.a. sstables registry) primary key is "string location" as partition key and "uuid generation" as clustering one. The "location" part was taken from table.config.datadir value which, in turn, a string containing path to on-disk files if the table was located locally, e.g. /var/lib/scylla/data/ks/cf-abc123 one. Recently [1] the datadir was moved from table config onto storage options, but this string is still used as registry key. Other than being owned by a table with ID, sstables are accessed by restore-from-object-storage code [2]. To make it work, both storage driver and sstable_directory helper class maintain two formats of object prefixes for sstables components. For S3-backed sstables having a record in registry, the path used is s3://bucket/generation/component. For restore code there are user-provided prefixes that do not match the aforementioned pattern. The selection between those two is now made by checking sstable state, which is not obvious and may cause troubles for tiered storage driver. This patch changes the registry schema so that partition key becomes "uuid owner" and is set to be table.id() value. This is to stop using the local path by S3 backed sstables. Also this change makes it possible for storage driver and sstable directory to rely on the storage options only to tell different bucket prefixes formats from each other. As a side effect, the make_s3_object_name() helper, that generates the proper object name, becomes explicit for restore-from-S3 usage. Now it relies on the sstable::filename() calling this->prefix() behind the scenes and the latter to return the user-provided prefix, which is pretty fragile construction. No need to backport (and it's not going to be easy to do it), storage options feature is still experimental Refs #20675 [1] Refs #20305 [2] Closes scylladb/scylladb#20998 * github.com:scylladb/scylladb: sstables: Flatten S3 object name making sstable_directory: Flatten directory lister creation treewide: Rename sstable registry location field to be owner system_keyspace: Change sstables registry partition key type sstables: Keep location variant on s3 backend too storage_options: Use variant on S3 options sstables: Split sstable::filename() helper sstables: Add s3_storage::owner() helper	2024-10-13 20:08:43 +03:00
Kefu Chai	7d2d44883b	install.sh: install seastar/scripts/addr2line.py as well seastar extracted `addr2line` python module out back in e078d7877273e4a6698071dc10902945f175e8bc. but `install.sh` was not updated accordingly. it still installs `seastar-addr2line` without installing its new dependency. this leaves us with a broken `seastar-addr2line` in the relocatable tarball. ```console $ /opt/scylladb/scripts/seastar-addr2line Traceback (most recent call last): File "/opt/scylladb/scripts/libexec/seastar-addr2line", line 26, in <module> from addr2line import BacktraceResolver ModuleNotFoundError: No module named 'addr2line' ``` in this change, we redistribute `addr2line.py` as well. this should address the issue above. Fixes scylladb/scylladb#21077 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21078	2024-10-13 19:35:14 +03:00
Kefu Chai	519b4a2934	utils/s3: include used header when building the tree with clang-19 and libstdc++ shipped along with GCC 14.2.1, we have ``` clang++ -MD -MT build/release/utils/s3/aws_error.o -MF build/release/utils/s3/aws_error.o.d -std=c++23 -I/home/kefu/dev/scylladb/master/seastar/include -I/home/kefu/dev/scylladb/master/build/release/seastar/gen/include -Werror=unused-result -DSEASTAR_API_LEVEL=7 -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_LOGGER_TYPE_STDOUT -DFMT_SHARED -I/usr/include/p11-kit-1 -DWITH_GZFILEOP -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -ffunction-sections -fdata-sections -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -Xclang -fexperimental-assignment-tracking=disabled -iquote. -iquote build/release/gen -std=gnu++23 -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -DBOOST_ALL_DYN_LINK -fvisibility=hidden -isystem abseil -Wall -Werror -Wextra -Wimplicit-fallthrough -Wno-mismatched-tags -Wno-c++11-narrowing -Wno-overloaded-virtual -Wno-unused-parameter -Wno-unsupported-friend -Wno-missing-field-initializers -Wno-deprecated-copy -Wno-psabi -Wno-error=deprecated-declarations -DXXH_PRIVATE_API -DSEASTAR_TESTING_MAIN -c -o build/release/utils/s3/aws_error.o utils/s3/aws_error.cc utils/s3/aws_error.cc:33:21: error: no member named 'make_unique' in namespace 'std' 33 \| auto doc = std::make_unique<rapidxml::xml_document<>>(); \| ~~~~~^ utils/s3/aws_error.cc:33:57: error: expected '(' for function-style cast or type construction 33 \| auto doc = std::make_unique<rapidxml::xml_document<>>(); \| ~~~~~~~~~~~~~~~~~~~~~~~~^ utils/s3/aws_error.cc:33:59: error: expected expression 33 \| auto doc = std::make_unique<rapidxml::xml_document<>>(); \| ^ 3 errors generated. ninja: build stopped: subcommand failed. ``` in order to address the build failure, let's include the used header. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21064	2024-10-13 18:32:34 +03:00
Patryk Jędrzejczak	18d3a6480d	test: test_read_required_hosts: run with the raft-based topology When we made the raft-based topology mandatory, all boost test tests started using it. Then, `test_read_required_hosts` started failing. We left investigating it for later and started running it with `force-gossip-topology-changes` to make it pass. Currently, the test doesn't fail with the raft-based topology anymore. Hence, we remove the FIXME and run the test with a normal config. We don't know when and why the test stopped failing. Investigating it wouldn't be easy, since we don't even know why it failed in the first place. We suspect that there was some bug that is now fixed. This patch only fixes a test, there is no need to backport it. Fixes scylladb/scylladb#18463 Closes scylladb/scylladb#20960	2024-10-11 17:01:20 +02:00
Kamil Braun	96070bb5b3	Merge 'storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from Sergey Zolotukhin During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in filter_for_query(): the map is considered incorrect if the list of replicas contains a node from a data center whose replication factor is 0. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2. Closes scylladb/scylladb#20851 * github.com:scylladb/scylladb: Add conditions checking for get_read_executor Avoid an extra call to block_for in db::filter_for_query. Improve code readability in consistency_level.cc and storage_proxy.cc tools: Add build_info header with functions providing build type information tests: Add tests for alter table with RF=1 to RF=0	2024-10-11 15:02:02 +02:00
Paweł Zakrzewski	900a6706b8	test: test_restart_cluster: create the test The purpose of this test that the cluster is able to boot up again after a full cluster shutdown, thus exhibiting no issues when connecting to raft group 0 that is larger than one.	2024-10-11 13:25:07 +02:00
Paweł Zakrzewski	7008b71acc	auth: standard_role_manager allows awaiting superuser creation This change implements the ability to await superuser creation in the function ensure_superuser_is_created(). This means that Scylla will not be serving CQL connections until the superuser is created. Fixes #10481	2024-10-11 13:25:07 +02:00
Paweł Zakrzewski	04fc82620b	auth: coroutinize the standard_role_manager start() function This change is a preparation for the next change. Moving to coroutines makes the code more readable and easier to process.	2024-10-11 13:25:07 +02:00
Paweł Zakrzewski	f525d4b0c1	auth: don't start server until the superuser is created This change reorganizes the way standard_role_manager startup is handled: now the future returned by its start() function can be used to determine when startup has finished. We use this future to ensure the startup is finished prior to starting the CQL server. Some clusters are created without auth, and auth is added later. The first node to recognize that auth is needed must create the superuser. Currently this is always on restart, but if we were to ever make it LiveUpdate then it would not be on restart. This suggests that we don't really need to wait during restart. This is a preparatory commit, laying ground for implementation of a start() function that waits for the superuser to be created. The default implementation returns a ready future, which makes no change in the code behavior.	2024-10-11 13:25:07 +02:00
Pavel Emelyanov	a7042d66e3	sstables: Flatten S3 object name making The s3_storage backend driver has a method that generates object path within the bucket. Depending on options alternative it picks one of two formats: - for string prefix, it uses it implicitly via sstable::filename() call that calls storage->prefix() which, in turn, returns prefix value - for registry-backed sstables, the /bucket/generation/component path is generated This patch bruses this place up. Similarly to previous patch, this change also makes the selection based on the location alternative, not on the sstable state. As well it's idempotent change, as S3 sstables with 'upload' state only appear when restoring from object store, and in this case the string location is in use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 14:11:28 +03:00
Pavel Emelyanov	8d5537a439	sstable_directory: Flatten directory lister creation After previous patchin, the way components lister is created for S3 storage options became quite hairy. This patch brushes things up to be easier to read. The only "functional" change here, is that selection between registry lister and S3 lister is made based on options' location held alternative, not on the sstable state value. That's in fact idempotent change, the only caller that provides string location on options is the "restore from object store" code that also sets state to be 'upload'. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 14:11:28 +03:00
Pavel Emelyanov	031893259a	treewide: Rename sstable registry location field to be owner This is sort of continuation of the previous patch. The partition key in the registry is now table_id, not string, and is better called "owner", not "location". This patch is s/location/owner/ over specific places that include field name in the schema, argument names in registry maintenance classes and tests accessing the selected row fields by name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 14:11:28 +03:00
Pavel Emelyanov	3315e3a2a9	system_keyspace: Change sstables registry partition key type Today, the system.sstables schema uses string as partition key. Callers, in turn, use table's datadir value to reference entries in it. That's wrong, S3-backed sstables don't have any local paths to work with. The table's ID is better in this role. This patch only changes the field type to be table_id and fixes the callers to provide one. In particular, see init_table_storage() change -- instead of generating a datadir string, it sets table.id() as the options' location. Other fixed places are tests. Internally, this id value is propagated via s3_storage::owner() method, that's fixed as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 13:48:09 +03:00
pehala	a2f9136e36	test.py: Use "python -m pytest" for pytest invocation for PythonTest Enables debugging inside pytest subprocesses as well. It seems that pydev automatically attaches itself also to all python subprocesses. Since we used to call "pytest" wrapper it was deemed a different program, and we could not debug individual tests. Closes scylladb/scylladb#21050	2024-10-11 13:38:47 +03:00
Pavel Emelyanov	bb13b7bf72	sstables: Keep location variant on s3 backend too Previous patch put variant<string, table_id> as location of S3 options. This patch makes the S3 sstables backend driver keep variant as sstable location. As with the previous patch, driver only keeps variant, but continues using its string alternative internally. This will be changed later on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 13:09:47 +03:00
Pavel Emelyanov	1181b6b082	storage_options: Use variant on S3 options Describing S3 storage for an sstables nowadays has two options -- via sstables registry entry and by using the direct prefix string. The former is used when putting a keyspace on S3. In this case each sstable has the corresponding entry in the system.sstables table. The latter is used by "restore from object storage" code. In that case, sstables don't have entries in the registry, but are accessed by a specific S3 object path. This patch reflects this difference by making s3_options::location be variant of string prefix and table_id owner. The owner needs more explanation, here it is. Today, the system.sstables schema defines partition key to be "string location" and clustering key to be "UUID generation". The partition key is table's datadir string, but it's wrong to use it this way. Next patches will change the partition key to be table's ID (there's table_id type for it), and before doing it storage options must be prepared to carry it onboard. This patch does it, but the table_id alternative of the location is still unused, the rest of the code keeps using the string location to reference a row in the registry table. Next patches will eventually make use of the table_id value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 13:04:52 +03:00
Kamil Braun	4d99cd2055	Merge 'raft: fast tombstone GC for group0-managed tables' from Emil Maskovsky Add the gossip state for broadcasting the nodes state_id. Implemented the Group0 state broadcaster (based on the gossip) that will broadcast the state id of each node and check the minimal state id for the tombstone GC. When there is a change in the tombstone GC minimal state id, the state broadcaster will update the tombstone GC time for the group0-managed tables. The main component of the change is the newly added `group0_state_id_handler` that keeps track, broadcasts and receives the last group0 state_ids across all nodes and sets the tombstone GC deletion time accordingly: * on each group0 change applied, the state_id handler broadcasts the state_id as a gossip state (only if the value has changed) * the handler checks for the node state ids every refresh period (configurable, 1h by default) * on every check, the handler figures out the lowest state_id (timeuuid), which is state_id that all of the nodes already have * the timestamp of this minimum state_id is then used to set the tombstone GC deletion time * the tombstone GC calculation then uses that deletion time to provide the GC time back to the callers, e.g. when doing the compaction * (as the time for tombstone GC calculation has the 1s granularity we actually deduce 1s from the determined timestamp, because it can happen that there were some newer mutations received in the same second that were not distributed across the nodes yet) This change introduces a new flag to the static schema descriptor (`is_group0_table`) that is being checked for this newly added mode in the tombstone GC. We also add a check (in non-release builds only) on every group0 modification that the table has this flag set. The group0 tombstone GC handling is similar to the "repair" tombstone GC mode in a sense (that the tombstone GC time is determined according to a reconciliation action), however it is not explicitly visible to (nor editable by) the user. And also the tombstone GC calculation is much simpler than the "repair" mode calculation - for example, we always use the whole range (as opposed to the "repair" mode that can have specific repair times set for specific ranges). We use the group0 configuration to determine the set of nodes (both current and previous in case of joint configuration) - we need to make sure that we account for all the group0 nodes (if any node didn't provide the state_id yet, the current check round will be skipped, i.e. no GC will be done until all known nodes provide their state_id timestamp value). Also note that the group0 state_id handling works on all nodes independently, i.e. each node might have its own (possibly different) state depending on the gossip application state propagation. This is however not a problem, as some nodes might be behind, but they will catch up eventually, and this solution has the benefit of being distributed (as opposed to having a central point to handle the state, like for example the topology coordinator that has been considered in the early stages of the design). Fixes: scylladb/scylla#15607 New feature, should not be backported. Closes scylladb/scylladb#20394 * github.com:scylladb/scylladb: raft: add the check for the group0 tables raft: fast tombstone GC for group0-managed tables tombstone_gc: refactor the repair map raft: flag the group0-managed tables gossip: broadcast the group0 state id raft/test: add test for the group0 tombstone GC treewide: code cleanup and refactoring	2024-10-11 11:52:27 +02:00
Pavel Emelyanov	ba97072709	sstables: Split sstable::filename() helper To have the filename(type, prefix) one, next patches will provide prefix on their own, to avoid storage->prefix() call. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 12:47:13 +03:00
Pavel Emelyanov	6f9cb51259	sstables: Add s3_storage::owner() helper This driver uses sstring _location as part of the lookup key in the sstables registry. Next patches will need to change that and put more checks on the registry access, so introduce a helper method beforehand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-11 12:47:12 +03:00
Sergey Zolotukhin	c373edab2d	Add conditions checking for get_read_executor During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in get_endpoints_for_reading(): the map is considered incorrect the number of read replica nodes is higher than replication factor. The check is applied only when built in non release mode. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625	2024-10-11 09:38:25 +02:00
Sergey Zolotukhin	8db6d6bd57	Avoid an extra call to block_for in db::filter_for_query.	2024-10-11 09:38:25 +02:00
Sergey Zolotukhin	ad93cf5753	Improve code readability in consistency_level.cc and storage_proxy.cc Add const correctness and rename some variables to improve code readability.	2024-10-11 09:38:25 +02:00
Sergey Zolotukhin	ae23d42889	tools: Add build_info header with functions providing build type information A new header provides `constexpr` functions to retrieve build type information: `get_build_type()`, `is_release_build()`, and `is_debug_build()`. These functions are useful when adding changes that should be enabled at compile time only for specific build types.	2024-10-11 09:38:24 +02:00
Sergey Zolotukhin	132358dc92	tests: Add tests for alter table with RF=1 to RF=0 Adding Vnodes and Tablets tests for alter keyspace operation that decreases replication factor from 1 to 0 for one of two data centers. Tablet version fails due to issue described in scylladb/scylladb#20625. Test for scylladb/scylladb#20625	2024-10-11 09:38:24 +02:00
Pavel Emelyanov	77eb9ddb0f	sstable_set: Reserve vector of readers When generating readers for the set of sstables, the end size of this vector is known in advance and its storage can be reserved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21055	2024-10-11 09:56:17 +03:00
Pavel Emelyanov	551da72492	api: Use captured database, not the one from ctx Continuation of the previous patch -- not commitlog-related endpoints can use provided database reference, that was captured from main. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 17:53:30 +03:00
Pavel Emelyanov	74f7071db8	api: Pass sharded<database> to commitlog endpoints registration This is to make registered enpoints with with the database without grabbing one from ctx. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 17:52:56 +03:00
Pavel Emelyanov	14ab6d2615	api: Move commitlog-related from storage_service.cc It registers itself in /storage_service function, but works with commitlog, so should be located next to commitlog endpoints. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 17:52:25 +03:00
Pavel Emelyanov	ba73704774	api: Unset commitlog API endpoints Most of other set_...()-s has the unset_...() scheduled right afterwards, so here's one for set_server_commitlog(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 17:50:37 +03:00
Pavel Emelyanov	44ec6d36f3	api: Extract set_server_commitlog() from set_server_done() The latter collects a bunch of endpoints including commitlog ones. Extract it as snandalone call in main. It's currently not located next to "commitlog server" as it should, because there's no standalone commitlog service in main. It will be addressed as a followup together with other endpoints that work with sharded<database>. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 17:49:10 +03:00
Pavel Emelyanov	1863ccd900	tablets: Validate system.tablets update Implement change validation for raft topology_change command. For now the only check is that the "pending replicas" contains at most one entry. The check mirrors similar one in `process_one_row` function. If not passed, this prevents system.tablets from being updated with the mutation(s) that will not be loaded later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 12:39:58 +03:00
Pavel Emelyanov	e5bf376cbc	group0_client: Introduce change validation Add validate_change() methods (well, a template and an overload) that are called by prepare_command() and are supposed to validate the proposed change before it hits persistent storage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 12:31:52 +03:00
Pavel Emelyanov	f09fe4f351	group0_client: Add shared_token_metadata dependency It will be needed later to get tablet_metadata from. The dependency is "OK", shared_token_metadata is low-level sharded service. Client already references db::system_keyspace, which in turn references replica::database which, finally, references token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-10 12:27:46 +03:00
Botond Dénes	86fd9ce8fd	schema/schema: break circular dependency with replica::database The schema module (everything in schema/) is supposed to be towards the leafs in the ScyllaDB inter-module dependency graph. In other words, it should not depend on many other modules. On the other hand, almost the entire codebase depends on the schema module itself. Currently there is a circular dependency between schema and replica::database, as the latter is a required argument for schema::describe(). This is bad, not just because of the dependency mess it introduces, but also because now schema::describe() can only be used by code which has a reference to the database handy. This patch breaks this circular dependency, by introducing the schema_describe_helper interface and providing an implementation for it in database.hh. There is another circular dependency: schema <-> replica::table. This is not addressed by this patch. Closes scylladb/scylladb#20893	2024-10-10 10:07:26 +03:00
Botond Dénes	81423e8e76	Merge 'repair: Fix stall in repair_get_row_diff_with_rpc_stream_process_op_slow_path' from Asias He Use clear_gently to avoid the following stalls. ``` ~frozen_mutation_fragment at ././frozen_mutation.hh:268 std::destroy_at<frozen_mutation_fragment> at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_construct.h:88 std::allocator_traits<std::allocator<std::_List_node<frozen_mutation_fragment> > >::destroy<frozen_mutation_fragment> at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/alloc_traits.h:537 std::__cxx11::_List_base<frozen_mutation_fragment, std::allocator<frozen_mutation_fragment> >::_M_clear at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/list.tcc:77 ~_List_base at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_list.h:499 ~partition_key_and_mutation_fragments at ././repair/repair.hh:298 ~repair_row_on_wire_with_cmd at ././repair/repair.hh:335 operator() at ./repair/row_level.cc:1881 ``` Fixes #21016 Performance improvement only. No backport. Closes scylladb/scylladb#21017 * github.com:scylladb/scylladb: repair: Fix stall in repair_get_row_diff_with_rpc_stream_process_op_slow_path repair: Add clear_gently for partition_key_and_mutation_fragments	2024-10-10 09:27:27 +03:00
Benny Halevy	3a12ad96c7	sstables: scylla_metadata: add sstable identifier Keep a copy of the sstable uuid generation in a new scylla_metadata sstable_identifier attribute. If the SSTable happens to have a numerical generation just create a new time-uuid and log a message about that. Dump this new attribute in scylla sstable dump tool. And add a unit test to verify that the written (and then loaded) sstable identifier matches the sstable's generation. The motivatrion for this change stems from backup deduplication. In essence, an sstable may already have been backed up in a previous snapshot, and we don't want to abck it up again if it's already present on external storage. Today this is based on rclone that compares files checksums, but once scylla will backup the sstables using the native object-storage stack (#19890), we would like to use the sstable globally-unique identifier for deduplication. Although the uuid-generation is encoded in the sstable path, the latter may change, e.g. due to intra-node migration, so keep a copy of the original unique identifier in scylla-metadata, and that attribute would survive file-based or intra-node migrations. Fixes scylladb/scylladb#20459 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21002	2024-10-10 08:52:46 +03:00
Avi Kivity	b66479ea98	Merge 'compaction: fix potential data resurrection with file-based migration' from Ferenc Szili When tablets are migrated with file-based streaming, we can have a situation where a tombstone is garbage collected before the data it shadows lands. For instance, if we have a tablet replica with 3 sstables: 1. sstable containing an expired tombstone 2. sstable with additional data 3. sstable containing data which is shadowed by the expired tombstone in sstable 1 If this tablet is migrated, and the sstables are streamed in the order listed above, the first two sstables can be compacted before the third sstable arrives. In that case, the expired tombstone will be garbage collected, and data in the third sstable will be resurrected after it arrives to the pending replica. This change fixes this problem by disabling tombstone garbage collection for pending replicas. This fixes a problem in Enterprise, but the change is in OSS in order to have as few differences between OSS and Enterprise and to have a common infrastructure for disabling tombstone GC on pending replicas. This change has to be backported to all active versions: 6.0, 6.1 and 6.2, as well as Enterprise 2024.2 Closes scylladb/scylladb#20788 * github.com:scylladb/scylladb: test: test tombstone GC disabled on pending replica tablet_storage_group_manager: update tombstone_gc_enabled in compaction group database::table: add tombstone_gc_enabled(locator::tablet_id)	2024-10-09 21:49:49 +03:00
Avi Kivity	bb1867c7c7	Merge 'sstables: Add digest checking in the validation path of the sstable layer' from Nikos Dragazis This PR builds upon the PR for checksum validation (#20207) to further enhance scrub's corruption detection capabilities by validating digests as well. The digest (full checksum) is the checksum over the entire data, as opposed to per-chunk checksums which apply to individual chunks. Until now, digests were not examined on any code paths. This PR integrates digest checking into the compressed/checksummed data sources as an optional feature and enables it only through the validation path of the sstable layer (`sstable::validate()`). The validation path is used by the following tools: * scrub in validate mode * `sstable validate` All other reads, including normal user reads, are unaffected by this change. The PR consists of: * Extensions to the compressed and checksummed data sources to support digest checking. The data sources receive the expected digest as a parameter and calculate the actual digest incrementally across multiple get() calls. The check happens on the get() call that reaches EOF and results to an exception if the digest is invalid. A digest check requires reading the whole file range. Therefore, a partial read or skip() is treated as an internal error. * A new shareable digest component loaded on demand by the validation code. No lifecycle management. * Grouping of old scrub/validate tests for compressed and uncompressed SSTables to reduce code duplication. * scrub/validate tests for SSTables with valid checksums but invalid digests, and SSTables with no digests at all. * scrub/validate tests with 3.x Cassandra SSTables to ensure compatibility. Refs #19058. New feature, no backport is needed. Closes scylladb/scylladb#20720 * github.com:scylladb/scylladb: test: Test scrub/validate with SSTables from Cassandra compaction: Make quarantine optional for perform_sstable_scrub() test: Make random schema optional in scrub_test_framework test: Add tests for invalid digests test: Merge scrub/validate tests for compressed and uncompressed cases sstables: Verify digests on validation path sstables: Check if digest component exists sstables: Add digest in the SSTable components sstables: Add digest check in compressed data source sstables: Add digest check in checksummed data source	2024-10-09 21:33:08 +03:00
Benny Halevy	d34878e96c	view: check_needs_view_update_path: get token_metadata_ptr check_needs_view_update_path is async and might yield so the token_metadata reference passed to it must be kept alive throughout the call. Fixes scylladb/scylladb#20979 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20980	2024-10-09 20:56:21 +03:00
Nadav Har'El	a1999cd5d5	cql-pytest: fix run-cassandra on systems with default Java 8 The test/cql-ptest/run-cassandra prefers to use Java 11 if installed on the system because this is the only version of Java that all modern versions of Cassandra run on (Cassandra 3 and 4 can run on Java 8 and 11, Cassandra 5 can run on Java 11 and 17). However, in our search order we tried the "java" in the user's path first, before trying Java 11. This means that if the user for some reason had the ancient Java 8 (which is now a decade old) as his default "java" got that, instead of Java 11, and couldn't run Cassandra 5. While at it, update the comments to reflect the new reality that Cassandra 5 needs Java 17 or 11 - not 11 or 8 as the older Cassandra. We should eventually change the code logic as well (searching for versions that depend on the Cassandra version - not always Java 8 and 11), but let's do it later. This patch already fixes a real bug for developers that did install Java 11 but their default "java" pointed to Java 8. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21001	2024-10-09 20:51:56 +03:00
David Garcia	2247bdbc8c	docs: Fix confgroup links It was not possible to link to configuration parameters groups in docs/reference/configuration-parameters.rst if they contained a space. Closes scylladb/scylladb#21018	2024-10-09 20:16:15 +03:00
Pavel Emelyanov	3dcf3d65d7	replica: Use substract_sets() helper The process_one_row() evaluates pending_replica by subtracting replicas from new_replicas. There's a convenience helper for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21019	2024-10-09 20:02:16 +03:00
Gleb Natapov	f7e7e61fa7	raft: add more information to start_read_barrier error Add requester into to the error about requester being out of config. Also fix a typo while we are at it. Message-ID: <ZwVDtOty2cWy3vqD@scylladb.com>	2024-10-09 16:24:34 +02:00
Pavel Emelyanov	7163fbcef5	Merge 'utils: replace dependency on boost ranges with <ranges>' from Avi Kivity To avoid depending on two similar libraries (boost ranges and std \<ranges), replace uses of the former with the latter. This series tackles the utils/ directory. Code cleanup, no backport. Closes scylladb/scylladb#20997 * github.com:scylladb/scylladb: utils: logalloc: replace boost with std utils: lsa: chunked_managed_vector: replace boost with std utils: config_file: replace boost with std utils: loading_cache: replace boost with std utils: fragment_range: replace boost with std utils: error_injector: replace boost with std utils: crc: replace boost for_each with built-in range for utils: class_registrator: replace boost with std utils: chunked_vector: replace boost with std utils: observable: replace boost with std	2024-10-09 16:04:48 +03:00
Botond Dénes	3e468608e7	Merge 'Collect sstables on boot from all datadirs (and don't collect from S3 twice)' from Pavel Emelyanov There's a long-pending issue in distributed loader. When it populates sstables on boot it loops over table.config.all_datadirs, but ignores the loop cursor (the datadir itslef), instead loading sstables from table.config.dir, which is 0th element of all_datadirs. There's a test for that, but it's also broken. Effectively collection happens from table.config.dir several times. For local sstables that's just wasted work and potentially lost sstables (but nobody seems to configure more than 1 datadir anyway). For S3 sstables it's also wasted work and incorrectness. The fix is for both -- populator and test. The former is to use all_datadirs to construct sstable_directory. To make it happen, creation of sstable_directory now depends on the storage options, the loop is moved into the branch that creates sstable_directory for local storage type. The test fix is to make sure that some sstables in non-default datadir before running population code. Closes scylladb/scylladb#20819 * github.com:scylladb/scylladb: test: Fix test_multiple_data_dirs distributed_loader: Indentation fix after previous patch distributed_loader: Use correct datadir to collect local sstable distributed_loader: Move all-datadirs loop to local storage collecting distributed_loader: Collect table subdirs based on its storage options distributed_loader: Indentation fix after previous patch distributed_loader: Squash loop of collect_subdir into one method distributed_loader: Convert map of directories into a vector distributed_loader: Make start_subdir() method work with directory distributed_loader: Drop local reference variable distributed_loader: Split start_subdir() distributed_loader: Remove allow-offstrategy argument distributed_loader: Make populate() method work with directory distributed_loader: Remove check for sstable_directory presense distributed_loader: Out-line table_populator() methods distributed_loader: Print storage options, not datadir distributed_loader: Print prepared message sstable_directory: Add sstable_state argument ot one of constructors sstable_directory: Add state() method	2024-10-09 14:43:34 +03:00
Michał Chojnowski	c2ba300f1c	reader_concurrency_semaphore: in stats, fix swapped count_resources and memory_resources can_admit_read() returns reason::memory_resources when the permit is queued due to lack of count resources, and it returns reason::count_resources when the permit is queued due to lack of memory resources. It's supposed to be the other way around. This bug is causing the two counts to be swapped in the stat dumps printed to the logs when semaphores time out. Closes scylladb/scylladb#20714	2024-10-09 14:12:01 +03:00
Lakshmi Narayanan Sreethar	69c385f540	compaction: make drain wait for compactions to stop during shutdown During shutdown, the compaction_manager starts stopping ongoing compaction tasks through `really_do_stop()` method as soon as it receives a signal from the abort source. Later, when the database object shuts down, it calls `compaction_manager::drain` to ensure that all compaction tasks have stopped. However, `compaction_manager::drain` is currently implemented in such a way that, during shutdown, it effectively becomes a no-op because the compaction_manager has already initiated the stopping of tasks. As a result the caller assumes that all the compaction tasks have stopped and proceeds to close all the tables. This can lead to race conditions where table closures overlap with compaction tasks that are still running, resulting in exceptions like : ``` exception during mutation write to 127.0.0.1: utils::internal::nested_exception<std::runtime_error> (Could not write mutation system:compaction_history (pk{0010b70d31705e0411efb2edf6467f094c8b}) to commitlog): seastar::gate_closed_exception (gate closed) ``` This commit fixes the issue by updating `compaction_manager::drain` to invoke `stop_ongoing_compactions` even during shutdown to ensure that it waits for the ongoing compaction tasks to complete. The `stop_ongoing_compactions` method will also send a stop request to these tasks before waiting, but the request will be ignored by the tasks as they would have already received one earlier from `really_do_stop()`. Fixes #20197 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20715	2024-10-09 12:08:32 +03:00
Pavel Emelyanov	17ec416178	Merge 'Make sure S3 upload completion parses possible error' from Ernest Zaslavsky fixes #20517 Adds `aws_error` which possibly can contain errors from the S3 response body. Adds to the multipart upload completion a check for possible error and issues a retry if the error is retryable Closes scylladb/scylladb#20518 * github.com:scylladb/scylladb: test: add complete_multipart_upload completion tests code: s3 client error handling code: add response parsing and error handling to the complete_multipart_upload code: Introduce AWS errors parsing	2024-10-09 12:01:27 +03:00
Piotr Smaron	e0c1a51642	cql/tablets: handle MVs in ALTER tablets KEYSPACE ALTERing tablets-enabled KEYSPACES (KS) didn't account for materialized views (MV), and only produced tablets mutations changing tables. With this patch we're producing tablets mutations for both tables and MVs, hence when e.g. we change the replication factor (RF) of a KS, both the tables' RFs and MVs' RFs are updated along with tablets replicas. The `test_tablet_rf_change` testcase has been extended to also verify that MVs' tablets replicas are updated when RF changes. Fixes: #20240 Closes scylladb/scylladb#21007	2024-10-09 10:51:18 +02:00
Pavel Emelyanov	0bc8d0c620	Merge 'utils: unconst: wean away from boost range library' from Avi Kivity As part of the effort to standardize on a single range library, convert the unconst helper and its only user to \<ranges>. The only user, mutation_partitions, happens to use intrusive_btree::iterator as the payload. That iterator wasn't fully conform to iterator requirements, so it's fixed in a preliminary patch. Code cleanup; no backport. Closes scylladb/scylladb#20986 * github.com:scylladb/scylladb: utils/unconst, mutation_partition: switch to ranges utils: intrusive_btree: improve conformity with iterator requirements	2024-10-09 10:06:52 +03:00
Yuao Ma	1cc7821d12	tools: fix typos in the code This patch corrects a minor typo without any functional changes. Signed-off-by: Yuao Ma <c8ef@outlook.com> Closes scylladb/scylladb#20975	2024-10-09 08:18:36 +03:00
Asias He	2d8442f663	repair: Fix stall in repair_get_row_diff_with_rpc_stream_process_op_slow_path Use clear_gently to avoid the following stalls. ``` ~frozen_mutation_fragment at ././frozen_mutation.hh:268 std::destroy_at<frozen_mutation_fragment> at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_construct.h:88 std::allocator_traits<std::allocator<std::_List_node<frozen_mutation_fragment> > >::destroy<frozen_mutation_fragment> at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/alloc_traits.h:537 std::__cxx11::_List_base<frozen_mutation_fragment, std::allocator<frozen_mutation_fragment> >::_M_clear at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/list.tcc:77 ~_List_base at /usr/lib/gcc/x86_64-redhat-linux/11/../../../../include/c++/11/bits/stl_list.h:499 ~partition_key_and_mutation_fragments at ././repair/repair.hh:298 ~repair_row_on_wire_with_cmd at ././repair/repair.hh:335 operator() at ./repair/row_level.cc:1881 ``` Fixes #21016	2024-10-09 09:37:49 +08:00
Asias He	f5f26e1bba	repair: Add clear_gently for partition_key_and_mutation_fragments It is used to clear mutation_fragments to avoid stalls.	2024-10-09 09:31:14 +08:00
Emil Maskovsky	0c9308cf48	raft: add the check for the group0 tables Added the runtime check to ensure that all the tables that are used with the group0 commands are marked as group0 tables.	2024-10-08 21:08:11 +02:00
Emil Maskovsky	a03e98d6e8	raft: fast tombstone GC for group0-managed tables Set the tombstone GC time for group0-managed tables to the minimal state id of the group0 nodes. The check is being done based on a timer, iterating through each node (according to the group0 topology configuration) and taking the minimum across all nodes. This miminum timestamp is then be used to set the tombstone GC time for the tombstone GC of all the group0-managed tables. Fixes: scylladb/scylla#15607	2024-10-08 21:07:30 +02:00
Emil Maskovsky	74bd79bbb3	tombstone_gc: refactor the repair map Move the repair_map definition to the tombstone_gc file where it is mostly being used. Refactor and add the accessors and setters for the group0 tombstone GC time.	2024-10-08 20:53:54 +02:00
Emil Maskovsky	22471410e7	raft: flag the group0-managed tables Add the schema flag to indicate the group0-managed tables. This is to be used to identify and list the group0-managed tables.	2024-10-08 20:53:54 +02:00
Emil Maskovsky	baea9cfa67	gossip: broadcast the group0 state id Implemented the group0 state_id handler (based on the gossip) that will broadcast the group0 state id of each node. This will be used to set the tombstone GC time for the group0 tables.	2024-10-08 20:53:54 +02:00
Emil Maskovsky	fa45fdf5f7	raft/test: add test for the group0 tombstone GC Test that the group0 fast tombstone GC works correctly.	2024-10-08 20:53:54 +02:00
Emil Maskovsky	a840949ea0	treewide: code cleanup and refactoring Fix the clang-tidy warnings, code cleanup and improvements. Applied the clang format to the updated places.	2024-10-08 20:53:54 +02:00
Nadav Har'El	b4df07df71	Merge 'cql3: Print arguments and return type without frozen when describing UDF' from Dawid Mędrek Scylla doesn't allow for the types of arguments or the return type of a UDF to be frozen. As a result, before these changes, create statements produced to restore UDFs as part of `DESCRIBE` statements could not be executed. Fixes scylladb/scylladb#20256 Backport: necessary as the restore process may not work correctly without these changes. The affected versions span from 5.2 to the current master, but we only want to apply the fix to the live versions, so 6.0, 6.1, and 6.2. Closes scylladb/scylladb#20816 * github.com:scylladb/scylladb: cql3/functions/user_function: Print arguments and return type without frozen cql3/functions/user_function: Use fmt to format create statement	2024-10-08 16:05:28 +03:00
Kamil Braun	2d9b8f269f	Merge 'cql: improve validating RF's change in ALTER tablets KS' from Piotr Smaron This patch series fixes a couple of bugs around validating if RF is not changed by too much when performing ALTER tablets KS. RF cannot change by more than 1 in total, because tablets load balancer cannot handle more work at once. Fixes: #20039 Should be backported to 6.0 & 6.1 (wherever tablets feature is present), as this bug may break the cluster. Closes scylladb/scylladb#20208 * github.com:scylladb/scylladb: cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS cql: join new and old KS options in ALTER tablets KS cql: fix validation of ALTERing RFs in tablets KS cql: harden `alter_keyspace_statement.cc::validate_rf_difference` cql: validate RF change for new DCs in ALTER tablets KS cql: extend test_alter_tablet_keyspace_rf cql: refactor test_tablets::test_alter_tablet_keyspace cql: remove unused helper function from test_tablets	2024-10-08 14:33:45 +02:00
Kamil Braun	1b9337bf99	Merge 'Wait for all users of group0 server to complete before destroying it' from Gleb Natapov Group0 server is often used in asynchronous context, but we do not wait for them to complete before destroying the server. We already have shutdown gate for it, so lets use it in those asynch functions. Also make sure to signal group0 abort source if initialization fails. Fixes scylladb/scylladb#20701 Backport to 6.2 since it contains `af83c5e53e` and it made the race easier to hit, so tests became flaky. Closes scylladb/scylladb#20891 * github.com:scylladb/scylladb: group: hold group0 shutdown gate during async operations group0: Stop group0 if node initialization fails	2024-10-08 13:46:54 +02:00
Avi Kivity	48ea51029f	Merge 'time_window_compaction_strategy: estimated_pending_compactions: reestimate compactions rather than using cached value' from Benny Halevy Currently, `estimated_pending_compactions` uses a precalculated value calculated by `update_estimated_compaction_by_tasks`, which, in turn, is called by `get_compaction_candidates`. That means that, if `estimated_pending_compactions` is called, e.g. right after major compaction, it will return an outdated value that was calculated prior to major compaction, and so, it is no longer relevant. Instead, just recalculate the value in `estimated_pending_compactions` and drop `update_estimated_compaction_by_tasks`. * Enhancement, no backport required Closes scylladb/scylladb#20892 * github.com:scylladb/scylladb: test: cql-pytest: test_compaction: add test_compactionstats_after_major_compaction test/cql-pytest: rename test_compaction{_tombstone_gc,} time_window_compaction_strategy: estimated_pending_compactions: reestimate compactions rather than using cached value	2024-10-08 13:29:51 +03:00
Gleb Natapov	d62fbd795b	storage_proxy: make sure there is no end iterator in _live_iterators array storage_proxy::cancellable_write_handlers_list::update_live_iterators assumes that iterators in _live_iterators can be dereferenced, but the code does not make any attempt to make sure this is the case. The iterator can be the end iterator which cannot be dereferenced. The patch makes sure that there is no end iterator in _live_iterators. Fixes scylladb/scylladb#20874 Closes scylladb/scylladb#20977	2024-10-08 13:16:27 +03:00
Avi Kivity	656dc438ab	utils: logalloc: replace boost with std	2024-10-08 12:07:14 +03:00
Avi Kivity	84b25a51f5	utils: lsa: chunked_managed_vector: replace boost with std	2024-10-08 12:03:30 +03:00
Avi Kivity	fa772701be	utils: config_file: replace boost with std	2024-10-08 12:03:15 +03:00
Avi Kivity	b62fadae5f	utils: loading_cache: replace boost with std Unfortunately, the replacement for boost::range::join(), std::views::concat(), is in C++26 (and not implemented in libstdc++ 14). We use array/transform/join to simulate it.	2024-10-08 11:54:34 +03:00
Laszlo Ersek	934b42c6a8	cmake/check_headers: correct typos Commit `efd65aebb2` ("build: cmake: add check-header target", 2023-11-13) introduced three typos: - In "cmake/check_headers.cmake", it checked whether the "parsed_args_GLOB_RECURSE" argument was defined, but then it referenced the same under the wrong name "parsed_args_RECURSIVE". - The above error masked two further typos; namely the duplicate use of "api" and "streaming" each, as targets. With "parsed_args_GLOB_RECURSE" above fixed, CMake now reports these conflicting arguments (target names). They should have been "node_ops" and "sstables", respectively. Correct the typos. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20992	2024-10-08 09:38:16 +03:00
Dawid Mędrek	8582ed513b	cql3/functions/user_function: Print arguments and return type without frozen Scylla doesn't allow for the types of arguments or the return type to be frozen. As a result, before these changes, create statements produced to restore UDFs as part of `DESCRIBE` statements could not be executed. We fix that and add a reproducer test and another one to verify that the implementation is correct.	2024-10-07 20:53:10 +02:00
Avi Kivity	72a39b84b0	utils: fragment_range: replace boost with std	2024-10-07 21:32:16 +03:00
Avi Kivity	c8a68c4cf7	utils: error_injector: replace boost with std	2024-10-07 21:28:36 +03:00
Avi Kivity	c560686d92	utils: crc: replace boost for_each with built-in range for Simpler.	2024-10-07 21:19:14 +03:00
Avi Kivity	44419fc5ec	utils: class_registrator: replace boost with std	2024-10-07 21:16:03 +03:00
Avi Kivity	adb92a6c16	utils: chunked_vector: replace boost with std	2024-10-07 21:11:23 +03:00
Avi Kivity	b259389a3e	utils: observable: replace boost with std	2024-10-07 21:11:07 +03:00
Nadav Har'El	45ccceb137	alternator: add "dc" and "rack" options to "/localnodes" request Before this patch, the "/localnodes" HTTP request to the Alternator server lists all the live nodes of the current DC. This patch adds two optional parameters to this query: dc: allows to list the live nodes of a specific named DC instead of the current DC of the server. rack: allows to restrict the results to just the nodes belonging to a specific named rack. For both options, if no live node exists in the given dc or rack (in particular, if such a dc or rack doesn't even exist), an empty list is returned - it's not an error. The default, if dc or rack is not specified - remains exactly as it is today - look at the current DC (the one of the node being request), and do not restrict the list to any specific rack. We expect the new options that we added here to be useful for two use cases: 1. A client that knows of some Scylla node (belonging to an unknown DC), but wants to list the nodes in its DC, which it knows by name. 2. A client in a multi-rack DC (e.g., multi-AZ region in AWS) that wants to send requests to nodes in its own rack (which it knows by name), to avoid cross-rack networking costs. Note that in both cases, this requires clients to know the names of DCs and AZs via some out-of-band means. The client can also get a list of DCs and racks using the system.local system table, as the tests included in this patch demonstrate. This patch includes two set of tests for these new options: One in the the single-node test/alternator framework that has a single dc and rack but can still check the case of an unknown dc or rack (in which case an empty list is returned). The second test is in the topology framework, and runs an 8-node cluster with two DCs, two racks, and two nodes in each, and checks all the combinations of "/localnodes" requests with and without dc and rack options. This test also resolves a longstanding TODO that asked for such a multi-DC test for "/localnodes" to be written. Fixes #12147 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20915	2024-10-07 20:53:47 +03:00
Pavel Emelyanov	8bfbc563cc	test: Remove sstable factory from test_min_max_clustering_key() The helper makes sstables from env directly. Callers may not create the factor after that. Less code the better. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20983	2024-10-07 20:08:05 +03:00
Kefu Chai	a6ec6d32ab	auth: add "IWYU pragma: keep" to keep boost/regex_fwd.hpp clang-include-cleaner is not able to tell that the header provides the template parameter of `std::vector<std::pair<query_source, boost::regex>>`. and suggest us to remove this include. but it's wrong. so, in this change we apply the "pragma" to keep it. see https://github.com/include-what-you-use/include-what-you-use/blob/master/docs/IWYUPragmas.md for the explanations on what this pragma is for. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-07 20:08:05 +03:00
Kefu Chai	3d31835949	auth: include boost/regex_fwd.hpp in header since we only need the full definition of boost::regex in the .cc file, where we - define the constructor and destructor - and actually use the regex. there is no need to include boost/regex.hpp in the header, in order to keep the preprocessed header smaller. let's use a header only contains forward declarations in header, and include the full definition in the .cc file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-07 20:08:05 +03:00
Piotr Smaron	ee56bbfe61	cql: sum of abs RFs diffs cannot exceed 1 in ALTER tablets KS Tablets load balancer is unable to process more than a single pending replica, thus ALTER tablets KS cannot accept an ALTER statement which would result in creating 2+ pending replicas, hence it has to validate if the sum of absoulte differences of RFs specified in the statement is not greter than 1.	2024-10-07 17:02:50 +02:00
Piotr Smaron	2aabe7f09c	cql: join new and old KS options in ALTER tablets KS A bug has been discovered while trying to ALTER tablets KS and specifying only 1 out of 2 DCs - the not specified DC's RF has been zeroed. This is because ALTER tablets KS updated the KS only with the RF-per-DC mapping specified in the ALTER tablets KS statement, so if a DC was ommitted, it was assigned a value of RF=0. This commit fixes that plus additionally passes all the KS options, not only the replication options, to the topology coordinator, where the KS update is performed. `initial_tablets` is a special case, which requires a special handling in the source code, as we cannot simply update old initial_tablet's settings with the new ones, because if only ` and TABLETS = {'enabled': true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but rather keep the old value - this is tested by the `test_alter_preserves_tablets_if_initial_tablets_skipped` testcase. Other than that, the above mentioned testcase started to fail with these changes, and it appeared to be an issue with the test not waiting until ALTER is completed, and thus reading the old value, hence the test's body has been modified to wait for ALTER to complete before performing validation.	2024-10-07 17:02:45 +02:00
Avi Kivity	d12ba753e0	utils/unconst, mutation_partition: switch to ranges unconst is a small help that converts a const iterator to a non-const iterator with the help of the container. Currently it is using the boost iterator/range libraries. Convert it to <ranges> as part of an effort to standardize on a single range library. Its only user in mutation_partition is converted as well. Due to more iteroperability problems between <range> and boost, some calls to boost::adaptors::reversed have to be converted as well.	2024-10-07 17:30:12 +03:00
Avi Kivity	75f4ea1b68	utils: intrusive_btree: improve conformity with iterator requirements The <ranges> library checks that an iterator's operator++() returns a reference to the same type. intrusive_btree's iterator do not; instead they return some base type and rely on implicit conversion to the real iterator type. This causes interoperatibility problems with <range>. Fix by using the CRTP pattern to inform iterator_base about what type we really are, and cast to it. Enforce it with static_assert. Note we can't static_assert in class scope since it is checked too early and fails. Checking in function scope delays the check.	2024-10-07 17:26:01 +03:00
Piotr Smaron	6676e47371	cql: fix validation of ALTERing RFs in tablets KS The validation has been corrected with: 1. Checking if a DC specified in ALTER exists. 2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that needs their RFs to be validated.	2024-10-07 16:02:01 +02:00
Piotr Smaron	93d61d7031	cql: harden `alter_keyspace_statement.cc::validate_rf_difference` This function assumed that strings passed as arguments will be of integer types, but that wasn't the case, and we missed that because this function didn't have any validation, so this change adds proper validation and error logging. Arguments passed to this function were forwarded from a call to `ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also `class:replication_strategy` pair. Second pair's member has been casted into an `int` type and somehow the code was still running fine, but only extra testing added later discovered a bug in here.	2024-10-07 16:02:01 +02:00
Piotr Smaron	47acdc1f98	cql: validate RF change for new DCs in ALTER tablets KS ALTER tablets KS validated if RF is not changed by more than 1 for DCs that already had replicas, but not for DCs that didn't have them yet, so specifying an RF jump from 0 to 2 was possible when listing a new DC in ALTER tablets KS statement, which violated internal invariants of tablets load balancer. This PR fixes that bug and adds a multi-dc testcases to check if adding replicas to a new DC and removing replicas from a DC is honoring the RF change constraints. Refs: #20039	2024-10-07 16:02:01 +02:00
Piotr Smaron	9c5950533f	cql: extend test_alter_tablet_keyspace_rf Added cases to also test decreasing RF and setting the same RF. Also added extra explanatory comments.	2024-10-07 16:02:00 +02:00
Piotr Smaron	adf453af3f	cql: refactor test_tablets::test_alter_tablet_keyspace 1. Renamed the testcase to emphasize that it only focuses on testing changing RF - there are other tests that test ALTER tablets KS in general. 2. Fixed whitespaces according to PEP8	2024-10-07 16:02:00 +02:00
Piotr Smaron	042825247f	cql: remove unused helper function from test_tablets `change_default_rf` is not used anywhere, moreover it uses `replication_factor` tag, which is forbidden in ALTER tablets KS statement.	2024-10-07 16:02:00 +02:00
Nikos Dragazis	7a1ec3aa41	test: Test scrub/validate with SSTables from Cassandra All current unit tests for scrub in validate mode generate random SSTables on the fly. Add some more tests with frozen Cassandra SSTables from the source tree to verify compatibility with Cassandra. Use some of the existing 3.x Cassandra SSTables to test the valid case, and use the same schema to generate some corrupted SSTables for the invalid case. Overall, the new tests cover the following scenarios: * valid compressed/uncompressed * compressed/uncompressed with invalid checksums * compressed/uncompressed with invalid digest For the compressed SSTable with invalid checksums, a small chunk length was used (4KiB) to have more chunks with less disk space. For uncompressed SSTables the chunk length is not configurable. Finally, since the SSTables live in the source tree, the quarantine mechanism was disabled. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-07 15:21:38 +03:00
Nikos Dragazis	7090e2597f	compaction: Make quarantine optional for perform_sstable_scrub() Allow `perform_sstable_scrub()` to disable quarantine for invalid SSTables detected by scrub in validate mode. This is already supported by the lower-level function `scrub_sstables_validate_mode()` via the flag `quarantine_sstables` and is being used by sstable-scrub. Propagate the flag up to `perform_sstable_scrub()`. This will allow to test scrub/validate against read-only SSTables from the source tree. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-07 15:21:38 +03:00
Nikos Dragazis	5f2be2924e	test: Make random schema optional in scrub_test_framework The scrub_test_framework, which is the foundation for all scrub-related tests, always generates a random schema upon initialization and makes it available to the user. This is useful for running tests with ephemeral SSTables, but is redundant when the creation of the SSTable predates the test (e.g., it lives in the source tree). Turn scrub_test_framework into a template with a boolean parameter to optionally switch off the random schema generation. Also, add an overload for run() to support passing a ready-to-use SSTable instead of mutation fragments. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-07 15:21:38 +03:00
Nikos Dragazis	07ed0a48aa	test: Add tests for invalid digests In a previous patch we extended the validation path of the SSTable layer to validate the digests along with the checksums. Add two tests for compressed and uncompressed SSTables to test the validation API against SSTables with valid checksums but corrupted digests. Add two more tests to ensure that the absence of digest does not affect checksum validation. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-07 15:21:38 +03:00
Nikos Dragazis	39a74fb692	test: Merge scrub/validate tests for compressed and uncompressed cases Currently, every scrub/validate test is duplicated to cover both compressed and uncompressed SSTables. However, except for the compression type, the tests are identical. This leads to some code bloat. Introduce common functions parameterized by the compression type to reduce code duplication. Also, group together the compressed and uncompressed variants into one compression-agnostic test. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-07 15:21:38 +03:00
Nikos Dragazis	3a3783ee23	sstables: Verify digests on validation path Extend the validation path to perform digest checking on all SSTables. This is achieved by loading the digest component on demand and passing it to the underlying data sources only during validation. The data sources for compressed and uncompressed SSTables were modified in previous patches to support digest checking. Consider digest checking as part of the integrity checking mechanism (i.e., requires `integrity_check::yes`) to ensure it remains disabled for all reads happening outside of the validation path (i.e., `sstable::validate()`). This practically means that digest checking is enabled only for: * scrub in validate mode * sstable validate Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-07 15:21:09 +03:00
Avi Kivity	7dad248ac7	Merge 'Fix sstables registry mock' from Pavel Emelyanov There are two issues in it. First, listing the registry with a consumer callback passes wrong argument to the consumer. Second, the primary key of the registry is wrong. Both issues don't show up, because existing tests that use mock don't read from it, only write. Tests that read from registry are python tests that start scylla and thus use real registry. Closes scylladb/scylladb#20946 * github.com:scylladb/scylladb: test: Use corrcet key in sstables registry mock test: Pass entry status to mock registry consumer	2024-10-07 13:56:26 +03:00
Anna Stuchlik	a601845780	doc: remove outdated JMX references This commit removes references to JMX from the docs. Context: The JMX server has been dropped and removed from installation. The user can install it manually if needed, as documented with https://github.com/scylladb/scylladb/issues/18687. This commit removes the outdated information about JMX from other pages in the documentation, including the docs for nodetool, the list of ports, and the admin section. Also, the no longer relevant JMX information is removed from the Docker Hub docs. Fixes https://github.com/scylladb/scylladb/issues/18687 Fixes https://github.com/scylladb/scylladb/issues/19575 Closes scylladb/scylladb#20917	2024-10-07 13:55:15 +03:00
Nadav Har'El	987042be68	mv, test: reproduce missing validation for view name This patch adds reproducer tests (still failing) for issue #20755, which is about missing validation of materialized view names: 1. Unlike table and keyspace names which are limited to 48 characters, we forgot to limit view name length, and an excessively long name can cause Scylla to shut down :-( 2. Unlike table and keyspace names which only allow alphanumeric characters, view names are missing this check and can include any characters. 3. Luckily, even though we are missing the alphanumeric check, we at least don't allow "/" in view names (if we allowed them, it could allow users to write in any directory in the filesystem!). But when this happens, we get an internal error instead of the expected errors. The first test also fails on Cassandra (it doesn't crash it, but leaves the table in a strange state), but the other two pass. Refs #20755 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20761	2024-10-07 13:49:58 +03:00
Avi Kivity	73eeb6d274	Merge 'clang-format: adjustments to avoid unwanted refactors and better match the Seastar coding style' from Emil Maskovsky Some adjustments to the `.clang-format` options to better match the current code: * don't sort the include headers: causes large diffs especially in files with a lot of includes, and the `#include` ordering is not prescribed by the Seastar coding style * binpack the arguments in function declarations and calls: allow binpacking (as opposed to forcing each parameter on a separate line if they don't fit into the line length) * indented parameter continuation (as opposed to aligning to the open parenthesis) - aligning to the open parenthesis causes alignment issues especially with lambdas Fixes: scylladb/scylladb#20951 No backport: Not a product issue, just applies to master. Closes scylladb/scylladb#20968 * github.com:scylladb/scylladb: clang-format: argument and function packing clang-format: don't sort the include headers	2024-10-07 13:21:31 +03:00
Pavel Emelyanov	1870873538	test: Fix test_multiple_data_dirs The one was broken from the very beginning. It only checked that after creating a table, its directory is created in all datadirs. But it didn't check that after restart populating happens from the all. That's because all directories by 0th were always empty, so not-populating from them didn't skip any data. Fix it by moving all sstables from datadirs[0] to datadirs[1] before restart. With that update not-populating data from datadirs[1] will be noticed instantly. Fortunately, previous patches fixed that, so the test still passes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	aa0c20a0e7	distributed_loader: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	792c0060c7	distributed_loader: Use correct datadir to collect local sstable Current code uses datadir it gets from table itself, which is the 0th element in the all-datadirs config. So populating local sstables happens several times from the same directory. Fix it by starting sstable directory with correct datadir -- the one obtained from the all-datadirs loop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	bf654f45bd	distributed_loader: Move all-datadirs loop to local storage collecting It now happens in the outer loop, but it's not correct for S3 storage, which is thus asked to collect its data twice. Also it's broken for local storage as well, because the datadir argument is ignored. Next patch will fix it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	fe779ab1a2	distributed_loader: Collect table subdirs based on its storage options Collecting sstables for local storage and for S3 storage differs. First, the populator collects sstables for each datadir configured in scylla.yaml, but S3 storage doesn't care, so it's effectively asked to collect the same data twice. Second, S3 collector code uses sstable_directory simply because that class is used by reshape and reshard code, but in fact collecting of S3 sstable can be made much simpler (but that's for later). Having said that, split preparation of sstables population for local and S3 storage types. Indentation is deliberately left broken for local storage collecting mathod. That's because otherwise next patch will need move it back anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	ee91cae5b9	distributed_loader: Indentation fix after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	6ae486100c	distributed_loader: Squash loop of collect_subdir into one method This prepares the gound for the next patch. Indentation is left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	4db2929afd	distributed_loader: Convert map of directories into a vector Knowledge of sstable state is no longer needed in the table_populator start/stop methods, so the map<state, directory> can be converted into vector<directory>. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	89e9231653	distributed_loader: Make start_subdir() method work with directory Similarly to populate_subdir() one, it also accepts state and gets directory out of it. Patch is the same way -- caller now passes it the reference to directory and doesn't care about the state (in fact, the start_subdir() doesn't care of the state either). While at it -- rename the method to reflect what it does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	999ec88765	distributed_loader: Drop local reference variable Cleanup after previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	50024ef62a	distributed_loader: Split start_subdir() It does two things -- starts sstable_directory and prepares it. Split it accordingly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	abdc0bb02d	distributed_loader: Remove allow-offstrategy argument This is to make populate_subdir() be self-contained in a way it uses passed sstable_directory and make caller not care about the state. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	3b583b9d9f	distributed_loader: Make populate() method work with directory The populate_subdir() accepts sstable_state argument and picks the corresponding sstable_directory object from the map. Patch it so that caller passes it the sstable_directory reference. For now it makes things more complicated, but next patches will simplify it back. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	051ac3a737	distributed_loader: Remove check for sstable_directory presense In the old days the set of sstable_directory-s used by populator could skip some of them. Now they are all present and the checks is always false. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	4752a504cd	distributed_loader: Out-line table_populator() methods To make further patching with less indentation level. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	617b0e3ce3	distributed_loader: Print storage options, not datadir Tables not necessarily have data in a directory, so it's more correct to show storage options in logs, not some directory path. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	31b2271f07	distributed_loader: Print prepared message When population throws, the catch block prepares a message to re-throw another exception and prints the same message into logs. Presumably the intent was to print the prepared message as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:04:23 +03:00
Pavel Emelyanov	87d392d071	sstable_directory: Add sstable_state argument ot one of constructors There's one constructor that became unused after `787ea4b1`. Modify it with the 'state' argument so that it could be used later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 12:03:36 +03:00
Pavel Emelyanov	b56483ab67	sstable_directory: Add state() method The one will expose sstables state the directory works with. For convenience. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-07 11:23:50 +03:00
Kefu Chai	abda779a5b	compaction: return created sst without using a temporary variable simpler this way. `sst` does not help with the readability or performance, but let's drop it. simpler this way. also, remove the unused parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20961	2024-10-07 10:56:25 +03:00
Pavel Emelyanov	8ccb4a1045	Merge 'db: remove unused includes ' from Kefu Chai these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, since we have `using seastar::shared_ptr` in `seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>` unnecessary if we don't need the full definition of `seastar::shared_ptr`. so, in this change, all the unused includes are removed. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#20963 * github.com:scylladb/scylladb: .github: add db to iwyu's CLEANER_DIR db: remove unused includes	2024-10-07 10:55:48 +03:00
Kefu Chai	cd05f61607	api/storage_service: use ranges when handlging restore API this change is a follow up of `787ea4b1`, to modernize the code base. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20972	2024-10-07 10:54:37 +03:00
Avi Kivity	946bb870f3	utils: hashers: include <memory> hashers.hh uses std::unique_ptr, so include its header. Closes scylladb/scylladb#20974	2024-10-07 10:52:36 +03:00
Kefu Chai	c6bc5b2706	sstable_loader: Remove unused _snapshot_name from download_task_impl in `787ea4b1`, we introduced `_prefix` and `_sstables` member variables to `sstables_loader::download_task_impl`, replacing the functionality of `_snapshot_name`. However, we overlooked removing the now-obsolete `_snapshot_name` variable. this commit removes the unused `_snapshot_name` member variable to improve code cleanliness and prevent potential confusion. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20969	2024-10-07 10:43:13 +03:00
Benny Halevy	fa8fe62e90	test: cql-pytest: test_compaction: add test_compactionstats_after_major_compaction Test that compactionstats are empty, i.e. there are no required compactions following major compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-07 10:24:06 +03:00
Benny Halevy	630b792bd0	test/cql-pytest: rename test_compaction{_tombstone_gc,} Prepare to add more tests related to compaction to this test suite. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-07 10:18:30 +03:00
Benny Halevy	284dbc51c3	time_window_compaction_strategy: estimated_pending_compactions: reestimate compactions rather than using cached value Currently, `estimated_pending_compactions` uses a precalculated value calculated by `update_estimated_compaction_by_tasks`, which, in turn, is called by `get_compaction_candidates`. That means that, if `estimated_pending_compactions` is called, e.g. right after major compaction, it will return an outdated value that was calculated prior to major compaction, and so, it is no longer relevant. Instead, just recalculate the value in `estimated_pending_compactions` and drop `update_estimated_compaction_by_tasks`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-07 10:15:19 +03:00
Gleb Natapov	e642f0a86d	group: hold group0 shutdown gate during async operations Wait for all outstanding async work that uses group0 to complete before destroying group0 server. Fixes scylladb/scylladb#20701	2024-10-06 17:20:52 +03:00
Gleb Natapov	ba22493a69	group0: Stop group0 if node initialization fails Commit `af83c5e53e` moved aborting of group0 into the storage service drain function. But it is not called if node fails during initialization (if it failed to join cluster for instance). So lets abort on both paths (but only once).	2024-10-06 17:20:52 +03:00
Kefu Chai	960aa38cf3	utils/i_filter: include used header when compiling with clang-19 and the standard library from GCC-14.2, we have: ``` /usr/bin/cmake -E __run_co_compile --tidy="clang-tidy;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/__w/scylladb/scylladb/utils/bloom_filter.cc -- /usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=wes Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:81:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error] 81 \| filter_ptr create_filter(int hash, large_bitset&& bitset, filter_format format) { \| ^ Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:82:12: error: no viable conversion from returned value of type '__detail::__unique_ptr_t<murmur3_bloom_filter>' (aka 'unique_ptr<utils::filter::murmur3_bloom_filter>') to function return type 'int' [clang-diagnostic-error] 82 \| return std::make_unique<murmur3_bloom_filter>(hash, std::move(bitset), format); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:85:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error] 85 \| filter_ptr create_filter(int hash, int64_t num_elements, int buckets_per, filter_format format) { \| ^ Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:86:12: error: no viable conversion from returned value of type '__detail::__unique_ptr_t<murmur3_bloom_filter>' (aka 'unique_ptr<utils::filter::murmur3_bloom_filter>') to function return type 'int' [clang-diagnostic-error] 86 \| return std::make_unique<murmur3_bloom_filter>(hash, large_bitset(get_bitset_size(num_elements, buckets_per)), format); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Error: /__w/scylladb/scylladb/utils/bloom_filter.hh:93:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error] 93 \| filter_ptr create_filter(int hash, large_bitset&& bitset, filter_format format); \| ^ Error: /__w/scylladb/scylladb/utils/bloom_filter.hh:94:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error] 94 \| filter_ptr create_filter(int hash, int64_t num_elements, int buckets_per, filter_format format); \| ^ Error: /__w/scylladb/scylladb/utils/i_filter.hh:17:25: error: no template named 'unique_ptr' in namespace 'std' [clang-diagnostic-error] 17 \| using filter_ptr = std::unique_ptr<i_filter>; \| ~~~~~^ Error: /__w/scylladb/scylladb/utils/i_filter.hh:54:12: error: unknown type name 'filter_ptr' [clang-diagnostic-error] 54 \| static filter_ptr get_filter(int64_t num_elements, double max_false_pos_prob, filter_format format); \| ^ 4 warnings and 8 errors generated. ``` apparently, the definition of `std::unique_ptr` is missing where it is used. so let's include `<memory>`, so that `i_filter.hh` is more self-contained. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20971	2024-10-06 14:20:41 +03:00
Michał Chojnowski	5884c9d2fc	utils/rjson.cc: correct a comment about assert() Commit `aa1270a00c` changed most uses of `assert` in the codebase to `SCYLLA_ASSERT`. But the comment fixed in this patch is talking specifically about `assert`, and shouldn't have been changed. It doesn't make sense after the change. Closes scylladb/scylladb#20967	2024-10-06 12:47:51 +03:00
Michał Chojnowski	882a3c60e4	utils/cached_file: reduce latency (and increase overhead) of partially-cached reads Currently, `cached_file::stream` (currently used only by index_reader, to read index pages), works as follows. Assume that the caller requested a read of the range [pos, pos + size). Then: - If the first page of the requested range is uncached, the entire [pos, pos + size) range is read from disk (even if some later pieces of it are cached), the resulting pages are added to the cache, and the read completes (most likely) from the cached pages. - If the first page of the read is cached, then the rest of the read is handled page-by-page, in a sequential loop, serving each page either from cache (if present) or from disk. For example, assume that pages 0, 1, 2, 3, 4 are requested. If exactly pages 1, 2 are cached, then `stream` will read the entire [0, 4] range from disk and insert the missing 0, 3, 4, and then it will continue serving the read from cache. If exactly pages 0 and 3 are cached, then it will serve 0 from cache, then it will read 1 from disk and insert it into cache, then it will read 2 from disk and insert it into cache, then it will serve 3 from cache, then it will read 4 from disk and insert it into cache. If exactly the first page is cached, a 128 kiB read turns into 31 I/O sequential read ops. This is weird, and doesn't look intended. In one case, we are reading even pages we already have, just to avoid fragmenting the read, and in the other case we are reading pages one-by-one (sequentially!) even if they are neighbours. I'm not sure if cached_file should minimize IOPS or byte throughput, but the current state is surely suboptimal. Even if its read strategy is somehow optimal, it should still at least coalesce contiguous reads and perform the non-contiguous reads in parallel. This patch leans into minimizing IOPS. After the patch, we serve as many front pages from the cache as we can, but when we see an uncached page, we read the entire remainder of the read from disk. As if we trimmed the read request by the longest cached prefix, and then performed the rest using the logic from before the patch. For example, if exactly pages 0 and 3 are cached, then we serve 0 from cache, then we read [1, 4] from disk and insert everything into cache. For partially-cached files, this will result in more bytes read from disk, but less IOPS. This might be a bad thing. But if so, then we should lean the other way in a more explicit and efficient way than we currently do. Closes scylladb/scylladb#20935	2024-10-04 17:39:38 +02:00
Emil Maskovsky	a11ede758e	clang-format: argument and function packing Changes to better match the Seastar code style and the current codebase. Allow parameter binpacking and continuation indenting. Refs: scylladb/scylladb#20951	2024-10-04 14:52:41 +02:00
Emil Maskovsky	b4f28b3e0e	clang-format: don't sort the include headers Sorting the include headers causes reordering of all headers and thus large diffs, especially in the files that include a lot of headers that have not been sorted before. This makes it harder to review the changes and to understand the history of the file. The Seastar code style doesn't prescribe any include headers ordering. Refs: scylladb/scylladb#20951	2024-10-04 14:51:54 +02:00
Kefu Chai	d72c8fc047	.github: add db to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-04 20:48:18 +08:00
Kefu Chai	ee36358a60	db: remove unused includes these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, since we have `using seastar::shared_ptr` in `seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>` unnecessary if we don't need the full definition of `seastar::shared_ptr`. so, in this change, all the unused includes are removed. but there are some headers which are actually used, while still being identified by this tool. these includes are marked with "IWYU pragma: keep". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-04 20:48:18 +08:00
Botond Dénes	af124993a4	Merge 'Do not remove objects from backup storage after restore' from Pavel Emelyanov The restore-from-s3 task uses load-and-stream internally which, in turn, unlinks loaded sstables on success. That's not what user expects when it restores from backup, objects should remain in bucket afterwards. Closes scylladb/scylladb#20947 * github.com:scylladb/scylladb: test: Add check that restored-from objects are not removed sstables_loader: Dont unlink sstables when restoring from S3 sstables_loader: Make primary_replica_only bool_class RAII field	2024-10-04 14:59:40 +03:00
Nikita Kurashkin	874cafefab	SStables: replace assertion with malformed_sstable_exception for invalid chunk_size This will allow to see underlying sstable file Fixes #20277 Closes scylladb/scylladb#20784	2024-10-04 14:48:35 +03:00
Pavel Emelyanov	6b480589fe	Merge 'treewide: accept list of sstables in "restore" API ' from Kefu Chai before this change, we enumerate the sstables tracked by the system.sstables table, and restore them when serving requests to "storage_service/restore" API. this works fine with "storage_service/backup" API. but this "restore" API cannot be used as a drop-in replacement of the rclone based API currently used by scylla-manager. in order to fill the gap, in this change: * add the "prefix" parameter for specifying the shared prefix of sstables * add the "sstables" parameter for specifying the list of TOC components of sstables * remove the "snapshot" parameter, as we don't encode the prefix on scylla's end anymore. * make the "table" parameter mandatory. Fixes https://github.com/scylladb/scylladb/issues/20461 ---- this change is a part of the efforts to bring the native backup/restore to scylla, no need to backprt. Closes scylladb/scylladb#20685 * github.com:scylladb/scylladb: treewide: accept list of sstables in "restore" API sstable: pass get_storage_option to sstable_directory::load_sstable() test/nodetool: add body parameter to `expected_request` tools/scylla-nodetool: enable nodetool to write HTTP body	2024-10-04 12:38:08 +03:00
Pavel Emelyanov	0f6e76f92f	api: Use captured compaction_manager in get_cm_stats() helper This is continuation of the previous patch that also need to touch the helper function argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-04 12:28:15 +03:00
Pavel Emelyanov	f99d8e07ae	api: Use captured compaction_manager in endpoints Instead of getting via ctx -> database chain. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-04 12:28:15 +03:00
Pavel Emelyanov	05b4a8e710	api: Add sharded<compaction_manager> argument to compaction_manager API reg/unreg To be used by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-04 12:28:15 +03:00
Pavel Emelyanov	58c4c21581	api: Move some endpoints from storage_service.cc to compaction_manager.cc Those setting and getting bandiwdth need compaction manager to work with and thus should sit next to other enpoints working with it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-04 11:36:07 +03:00
Pavel Emelyanov	43fe482204	api: Unset compaction_manager endpoints Similarly to other .cc files, compaction manager should have its endpoints unset. For now, no batch unsetting exists, so need to do it one-by-one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-04 11:35:19 +03:00
Pavel Emelyanov	aa13be15b0	api: Use shorter registration method for compaction_manager function The register_api() helper does exatly what's needed here -- registers function and calls a method to set routes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-04 11:34:33 +03:00
Botond Dénes	07094c3e44	Merge 'replica: Fix tombstone GC during tablet split preparation' from Raphael "Raph" Carvalho During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes https://github.com/scylladb/scylladb/issues/20044. Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed. Closes scylladb/scylladb#20939 * github.com:scylladb/scylladb: replica: Fix tombstone GC during tablet split preparation service: Improve error handling for split	2024-10-04 10:29:42 +03:00
Nikos Dragazis	347f5ee166	sstables: Check if digest component exists Extend `read_digest()` to first check if the digest component exists before attempting to load it from disk. Make `validate_checksums()` throw an error if the component does not exist to preserve its current behavior. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-03 18:09:05 +03:00
Nikos Dragazis	7e738bcd2d	sstables: Add digest in the SSTable components SSTables store their digest in a Digest file. Add this in the list of SSTable components. In a follow-up patch we will use this component to enable digest checking in the validation path. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-03 18:09:05 +03:00
Nikos Dragazis	c893f06409	sstables: Add digest check in compressed data source Following the addition of digest check in the checksummed data source, add the same feature to the compressed data source as well. This ensures consistent behavior across any type of SSTable. This is added as an optional feature so that we can preserve the current behavior, that is verify only the per-chunk checksums during normal user reads. To ensure zero cost at runtime when disabled, we introduce the on/off switch as a template parameter. The digest calculation for compressed SSTables depends on the SSTable format, hence the new template argument for the checksum mode. This is consistent with the compressed data sink. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-03 18:09:01 +03:00
Nikos Dragazis	0df1c01759	sstables: Add digest check in checksummed data source The checksummed data source verifies the checksum of each chunk in the data files of uncompressed SSTables. This is being leveraged by scrub in validation mode. Extend the data source to check the digest (full checksum) as well. Unlike checksums, this is added as an optional feature so that SSTables without a digest can still be validated in a per-chunk basis. To enable this, the caller needs to set the template parameter `check_digest` to true, and provide the expected digest. The data source calculates the digest incrementally through multiple get() calls and compares against the expected digest after reading the whole file range. If there is a mismatch, it throws an exception. Checking the digest requires reading the whole data file. If this cannot be satisfied (e.g., due to partial read or skip()), the data source fails immediately. If the user has successfully read the whole file range, it can be safely assumed that the digest is valid. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-10-03 18:08:56 +03:00
Tomasz Grabiec	62f3d9e173	perf: perf_fast_forward: Add test case for querying missing rows	2024-10-03 16:26:41 +02:00
Tomasz Grabiec	4602ba90df	perf-fast-forward: Allow overriding promoted index block size For testing dense clustering index.	2024-10-03 16:26:41 +02:00
Tomasz Grabiec	1782456a52	perf-fast-forward: Test subsequent key reads from the middle in test_large_partition_select_few_rows	2024-10-03 16:26:41 +02:00
Tomasz Grabiec	751fa10de8	perf-fast-forward: Allow adding key offset in test_large_partition_select_few_rows	2024-10-03 16:26:41 +02:00
Tomasz Grabiec	10c6990e41	perf-fast-forward: Use single-partition reads in test_large_partition_select_few_rows It's a more realistic scenario than a full scan.	2024-10-03 16:26:28 +02:00
Tomasz Grabiec	753f6a61fd	sstables: bsearch_clustered_cursor: Add more tracing points	2024-10-03 16:24:18 +02:00
Botond Dénes	38088daa1f	scylla-gdb.py: drop compatibility code for EOL releases Any release < 6.0 or < 2023.1 is EOL and need not be supported by scylla-gdb.py anymore. Remove compatibility code for these releases. Closes scylladb/scylladb#20918	2024-10-03 15:42:08 +03:00
Avi Kivity	494561c4f3	cql3: expr: drop boost usage Replace boost usage with <ranges>, modernizing the code a little and reducing dependencies on a redundant library. Closes scylladb/scylladb#20919	2024-10-03 15:39:40 +03:00
Kefu Chai	7b82f3a375	test/lib: remove redundant fmt::to_string() in seastar::format() previously change, implementation was unnecessarily verbose and less efficient, as it created and immediately discarded temporary strings. remove unnecessary use of `fmt::to_string()` when arguments are already being formatted by `seastar::format()`. in this this change: - eliminates creation of temporary `std::string` instances - reduces memory allocations and copies - improves performance - simplifies the code Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20923	2024-10-03 15:36:55 +03:00
Tomasz Grabiec	95b864497a	sstables: reader: Log data file range	2024-10-03 14:16:05 +02:00
Tomasz Grabiec	41d3ae5e81	sstables: bsearch_clustered_cursor: Unify skip_info logging Now all exit paths which return skip_info will print it in the same way which makes for easier log parsing.	2024-10-03 14:16:05 +02:00
Tomasz Grabiec	1b82d5117a	sstables: bsearch_clustered_cursor: Narrow down range using "end" position of the block This is optimization. Example: block0: start=aaa, end=aaA block1: start=bbb, end=bbB block2: whatever Before the patch, advance_to("aAA") would skip to block0, and upper bound probe would skip to block1. This way, the reader would read the range of block0 from the data file. After the patch, "end" position is taken into account, so advance_to("aAA") will notice that block0 doesn't contain the position and will skip to block1. This is especially important for dense indexes, as it allows us to skip accessing data file if the search key is missing. It also solves the edge case problem related to the fact that single row reads are using a range which with positions which are not equal to the key, but are before(key) and after(key) for the lower bound and upper bound respectively. Before the patch, advance_to(before("bbb")) would skip to block0, before the position is before the block1's start. And upper bound probe for after("bbb") would point to block2. This way the read would scan block0 needlessly. After the patch, advance_to(before("bbb")) will skip to block1 because we notice based on "end" that block0 doesn't contain the position. This change also ensures that the start position of the upper bound entry of the after_key(pos), where pos is the last advance_to() position, is warm in cache. This is needed to optimize single-row reads with a dense index so that they always read exactly one promoted index block. For this to work, probe_upper_bound() for the after_key(row) always needs to find the upper bound block in cache.	2024-10-03 14:16:05 +02:00
Tomasz Grabiec	b03f23a09b	sstables: bsearch_clustered_cursor: Skip even to the first block It was unnecessary to emit a skip info for the first block since it follows immediately the partition start, but it is relevant to the optimization of avoiding data reads for missing keys. This optimization relies on the fact that lower bound position equals upper bound position. If the reader's key is before the first key in the partition and we don't arm the skip info for the first block, lower bound would be equal to the partition start, and upper bound would be equal to the first row's position, which are not equal.	2024-10-03 14:16:05 +02:00
Tomasz Grabiec	c905554121	test: sstables: sstable_3_x_test: Improve failure message	2024-10-03 14:16:05 +02:00
Tomasz Grabiec	7f077893ed	sstables: mx: writer: Never include partition_end marker in promoted index block width Currently, it may happen that the last promoted index block includes the partition_end marker. That's because we first write the partition end marker and then emit the unclosed block. This behavior matches Cassandra (checked in 3.x and 5.0.1). This is problematic for ruling out data file reads based on index. The width field is currently unused, but it will be used later where the width of the last block is used to compute the skip position past the last block for lookups which land after all keys in the partition. If width includes the marker then such a skip would land in the next partition, which is incorrect, as the reader context expects a cell element. Even if that was recognized, it's wrong - if this is not a single partition read (so upper bound is not at the next partition too), then we would read from the wrong (next) partition. We want to be able to make such skips in order to avoid unnecessary data file IO for reads of missing rows. Currently, we would always read the last block even if the key is past its "end" position. Another way to solve this would be to propagate the "past the last block" condition from the index cursor to the reader and let it deal with it, but the logic for that would be complicated. With this fix, there is no special logic required.	2024-10-03 14:09:57 +02:00
Pavel Emelyanov	4465bd9e5e	test: Add check that restored-from objects are not removed Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 13:37:04 +03:00
Botond Dénes	3ebb124eb2	repair/row_level: remove reader timeout This timeout was added to catch reader related deadlocks. We have not seen such deadlocks for a long time, but we did see false-timeouts caused by this, see explanation below. Since the cost now outweight the benefit, remove the timeout altogether. The false timeout happens during mixed-shard repair. The `reader_permit::set_timeout()` call is called on the top-level permit which repair has a handle on. In the case of the mixed-shard repair, this belongs to the multishard reader. Calling set_timeout() on the multishard reader has no effect on the actual shard readers, except in one case: when the shard reader is created, it inherits the multishard reader's current timeout. As the shard reader can be alive for a long time, this timeout is not refreshed and ultimately causes a timeout and fails the repair. Refs: #18269 Closes scylladb/scylladb#20703	2024-10-03 11:26:29 +02:00
Kamil Braun	e67016540c	Merge 'Node replace and remove operations: Add deprecate IP addresses usage warning.' from Sergey Zolotukhin - As part of deprecation of IP address usage, warning messages were added when IP addresses specified in the `ignore-dead-nodes` and `--ignore-dead-nodes-for-replace` options for scylla and nodetool. - Slight optimizations for `utils::split_comma_separated_list`, ` host_id_or_endpoint lists` and `storage_service` remove node operations, replacing `std::list` usage with `std::vector`. Fixes scylladb/scylladb#19218 Backport: 6.2 as it's not yet released. Closes scylladb/scylladb#20756 * github.com:scylladb/scylladb: config: Add a warning about use of IP address for join topology and replace operations. nodetool: Add IP address usage warning for 'ignore-dead-nodes'. tests: Fix incorrect UUIDs in test_nodeops utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists	2024-10-03 11:08:28 +02:00
Kamil Braun	d2233d4400	Merge 'test: update cql/ tests to work with tablets enabled by default' from Konstantin Osipov Explicitly disable tablets for features which still dont' work with tablets: cdc, lwt, coutners. Closes scylladb/scylladb#20858 * github.com:scylladb/scylladb: test: make cdc tests pass with tablets on by default test: make cql/counters* pass with and without tablets test: make cql/lwt_* pass with and without tablets test: rename cql/list_test to cql/lwt_list_test	2024-10-03 10:53:17 +02:00
Kefu Chai	f9091066b7	treewide: replace boost::irange with std::views::iota where possible when building scylla with the standard library from GCC-14.2, shipped by fedora 41, we have following build failure: ``` /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=x86-64-v3 -mpclmul -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -MD -MT CMakeFiles/scylla-main.dir/Debug/init.cc.o -MF CMakeFiles/scylla-main.dir/Debug/init.cc.o.d -o CMakeFiles/scylla-main.dir/Debug/init.cc.o -c /home/kefu/dev/scylladb/init.cc In file included from /home/kefu/dev/scylladb/init.cc:12: In file included from /home/kefu/dev/scylladb/db/config.hh:20: In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:26: /home/kefu/dev/scylladb/locator/tablets.hh:410:30: error: unexpected type name 'size_t': expected expression 410 \| return boost::irange<size_t>(0, tablet_count()) \| boost::adaptors::transformed([] (size_t i) { \| ^ /home/kefu/dev/scylladb/locator/tablets.hh:410:23: error: no member named 'irange' in namespace 'boost' 410 \| return boost::irange<size_t>(0, tablet_count()) \| boost::adaptors::transformed([] (size_t i) { \| ~~~~~~~^ /home/kefu/dev/scylladb/locator/tablets.hh:410:38: error: left operand of comma operator has no effect [-Werror,-Wunused-value] 410 \| return boost::irange<size_t>(0, tablet_count()) \| boost::adaptors::transformed([] (size_t i) { \| ^ 3 errors generated. [16/782] Building CXX object CMakeFiles/scylla-main.dir/Debug/keys.cc.o [17/782] Building CXX object CMakeFiles/scylla-main.dir/Debug/counters.cc.o [18/782] Building CXX object CMakeFiles/scylla-main.dir/Debug/partition_slice_builder.cc.o [19/782] Building CXX object CMakeFiles/scylla-main.dir/Debug/mutation_query.cc.o FAILED: CMakeFiles/scylla-main.dir/Debug/mutation_query.cc.o /home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=x86-64-v3 -mpclmul -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -MD -MT CMakeFiles/scylla-main.dir/Debug/mutation_query.cc.o -MF CMakeFiles/scylla-main.dir/Debug/mutation_query.cc.o.d -o CMakeFiles/scylla-main.dir/Debug/mutation_query.cc.o -c /home/kefu/dev/scylladb/mutation_query.cc In file included from /home/kefu/dev/scylladb/mutation_query.cc:12: In file included from /home/kefu/dev/scylladb/schema/schema_registry.hh:17: In file included from /home/kefu/dev/scylladb/replica/database.hh:11: In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:26: /home/kefu/dev/scylladb/locator/tablets.hh:410:30: error: unexpected type name 'size_t': expected expression 410 \| return boost::irange<size_t>(0, tablet_count()) \| boost::adaptors::transformed([] (size_t i) { \| ^ /home/kefu/dev/scylladb/locator/tablets.hh:410:23: error: no member named 'irange' in namespace 'boost' 410 \| return boost::irange<size_t>(0, tablet_count()) \| boost::adaptors::transformed([] (size_t i) { \| ~~~~~~~^ /home/kefu/dev/scylladb/locator/tablets.hh:410:38: error: left operand of comma operator has no effect [-Werror,-Wunused-value] 410 \| return boost::irange<size_t>(0, tablet_count()) \| boost::adaptors::transformed([] (size_t i) { \| ^ In file included from /home/kefu/dev/scylladb/mutation_query.cc:12: In file included from /home/kefu/dev/scylladb/schema/schema_registry.hh:17: In file included from /home/kefu/dev/scylladb/replica/database.hh:37: In file included from /home/kefu/dev/scylladb/db/snapshot-ctl.hh:20: /home/kefu/dev/scylladb/tasks/task_manager.hh:403:54: error: no member named 'irange' in namespace 'boost' 403 \| co_await coroutine::parallel_for_each(boost::irange(0u, smp::count), [&tm, id, &res, &func] (unsigned shard) -> future<> { \| ~~~~~~~^ 4 errors generated. ``` so let's take the opportunity to switch from `boost::irange` to `std::views::iota`. in this change, we: - switch from boost::irange to std::views::iota for better standard library compatibility - retain boost::irange where step parameter is used, as std::views::iota doesn't support it - this change partially modernizes our range usage while maintaining - existing functionality Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20924	2024-10-03 10:33:33 +03:00
Pavel Emelyanov	7389f4275d	sstables_loader: Dont unlink sstables when restoring from S3 When load_and_stream() completes, all sstables that were loaded (and streamed) are unlinked. This is wrong for the restore-from-s3 task, as removing objects from backup storage is not what user expects. Fix it by adding a boolean to streamer class, and set it to false (well, bool_class<>::no) for restore task. fixes: #20938 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 10:15:22 +03:00
Pavel Emelyanov	7eb48358e9	sstables_loader: Make primary_replica_only bool_class RAII field This boolean is currently passed all the way around as pure bool argument. And it's only needed in a single get_endpoints() method that calculates the target endpoints. This patch places this bool on class streamer, so that the call chain arguments are not polluted, and converts it to bool_class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 10:13:37 +03:00
Pavel Emelyanov	1da1d131b2	test: Use corrcet key in sstables registry mock The "real" registry defines its primary key as (location, generation) pair, where location is the partition key and generation is clustering key. The registry mock uses only location part as primary key, while it must use both. The buggy mock works simply because the listing API is in fact not used by unit tests. Those tests that do need it are python tests that start scylla and thus implicitly use real registry. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 10:07:11 +03:00
Pavel Emelyanov	a503e2ab10	test: Pass entry status to mock registry consumer When sstables registry is listed, the passed consumer accepts entry status as its first argument, not its location (location is passed as a search key) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 10:06:28 +03:00
Kefu Chai	c7eafc4dc1	auth: capture boost::regex_error not std::regex_error in `a3db5401`, we introduced the TLS certi authenticator, which is configured using `auth_certificate_role_queries` option . the value of this option contains a regular expression. so there are chances the regular expression is malformatted. in that case, when converting its value presenting the regular expression to an instance of `boost::regex`, Boost.Regex throws a `boost::regex_error` exception, not `std::regex_error`. since we decided to use Boost.Regex, let's catch `boost::regex_error`. Refs `a3db5401` Fixes scylladb/scylladb#20941 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20942	2024-10-03 09:57:15 +03:00
Piotr Dulikowski	6778001313	Merge 'cql3: Make creating MV respect ID option' from Dawid Mędrek Before these changes, we could create a materialized view specifying its ID, but the option was ignored. This commit makes Scylla respect the option. Now specifying the ID results in the MV being created with that specific ID. This way, Scylla's behavior is consistent with Cassandra's. Because Cassandra doesn't mention the option in its user documentation, we don't update it either in case the semantics of it changes in the future -- we want to have an open door for any modifications. Note that Cassandra returns a server error if the provided ID is already in use, both in the case of regular tables and MVs. That's most likely a bug. Instead of following that behavior, we stay consistent with the current semantics of creating a regular table in Scylla: if the provided ID is already used, return an InvalidRequest. The last thing worth pointing out is Cassandra handles `WITH ID = null` as a special case; normally, specifying an invalid ID results in a ConfigurationException, but a null is treated as a syntax error. As in the previous paragraph, we stay consistent with the semantics of regular tables and all invalid IDs, null included, lead to a ConfigurationException. We also add a few short tests verifying that the implementation works as intended. Fixes scylladb/scylladb#20616 Backport not needed: the semantics of the option was never documented in either Cassandra, or Scylla. Closes scylladb/scylladb#20773 * github.com:scylladb/scylladb: test/cql-pytest: Get rid of unnecessary processing describe statements cql3: Make creating MV respect ID option	2024-10-03 08:31:07 +02:00
Dawid Mędrek	1f1b201fd8	cql3/functions/user_function: Use fmt to format create statement We replace `std::ostringstream` with views and formatting using fmt to improve readability of the code.	2024-10-02 19:17:35 +02:00
Ferenc Szili	cdf775d3cc	test: test tombstone GC disabled on pending replica This tests if tombstone GC is disabled on pending replicas	2024-10-02 16:37:57 +02:00
Ferenc Szili	ba6707506d	tablet_storage_group_manager: update tombstone_gc_enabled in compaction group In order to avoid cases during tablet migrations where we garbage collect tombstones before the data it shadows arrives, we will disable tombstone GC on pending replicas. To achieve this we added a tombston_gc_enabled flag to compaction_group. This flag is updated from updte_effective_repliction_map method of the tablet_storage_group_manager class.	2024-10-02 16:31:33 +02:00
Raphael S. Carvalho	93815e0649	replica: Fix tombstone GC during tablet split preparation During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: 1) sstable B is split first, and moved from main (unsplit) group to a split-ready group 2) now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes #20044. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-10-02 11:26:13 -03:00
Ferenc Szili	e472844a78	database::table: add tombstone_gc_enabled(locator::tablet_id) This change adds the flag tombstone_gc_enabled to compaction_group. The value of this flag will be set in tablet_storage_group_manager::update_effective_replication_map().	2024-10-02 16:24:45 +02:00
Raphael S. Carvalho	bcd358595f	service: Improve error handling for split Retry wasn't really happening since the loop was broken and sleep part was skipped on error. Also, we were treating abort of split during shutdown as if it were an actual error and that confused longevity tests that parse for logs with error level. The fix is about demoting the level of logs when we know the exception comes from shutdown. Fixes #20890.	2024-10-02 11:23:44 -03:00
Konstantin Osipov	1d1777b13a	test: make cdc tests pass with tablets on by default CDC is not supported with tablets, explicitly disable tablets in CDC keyspace definition.	2024-10-02 06:37:14 -04:00
Konstantin Osipov	0e3dbec277	test: make cql/counters* pass with and without tablets Counters are not supported with tablets, make sure the test works in any ScyllaDB configuration.	2024-10-02 06:37:14 -04:00
Konstantin Osipov	92aca17bc5	test: make cql/lwt_* pass with and without tablets Lightweight transactions don't support tablets, so let's explicitly disable tablets in LWT tests.	2024-10-02 06:37:14 -04:00
Konstantin Osipov	7c64fc0c4f	test: rename cql/list_test to cql/lwt_list_test This test is actually testing lists with LWT, so should have the corresponding name. Going forward we'll patch CQL LWT tests for tablets, so let's group them together.	2024-10-02 06:37:14 -04:00
Sergey Zolotukhin	6398b7548c	config: Add a warning about use of IP address for join topology and replace operations. When the '--ignore-dead-nodes-for-replace' config option contains IP addresses, a warning will be logged, notifying the user that using IP addresses with this option is deprecated and will no longer be supported in the next release. Fixes scylladb/scylladb#19218	2024-10-02 11:56:59 +02:00
Sergey Zolotukhin	9c692438e9	nodetool: Add IP address usage warning for 'ignore-dead-nodes'. Since we are deprecating the use of IP addresses, a warning message will be printed if 'nodetool removenode --ignore-dead-nodes' is used with IP addresses.	2024-10-02 11:56:59 +02:00
Sergey Zolotukhin	a871321ecf	tests: Fix incorrect UUIDs in test_nodeops It was found that the UUIDs used in test_nodeops were invalid. This update replaces those UUIDs with newly generated random UUIDs.	2024-10-02 11:56:59 +02:00
Sergey Zolotukhin	3b9033423d	utils: Optimizations for utils::split_comma_separated_list and usage of host_id_or_endpoint lists - utils::split_comma_separated_list now accepts a reference to sstring instead of a copy to avoid extra memory allocations. Additionally, the results of trimming are moved to the resulting vector instead of being copied. - service/storage_service removenode, raft_removenode, find_raft_nodes_from_hoeps, parse_node_list and api/storage_service::set_storage_service were changed to use std::vector<host_id_or_endpoint> instead of std::list<host_id_or_endpoint> as std::vector is a more cache-friendly structure, resulting in better performance.	2024-10-02 11:56:59 +02:00
Dawid Mędrek	7a7a1e3558	treewide: Prefer bytes_fwd.hh over bytes.hh CI started reporting warnings about including `bytes.hh` in several files. The reason is they actually only use code introduced in `bytes_fwd.hh` (which is also included by `bytes.hh`). Clang-include-cleaner suggests that we get rid of that indirection and only include `bytes_fwd.hh`. That's what happens in this commit. We include `bytes.hh` in `exceptions/exceptions.cc` because it relies on the formatting utilities declared and defined in `bytes.hh`. Closes scylladb/scylladb#20842	2024-10-02 07:29:30 +02:00
Dawid Mędrek	de88c150f6	test/cql-pytest: Get rid of unnecessary processing describe statements As part of scylladb/scylladb@d42f160, we added a test verifying that restoring the schema works as intended. Unfortunately, because of scylladb/scylladb#20616, we had to manually process the results of `DESCRIBE SCHEMA` to exclude the ID parameter and be able to compare restore statements corresponding to the same view. Now that materialized views respect the ID parameter, we can get rid of that logic.	2024-10-01 22:04:05 +02:00
Dawid Mędrek	552c752005	cql3: Make creating MV respect ID option Before these changes, we could create a materialized view specifying its ID, but the option was ignored. This commit makes Scylla respect the option. Now specifying the ID results in the MV being created with that specific ID. This way, Scylla's behavior is consistent with Cassandra's. Because Cassandra doesn't mention the option in its user documentation, we don't update it either in case the semantics of it changes in the future -- we want to have an open door for any modifications. Note that Cassandra returns a server error if the provided ID is already in use, both in the case of regular tables and MVs. That's most likely a bug. Instead of following that behavior, we stay consistent with the current semantics of creating a regular table in Scylla: if the provided ID is already used, return an InvalidRequest. The last thing worth pointing out is Cassandra handles `WITH ID = null` as a special case; normally, specifying an invalid ID results in a ConfigurationException, but a null is treated as a syntax error. As in the previous paragraph, we stay consistent with the semantics of regular tables and all invalid IDs, null included, lead to a ConfigurationException. We also add a few short tests verifying that the implementation works as intended.	2024-10-01 22:03:58 +02:00
Kefu Chai	9b5eab0dde	test/lib: include <fmt/std.h> for formatting std::optional before this change, when compiling with fmtlib v11.0.2 and clang v19.1.0, the compiler fails like: ``` /usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_NO_LIB -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=x86-64-v3 -mpclmul -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -MD -MT test/lib/CMakeFiles/test-lib.dir/Debug/cql_assertions.cc.o -MF test/lib/CMakeFiles/test-lib.dir/Debug/cql_assertions.cc.o.d -o test/lib/CMakeFiles/test-lib.dir/Debug/cql_assertions.cc.o -c /home/kefu/dev/scylladb/test/lib/cql_assertions.cc In file included from /home/kefu/dev/scylladb/test/lib/cql_assertions.cc:12: In file included from /usr/include/fmt/ranges.h:20: In file included from /usr/include/fmt/format.h:41: /usr/include/fmt/base.h:2673:45: error: implicit instantiation of undefined template 'fmt::detail::type_is_unformattable_for<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, char>' 2673 \| type_is_unformattable_for<T, char_type> _; \| ^ /usr/include/fmt/base.h:2735:23: note: in instantiation of function template specialization 'fmt::detail::parse_format_specs<std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, fmt::detail::compile_parse_context<char>>' requested here 2735 \| parse_funcs_{&parse_format_specs<Args, parse_context_type>...} {} \| ^ /usr/include/fmt/base.h:2884:47: note: in instantiation of member function 'fmt::detail::format_string_checker<char, int, std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>>, std::vector<std::optional<managed_bytes>>>::format_string_checker' requested here 2884 \| detail::parse_format_string<true>(str_, checker(s)); \| ^ /home/kefu/dev/scylladb/test/lib/cql_assertions.cc:132:34: note: in instantiation of function template specialization 'fmt::basic_format_string<char, int &, std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<managed_bytes>> &>::basic_format_string<char[35], 0>' requested here 132 \| fail(seastar::format("row {} differs, expected {} got {}", row_nr, row, actual)); \| ^ /usr/include/fmt/base.h:1616:8: note: template is declared here 1616 \| struct type_is_unformattable_for; \| ^ /home/kefu/dev/scylladb/test/lib/cql_assertions.cc:132:34: error: call to consteval function 'fmt::basic_format_string<char, int &, std::vector<std::optional<seastar::basic_sstring<signed char, unsigned int, 31, false>>> &, const std::vector<std::optional<managed_bytes>> &>::basic_format_string<char[35], 0>' is not a constant expression 132 \| fail(seastar::format("row {} differs, expected {} got {}", row_nr, row, actual)); \| ^ ``` because the formatter for `std::optional<>` is defined in fmt/std.h. so, in this change, we include the used header. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20922	2024-10-01 22:32:16 +03:00
Tomasz Grabiec	a29501ed67	sstables: Reduce amount of I/O for clustering-key-bounded reads from large partitions Single-row reads from large partition issue 64 KiB reads to the data file, which is equal to the default span of the promoted index block in the data file. If users would want to reduce selectivity of the index to speed up single-row reads, this won't be effective. The reason is that the reader uses promoted index to look up the start position in the data file of the read, but end position will in practice extend to the next partition, and amount of I/O will be determined by the underlying file input stream implementation and its read-ahead heuristics. By default, that results in at least 2 IOs 32KB each. There is already infrastructure to lookup end position based on upper bound of the read, but it's not effective becasue it's a non-populating lookup and the upper bound cursor has its own private cached_promoted_index, which is cold when positions are computed. It's non-populating on purpose, to avoid extra index file IO to read upper bound. In case upper bound is far-enough from the lower bound, this will only increase the cost of the read. The solution employed here is to warm up the lower bound cursor's cache before positions are computed, and use that cursor for non-populating lookup of the upper bound. We use the lower bound cursor and the slice's lower bound so that we read the same blocks as later lower-bound slicing would, so that we don't incur extra IO for cases where looking up upper bound is not worth it, that is when upper bound is far from the lower bound. If upper bound is near lower bound, then warming up using lower bound will populate cached_promoted_index with blocks which will allow us to locate the upper bound block accurately. This is especially important for single-row reads, where the bounds are around the same key. In this case we want to read the data file range which belongs to a single promoted index block. It doesn't matter that the upper bound is not exactly the same. They both will likely lie in the same block, and if not, binary search will bring adjacent blocks into cache. Even if upper bound is not near, the binary search will populate the cache with blocks which can be used to narrow down the data file range somewhat. Fixes #10030. The change was tested with perf-fast-forward. I populated the data set with `column_index_size_in_kb` set to 1 scylla perf-fast-forward --populate --run-tests=large-partition-slicing --column-index-size-in-kb=1 Test run: build/release/scylla perf-fast-forward --run-tests=large-partition-select-few-rows -c1 --keep-cache-across-test-cases --test-case-duration=0 This test reads two rows from the middle of a large partition (1M rows), of subsequent keys. The first read will miss in the index file page cache, the second read will hit. Notice that before the change, the second read issued 2 aio requests worth of 64KiB in total. After the change, the second read issued 1 aio worth of 2 KiB. That's because promoted index block is larger than 1 KiB. I verified using logging that the data file range matches a single promoted index block. Also, the first read which misses in cache is still faster after the change. Before: running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009802 1 1 102 0 102 102 21.0 21 196 2 1 0 1 1 0 0 0 568 269 4716050 53.4% 500001 1 0.000321 1 1 3113 0 3113 3113 2.0 2 64 1 0 1 0 0 0 0 0 116 26 555110 45.0% After: running: large-partition-select-few-rows on dataset large-part-ds1 Testing selecting few rows from a large partition: stride rows time (s) iterations frags frag/s mad f/s max f/s min f/s avg aio aio (KiB) blocked dropped idx hit idx miss idx blk c hit c miss c blk allocs tasks insns/f cpu 500000 1 0.009609 1 1 104 0 104 104 20.0 20 137 2 1 0 1 1 0 0 0 561 268 4633407 43.1% 500001 1 0.000217 1 1 4602 0 4602 4602 1.0 1 2 1 0 1 0 0 0 0 0 110 26 313882 64.1% (cherry picked from commit dfb339376aff1ed961b26c4759b1604f7df35e54)	2024-10-01 18:40:34 +02:00
Tomasz Grabiec	41be5d1daf	sstables: clustered_cursor: Track current block Will be needed by the reader to jump to the current block even if we already advanced to it before, when setting up the reader context. We want to advance to lower bound earlier, before the praser skips to the lower bound. We want that in order to set input stream data file range based on index. If we didn't have access to the current block and used the result from advance_to(), the parser will think we're already in the block which has lower_bound when it attempts to skip, and will not skip, falling back to scanning.	2024-10-01 18:40:34 +02:00
Kefu Chai	787ea4b1d4	treewide: accept list of sstables in "restore" API before this change, we enumerate the sstables tracked by the system.sstables table, and restore them when serving requests to "storage_service/restore" API. this works fine with "storage_service/backup" API. but this "restore" API cannot be used as a drop-in replacement of the rclone based API currently used by scylla-manager. in order to fill the gap, in this change: * add the "prefix" parameter for specifying the shared prefix of sstables * add the "sstables" parameter for specifying the list of TOC components of sstables * remove the "snapshot" parameter, as we don't encode the prefix on scylla's end anymore. * make the "table" parameter mandatory. Fixes scylladb/scylladb#20461 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-01 23:24:56 +08:00
Kefu Chai	17181c2eca	sstable: pass get_storage_option to sstable_directory::load_sstable() before this change, we always pass `sstable_directory::_storage_opts` to `_manager.make_sstable()` in `sstable_directory::load_sstable()`. but when loading from object storage, we need to customize the storage_options on a per-sstable basis. the way to address this is to allow the caller of `sstable_directory::process_descriptor()` to pass a functor which return the `storage_options` to be used when creating the sstable. so, in this change, we update - sstable_directory::load_sstable() - sstable_directory::process_descriptor() so that they accept another parameter to create the storage_options. in the next commit we will pass a different functor for customizing the storage_options on a per-sstable basis when loading sstables. Refs scylladb/scylladb#20461 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-01 23:24:56 +08:00
Kefu Chai	283697e316	test/nodetool: add body parameter to `expected_request` before this change, `expected_request` only includes query strings for the parameters of requests. but we will add an API ("storage_service/restore") which accepts its parameters in HTTP body as well. in this change, we add an optional `body` member to `expected_request`, so that we can mock the APIs which pass the parameters with the HTTP body. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-01 23:24:56 +08:00
Kefu Chai	3c19cc9aec	tools/scylla-nodetool: enable nodetool to write HTTP body before this change, we always send the parameters with query strings, but we will add an API ("storage_service/restore") which accepts its parameters in HTTP body as well. in this change, we add an optional parameter to `do_request()` and `post()`, so that we can send HTTP body when using "POST" method in nodetool implementation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-01 23:24:56 +08:00
Pavel Emelyanov	7f71371de1	distributed_loader: Get token metadata from e.r.m., not database Though database can be used to get relevant token metadata, it's better not to use one service (database) as a proxy to get another one (token metadata). In case of tokens, there's effective replication map at hand, which is a more correct source of such topology information. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20894	2024-10-01 14:59:35 +03:00
Anna Stuchlik	7eb1dc2ae5	doc: document the option to run ScyllaDB in Docker on macOS This commit adds a description of a workaround to create a multi-node ScyllaDB cluster with Docker on macOS. Refs https://github.com/scylladb/scylladb/issues/16806 See https://forum.scylladb.com/t/running-3-node-scylladb-in-docker/1057/4 Closes scylladb/scylladb#20857	2024-10-01 14:58:58 +03:00
Botond Dénes	6535283881	.github/CODEOWNERS: add code owners for tools/* Closes scylladb/scylladb#20702	2024-10-01 14:52:26 +03:00
Yaron Kaikov	ab964bcd5a	[script/pull_github_pr.sh] Check Gating status before merging Maintainers use scripts/pull_github_pr.sh from scylladb.git when merging PRs and before pushing to the next. We want to prevent merges from piling up on top of unstable builds. This change will check Gating's current status and notify the maintainers Related to scylladb/scylla-pkg#3644 Closes scylladb/scylladb#20742	2024-10-01 14:46:29 +03:00
Anna Stuchlik	a97db03448	doc: add metric updates from 6.1 to 6.2 This commit specifies metrics that are new in version 6.2 compared to 6.1, as specified in https://github.com/scylladb/scylladb/issues/20176. Fixes https://github.com/scylladb/scylladb/issues/20176 Closes scylladb/scylladb#20896	2024-10-01 14:41:37 +03:00
muthu90tech	1204d54c5c	transport: Dont bypass seastar API when making syscalls The transport/controller.cc bypasses seastar API when making a few syscalls, this PR will use the right seastar API to make the syscall and libc calls this PR relies on few new APIs introduced in seastar commit : cd7f3b8e8850cd80a4f6899cedc726e576c51abe Closes scylladb/scylladb#17443 Closes scylladb/scylladb#19565	2024-10-01 14:29:24 +03:00
Benny Halevy	5a0f3889e0	treewide: use std::ranges sort functions rather than boost Using the standard library is preffered over boost. In cql3/expr/expression.cc to_sorted_vector got more of a face-list and was modernized to use also std::unique and while at it, to move its input range in the uniquely sorted result vector. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-10-01 14:19:05 +03:00
Avi Kivity	e99426df60	treewide: de-static namespace scope functions in headers 'static inline' is always wrong in headers - if the same header is included multiple times, and the function happens not to be inlined, then multiple copies of it will be generated. Fix by mechanically changing '^static inline' to 'inline'.	2024-10-01 14:02:50 +03:00
Avi Kivity	e9425e15b2	treewide: remove dependency on boost asio address_v4 It's not used. There's a comment mentioning it prevents some type conflict, but apparently that was fixed some time ago. Closes scylladb/scylladb#20883	2024-10-01 14:00:50 +03:00
Pavel Emelyanov	24598848a9	Merge 'virtual_tables: snapshots: include all snapshots' from Benny Halevy Use database::get_snapshot_details to get the details of all snapshots on disk, in particular those of deleted tables. Add test_snapshots_dropped_table to test listing of snapshots of a deleted table. And harden the existing test cases to use a unique snapshot tag and to delete it when the test ends. Fixes #18313 * No backport required at this time since this is rather minor UX issue that weren't hit in the field AFAIK Closes scylladb/scylladb#20869 * github.com:scylladb/scylladb: cql-pytest: test_virtual_tables: add test_snapshots_multiple_keyspaces virtual_tables: snapshots: include all snapshots	2024-10-01 13:56:13 +03:00
Gleb Natapov' via ScyllaDB development	22368b13f2	api: introduce raft stepdown REST API Also provide test.py util function to trigger it. Can be useful for testing.	2024-10-01 12:18:49 +02:00
Avi Kivity	f5628be597	Update tools/java submodule * tools/java 5b0e274f12...b2d025fd6b (1): > build.xml: update scylla-tools license	2024-10-01 12:48:45 +03:00
Pavel Emelyanov	1dfe780457	cql: Check that CREATEing tablets/vnodes is consistent with the CLI There are two bits that control whenter replication strategy for a keyspace will use tablets or not -- the configuration option and CQL parameter. This patch tunes its parsing to implement the logic shown below: if (strategy.supports_tablets) { if (cql.with_tablets) { if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { throw "tablets are not enabled"; } } else if (cql.with_tablets = off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified if (cfg.enable_tablets) { return create_keyspace_with_tablets(); } else { return create_keyspace_without_tablets(); } } } else { // strategy doesn't support tablets if (cql.with_tablets == on) { throw "invalid cql parameter"; } else if (cql.with_tablets == off) { return create_keyspace_without_tablets(); } else { // cql.with_tablets is not specified return create_keyspace_without_tablets(); } } closes: #20088 In order to enable tablets "by default" for NetworkTopologyStrategy there's explicit check near ks_prop_defs::get_initial_tablets(), that's not very nice. It needs more care to fix it, e.g. provide feature service reference to abstract_replication_strategy constructor. But since ks_prop_defs code already highjacks options specifically for that strategy type (see prepare_options() helper), it's OK for now. There's also #20768 misbehavior that's preserved in this patch, but should be fixed eventually as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20779	2024-10-01 10:54:29 +02:00
Botond Dénes	e780a3f168	Merge 'fix regressions of building tests with cmake' from Laszlo Ersek Fix two recent regressions of the cmake build -- found this time in the test suite. We (presumably) don't build stable releases (and their tests) with CMake, so backporting these fixes appears unnecessary, even if the regressions have been ported to stable branches. @xemul @dawmd @tchaikov @tgrabiec @scylladb/scylla-maint Closes scylladb/scylladb#20854 * github.com:scylladb/scylladb: test/boost/bptree_test: fix the CMake build test/boost/auth_test: fix the CMake build	2024-10-01 11:14:19 +03:00
Kefu Chai	d484121cc8	github: add a trigger to retrigger clang-tidy with comment before this change, clang-tidy is triggered by a pull request. but there are chances that user wants to retrigger it. for jenkins jobs, user can rebuild a job manually. but for workflow, only the developers with write permission can retrigger a workflow. this is not convenient to regular contributors. so, in this change, another trigger is added, so that user can trigger the clang-tidy workflow with "/clang-tidy" command. the syntax is inspired by IRC commands. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20841	2024-10-01 11:10:54 +03:00
Ernest Zaslavsky	5a96549c86	test: add complete_multipart_upload completion tests A primitive python http server is processing s3 client requests and issues either success or error. A multipart uploader should fail or succeed (with or without retries) depending on aforementioned server response	2024-10-01 09:06:24 +03:00
Ernest Zaslavsky	3be6052786	code: s3 client error handling Handle the `finalize_upload` possible exception to abort the upload (which also can throw) and show the right error originated from the `finalize_upload`	2024-10-01 09:06:24 +03:00
Ernest Zaslavsky	6be2433b5a	code: add response parsing and error handling to the complete_multipart_upload Instead of ignoring the response for multipart upload completion start parsing it and look for a possible errors in the response body. If the error is found throw an exception	2024-10-01 09:06:24 +03:00
Ernest Zaslavsky	826cf5cd4a	code: Introduce AWS errors parsing Add a simple utility class to parse (possible) error response from AWS S3. Stay as close as possible to aws-sdk-cpp ErrorMarshaler https://github.com/aws/aws-sdk-cpp/blob/main/src/aws-cpp-sdk-core/source/client/AWSErrorMarshaller.cpp logic Also, add a tester for this new class	2024-10-01 09:06:24 +03:00
Michał Chojnowski	c77d00fd8d	index_reader: remove a piece of misguided code involved in single-partition reads This patch removes a piece of code which, according to the comment, allows for forwarding the index reader even if it was created as a single-partition reader. For single-partition reads, the input_stream used by the reader is limited to the single index page containing the partition, since reading the index file past that point would be a waste. Because of this limit, such an index reader can't be forwarded/advanced. The dubious piece of code gets around that by unsetting the stream and ensuring it will be re-created, this time without the limit, if the index is advanced. But there is no use for this. The idea of a "single-partition reader" exist as an optimization. It's illegal to forward single-partition readers, and it doesn't make sense to attempt that. (If there's a need for forwarding, just don't create a single-partition reader). I suspect this piece of code was written due to a misunderstanding. Before the previous patch in this series, when the searched partition key was the first key in its page, the index reader would scan the preceding page first, realize it made a mistake, and advance to the next, correct page. I suspect this piece of code was written to make this work. But this is, in fact, undesirable. The fact that the index reader was working like this was a performance bug. In the single-partition case there's never an inherent reason to start with the wrong page. The index logic can be corrected to always start with the right page, and that's what the previous patch in this series does. And with that, there is no need to support advancing anymore, and the dubious piece of code can be erased. We also add an assert to emphasize that advancing a single-partition reader is illegal.	2024-09-30 23:43:36 +02:00
Michał Chojnowski	bc30509523	index_reader: in single-partition reads, don't read more than one page When looking for a partition key in the index, we scan the index from the first index page which can possibly contain the key. In a single-partition read, there is never a reason to read beyond that page. After the previous patch in this series, it's guaranteed that the first key in the next page is strictly greater than the searched key. So if the searched key is greater than the last key in the first page, then it is neither in the first nor the second page -- it must be absent from the sstable. But with the current logic, we read the second index page anyway, and the realization that the key is absent happens higher in the call chain. This patch optimizes that inefficiency by immediately returning EOF if a single-partition read doesn't find the key in the first page. Returning "end of file" even though we didn't actually go beyond the end of file is hacky, but I don't see any other non-invasive way of communicating to the caller that the partition is absent. Some caller of the index could possibly assume that returning EOF proves that the searched key is greater than all keys in the sstable. I don't think any such caller exists today, but it's a possible place for confusion. Together with the previous patch in this series, this patch guarantees that a single-partition read only accesses a single index page. This fixes a weird secondary performance bug. Due to some misunderstanding in the logic, when during a single-partition read we scan two index pages, the second index page is scanned via an input_stream created without an upper I/O limit, which means that we additionally read a full read-ahead (currently: 64 kiB) past the second index page for no reason whatsoever. After this and the previous patch, a single-partition read always reads exactly one index page, so the above problem cannot occur.	2024-09-30 23:43:36 +02:00
Michał Chojnowski	6b8b7d962c	index_reader: fix unnecessary reads of preceding index pages When setting the index to position X, we first look for the first summary entry N such that N >= X. Then we load the index page preceding N and scan it for the first partition key P such that P >= X. If there is no such key in this page, then we scan the next page (starting with N) for such key. (In this case it's always the first key). For example, assume we have: summary: A C E index: A B C D E F If we look up "B" in the index, then we first locate summary entry "C", then we scan the index for B, starting from "A". This is all fine. But when we look for "C" in the index, then we do the exactly the same -- we scan the index for "C" starting from "A". This is wasteful, because we can start scanning from "C". To avoid this inefficiency, we should be looking for N > X, not N >= X. This patch fixes that. In addition, this fixes a second, weirder performance bug. Due to some misunderstanding in the logic, when during a single-partition read we scan two index pages, the second index page is scanned via an input_stream created without an upper I/O limit, which means that we additionally read a full read-ahead (currently: 64 kiB) past the second index page for no reason whatsoever. After this patch, a single-partition read always reads exactly one index page, so the above problem cannot occur.	2024-09-30 23:43:36 +02:00
Avi Kivity	fb8743b2d6	Merge 'sstables: Fix use-after-free on page cache buffer when parsing promoted index entries across pages' from Tomasz Grabiec This fixes a use-after-free bug when parsing clustering key across pages. Also includes a fix for allocating section retry, which is potentially not safe (not in practice yet). Details of the first problem: Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `93482439`, the promoted index cursor was optimized to avoid fully page copy when parsing index blocks. Instead, parser is given a temporary_buffer which is a view on the page. A bit earlier, in `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. Details of the solution: We adapt page_view to a temporary_buffer-like API. For this, a new concept is introduced called ContiguousSharedBuffer. We also change parsers so that they can be templated on the type of the buffer they work with (page_view vs temporary_buffer). This way we don't introduce indirection to existing algorithms. We use page_view instead of temporary_buffer in the promoted index parser which works with page cache buffers. page_view can be safely shared via share() and stored across allocating sections. It keeps hold to the LSA buffer even across allocating sections by the means of cached_file::page_ptr. Fixes #20766 Closes scylladb/scylladb#20837 * github.com:scylladb/scylladb: sstables: bsearch_clustered_cursor: Add trace-level logging sstables: bsearch_clustered_cursor: Move definitions out of line test, sstables: Verify parsing stability when allocating section is retried test, sstables: Verify parsing stability when buffers cross page boundary sstables: bsearch_clustered_cursor: Switch parsers to work with page_view cached_file: Adapt page_view to ContiguousSharedBuffer cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start sstables, utils: Allow parsers to work with different buffer types sstables: promoted_index_block_parser: Make reset() always bring parser to initial state sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried	2024-10-01 00:02:55 +03:00
Calle Wilund	b5d167699c	commitlog: Fix buffer_list_bytes not updated correctly Fixes #20862 With the change in `60af2f3cb2` the bookkeep for buffer memory was changed subtly, the problem here that we would shrink buffer size before we after flush use said buffer's size to decrement the buffer_list_bytes value, previously inc:ed by the full, allocated size. I.e. we would slowly grow this value instead of adjusting properly to actual used bytes. Test included. Closes scylladb/scylladb#20886	2024-09-30 18:04:00 +03:00
Raphael S. Carvalho	cf58674029	replica: Fix schema change during migration cleanup During migration cleanup, there's a small window in which the storage group was stopped but not yet removed from the list. So concurrent operations traversing the list could work with stopped groups. During a test which emitted schema changes during migrations, a failure happened when updating the compaction strategy of a table, but since the group was stopped, the compaction manager was unable to find the state for that group. In order to fix it, we'll skip stopped groups when traversing the list since they're unused at this stage of migration and going away soon. Fixes #20699. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#20798	2024-09-30 17:30:38 +03:00
David Garcia	b94fbbf30c	docs: update command Removes the update command from the setup command. This is required because versions now are not strictly pinned in the poetry.lock file since Sphinx ScyllaDB Theme 1.8. Closes scylladb/scylladb#20876	2024-09-30 17:06:07 +03:00
Andrei Chekun	cdd0c0b7fc	test.py: Do not attach logs for passed tests To reduce the amount of space needed for reports, this PR will modify logs attachment in allure, so it will attach logs only for the tests that have status other than PASSED. To simplify the solution, with the current way it's not possible to switch off these logs completely. Closes scylladb/scylladb#20786	2024-09-30 14:55:55 +02:00
Kefu Chai	1c8100d3f1	test/unit: remove unused #include following headers are no longer used by this compilation unit: - "utils/managed_ref.hh" - "test/perf/perf.hh" this was identified by clang-include-cleaner. As the code is audited, we can safely remove the #include directive. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20850	2024-09-30 14:46:39 +03:00
Kefu Chai	c3be4a36af	test.py: pass "count" to re.sub() with kwarg since Python 3.13, passing count to `re.sub()` as positional argument has been deprecated. and when runnint `test.py` with Python 3.13, we have following warning: ``` /home/kefu/dev/scylladb/./test.py:1477: DeprecationWarning: 'count' is passed as positional argument args.tests = set(re.sub(r'.* List configured unit tests\n(.*)\n', r'\1', out, 1, re.DOTALL).split("\n")) ``` see also https://github.com/python/cpython/issues/56166 in order to silence this distracting warning, let's pass `count` using kwarg. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20859	2024-09-30 13:57:02 +03:00
Kefu Chai	947d9d5a97	scylla_coredump_setup: fix typos in comment these typos were identified by the codespell workflow. and fixed a syntax error along the way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20877	2024-09-30 13:29:34 +03:00
Aleksandra Martyniuk	efc7ad8547	node_ops: fix task_manager_module::get_nodes() Currently, node ops virtual task gathers its children from all nodes contained in a sum of service::topology::normal_nodes and service::topology::transition_nodes. The maps may contain nodes that are down but weren't removed yet. So, if a user requests the status of a node ops virtual task, the task's attempt to retrieve its children list may fail with seastar::rpc::closed_error. Filter out the tasks that are down in node_ops::task_manager_module::get_nodes. Fixes: #20843. Closes scylladb/scylladb#20856	2024-09-30 12:32:23 +03:00
Pavel Emelyanov	423b5a3ba7	Merge 'directories: cleanups to silence clang-tidy false alarms' from Kefu Chai clang-tidy warns: ``` Warning: /__w/scylladb/scylladb/utils/directories.cc:132:52: warning: 'path' used after it was moved [bugprone-use-after-move] 132 \| bool can_access = co_await file_accessible(path.string(), access_flags::read \| access_flags::write \| access_flags::execute); \| ^ /__w/scylladb/scylladb/utils/directories.cc:121:28: note: move occurred here 121 \| verification_error(std::move(path), "File not owned by current euid: {}. Owner is: {}", geteuid(), sd.uid); \| ^ ``` because we pass `std::move(path)` to `verification_error()`, and "then" use this variable again in this same function. this is a false alarm, but we could make it very clear to convince this tool that it's safe to do so. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#20875 * github.com:scylladb/scylladb: directories: mark verification_error() with [[noreturn]] directories: pass const ref of path to verification_error()	2024-09-30 12:02:39 +03:00
Kamil Braun	322efb54c2	Merge 'raft_group0_client: place on a #include diet' from Avi Kivity Reduce compile time and unnecessary compilations by reducing #include load. Minor refactoring, no backport. Closes scylladb/scylladb#20864 * github.com:scylladb/scylladb: raft_group0_client: uninclude "raft_group0_registry.hh" raft_group_registry: extract raft_timeout raft_group0_client: uninclude "mutation/mutation.hh" raft_group0_client: uninclude "db/system_keyspace.hh" db: system_keyspace: extract auth_version_t into its own header	2024-09-30 10:43:44 +02:00
Kefu Chai	faec71e666	directories: mark verification_error() with [[noreturn]] this helps the compiler or static analyzers do make the right decision. for instance, clang-tidy thinks a parameter like `std::move(path)` could be reused after being moved away. with this attribute, this tool should be able to tell that this never happens. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-30 12:07:15 +08:00
Kefu Chai	0ef72475fc	directories: pass const ref of path to verification_error() before this change, we pass a `path` to `verification_error()` by moving away from the original `path`. this works fine in the sense that it is correct and does not incur potential performance issues. but clang-tidy considers it a used-after-move, because it cannot tell `verification_error()` does not return at all, and believes that `path` could be accessed again after being moved away. so it warns like: ``` Warning: /__w/scylladb/scylladb/utils/directories.cc:132:52: warning: 'path' used after it was moved [bugprone-use-after-move] 132 \| bool can_access = co_await file_accessible(path.string(), access_flags::read \| access_flags::write \| access_flags::execute); \| ^ /__w/scylladb/scylladb/utils/directories.cc:121:28: note: move occurred here 121 \| verification_error(std::move(path), "File not owned by current euid: {}. Owner is: {}", geteuid(), sd.uid); \| ^ ``` in this change, instead of passing `fs::path` to `verification_error()`, we pass a `const fs::path&` to this function. because `verification_error()` is not coroutine, neither does it not pass `path` to another continuation to be scheduled. so it's perfectly fine to pass `path` to it. this change address the false alarms from clang-tidy. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-30 12:07:15 +08:00
Nadav Har'El	64c0540d02	cql-pytest: test a few small materialized views syntax issue While documenting materialized view in a new document (Refs #16569) I encountered a few questions and this patch contains tests that clarify their answer - and can later guarantee that the answer doesn't unintentionally change in the future. The questions that these tests answer are: 1. It is not allowed to filter a view on a static column (a comment on the test explains why). 2. We already tested that it's not allowed to SELECT a static column into a view. Here we add the check that "SELECT *" is also not allowed if a static column exists in the base table. 3. We check that CREATE MATERIALIZED VIEW ... WITH COMMENT='..' works. 4. We check that CREATE MATERIALIZED VIEW ... WITH COMPACT STORAGE is forbidden. 5. We check that CREATE MATERIALIZED VIEW ... WITH garbage=.. fails with a clean InvalidRequest. All these tests pass on both Scylla and Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20873	2024-09-29 21:34:24 +03:00
Nadav Har'El	b008dabee5	test/cql-pytest: fix support for Cassandra 3 One of the design goals of the test/cql-pytest frameworks was to be able to run these tests against Cassandra. Preferably, we should be able to run most of the tests against any popular version of Cassandra, including Cassandra 3. This is admittingly a very old version, but was still maintained until just a year ago, it's the version that Scylla is most compatible with, and we can still be curious about how it worked. Until recently cql-pytest indeed worked on Cassandra 3, but it broke on some change related to tablet detection that cause our most basic fixture - "text_keyspace" - to use the Cassandra 4 feature of "auto expand". This is trivial to fix - we should just use the this_dc fixture that we already had exactly for this purpose. Fixes #20781 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20782	2024-09-29 19:36:33 +03:00
Benny Halevy	946f21bbd3	cql-pytest: test_virtual_tables: add test_snapshots_multiple_keyspaces Test snapshots listing in system.snapshots using multiple keyspaces and multiple snpashots. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-29 14:36:18 +03:00
Benny Halevy	906de3444b	virtual_tables: snapshots: include all snapshots Use database::get_snapshot_details to get the details of all snapshots on disk, in particular those of deleted tables. Add test_snapshots_dropped_table to test listing of snapshots of a deleted table. And harden the existing test cases to use a unique snapshot tag and to delete it when the test ends. Fixes scylladb/scylladb#18313 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-29 14:16:11 +03:00
Botond Dénes	a4c41755de	Update seastar submodule * ./seastar 69f88e2f...3c9c2696 (14): > core/reactor: don't check AIO block count when they are not needed > build: do not print the default value of --c++-standard in help output > json_formatter: Add tests for formatter::write > Add APIs to get group details and to change ownership of file. > scripts/perftune.py: improve a dry-run printout > build: drop the workaround for a GCC bug > cmake: Depend on libbsd if DPDK depends on it > http: clarify the ownership in the router's doxygen comment > build: check for P2582R1 support > python: introduce a python formatting CI check > addr2line: reformat with black > scripts: add pyproject.toml > json_formatter: Make formatter::write work for std::pair > README.md: use the github homepage of Ceph for Crimson Closes scylladb/scylladb#20836	2024-09-29 13:47:40 +03:00
Avi Kivity	5a470b2bfb	Merge 'scylla_raid_setup: configure SELinux file context' from Takuya ASADA On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #19325 Closes scylladb/scylladb#20528 * github.com:scylladb/scylladb: scylla_raid_setup: configure SELinux file context scylla_coredump_setup: fix SELinux configuration for RHEL9	2024-09-29 12:53:00 +03:00
Avi Kivity	884297ae2e	raft_group0_client: uninclude "raft_group0_registry.hh" Reduce unnecessary recompilations.	2024-09-28 17:25:11 +03:00
Avi Kivity	67cdd0d389	raft_group_registry: extract raft_timeout It is a vocabulary term that shouldn't need the registry to be visible. Extract it to a new header.	2024-09-28 17:25:03 +03:00
Avi Kivity	93afc77307	raft_group0_client: uninclude "mutation/mutation.hh" Lighten the dependency load. Some constructors and destructors are uninlined to avoid the header depending on the mutation class.	2024-09-28 16:31:53 +03:00
Avi Kivity	5d68efe0bd	raft_group0_client: uninclude "db/system_keyspace.hh" It doesn't need it apart from a forward declaration. Files that lost necessary includes are adjusted, and some users of auth_version_t are redirected to the definition outside system_keyspace.	2024-09-28 16:31:53 +03:00
Avi Kivity	df3ee94467	db: system_keyspace: extract auth_version_t into its own header Users of auth_version_t shouldn't need to include the heavyweight system_keyspace.hh.	2024-09-28 16:31:50 +03:00
Pavel Emelyanov	c17d353718	Revert "[script/pull_github_pr.sh] Check Gating status before merging" This reverts commit `fac682df7e`. Again, this patch broke maintainer workflows, it needs even more care.	2024-09-27 19:12:18 +03:00
Benny Halevy	23d6b996b8	test/pylib: scylla_cluster: set endpoint_snitch in scylla conf When `property_file` is provided, we generate a `cassandra-rackdc.properties` file, but to actually use it, `endpoint_snitch` must be set to `GossipingPropertyFileSnitch`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20730	2024-09-27 16:46:54 +03:00
David Garcia	4900e4b1ac	docs: update theme 1.8.1 chore: update README Closes scylladb/scylladb#20832	2024-09-27 14:35:39 +02:00
Laszlo Ersek	153279dbfa	test/boost/bptree_test: fix the CMake build Commit `4cf4b7d4ef` ("test: Move B+tree compactiont test from unit to boost", 2024-09-24) introduced the first SEASTAR_THREAD_TEST_CASE to "test/boost/bptree_test.cc" (alongside the prior BOOST_AUTO_TEST_CASEs), but missed changing the KIND of the test from BOOST to SEASTAR. Therefore we get a linker failure: > : && /usr/bin/clang++ -O2 -Xlinker --build-id=sha1 --ld-path=ld.lld > -dynamic-linker=/.../lib64/ld-linux-x86-64.so.2 > test/boost/CMakeFiles/bptree_test.dir/Dev/bptree_test.cc.o -o > test/boost/Dev/bptree_test -L$srcdir/idl/absl::headers > -Wl,-rpath,$srcdir/idl/absl::headers test/lib/Dev/libtest-lib.a > seastar/Dev/libseastar.a /usr/lib64/libxxhash.so > /usr/lib64/libboost_unit_test_framework.so.1.83.0 utils/Dev/libutils.a > -Xlinker --push-state -Xlinker --whole-archive auth/Dev/libscylla_auth.a > -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/Dev/libcdc.a > compaction/Dev/libcompaction.a mutation_writer/Dev/libmutation_writer.a > -Xlinker --push-state -Xlinker --whole-archive dht/Dev/libscylla_dht.a > -Xlinker --pop-state types/Dev/libtypes.a index/Dev/libindex.a -Xlinker > --push-state -Xlinker --whole-archive locator/Dev/libscylla_locator.a > -Xlinker --pop-state message/Dev/libmessage.a gms/Dev/libgms.a > sstables/Dev/libsstables.a readers/Dev/libreaders.a > schema/Dev/libschema.a -Xlinker --push-state -Xlinker --whole-archive > tracing/Dev/libscylla_tracing.a -Xlinker --pop-state > Dev/libscylla-main.a -Xlinker --push-state -Xlinker --whole-archive > Dev/libscylla-zstd.a -Xlinker --pop-state /usr/lib64/libzstd.so > abseil/absl/strings/Dev/libabsl_cord.a > abseil/absl/strings/Dev/libabsl_cordz_info.a > abseil/absl/strings/Dev/libabsl_cord_internal.a > abseil/absl/strings/Dev/libabsl_cordz_functions.a > abseil/absl/strings/Dev/libabsl_cordz_handle.a > abseil/absl/crc/Dev/libabsl_crc_cord_state.a > abseil/absl/crc/Dev/libabsl_crc32c.a > abseil/absl/crc/Dev/libabsl_crc_internal.a > abseil/absl/crc/Dev/libabsl_crc_cpu_detect.a > abseil/absl/strings/Dev/libabsl_str_format_internal.a /usr/lib64/libz.so > service/Dev/libservice.a node_ops/Dev/libnode_ops.a > service/Dev/libservice.a node_ops/Dev/libnode_ops.a -lsystemd > raft/Dev/libraft.a repair/Dev/librepair.a streaming/Dev/libstreaming.a > replica/Dev/libreplica.a db/Dev/libdb.a mutation/Dev/libmutation.a > data_dictionary/Dev/libdata_dictionary.a cql3/Dev/libcql3.a > transport/Dev/libtransport.a cql3/Dev/libcql3.a > transport/Dev/libtransport.a lang/Dev/liblang.a > /usr/lib64/liblua-5.4.so -lm /usr/lib64/libsnappy.so.1.1.10 > abseil/absl/container/Dev/libabsl_raw_hash_set.a > abseil/absl/hash/Dev/libabsl_hash.a abseil/absl/hash/Dev/libabsl_city.a > abseil/absl/types/Dev/libabsl_bad_variant_access.a > abseil/absl/hash/Dev/libabsl_low_level_hash.a > abseil/absl/types/Dev/libabsl_bad_optional_access.a > abseil/absl/container/Dev/libabsl_hashtablez_sampler.a > abseil/absl/profiling/Dev/libabsl_exponential_biased.a > abseil/absl/synchronization/Dev/libabsl_synchronization.a > abseil/absl/debugging/Dev/libabsl_stacktrace.a > abseil/absl/synchronization/Dev/libabsl_graphcycles_internal.a > abseil/absl/synchronization/Dev/libabsl_kernel_timeout_internal.a > abseil/absl/debugging/Dev/libabsl_symbolize.a > abseil/absl/debugging/Dev/libabsl_debugging_internal.a > abseil/absl/base/Dev/libabsl_malloc_internal.a > abseil/absl/debugging/Dev/libabsl_demangle_internal.a > abseil/absl/time/Dev/libabsl_time.a > abseil/absl/strings/Dev/libabsl_strings.a > abseil/absl/strings/Dev/libabsl_strings_internal.a > abseil/absl/strings/Dev/libabsl_string_view.a > abseil/absl/base/Dev/libabsl_throw_delegate.a > abseil/absl/numeric/Dev/libabsl_int128.a > abseil/absl/base/Dev/libabsl_base.a > abseil/absl/base/Dev/libabsl_raw_logging_internal.a > abseil/absl/base/Dev/libabsl_log_severity.a > abseil/absl/base/Dev/libabsl_spinlock_wait.a -lrt > abseil/absl/time/Dev/libabsl_civil_time.a > abseil/absl/time/Dev/libabsl_time_zone.a rust/Dev/libwasmtime_bindings.a > rust/librust_combined.a utils/Dev/libutils.a seastar/Dev/libseastar.a > /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so > /usr/lib64/libboost_chrono.so /usr/lib64/libboost_atomic.so > /usr/lib64/libcares.so /usr/lib64/libfmt.so.10.2.1 /usr/lib64/liblz4.so > /usr/lib64/libgnutls.so -latomic /usr/lib64/libsctp.so > /usr/lib64/libprotobuf.so /usr/lib64/libyaml-cpp.so > /usr/lib64/libhwloc.so /usr/lib64/libnuma.so /usr/lib64/libxxhash.so > /usr/lib64/libcryptopp.so /usr/lib64/libdeflate.so > /usr/lib64/libboost_regex.so.1.83.0 /usr/lib64/libicui18n.so > /usr/lib64/libicuuc.so -ldl && : > ld.lld: error: undefined symbol: main > >>> referenced by > /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../lib64/crt1.o:(_start) > > ld.lld: error: undefined symbol: > seastar::testing::seastar_test::seastar_test(char const, char const, > int, boost::unit_test::decorator::collector_t&) > ooo referenced by bptree_test.cc > >>> > test/boost/CMakeFiles/bptree_test.dir/Dev/bptree_test.cc.o:(_GLOBAL__sub_I_bptree_test.cc) > clang++: error: linker command failed with exit code 1 (use -v to see invocation) Fix the KIND now. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-09-27 12:21:17 +02:00
Laszlo Ersek	5fa87cb1c6	test/boost/auth_test: fix the CMake build Commit `78ab1ee8b7` ("test: Add tests for `CREATE ROLE WITH SALTED HASH`", 2024-09-20) made test/boost/auth_test dependent on cql3, but didn't encode the dependency in "CMakeLists.txt": > FAILED: > test/boost/CMakeFiles/auth_test.dir/RelWithDebInfo/auth_test.cc.o > /usr/bin/clang++ -DBOOST_ALL_DYN_LINK -DFMT_SHARED > -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 > -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT > -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING > -DSEASTAR_TESTING_MAIN -DXXH_PRIVATE_API > -DCMAKE_INTDIR=\"RelWithDebInfo\" -I$srcdir -I$srcdir/build/gen > -I$srcdir/seastar/include -I$srcdir/build/seastar/gen/include > -I$srcdir/build/seastar/gen/src -isystem $srcdir/abseil -isystem > $srcdir/build/rust -ffunction-sections -fdata-sections -O3 -g -gz > -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra > -Wno-error=deprecated-declarations -Wimplicit-fallthrough > -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags > -Wno-missing-field-initializers -Wno-overloaded-virtual > -Wno-unsupported-friend -Wno-enum-constexpr-conversion > -Wno-unused-parameter -ffile-prefix-map=$srcdir/build=. -march=westmere > -Xclang -fexperimental-assignment-tracking=disabled -mllvm > -inline-threshold=2500 -fno-slp-vectorize -Werror=unused-result -MD -MT > test/boost/CMakeFiles/auth_test.dir/RelWithDebInfo/auth_test.cc.o -MF > test/boost/CMakeFiles/auth_test.dir/RelWithDebInfo/auth_test.cc.o.d -o > test/boost/CMakeFiles/auth_test.dir/RelWithDebInfo/auth_test.cc.o -c > $srcdir/test/boost/auth_test.cc > $srcdir/test/boost/auth_test.cc:22:10: fatal error: 'cql3/CqlParser.hpp' > file not found > 22 \| #include "cql3/CqlParser.hpp" > \| ^~~~~~~~~~~~~~~~~~~~ > 1 error generated. State the dependency now. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-09-27 11:38:03 +02:00
Tomasz Grabiec	b5ae7da9d2	sstables: bsearch_clustered_cursor: Add trace-level logging	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	8e54ecd38e	sstables: bsearch_clustered_cursor: Move definitions out of line In order to later use the formatter for the inner class promoted_index_block, which is defined out of line after cached_promoted_index class definition.	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	0279ac5faa	test, sstables: Verify parsing stability when allocating section is retried	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	c09fa0cb98	test, sstables: Verify parsing stability when buffers cross page boundary	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	7670ee701a	sstables: bsearch_clustered_cursor: Switch parsers to work with page_view This fixes a use-after-free bug when parsing clustering key across pages. Clustering key index lookup is based on the index file page cache. We do a binary search within the index, which involves parsing index blocks touched by the algorithm. Index file pages are 4 KB chunks which are stored in LSA. To parse the first key of the block, we reuse clustering_parser, which is also used when parsing the data file. The parser is stateful and accepts consecutive chunks as temporary_buffers. The parser is supposed to keep its state across chunks. In `b1b5bda`, the parser was changed to keep shared fragments of the buffer passed to the parser in its internal state (across pages) rather than copy the fragments into a new buffer. This is problematic when buffers come from page cache because LSA buffers may be moved around or evicted. So the temporary_buffer which is a view on the LSA buffer is valid only around the duration of a single consume() call to the parser. If the blob which is parsed (e.g. variable-length clustering key component) spans pages, the fragments stored in the parser may be invalidated before the component is fully parsed. As a result, the parsed clustering key may have incorrect component values. This never causes parsing errors because the "length" field is always parsed from the current buffer, which is valid, and component parsing will end at the right place in the next (valid) buffer. The problematic path for clustering_key parsing is the one which calls primitive_consumer::read_bytes(), which is called for example for text components. Fixed-size components are not parsed like this, they store the intermediate state by copying data. This may cause incorrect clustering keys to be parsed when doing binary search in the index, diverting the search to an incorrect block. The solution is to use page_view instead of temporary_buffer, which can be safely shared via share() and stored across allocating section. The page_view maintains its hold to the LSA buffer even across allocating sections. Fixes #20766	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	c15145b71d	cached_file: Adapt page_view to ContiguousSharedBuffer	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	29498a97ae	cached_file: Change meaning of page_view::_size to be relative to _offset rather than page start Will be easier to implement ContiguousSharedBuffer API as the buffer size will be equal to _size.	2024-09-27 01:25:15 +02:00
Tomasz Grabiec	c0fa49bab5	sstables, utils: Allow parsers to work with different buffer types Currently, parsers work with temporary_buffer<char>. This is unsafe when invoked by bsearch_clustered_cursor, which reuses some of the parsers, and passes temporary_buffer<char> which is a view onto LSA buffer which comes from the index file page cache. This view is stable only around consume(). If parsing requires more than one page, it will continue with a different input buffer. The old buffer will be invalid, and it's unsafe for the parser to store and access it. Unfortunetly, the temporary_buffer API allows sharing the buffer via the share() method, which shares the underlying memory area. This is not correct when the underlying is managed by LSA, because storage may move. Parser uses this sharing when parsing blobs, e.g. clustering key components. When parsing resumes in the next page, parser will try to access the stored shared buffers pointing to the previous page, which may result in use-after-free on the memory area. In prearation for fixing the problem, parametrize parsers to work with different kinds of buffers. This will allow us to instantiate them with a buffer kind which supports sharing of LSA buffers properly in a safe way. It's not purely mechanical work. Some parts of the parsing state machine still works with temporary_buffer<char>, and allocate buffers internally, when reading into linearized destination buffer. They used to store this destination in _read_bytes vector, same field which is used to store the shared buffers. Now it's not possible, since shared buffer type may be different than temporary_buffer<char>. So those paths were changed to use a new field: _read_bytes_buf.	2024-09-27 01:24:54 +02:00
Tomasz Grabiec	93bfaf4282	sstables: promoted_index_block_parser: Make reset() always bring parser to initial state When reset() is done due to allocating section retry, it can be theoretically in an arbitrary point. So we should not assume that it finished parsing and state was reset by previous parsing. We should reset all the fields.	2024-09-27 01:23:43 +02:00
Tomasz Grabiec	ac823b1050	sstables: bsearch_clustered_cursor: Switch read_block_offset() to use the read() method To unify logic which handles allocating section retry, and thus improve safety.	2024-09-27 01:22:35 +02:00
Nadav Har'El	9af43dcd06	Merge 'Move collections stress tests from unit/ to boost/' from Pavel Emelyanov Collection stress tests include testing of B- B+- and radix trees, and those tests live in unit/ suite. There are also small corner-case tests for those collections in boost/ suite. There's an attempt to get rid of unit suite in favor of boost one, and this PR moves the collections stress testing from unit suite into their boost counterparts. refs: scylladb/qa-tasks#1655 Closes scylladb/scylladb#20475 * github.com:scylladb/scylladb: test: Move other collection-testing headers from unit to boost test: Move stress-collecton header from unit to boost test: Move B+tree compactiont test from unit to boost test: Move radix tree compactiont test from unit to boost test: Move B-tree compactiont test from unit to boost test: Move radix tree stress test from unit to boost test: Move B-tree stress test from unit to boost test: Move b+tree stress test from unit to boost test: Add bool in_thread argument to stress_collection function	2024-09-26 18:11:23 +03:00
Botond Dénes	9fe64b5d70	Merge 'Remove datadir string from table::config' from Pavel Emelyanov The datadir keeps path to directory where local sstables can be. The very same information is now kept in table's storage options (#20542). This set fixes the remaining places that still use table::config::datadir and table::dir() and removes the datadir field. Closes scylladb/scylladb#20675 * github.com:scylladb/scylladb: treewide: Remove table::config::datadir distributed_loader: Print storage options, not datadir data_dictionary: Add formatter for storage_options test: Construct table_for_tests with table storage options test: Generalize pair of make_table_for_tests helpers tests: Add helper to get snapshot directory from storage options table: snapshot_exists: Get directory from storage options table: snapshot_on_all_shards: Get directory from storage options	2024-09-26 15:26:45 +03:00
Kamil Braun	9224e48d6b	Merge 'Populate raft address map from gossiper on raft configuration change' from Gleb Natapov For each new node added to the raft config populate its ID to IP mapping in raft address map from the gossiper. The mapping may have expired if a node is added to the raft configuration long after it first appears in the gossiper. Fixes scylladb/scylladb#20600 Backport to all supported versions since the bug may cause bootstrapping failure. Closes scylladb/scylladb#20601 * github.com:scylladb/scylladb: test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join group0: make sure that address map has an entry for each new node in the raft configuration	2024-09-26 12:41:25 +02:00
Tomasz Grabiec	8aca93b3ec	sstables: bsearch_clustered_cursor: Fix parsing when allocating section is retried Parser's state was not reset when allocating section was retried. This doesn't cause problems in practice, because reserves are enough to cover allocation demands of parsing clustering keys, which are at most 64K in size. But it's still potentially unsafe and needs fixing.	2024-09-26 12:34:41 +02:00
Laszlo Ersek	ed91d35171	sstables: coroutinize sstable::load() Best viewed with "git show -b -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20822	2024-09-26 13:26:22 +03:00
Lakshmi Narayanan Sreethar	7beea03196	build: cmake: link cql3 library to the service library After commit `d16ea0af`, compiling the server using cmake fails with the following error : ``` FAILED: service/CMakeFiles/service.dir/Dev/qos/service_level_controller.cc.o ... /home/Scylla/scylladb/cql3/util.hh:21:10: fatal error: 'cql3/CqlParser.hpp' file not found 21 \| #include "cql3/CqlParser.hpp" \| ^~~~~~~~~~~~~~~~~~~~ 1 error generated. ``` Fix it by linking the cql3 to the service library. Closes scylladb/scylladb#20805	2024-09-26 09:17:30 +03:00
Yaron Kaikov	fac682df7e	[script/pull_github_pr.sh] Check Gating status before merging Maintainers use scripts/pull_github_pr.sh from scylladb.git when merging PRs and before pushing to the next. We want to prevent merges from piling up on top of unstable builds. This change will check Gating's current status and notify the maintainers Related to scylladb/scylla-pkg#3644 Closes scylladb/scylladb#20742	2024-09-26 08:44:06 +03:00
Nadav Har'El	7715abfc56	Merge 'Alternator store ProvisionedThroughput' from Amnon Heiman When users create a table using the Alternator API, they can decide if the billing is PROVISIONED of PAY_PER_REQUEST. If the billing is set to PROVISIONED, they need to set the ProvisionedThroughput ReadCapacityUnits (RCU) and WriteCapacityUnits (WCU). This series adds support for getting and setting the ProvisionedThroughput. The values will be stored as table extension tags. Following how TTL is stored within the Alternator, we will use ```system:rcu_attribute``` and ```system:wcu_attribute``` for the labels. The series adds a test that sets ProvisionedThroughput and validates that it gets the value back. It was tested with both Alternator and AWS. This series is part of the effort to monitor, limit, and bill Alternator operations. New code, no need to backport. Closes scylladb/scylladb#20056 * github.com:scylladb/scylladb: docs/alternator/compatibility.md: explain the consumed capacity provisioned Add test/alternator/test_provisioned_throughput.py test/alternator/util.py: Allow override BillingMode alternator/executor.cc: Store ProvisionedThroughput	2024-09-26 01:23:17 +03:00
Avi Kivity	357168114b	cql3: statement_restrictions: use the evaluator to calculate token for constrained global index query A global index has a primary key of the form (indexed_column, token, partition_key_column..., clustering_key_column...) The primary key columns are used to point at the base table row, and the token (computed as token(partition_key_column...) is used to maintain sort order. The query planner has an optimization: if the partition key is fully constrained to a unique value, then we compute the token from the partition key and use that to seek directly into the clustering row range for that base table partition. If the clustering key is also partially constrained, it is used to refine the index clustering key. Currently, this optimization is implemented as a hack: the partition key is extracted from the prepared statement + query options in get_global_index_token_clustering_ranges(), then used to calculate the token, which is then substituted in the expression passed to get_single_column_clustering_bounds() (the expression is shared across all running queries, so this is quite dangerous). We simplify the whole thing: - Let prepare_index_global() recognize that if the partition key is not fully constrained, then there is no way that we'll be able to compute the token (as it needs all partition key columns). Since the token is the first clustering key column of the index table, we can truncate it to length zero and bail out. - Otherwise, the partition key is fully constrained. We refactor the predicate (pk1 = :a AND pk2 = :b) to (pk1, pk2) := (:a, :b). We then pass expressions representing the partition key to the token function, ending up with token(:a, :b). We then substitute this expression into (*_idx_tbl_ck_prefix)[0], which computes the first clustering key column for the index table. - Remove the runtime component in get_global_index_clustering_ranges(). Note this include the early return if the partition key wasn't fully constrained (though the comment only mentions over-constraining), and the token computation, which is now done by evaluate(). Closes scylladb/scylladb#20733	2024-09-25 22:48:16 +03:00
Yaron Kaikov	d164fd45bc	install-dependencies.sh: update node_exporter to 1.8.2 Update node_exporter to 1.8.2 Fixes: #18493 Closes scylladb/scylladb#20254 [avi: regenerate frozen toolchain, with new clang in https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.8-Fedora-40-x86_64.tar.gz new clang regenerated due to new packaging format (`f6fe4d9e73`) and some other minor changes.]	2024-09-25 18:42:25 +03:00
Gleb Natapov	9e4cd32096	test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join	2024-09-25 17:10:09 +03:00
Kamil Braun	7d8f1d251a	Merge 'Mark node as being replaced earlier' from Gleb Natapov Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended. Fixes: scylladb/scylladb#20629 Need to be backported since this is a regression Closes scylladb/scylladb#20743 * github.com:scylladb/scylladb: test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts topology coordinator:: mark node as being replaced earlier topology coordinator: do metadata barrier before calling finish_accepting_node() during replace	2024-09-25 15:46:12 +02:00
Kamil Braun	09c68c0731	service: raft: fix rpc error message What it called "leader" is actually the destination of the RPC. Trivial fix, should be backported to all affected versions. Closes scylladb/scylladb#20789	2024-09-25 15:46:37 +03:00
Kefu Chai	d5b348460f	config: do not provide default value for set_value() and friends before this change, `config_file::set_value()` and `config_file::set_value_on_all_shards()` provide default value for `config_source`. but the default value is never used -- we alway specify the `source_source` when calling `set_value_on_all_shards()`. so in hope to improve the readability, the default value is removed. so, for example, one can figure out when `config_source::Internal` is used with less efforts. despite that `config_file::set_value()` is not used in the tree. for the sake of completeness, its default value is also dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20728	2024-09-25 15:45:42 +03:00
Anna Stuchlik	8145109120	doc: add OS support for version 6.2 This commit adds the OS support for version 6.2. In addition, it removes support for 6.0, as the policy is only to include information for the supported versions, i.e., the two latest versions. Fixes https://github.com/scylladb/scylladb/issues/20804 Closes scylladb/scylladb#20806	2024-09-25 15:39:23 +03:00
Pavel Emelyanov	ae76481444	Merge 'treewide: add "table" parameter to "backup" API ' from Kefu Chai with this parameter, "backup" API can backup the given table, this enables it to be a drop-in replacement of existing rclone API used by scylla manager. Fixes https://github.com/scylladb/scylladb/issues/20636 --- this change is a part of the efforts to bring the native backup/restore to scylla, no need to backprt. Closes scylladb/scylladb#20661 * github.com:scylladb/scylladb: backup_task: fix the indent treewide: add "table" parameter to "backup" API	2024-09-25 10:53:38 +03:00
Takuya ASADA	f6fe4d9e73	toolchain: fix broken INSTALL_FROM mode We found that --clang-build-mode INSTALL_FROM tries to rebuild clang even we use an archive of prebuilt image. Seems like it is because ninja detected changes on standard library headers, which updated when we build new frozen toolchain container image. To avoid such unnecessary rebuild, we should stop archive whole clang build directory, we should archive install image instead. To do so, we can use "DESTDIR=<sysroot dir> ninja install-distribution-stripped", and archive sysroot dir as clang archive. Fixes #20421 Closes scylladb/scylladb#20422	2024-09-25 10:48:56 +03:00
Anna Stuchlik	da8047a834	doc: add an intro to the Features page This commit modifies the Features page in the following way: - It adds a short introduction and descriptions to each listed feature. - It hides the ToC (required to control and modify the information on the page, e.g., to add descriptions, have full control over what is displayed, etc.) - Removes the info about Enterprise features (following the request not to include Enterprise info in the OSS docs) Fixes https://github.com/scylladb/scylladb/issues/20617 Blocks https://github.com/scylladb/scylla-enterprise/pull/4711 Closes scylladb/scylladb#20635	2024-09-25 08:50:21 +03:00
Aleksandra Martyniuk	3195ebd04e	node_ops: make node_ops tasks type more human-friendly Currently, node ops tasks type is retrieved from topology_request without any change. Use respective node operation name instead. Closes scylladb/scylladb#20671	2024-09-25 08:49:34 +03:00
Kamil Braun	69b4769418	test: fix `topology_custom/test_raft_recovery_stuck` flakiness The test performs consecutive schema changes in RECOVERY mode. The second change relies on the first. However the driver might route the changes to different servers and we don't have group 0 to guarantee linearizability. We must rely on the first change coordinator to push the schema mutations to other servers before returning, but that only happens when it sees other servers as alive when doing the schema change. It wasn't guaranteed in the test. Fix this. Fixes scylladb/scylladb#20791 Should be backported to all branches containing this test to reduce flakiness. Closes scylladb/scylladb#20792	2024-09-25 08:45:37 +03:00
Kefu Chai	54858b8242	backup_task: fix the indent Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-25 09:11:26 +08:00
Kefu Chai	d663b6c13b	treewide: add "table" parameter to "backup" API with this parameter, "backup" API can backup the given table, this enables it to be a drop-in replacement of existing rclone API used by scylla manager. in this change: * api/storage_service: add "table" parameter to "backup" API. * snapshot_ctl: compose the full path of the snapshot directory in `snapshot_ctl::start_backup`. since we have all the information for composing the snapshot directory, and what the `backup_task_impl` class is interested is but the snapshot directory, we just pass the path to it instead the individual components of the directory. * backup_task_impl: instead of scan the whole keyspace recursively, only scan the specified snapshot directory. Fixes scylladb/scylladb#20636 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-25 09:11:26 +08:00
Avi Kivity	d16ea0afd6	Merge 'cql3: Extend DESC SCHEMA by auth and service levels' from Dawid Mędrek Auth has been managed via Raft since Scylla 6.0. Restoring data following the usual procedure (1) is error-prone and so a safer method must have been designed and implemented. That's what happens in this PR. We want to extend `DESC SCHEMA` by auth and service levels to provide a safe way to backup and restore those two components. To realize that, we change the meaning of `DESC SCHEMA WITH INTERNALS` and add a new "tier": `DESC SCHEMA WITH INTERNALS AND PASSWORDS`. * `DESC SCHEMA` -- no change, i.e. the statement describes the current schema items such as keyspaces, tables, views, UDTs, etc. * `DESC SCHEMA WITH INTERNALS` -- does the same as the previous tier and also describes auth and service levels. No information about passwords is returned. * `DESC SCHEMA WITH INTERNALS AND PASSWORDS` -- does the same as the previous tier and also includes information about the salted hashes corresponding to the passwords of roles. To restore existing roles, we extend the `CREATE ROLE` statement by allowing to use the option `WITH SALTED HASH = '[...]'`. --- Implementation strategy: * Add missing things/adjust existing ones that will be used later. * Implement creating a role with salted hash. * Add tests for creating a role with salted hash. * Prepare for implementing describe functionality of auth and service levels. * Implement describe functionality for elements of auth and service levels. * Extend the grammar. * Add tests for describe auth and service levels. * Add/update documentation. --- (1): https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/backup-restore/restore.html In case the link stops working, restoring a schema was realised by managing raw files on disk. Fixes scylladb/scylladb#18750 Fixes scylladb/scylladb#18751 Fixes scylladb/scylladb#20711 Closes scylladb/scylladb#20168 * github.com:scylladb/scylladb: docs: Update user documentation for backup and restore docs/dev: Add documentation for DESC SCHEMA test: Add tests for describing auth and service levels cql3/functions/user_function: Remove newline character before and after UDF body cql3: Implement DESCRIBE SCHEMA WITH INTERNALS AND PASSWORDS auth: Implement describing auth auth/authenticator: Add member functions for querying password hash service/qos/service_level_controller: Describe service levels data_dictionary: Remove keyspace_element.hh treewide: Start using new overloads of describe treewide: Fix indentation in describe functions treewide: Return create statement optionally in describe functions treewide: Add new describe overloads to implementations of data_dictionary::keyspace_element treewide: Start using schema::ks_name() instead of schema::keyspace_name() cql3: Refactor `description` cql3: Move description to dedicated files test: Add tests for `CREATE ROLE WITH SALTED HASH` cql3/statements: Restrict CREATE ROLE WITH SALTED HASH auth: Allow for creating roles with SALTED HASH types: Introduce a function `cql3_type_name_without_frozen()` cql3/util: Accept std::string_view rather than const sstring&	2024-09-24 21:44:32 +03:00
Tomasz Grabiec	bca8258150	Merge 'tablet: Fix single-sstable split when attaching new unsplit sstables' from Raphael "Raph" Carvalho To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes #20626. Closes scylladb/scylladb#20737 * github.com:scylladb/scylladb: tablet: Fix single-sstable split when attaching new unsplit sstables replica: Fix tablet split execute after restart	2024-09-24 19:46:11 +02:00
Abhinav	36d68ec955	raft topology: add error for removal of non-normal nodes In the current scenario, We check if a node being removed is normal on the node initiating the removenode request. However, we don't have a similar check on the topology coordinator. The node being removed could be normal when we initiate the request, but it doesn't have to be normal when the topology coordinator starts handling the request. For example, the topology coordinator could have removed this node while handling another removenode request that was added to the request queue earlier. This commit intends to fix this issue by adding more checks in the enqueuing phase and return errors for duplicate requests for node removal. This PR fixes a bug. Hence we need to backport it. Fixes: scylladb/scylladb#20271 Closes scylladb/scylladb#20500	2024-09-24 16:11:19 +02:00
Botond Dénes	24ac408a08	Revert "[script/pull_github_pr.sh] Check Gating status before merging" This reverts commit `ec0bb42b45`. This patch broke maintainer workflows, it needs more work before it can land.	2024-09-24 16:53:02 +03:00
Artsiom Mishuta	c07306582b	test.py: deselect remove_data_dir_of_dead_node event Deselect remove_data_dir_of_dead_node event from test_random_failures due to issue scylladb/scylladb#20751 Closes scylladb/scylladb#20790	2024-09-24 14:49:00 +02:00
Dawid Mędrek	1ef51be1d7	docs: Update user documentation for backup and restore We update the relevant articles addressing backing-up and restoring the schema by specifying that the user performing it must be a superuser. We also update the required version of cqlsh. Additionally, we add an article covering the fundamental information on `DESCRIBE SCHEMA`.	2024-09-24 14:21:15 +02:00
Dawid Mędrek	5e1d7f109a	docs/dev: Add documentation for DESC SCHEMA We add documentation for developers addressing `DESCRIBE SCHEMA`. It covers the following aspects of it: * motivation, * synopsis of the solution, * implementation of the solution, as well as a few subsections explaining the details: * restoring process and its side effects, * restoring roles with passwords, * list of statements generated by `DESC SCHEMA` with examples, * implementation details.	2024-09-24 14:18:01 +02:00
Dawid Mędrek	d42f1604ad	test: Add tests for describing auth and service levels We add tests verifying the following features work correctly: * describing auth: roles, role grants, granting permissions on resources, * describing service levels: creating them and attaching to roles.	2024-09-24 14:18:01 +02:00
Dawid Mędrek	10d13f541b	cql3/functions/user_function: Remove newline character before and after UDF body We remove newline characters that are printed before and after a UDF's body. This way, we want to keep the create statement as close to what was actually provided as possible. Although there should be no semantic differences with or without the newline characters, it's a lot more convenient in testing when they're not present. Fixes scylladb/scylladb#20711	2024-09-24 14:18:01 +02:00
Dawid Mędrek	be851cef10	cql3: Implement DESCRIBE SCHEMA WITH INTERNALS AND PASSWORDS When executing `DESC SCHEMA WITH INTERNALS`, Scylla now also returns statements that can be used to recreate service levels and restore the state of auth. That encompasses granting roles and permissions as well as attaching service levels to roles. If the additional parameter `WITH PASSWORDS` is provided, the statements corresponding to recreating roles in the system will also contain the stored salted hashes.	2024-09-24 14:18:01 +02:00
Dawid Mędrek	2a27d4b4d6	auth: Implement describing auth We introduce a function `describe_auth()` in `auth::service` responsible for producing a sequence of descriptions whose corresponding CQL statement can be used to restore the state of auth.	2024-09-24 14:17:58 +02:00
Nadav Har'El	b70ab7bd64	test/boost: add README.md Add a README.md in test/boost, giving a short introduction to what this directory is and what kind of tests it contains, and how to run individual tests. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20550	2024-09-24 15:16:55 +03:00
Pavel Emelyanov	39dc340424	test: Move other collection-testing headers from unit to boost Simple and straightforward. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:13 +03:00
Pavel Emelyanov	f0d60c2b4d	test: Move stress-collecton header from unit to boost Now all its users are in boost suite. Once moved, the stress_collection() function no longer runs in seastar thread, and the in_thread argument is removed while the function is moved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:13 +03:00
Pavel Emelyanov	4cf4b7d4ef	test: Move B+tree compactiont test from unit to boost This time the boost test needs to stop being pure-boost test, since bptree compaction test case needs to run in seastar thread. Other collection tests are already such, not bptree_test joins the party. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:13 +03:00
Pavel Emelyanov	d1f727669c	test: Move radix tree compactiont test from unit to boost No surprises here, just move the code and hard-code default args. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:13 +03:00
Pavel Emelyanov	bdcf965318	test: Move B-tree compactiont test from unit to boost This test must run in seastar thread, so put it in seastar-thread test case, fortunately btree test allows that. Just like its stress peer, this test also has two invocations from suite, so make it two distinct test cases as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:13 +03:00
Pavel Emelyanov	328b5b71d7	test: Move radix tree stress test from unit to boost Just move the code. Test "scale" is also taken from default unit test arguments. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:13 +03:00
Pavel Emelyanov	023cc99514	test: Move B-tree stress test from unit to boost This also moves the code, but takes into account the stress test had two invovations with suite options -- small and large. Inherit both with two distinct test cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:42:12 +03:00
Pavel Emelyanov	72cb835c1e	test: Move b+tree stress test from unit to boost Just move the code. And hard-code the "scale" (i.e. -- number of keys and iterations) from default arguments of the unit test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:31:33 +03:00
Pavel Emelyanov	f0526bf6a4	test: Add bool in_thread argument to stress_collection function This code is going to be shared between seastar thread and boost tests, temporarily. So not to yield in pure boost test, add the switch. It will be removed really soon. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-24 13:31:33 +03:00
Tomasz Grabiec	bd6eeb4730	Merge 'Separate schema merging logic' from Marcin Maliszkiewicz This patch doesn't yet change how schema merging works but it prepares the ground for it by simplifying the code and separating merging logic into its own unit. It consists of: - minor cleanups of unused code - moving code into separate file - simplifying merge_keyspaces code More detailed explanation in per commit messages. Relates scylladb/scylladb#19153 Closes scylladb/scylladb#19687 * github.com:scylladb/scylladb: db: schema_applier: simplify merge_keyspaces function db: schema_applier: remove unnecessary read in merge_keyspaces db: schema_tables: move scylla specific code into create keyspace function db: move schema merging code into a separate unit db: schema_tables: export some schema management functions replica: remove unused table_selector forward declaration db: remove unused flush arg from do_merge_schema func db: remove unused read_arg_values function	2024-09-24 11:43:06 +02:00
Michał Jadwiszczak	d7945eea2a	docs/dev/service_levels: replace `unspecified` workload type with `NULL` `unspecified` workload type is an internal value and it's not exposed to user via CQL. Default value for workload type from user's perspective is `NULL`. Fixes scylladb/scylladb#20780	2024-09-24 11:43:29 +03:00
Yaron Kaikov	ec0bb42b45	[script/pull_github_pr.sh] Check Gating status before merging Maintainers use scripts/pull_github_pr.sh from scylladb.git when merging PRs and before pushing to the next. We want to prevent merges from piling up on top of unstable builds. This change will check Gating's current status and notify the maintainers Related to https://github.com/scylladb/scylla-pkg/issues/3644 Closes scylladb/scylladb#20742	2024-09-24 08:39:47 +03:00
Pavel Emelyanov	9fd8eba3ec	proxy: Don't keep truncate timeout as optional argument Because it is never such -- the only caller of truncate_blocking() always knows the timeout it want this method to use. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20620	2024-09-24 08:25:54 +03:00
Pavel Emelyanov	d64529f370	Merge 'sstables/sstables.hh: Remove unused forward declarations' from Nikos Dragazis Code cleanup, no backport needed. Closes scylladb/scylladb#20767 * github.com:scylladb/scylladb: sstables: Remove forward declaration for random_access_reader sstables: Remove forward declaration for metadata_collector sstables: Remove forward declaration for sstables_manager sstables: Remove forward declaration for sstable_writer_v2 sstables: Remove forward declaration for key	2024-09-24 07:44:56 +03:00
Andrei Chekun	da2397005b	test.py: Remount cgroup before changing files ownership Change order of functions: firstly remount, then change ownership for cgroup. It was not failing before because with privileged mode, it will mount cgroups as RW, but it's better to have this check if behavior will change. Closes scylladb/scylladb#20676	2024-09-24 07:27:24 +03:00
Kefu Chai	657ea95f4c	main: coroutinize read_config() for better readability. read_config() is not on the critical path, so the performance degradation caused by C++20 couroutine is neglectable. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20694	2024-09-24 06:30:34 +03:00
Gleb Natapov	1213f02a5a	test: skip test_lwt_semaphore::test_cas_semaphore in aarch64 debug mode The test configures write timeout to much smaller value to make the test run faster since for some writes sleep is inserted to hit the timeout, but it makes aarch64 debug flaky since timeout happens when it should not because of a natural slowness. Fixes scylladb/scylladb#20515 Closes scylladb/scylladb#20744	2024-09-23 20:46:55 +02:00
Avi Kivity	5c329e3db0	Merge 'Put sstables::test class on a diet' from Pavel Emelyanov This one is aimed at giving tests the ability to call private methods of class sstable. Some of the wrappers in the test class wrap public methods and can be removed. Closes scylladb/scylladb#20614 * github.com:scylladb/scylladb: test: Remove sstables::test::binary_search() test: Remove sstables::test::move_summary() test: Remove sstables::test::read_toc() test: Remove sstables::test::get_summary() test: Remove sstables::test::get_statistics() test: Remove sstables::test::data_read()	2024-09-23 21:40:40 +03:00
Yaniv Michael Kaul	26f2cbdfe2	optimized_clang.sh: compile with -march Add for both x86_64 compilation flags for clang, to get it compile with newer arch x86_64-v3 for x86 and ARM 8.2 level for aarch64. Tested to compile fine with both clang 18.1.6 and 18.1.8. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20682	2024-09-23 17:40:20 +03:00
Paweł Zakrzewski	16dd58fb0d	cql3: respect the user-defined page size in aggregate queries This change allows the user to fully set the page size for the query. There's still an internal hard-limit of 1MB anyway, so there's no need to limit it to our default value (because using a larger page size might be a query optimization sometimes) Fixes #20612 Closes scylladb/scylladb#20692	2024-09-23 16:31:21 +03:00
Botond Dénes	64ed3f80c7	Merge 'Coroutinize sstable_directory::remove_unshared_sstables()' from Pavel Emelyanov This one is pretty simple ``` return do_with(std::move(data), [] { toss_data(data); return remove(std::move(data)); }); ``` it doesn't really need to do_with() since "toss_data" is non-preemptive. Still, convert it into ``` toss_data(data); co_await remove(std::move(data)); ``` Closes scylladb/scylladb#20479 * github.com:scylladb/scylladb: sstables: Restore indentation after previous patch sstables: Coroutinize remove_unshared_sstables()	2024-09-23 16:15:46 +03:00
Kefu Chai	1aa030a8cd	docs: explain precedence of configure options to explain for instance which setting takes effect if both command line options and `scylla.yaml` configures the same parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20696	2024-09-23 16:12:44 +03:00
Yaniv Michael Kaul	85c0bb7ff4	optimized_clang.sh: add missing symbolic links to clang (for ccache) The removal of clang removes the symblic links ccache uses to mask itself as clang/clang++ Manually add them back, so ccache can work. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Fixes: https://github.com/scylladb/scylladb/issues/20490 Closes scylladb/scylladb#20491	2024-09-23 15:55:22 +03:00
Nikos Dragazis	1e4b67dd8a	sstables: Remove forward declaration for random_access_reader The sstables header contains a forward declaration for `random_access_reader`. This was introduced in `75dc7b799e` for no obvious reason. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-23 15:28:43 +03:00
Nikos Dragazis	83ccd5bcca	sstables: Remove forward declaration for metadata_collector The sstables header contains a forward declaration for `metadata_collector`. This was introduced in `2d6608bb88` for the return value of the `sstable_writer::get_metadata_collector()`. This function was later removed in `9e7144f719` but the forward declaration was left behind. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-23 15:28:43 +03:00
Nikos Dragazis	90aff33cb0	sstables: Remove forward declaration for sstables_manager This is a duplicate. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-23 15:28:43 +03:00
Nikos Dragazis	4efca437c8	sstables: Remove forward declaration for sstable_writer_v2 The sstables header contains a forward declaration for `sstable_writer_v2`. This was introduced in `fed5b73147` but never used. It is probably a leftover from a previous revision of the patchset. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-23 15:28:43 +03:00
Nikos Dragazis	fda98ba9f6	sstables: Remove forward declaration for key The sstables header contains a forward declaration for `key`. This was introduced in `198f55dc5c` for a reference parameter in `binary_search()`. The function was eventually moved to a different header in `4ed7e529db` but the forward declaration was left behind. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-23 15:28:26 +03:00
Piotr Dulikowski	d1c7e2effa	configure.py: deduplicate --out-final-name arg added in build.ninja Every time the ninja buildfile decides it needs to be updates, it calls the configure.py script with roughly the same set of flags. However, the --out-final-name flag is improperly handled and, on each reconfigure, one more --out-final-name flag is appended to the rebuild command. This is harmless because each instance of the flag will specify the same parameter, but slightly annoying because it bloats the generated file and the duplicated flags show up in ninja's output when reconfigure runs. Fix the problem by stripping the --out-final-name flags from the set of the flags passed to the configure.py before forwarding them to the reconfigure rule. Closes scylladb/scylladb#20731	2024-09-23 15:05:13 +03:00
Dawid Mędrek	90ce86930a	auth/authenticator: Add member functions for querying password hash We add new member functions to the interface of `auth::authenticator` responsible for querying the password hash corresponding to a given role. One method indicates whether a given authenticator uses password hashes, while the other queries them or throws an exception password hashes are not used. The rationale for extending the interface of authenticator is to be able to access salted hashes from other parts of auth. We will need them in an upcoming commit responsible for describing auth.	2024-09-23 13:55:52 +02:00
Dawid Mędrek	6517ca8920	service/qos/service_level_controller: Describe service levels We implement a member function responsible for producing instances of `cql3::description` that can be used to restore service levels.	2024-09-23 13:55:49 +02:00
Kefu Chai	40f2d4c988	build: cmake: drop scylla-jmx from the build in `3cd2a61736`, we dropped scylla-jmx from the build. but didn't update the CMake building system accordingly, this broke the CMake build, as the dependencies pointing to jmx cannot be found or fulfilled. in this change, we remove all references to jmx in the CMake build. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20736	2024-09-23 14:20:42 +03:00
Marcin Maliszkiewicz	2df8eefd67	db: schema_applier: simplify merge_keyspaces function - removes uneccesary temporary sets/vectors - removes auto&& - moves return value instead of copying - instead adds diff references to keep readability - create and alter logic is almost the same, now it's visible better	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	7225538845	db: schema_applier: remove unnecessary read in merge_keyspaces read_schema_partition_for_keyspace() is already called for every changing keyspace by get_schema_complete_view() and stored in _after field so we can reuse this data.	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	f49822f78d	db: schema_tables: move scylla specific code into create keyspace function Since extract_scylla_specific_keyspace_info() was always coupled with create_keyspace_from_schema_partition() there is no value in separating them. By moving first into the latter we: - reduce number of exported functions - simplify arguments of create_keyspace_from_schema_partition - simplify caller's code	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	9792d720c9	db: move schema merging code into a separate unit It's mostly self containted and it's easier to maintain reasonably sized files. Also splitting better shows boundaries between schema and schema merging code.	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	208050f190	db: schema_tables: export some schema management functions In subseqent commits schema merging code will be separated from db/schema_tables.cc but code which manages schema will remain intact. So those two translation units will share some amount of code. It's similar case as with replica/database.cc which creates schema on startup, it calls functions from db/schema_tables.cc. Struct qualified_name got moved to header as it's used as read_table_mutations() argument.	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	258ffbd126	replica: remove unused table_selector forward declaration	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	4630864b58	db: remove unused flush arg from do_merge_schema func	2024-09-23 12:01:36 +02:00
Marcin Maliszkiewicz	4cce9c8b5a	db: remove unused read_arg_values function	2024-09-23 12:01:36 +02:00
Nadav Har'El	6496eab5ee	Merge 'Rename Alternator batch item count metrics' from Amnon Heiman This PR addresses multiple issues with alternator batch metrics: 1. Rename the metrics to scylla_alternator_batch_item_count with op=BatchGetItem/BatchWriteItem 2. The batch size calculation was wrong and didn't count all items in the batch. 3. Add a test to validate that the metrics values increase by the correct value (not just increase). This also requires an addition to the testing to validate ops of different metrics and an exact value change. Needs backporting to allow the monitoring to use the correct metrics names. Fixes #20571 Closes scylladb/scylladb#20646 * github.com:scylladb/scylladb: alternator:test_metrics test metrics for batch item count alternator:test_metrics Add validating the increased value alternator: Fix item counting in batch operations Alterntor rename batch item count metrics	2024-09-23 10:13:07 +03:00
Kefu Chai	2014d1c0cb	cql3: drop workaround for castas_fctn_simple() now that `e13a584ab7` has been merged, and our toolchain is based on the fedora 40 on 20240710, which should include this change. so let's drop the workaround from `51d09e6a` Refs #18508 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20750	2024-09-22 19:59:10 +03:00
Kefu Chai	fdc8773278	test/scylla_gdb: get table::_schema raw pointer with lw_shared_ptr This commit addresses an issue where accessing the raw pointer of the schema instance within `table::_schema` using `table.schema._p` was unreliable. before this change, `_p` was of type `lw_shared_ptr_counter_base`, a type-erased smart pointer, preventing direct casting to the underlying schema pointer. but we still cast it to `schema` anyway. this led to a gdb.MemoryError when dereferencing the deduced pointer: but the type of `_p` is `lw_shared_ptr_counter_base`, which is a type erased smart pointer, and it cannot be casted directly to the under pointer pointing to a `schema` instance. this results in: ``` Traceback (most recent call last): File "/home/avi/scylla/test/scylla_gdb/../../scylla-gdb.py", line 5554, in invoke self.print_key_type(seastar_lw_shared_ptr(schema['_clustering_key_type']).get().dereference(), 'clustering') File "/home/avi/scylla/test/scylla_gdb/../../scylla-gdb.py", line 5533, in print_key_type key_type = seastar_shared_ptr(key_type).get().dereference() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ gdb.MemoryError: Cannot access memory at address 0x4000079656b0078 ``` when we are dereferencing the raw pointer deduced this way. in this change, we use the wrapper of `seastar_lw_shared_ptr` to safely obtain the raw pointer. * reenable this test previously disabled by `3d781c4f` tested using ```console $ SCYLLA=/home/kefu/dev/scylladb/master/build/release/scylla \ test/scylla_gdb/run -o junit_suite_name=scylla_gdb test_misc.py::test_schema ``` on an up-to-date fedora 40 installation. Refs `3d781c4f` Fixes scylladb/scylladb#20741 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20746	2024-09-22 18:30:16 +03:00
Avi Kivity	657848dcbb	cql3: statement_restrictions, expr: move restrictions-related expression utilities out of expression.cc Move all of the blatantly restriction-related expression utilities to statement_restrictions.cc. Some are so blatant as to include the word "restriction" in their name. Others are just so specialized that they cannot be used for anything else. The motivation is that further refactoring will be simplified if it can happen within the same module, as there will not be a need to prove it has no effect elsewhere. Most of the declarations are made non-public (in .cc file) to limit proliferation. A few are needed for tests or in select_statement.cc and so are kept public. Other than that, the only changes are namespace qualifications and removal of a now-duplicate definition ("inclusive"). Closes scylladb/scylladb#20732	2024-09-22 11:00:51 +03:00
Avi Kivity	3d781c4fc8	Update frozen toolchain * tools/java e505a6d3bb...5b0e274f12 (1): > Merge 'build.xml: install and use java-11 when building' from Kefu Chai Updates to clang 18.1.8 + LLVM patch to match Fedora 40. New optimized clang build generated and stored in https://devpkg.scylladb.com/clang/clang-18.1.8-x86_64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.8-aarch64.tar.gz Due to the loss of the jmx submodule, we no longer install java-11-openjdk. We add it in install-dependencies.sh here to compensate, pending a better solution. tools/java submodule updated to remove build failure where Java 8 was selected instead of Java 11. The scylla_gdb test suite was disabled due to a regression in gdb 15, which is brought in by the toolchain update [1]. [1] https://github.com/scylladb/scylladb/issues/20741.	2024-09-21 20:07:28 +03:00
Raphael S. Carvalho	38ce2c605d	tablet: Fix single-sstable split when attaching new unsplit sstables To fix a race between split and repair here `c1de4859d8`, a new sstable generated during streaming can be split before being attached to the sstable set. That's to prevent an unsplit sstable from reaching the set after the tablet map is resized. So we can think this split is an extension of the sstable writer. A failure during split means the new sstable won't be added. Also, the duration of split is also adding to the time erm is held. For example, repair writer will only release its erm once the split sstable is added into the set. This single-sstable split is going through run_custom_job(), which serializes with other maintenance tasks. That was a terrible decision, since the split may have to wait for ongoing maintenance task to finish, which means holding erm for longer. Additionally, if split monitor decides to run split on the entire compaction group, it can cause single-sstable split to be aborted since the former wants to select all sstables, propagating a failure to the streaming writer. That results in new sstable being leaked and may cause problems on restart, since the underlying tablet may have moved elsewhere or multiple splits may have happened. We have some fragility today in cleaning up leaked sstables on streaming failure, but this single-sstable split made it worse since the failure can happen during normal operation, when there's e.g. no I/O error. It makes sense to kill run_custom_job() usage, since the single-sstable split is offline and an extension of sstable writing, therefore it makes no sense to serialize with maintenance tasks. It must also inherit the sched group of the process writing the new sstable. The inheritance happens today, but is fragile. Fixes #20626. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-09-20 23:03:01 -03:00
Raphael S. Carvalho	999f1f1318	replica: Fix tablet split execute after restart let's assume there are 2 nodes, n1, n2. n1 is the coordinator. 1) n1 emits split 2) n1 and n2 complete split work 3) n1 becomes aware all replicas are ready for split 4) n2 restarts, but places split sstable into main group[1] 5) n1 executes split 6) n2 handles split completion, but see the main group is not empty [1]: During split, main group should only contain unsplit sstables. If all sstables are split, main must be empty. This is a result of replica not setting storage group to split mode on restart (using tablet map) and therefore sstables are incorrectly placed on main group. The fix is about looking at tablet map and setting group to split mode before sstables are populated into it. Refs #20626. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-09-20 22:28:09 -03:00
Avi Kivity	cd861bc788	row_cache: coroutinize do_update() do_with() makes the change a no-brainer, and besides, it's called once per huge update. Closes scylladb/scylladb#20735	2024-09-21 00:07:02 +02:00
Botond Dénes	488a372fdc	tool/scylla-nodetool: status: reorder endpoint calls to match old nodetool Old nodetool requested `/storage_service/tokens_endpoing` first, then `/storage_service/host_id`, while the native nodetool did it in reverse order. Most of the time this is inconsequential but there is an edge case when a node's IP address is changed. This reversing of the order results in unexpected behavior for tests, causing noise via flaky tests. Match the order of the old nodetool so that the native nodetool exhibits the behavior expected by tests (and users too probably). Fixes: scylladb/scylladb#18693 Closes scylladb/scylladb#20615	2024-09-20 15:07:16 +02:00
Dawid Mędrek	b357307406	data_dictionary: Remove keyspace_element.hh The interface is not used anywhere anymore, so we can remove it safely. It has been replaced by custom functions for each keyspace element and `cql3::description`.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	7b4f9c806c	treewide: Start using new overloads of describe We continue removing `data_dictionary::keyspace_element`. In this commit, we start using the overloads returning `cql3::description` in places where the methods specified by `data_dictionary::keyspace_element` were used.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	df94e92b06	treewide: Fix indentation in describe functions After modifying new functions for generating `cql3::description`, we fix indentation in them in this commit.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	86722e4cea	treewide: Return create statement optionally in describe functions We add a new parameter in functions used to generate instances of `cql3::description` for types related to situations where we might not need a create statement. An example of such a scenario could be `DESCRIBE TYPES`.	2024-09-20 14:24:54 +02:00
Dawid Mędrek	0702e93e32	treewide: Add new describe overloads to implementations of data_dictionary::keyspace_element We're removing `data_dictionary::keyspace_element`. Before we can do that, we need to substitute the existing methods used for describing keyspace elements with their new versions returning `cql3::description`. That's what happens in this commit.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	39cf106151	treewide: Start using schema::ks_name() instead of schema::keyspace_name() We're going to remove the interface `data_dictionary::keyspace_element`. As `schema::keyspace_name()` is an implementation of one of the methods specified by that interface, we replace its uses by `schema::ks_name()`. `schema::keyspace_name()` was an alias for it, so no semantic change has occured.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	1844c71f9a	cql3: Refactor `description` In these changes, we describe the purpose of the type and make it reusable for other parts of the code. That includes ditching the existing constructors, leaving the formatting of its fields to the user of the interface. The removed constructors have been replaced by free functions so that existing code can still use them the way it did before.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	05d6794e65	cql3: Move description to dedicated files We move the declaration of `description` to dedicated files to be able to create instances of it from other parts of the code. `describe_statement.cc` has been functioning as an intermediary between objects that can be described and the end user. It will still perform that duty, but we want to let other modules be able to generate descriptions on their own, without having to share an additional layer of abstraction in form of types inheriting from `data_dictionary::keyspace_element`. Those types may not perform any other function than that and thus may be redundant. Adjusting `description` to its new purpose will happen in an upcoming commit.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	78ab1ee8b7	test: Add tests for `CREATE ROLE WITH SALTED HASH`	2024-09-20 14:24:53 +02:00
Dawid Mędrek	47a5469280	cql3/statements: Restrict CREATE ROLE WITH SALTED HASH We start requiring that the user issuing `CREATE ROLE WITH SALTED HASH` be a superuser. The rationale for that is the statement directly modifies a system tables, circumventing the hashing algorithm. Additionally, we correct a possible existing problem. `_options.is_superuser` in `create_role_statement` may be an empty optional, so dereferencing it without a prior check could lead to undefined behavior in the future.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	206fdf2848	auth: Allow for creating roles with SALTED HASH We introduce a way to create a role with explictly provided salted hash. The algorithm for creating a role with a password works like this: 1. The user issues a statement `CREATE ROLE <role> WITH PASSWORD = '<password>' <...>`. 2. Scylla produces a hash based on the value of `<password>`. 3. Scylla puts the produced hash in `system.roles`, in the column `salted_hash`. The newly introduced way to create a role is based on a new form of the create statement: `CREATE ROLE <role> WITH SALTED HASH = '<salted_hash>` The difference in the algorithm used for processing this statement is that we insert `<salted_hash>` into `system.roles` directly, without hashing it. The rationale for introducing this new statement is that we want to be able to restore roles. The original password isn't stored anywhere in the database (as intended), so we need to rely on the column `salted_hash`.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	35a92d189e	types: Introduce a function `cql3_type_name_without_frozen()` The introduced function returns the actual name of the type represented by `abstract_type`. It circumvents name processing like wrapping a type within `frozen<>` or using Cassandra's syntax. We add the function to be able to describe UDFs in the upcoming commits that require that their arguments not be `frozen<>`. We also test the implementation.	2024-09-20 14:24:53 +02:00
Dawid Mędrek	202d866892	cql3/util: Accept std::string_view rather than const sstring&	2024-09-20 14:24:53 +02:00
Avi Kivity	61d19e4464	Update tools/java submodule * tools/java 0b4accdd5e...e505a6d3bb (1): > [C-S] Make it use DCAwareRoundRobinPolicy unless rack is provided	2024-09-20 14:49:21 +03:00
Pavel Emelyanov	b45891acd7	sstables: storage: Don't keep base directory in base class This reverts commit `44bd183187` and moves the base directory back on filesystem_storage. The mentioned commit says > so we can use the base (table) directory for > e.g. pending_delete logs, in the next patch. but "next patch" doesn't use it outside of the filesystem-storage anyway. This field doesn't make sense for S3 backend. Its "location" is not location, but a key in the system.sstables, which should rather be schema ID, not /var/lib/.../keyspace/table-uuid string. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20642	2024-09-20 11:51:04 +03:00
Andrei Chekun	bd9a73c39b	Add .idea folder to .gitignore .idea directory used by JetBrains IDE's to store data about project config Closes scylladb/scylladb#20718	2024-09-20 11:49:41 +03:00
Tomasz Grabiec	8e047e8fff	gdb: Add std::set wrapper Allows accessing std::set fields from gdb, e.g.: (gdb) python for e in std_set(_promoted_index._blocks): print(e) Closes scylladb/scylladb#20650	2024-09-20 08:24:15 +03:00
Anna Stuchlik	5da7894f70	doc: move the install-jmx instructions to a common folder This commit moves the install-jmx.rst file from the install-scylla folder to the installation-common folder. All the references to the moved document are updated. This is a follow-up to https://github.com/scylladb/scylladb/pull/17969/ Closes scylladb/scylladb#20712	2024-09-20 00:36:32 +03:00
Nadav Har'El	3499c407f7	test: avoid silly "no_mode.1" labels when running tests outside test.py For the benefit of running test.py inside CI, we recently added to test/cql-pytest and test/alternator the knowledge of which "Scylla mode" (--mode) and "run number" is running (--run_id), although these concepts are alien to these two test frameworks (remember that those test frameworks can also run tests against unknown versions of Scylla or even our competitors' implementations). One unfortunate result of this change is that now if you run a test by using pytest directly (or test/*/run) instead of test.py, for example: $ cd test/alternator $ pytest --aws test_item.py::test_basic_string_put_and_get The test's success or failure reports the ugly name test_item.py::test_basic_string_put_and_get.no_mode.1 This unnecessary "no_mode.1" come from the the default values for --mode and --run_id, respectively. But there is no reason for these silly defaults. In this patch we change these defaults to None, and when they are None, they aren't tacked onto the test's name. This patch shouldn't affect running tests through test.py, because test.py always sets the --mode and --run_id options, and doesn't leave them as the default. Fixes #20512 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20513	2024-09-20 00:36:32 +03:00
Avi Kivity	b015c85d31	Merge 'gms: inet_address: drop unused raw_addr method and modernize comperators' from Benny Halevy Drop the unused `gms::inet_address::raw_addr` method and modernize operator== and operator< as class methods * Cleanup only, no backport needed Closes scylladb/scylladb#20681 * github.com:scylladb/scylladb: gms: inet_address: modernize comparison operators gms: inet_address: drop unused raw_addr method	2024-09-20 00:36:32 +03:00
Piotr Dulikowski	7e7701d436	Merge 'cql3/statements/select_statement: `SELECT ... USING SERVICE LEVEL`' from Michał Jadwiszczak Allow to specify service level used in select statement `SELECT ... USING SERVICE LEVEL sl_name`. In OSS, this only affects statement's timeout. In case both service level and timeout are specified `SELECT ... USING SERVICE LEVEL sl_name AND TIMEOUT 1h`, the timeout has higher priority as statement's timeout. Fixes scylladb/scylladb#18471 Closes scylladb/scylladb#20523 * github.com:scylladb/scylladb: test/cql-pytest: add test for `SELECT ... USING SERVICE LEVEL` cql3/Cql.g: extend grammar to allow `SELECT ... USING SERVICE LEVEL` cql3/statements/select_statement: use service level timeout cql3/attributes: add service level name field qos/service_level_controller: add method to check if service level exists in cache	2024-09-19 18:19:23 +02:00
Pavel Emelyanov	bd720dd2da	Merge 'cql3: statement_restrictions: adapt to functional style' from Avi Kivity The statement_restrictions class started life in the object-oriented style - an object that interacts with its environment via mutators and is observed via observers. This is however not suitable for its objective: to analyze the WHERE clause, select a query plan, and partition the WHERE clause atoms to the various parts demanded by the query plan (read_command and filters). Furthermore, the object oriented style makes it hard to work with as you can only call some observers after the related mutators were called. Fix this by transforming the code info a more functional style: we call a function that returns an immutable statement_restrictions object that can only be observed. This makes it easier to further change in the future, as changes will not have to consider interaction with the environment. No backport as this is a refactoring Closes scylladb/scylladb#20672 * github.com:scylladb/scylladb: cql3: statement_restrictions: use functional style cql3: statement_restrictions: calculate the index only once cql3: statement_restrictions: make it a const object	2024-09-19 18:18:28 +03:00
Kefu Chai	8cc9d783a0	sstables/sstable_directory: document components_lister::process() for better maintainability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20693	2024-09-19 18:11:31 +03:00
Kefu Chai	7985aa97b1	main, test: use seastar::handle_signal() instead use `seastar::handle_signal()` instead of `reactor::handle_signal()`. in a recent change in seastar (c3e826ad1197f2610138f3bcfaeb0b458f8fb799), the later was marked as deprecated in favor of the former, so let's use the recommended API. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20695	2024-09-19 18:10:07 +03:00
Kefu Chai	1fd1698a90	test: btree: use BOOST_DATA_TEST_CASE() when appropriate instead grouping tests with different parameters, let's parameterize them using `BOOST_DATA_TEST_CASE()`, simpler this way. and the tests can be more structured. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20697	2024-09-19 18:09:05 +03:00
Avi Kivity	6f7c2ce0aa	Merge 'cql_server::connection: Process rebounce message in case of multiple shard migrations' from Sergey Zolotukhin During a query execution, the query can be re-bounced to another shard if the requested data is located there. Previous implementation assumed that the shard cannot be changed after first re-bounce, however with the introduction of Tablets, data could be migrated to another shard after the query was already re-bounced, causing a failure of the query execution. To avoid this issue, the query is re-bounced as needed until it is executed on the correct shard. Fixes #15465 Closes scylladb/scylladb#20493 * github.com:scylladb/scylladb: cql_server: Add a test for multiple query msg rebounces. cql_server::connection: process: rebounce msg if needed cql_server::connection: process: co-routinize connection::process_on_shard cql_server: connection: process: fixup indentation cql_server: connection: process_on_shard: drop permit parameter transport: server: pass bounce_to_shard as foreign shared ptr cql_server: connection: process: add template concept for process_fn cql_server: move process_fn_return_type to class definition	2024-09-19 17:27:55 +03:00
Gleb Natapov	1b4c255ffd	test: amend test_replace_reuse_ip test to check that there is no stale writes after snapshot transfer starts	2024-09-19 15:24:59 +03:00
Gleb Natapov	c0939d86f9	topology coordinator:: mark node as being replaced earlier Before `17f4a151ce` the node was marked as been replaced in join_group0 state, before it actually joins the group0, so by the time it actually joins and starts transferring snapshot/log no traffic is sent to it. The commit changed this to mark the node as being replaced after the snapshot/log is already transferred so we can get the traffic to the node while it sill did not caught up with a leader and this may causes problems since the state is not complete. Mark the node as being replaced earlier, but still add the new node to the topology later as the commit above intended.	2024-09-19 15:23:48 +03:00
Gleb Natapov	644e7a2012	topology coordinator: do metadata barrier before calling finish_accepting_node() during replace During replace with the same IP a node may get queries that were intended for the node it was replacing since the new node declares itself UP before it advertises that it is a replacement. But after the node starts replacing procedure the old node is marked as "being replaced" and queries no longer sent there. It is important to do so before the new node start to get raft snapshot since the snapshot application is not atomic and queries that run parallel with it may see partial state and fail in weird ways. Queries that are sent before that will fail because schema is empty, so they will not find any tables in the first place. The is pre-existing and not addressed by this patch.	2024-09-19 15:00:27 +03:00
Benny Halevy	574a08ed96	storage_service: rebuild: warn about tablets-enabled keyspaces Until we automatically support rebuild for tablets-enabled keyspaces, warn the user about them. The reason this is not an error, is that after increasing RF in a new datacenter, the current procedure is to run `nodetool rebuild` on all nodes in that dc to rebuild the new vnode replicas. This is not required for tablets, since the additional replicas are rebuilt automatically as part of ALTER KS. However, `nodetool rebuild` is also run after local data loss (e.g. due to corruption and removal of sstables). In this case, rebuild is not supported for tablets-enabled keyspaces, as tablet replicas that had lost data may have already been migrated to other nodes, and rebuilding the requested node will not know about it. It is advised to repair all nodes in the datacenter instead. Refs scylladb/scylladb#17575 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20375	2024-09-19 14:25:46 +03:00
Pavel Emelyanov	8487f2fd93	treewide: Remove table::config::datadir It's write-only now, all the places than wanted to know where table's storage is, already use storage_options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Pavel Emelyanov	350f64c38b	distributed_loader: Print storage options, not datadir When populating keyspace on boot the dist. loader prints a debugging message with ks:cf names, state and the directory from where it picks sstables. The last one is not extremely correct, as loading sstables from S3 happens from a bucket, not directory. So it's better to print the storage options, not the datadir string. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Pavel Emelyanov	b2fcfdcaa9	data_dictionary: Add formatter for storage_options Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Pavel Emelyanov	5046cfab4b	test: Construct table_for_tests with table storage options The only place that constructs table_for_tests is make_table_for_tests helper. It can and should prepare the correct storage options, because that's the last place where the target directory is still known. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Pavel Emelyanov	eaad4f348b	test: Generalize pair of make_table_for_tests helpers They only differ in a way they get target directory from -- one via argument, andother from test_env. Respectively, the latter can call the former. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Pavel Emelyanov	d9ef9bdd3b	tests: Add helper to get snapshot directory from storage options There's a bunch of tests that check the contents of snapshot directory after creating one. Add a helper for those that gets this directory via storage options, not table config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:39 +03:00
Pavel Emelyanov	a734fd5c9c	table: snapshot_exists: Get directory from storage options Similarly to snapshot_on_all_shards, the way snapshot directory is evaluated is changed to rely on storage options. Two ... assumptions are that when asking for non-local snapshot existance or for a snapshot of a virtual table, it's correct to return false instead of throwing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:06:09 +03:00
Pavel Emelyanov	24589cf00c	table: snapshot_on_all_shards: Get directory from storage options There are several things that are changed here - The target directory for snapshot is evaluated using table directory taken from its storage options, not from config - If the storage options are not "local", the snapshot_on_all_shards is failed early, it's impossible to snapshot sstables anyway - If the storage is not configured for the obtained local options, snapshotting is skilled, because it's a virtual table that's probably not supposed to have snapshots - The late failure to snapshot non-local sstables is converted into internal error, as this functionality cannot be executed as per previous change - The target path is created using fs::path operator/ overload, not by concatenating strings (it's minor change) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-19 13:05:16 +03:00
Anna Stuchlik	cdc69b4e06	doc: enable publishing docs for branch-6.2 This commit enables publishing documentation from branch-6.2. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed). Fixes https://github.com/scylladb/scylladb/issues/20643 No backport is required. Closes scylladb/scylladb#20647	2024-09-19 09:39:58 +03:00
Anna Stuchlik	400a14eefa	doc: update the unified installer instructions This commit updates the unified installer instructions to avoid specifying a given version. At the moment, we're technically unable to use variables in URLs, so we need to update the page each release. Fixes https://github.com/scylladb/scylladb/issues/20677 Closes scylladb/scylladb#20680	2024-09-19 09:28:44 +03:00
Anna Stuchlik	aa0c95c95c	doc: fix a broken link This commit fixes a link to the Manager by adding a missing underscore to the external link. Closes scylladb/scylladb#20656	2024-09-19 09:20:20 +03:00
Calle Wilund	60f8a9f39d	database: Also forced new schema commitlog segment on user initiated memtable flush Refs #20686 Refs #15607 In #15060 we added forced new commitlog segment on user initated flush, mainly so that tests can verify tombstone gc and other compaction related things, without having to wait for "organic" segment deletion. Schema commitlog was not included, mainly because we did not have tests featuring compaction checks of schema related tables, but also because it was assumed to be lower general througput. There is however no real reason to not include it, and it will make some testing much quicker and more predictable. Closes scylladb/scylladb#20691	2024-09-19 09:00:33 +03:00
Benny Halevy	5ccdf1cf1c	gms: inet_address: modernize comparison operators Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-18 17:07:51 +03:00
Benny Halevy	38540d89a1	gms: inet_address: drop unused raw_addr method Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-18 14:21:18 +03:00
Kefu Chai	b0696bd842	test: btree: use BOOST_DATA_TEST_CASE to structure parameterized tests for better readability. and for more structured tests. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20516	2024-09-18 14:16:28 +03:00
Pavel Emelyanov	eb22c2a8c8	Merge 'reader_concurrency_semaphore: improve the diagnostics dump' from Botond Dénes * Also dump diagnostics when a read times out while active (not queued). * Add the "Trigger permit" line, containing the details of the permit which caused the diagnostics dump (by e.g. timing out). * Add the "Identified bottleneck(s)" line, containing the identified bottlenecks which lead to permits being queued. This line is missing if no such bottleneck can be identified. * Document the new features, as well as the stat dump, which was added some time ago. Example of the new dump format: ``` INFO 2024-09-12 08:09:48,046 [shard 0:main] reader_concurrency_semaphore - Semaphore reader_concurrency_semaphore_dump_reader_diganostics with 8/10 count and 106192275/32768 memory resources: timed out, dumping permit diagnostics: Trigger permit: count=0, memory=0, table=ks.tbl0, operation=mutation-query, state=waiting_for_admission Identified bottleneck(s): memory permits count memory table/operation/state 3 2 26M ./push-view-updates-2/active 3 2 16M ks.tbl1/push-view-updates-1/active 1 1 15M ks.tbl2/push-view-updates-1/active 1 0 13M ks.tbl1/multishard-mutation-query/active 1 0 12M ks.tbl0/push-view-updates-1/active 1 1 10M ks.tbl3/push-view-updates-2/active 1 1 6060K ks.tbl3/multishard-mutation-query/active 2 1 1930K ks.tbl0/push-view-updates-2/active 1 0 1216K ks.tbl0/multishard-mutation-query/active 6 0 0B ks.tbl1/shard-reader/waiting_for_admission 3 0 0B ./data-query/waiting_for_admission 9 0 0B ks.tbl0/mutation-query/waiting_for_admission 2 0 0B ks.tbl2/shard-reader/waiting_for_admission 4 0 0B ks.tbl0/shard-reader/waiting_for_admission 9 0 0B ks.tbl0/data-query/waiting_for_admission 7 0 0B ks.tbl3/mutation-query/waiting_for_admission 5 0 0B ks.tbl1/mutation-query/waiting_for_admission 2 0 0B ks.tbl2/mutation-query/waiting_for_admission 8 0 0B ks.tbl1/data-query/waiting_for_admission 1 0 0B ./mutation-query/waiting_for_admission 26 0 0B permits omitted for brevity 96 8 101M total Stats: permit_based_evictions: 0 time_based_evictions: 0 inactive_reads: 0 total_successful_reads: 0 total_failed_reads: 0 total_reads_shed_due_to_overload: 0 total_reads_killed_due_to_kill_limit: 0 reads_admitted: 1 reads_enqueued_for_admission: 82 reads_enqueued_for_memory: 0 reads_admitted_immediately: 1 reads_queued_because_ready_list: 0 reads_queued_because_need_cpu_permits: 82 reads_queued_because_memory_resources: 0 reads_queued_because_count_resources: 0 reads_queued_with_eviction: 0 total_permits: 97 current_permits: 96 need_cpu_permits: 0 awaits_permits: 0 disk_reads: 0 sstables_read: 0 ``` Fixes: https://github.com/scylladb/scylladb/issues/19535 Improvement, no backport needed. Closes scylladb/scylladb#20545 * github.com:scylladb/scylladb: docs/dev/reader-concurrency-semaphore.md: update the documentation on diagnostics dumps test/boost/reader_concurrency_semaphore_test: test the new diagnostics functionality reader_concurrency_semaphore: add bottleneck self-diagnosis to diagnosis dump reader_concurrency_semaphore: include trigger permit in diagnostic dump reader_concurrency_semaphore: propagate permit to do_dump_reader_permit_diagnostics() reader_concurrency_semaphore: use consistent exception type for timeout reader_concurrency_semaphore: dump diagnostics when non-waiting reader times out	2024-09-18 14:06:05 +03:00
Botond Dénes	1efda557b1	replica/table: query_mutations(): enter the table's async gate So the table is not dropped while the query is ongoing. query() already does this but using old-fashioned enter()+leave(), convert it to use the new RAII helper. Closes scylladb/scylladb#20583	2024-09-18 14:03:22 +03:00
Pavel Emelyanov	2f4f0eb060	Merge 'Alternator: a few RBAC fixes' from Nadav Har'El The main goal of this PR is to fix a bug (#20619) in the alternator_enforce_authorization=false setting - which didn't do its job (i.e, _don't_ check permissions) when authorization is configured in CQL but not wanted in Alternator. The series also a few smaller bugs in the code that were discovered while debugging the main issue: 1. A potential use-after-free (that didn't seem to hit us in practice) is fixed. 2. A confusing error message (that was also reported in #20619) is improved. 3. Make the alternator_enforce_authorization live-updatable. There was no reason why it shouldn't be, and as this series needs to make this flag available to more code, let's just do it properly and assume the flag is live-updatable. Because the RBAC feature has not been backported to any open-source branches, neither should these fixes. But if some private branch received a backport of the RBAC feature, it should get these fixes too. Fixes #20619. Closes scylladb/scylladb#20640 * github.com:scylladb/scylladb: alternator: make alternator_enforce_authorization live-updateable alternator: fix alternator_enforce_authorization=false alternator: improve error message when unauthenticated alternator: avoid use-after-free in RBAC	2024-09-18 14:02:09 +03:00
Kefu Chai	cb1670b79b	Update seastar submodule * seastar ec5da7a6...69f88e2f (38): > build: s/Sanitizers_COMPILER_OPTIONS/Sanitizers_COMPILE_OPTIONS > test: Update httpd test with request/reply body writing sugar > http: Add sugar to request and response body writers > utils: Add util::write_to_stream() helper > seastar-addr2line: adjust llvm termination regex > README.md: add Crimson project > rpc: conditionally use fmt::runtime() based on SEASTAR_LOGGER_COMPILE_TIME_FMT > build: check the combination of Sanitizers > tls: clear session ticket before releasing > print: remove dead code > doc/lambda-coroutine-fiasco: reword for better readability > rpc: fix compilation error caused by fmt::runtime() > tutorial: explain the use case of rethrow_exception and coroutine::exception > reactor: print more informative error when io_submit fails > README.md: note GitHub discussions > prometheus: `fmt::print` to stringstream directly > doc: add document for testing with seastar > seastar/testing: only include used headers > test: Add abortable http client test cases > http/client: Add abortable make_request() API method > http/client: Abort established connections > http/client: Handle abort source in pool wait > http/client: Add abort source to factory::make() method > http/client: Pass abort_source here and there > http/client: Idnentation fix after previous patch > http/client: Merge some continuations explicitly > signal: add seastar signal api > httpd: remove unused prometheus structs > print: use fmtlib's fmt::format_string in format() > rpc: do not use seastar::format() in rpc logger > treewide: s/format/seastar::format/ > prometheus: sanitize label value for text protocol > tests: unit test prometheus wire format > io-tester: Introduce batches to rate-based submission > io-tester: Generalize issueing request and collecting its result > io-tester: Cancel intent once > io-tester: Dont carry rps/parallelism variables over lambdas > io-tester: Simplify in-flight management The breaking changes in the seastar submodule necessitate corresponding modifications in our code. These changes must be implemented together in a single commit to maintain consistency. So that each commit is buildable. following changes are included in addition to seastar submodule update: * instead of passing a `const char` for the format string, pass a templated `fmt::format_string<...>`, this depends on the `seastar::format()` change in seastar. explicitly call `fmt::runtime()` if the format string is not a consteval expression. this depends on the `seastar::format()` change in seastar. as `seastar::format()` does not accept a plain `const char` which is not constexpr anymore. pass abort_source to `dns_connection_factory::make()`. this depends on the change in seastar, which added a `abort_source` argument to the pure virtual member function of `connection_factory::make()`. call call {fmt,seastar}::format() explicitly. this is a follow up of `3e84d43f`, which takes care of all places where we should call `fmt::format()` and `seastar::format()` explicitly to disambiguate the `format()` call. but more `format()` call made their way into the source tree after `3e84d43f`. so we need fix them as well. * include used header in tests Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Update seastar submodule Please enter the commit message for your changes. Lines starting Closes scylladb/scylladb#20649	2024-09-18 13:59:22 +03:00
Gleb Natapov	bddaf498df	group0: make sure that address map has an entry for each new node in the raft configuration ID->IP mapping is added to the raft address map when the mapping first appears in the gossiper, but it is added as expiring entry. It becomes non expiring when a node is added to raft configuration. But when a node joins those two events may be distant in time (since the node's request may sit in the topology coordinator queue for a while) and mappings may expire already from the map. This patch makes sure to transfer the mapping from the gossiper for a node that is added to the raft configuration instead of assuming that the mapping is already there.	2024-09-18 13:42:38 +03:00
Amnon Heiman	8dec292698	alternator:test_metrics test metrics for batch item count This patch adds tests for the batch operations item count. The tests validate that the metrics tracking the number of items processed in a batch increase by the correct amount. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-18 11:31:06 +03:00
Amnon Heiman	4d57a43815	alternator:test_metrics Add validating the increased value The `check_increases_operation` now allows override the checked metric. Additionally, a custom validation value can now be passed, which make it possible to validate the amount by which a value has changed, rather than just validating that the value increased. The default behavior of validating that values have increased remains unchanged, ensuring backward compatibility. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-18 11:31:06 +03:00
Amnon Heiman	905408f764	alternator: Fix item counting in batch operations This patch fixes the logic for counting items in batch operations. Previously, the item count in requests was inaccurate, it count the number of tabels in get_item and the request_items in write_items. The new logic correctly counts each individual item in `BatchGetItem` and `BatchWriteItem` requests. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-18 11:30:59 +03:00
Amnon Heiman	515857a4a9	Alterntor rename batch item count metrics This patch renames metrics tracking the total number of items in a batch to `scylla_alternator_batch_item_count`. It uses the existing `op` label to differentiate between `BatchGetItem` and `BatchWriteItem` operations. Ensures better clarity and distinction for batch operations in monitoring. This an example of how it looks like: # HELP scylla_alternator_batch_item_count The total number of items processed across all batches # TYPE scylla_alternator_batch_item_count counter scylla_alternator_batch_item_count{op="BatchGetItem",shard="0"} 4 scylla_alternator_batch_item_count{op="BatchWriteItem",shard="0"} 4	2024-09-18 11:20:07 +03:00
Anna Mikhlin	0c7ca284ad	mergify: add support for branch-6.2 branch-6.2 is already available, adding support for it in mergify to allow backport to this new branch. in addition, since branch 5.4 reached EOL - removing it Closes scylladb/scylladb#20669	2024-09-18 08:30:41 +03:00
Ernest Zaslavsky	924325fd25	treewide: add "prefix" parameter to backup API Allow the caller to pass the prefix when performing backup and restore Fixes scylladb/scylladb#20335 Closes scylladb/scylladb#20413	2024-09-18 08:25:00 +03:00
Calle Wilund	b789361091	commitlog: Fix assertion in oversized_alloc Fixes #20633 Cannot assert on actual request_controller when releasing permit, as the release, if we have waiters in queue, will subtract some units to hand to them. Instead assert on permit size + waiter status (and if zero, also controller value) * v2 - use SCYLLA_ASSERT Closes scylladb/scylladb#20654	2024-09-18 08:22:28 +03:00
Avi Kivity	57ab5ce313	repair: row_level: simplify repair_put_row_diff_with_rpc_stream_process_op() repair_put_row_diff_with_rpc_stream_process_op() always returns stop_iteration::no (or throws). Moreover, the return value is ignored by its only caller. Simplify by returning a plain future<>. Closes scylladb/scylladb#20610	2024-09-18 08:17:09 +03:00
Botond Dénes	d72fcb11f5	Merge 'Add new GDB commands to dump sstable index file from memory and print promoted index ' from Tomasz Grabiec Closes scylladb/scylladb#20648 * github.com:scylladb/scylladb: gdb: Introduce "scylla sstable-dump-cached-index" command gdb: Introduce "scylla sstable-promoted-index" command gdb: Fix range printer for singular ranges	2024-09-18 08:13:04 +03:00
Nadav Har'El	24fb92c8ba	Merge 'cql3: simplify runtime component of selection filtering' from Avi Kivity Most of the analysis of the WHERE clause is done in statement_restrictions. It determines what parts to use for the primary or secondary index, and what parts to use for filtering. The difficult part is that it has a very wide interface. After construction, the user must pick the correct bits from many public functions. There are subtle interactions between them that are hard to untangle. This series simplifies the interface as it is used for selection filtering. In the end, only two public functions are used, both returning expressions: one for the partition-level filtering, one for the clustering row level filtering. In the end, the WHERE clause is factored into three parts: - one part goes into the read_command of the primary or secondary index - another part (that references only partition key columns and static key columns) is used to filter entire partitions - another part (that currently references only clustering key columns and regular columns, but one day may reference other columns) is used to filter clustering rows Refactoring, no backport. Closes scylladb/scylladb#20487 * github.com:scylladb/scylladb: cql3: statement_restrictions: drop accessors for single-column key restrictions cql3: selection: adjust indentation cql3: selection: delete empty loop cql3: statement_restrictions, selection: fold multi-column restrictions into row-level filter cql3: statement_restrictions, selection: merge clustering key filter and regular columns filter cql3: statement_restrictions, selection: merge partition key filter and static columns filter cql3: selection: filter regular and static rows as a single expression each cql3: statement_restrictions: collect regular column and static column filters into single expressions cql3: selection: filter clustering key as a single expression cql3: statement_restrictions: expose filter for clustering key cql3: selection: filter partition key as a single expression cql3: statement_restrictions: expose filter for partition key cql3: statement_restrictions: remove relations used for indexing from filtering cql3: statement_restrictions: bail out of find_idx if !_uses_secondary_index cql3: statement_restrictions, modification_statement: pass correct value of check_indexes cql3: statement_restrictions: correct mismatched clustering/partition restrictions references cql3: statement_restrictions: precalculate get_column_defs_for_filtering() cql3: selection: do_filter(): push static/regular row glue to higher level	2024-09-17 22:58:24 +03:00
Piotr Dulikowski	cc5c3aaae7	Merge 'message/messaging_service: guard adding maintenance tenant under cluster feature' from Michał Jadwiszczak In https://github.com/scylladb/scylladb/pull/18729, we introduced a new statement tenant `$maintenance`, but the change wasn't protected by any cluster feature. This wasn't a problem for OSS, since unknown isolation cookie just uses default scheduling group. However, in enterprise that leads to creating a service level on not-upgraded nodes, which may end up in an error if user create maximum number of service levels. This patch adds a cluster feature to guard adding the new tenant. It's done in the way to handle two upgrade scenarios: - version without `$maintenance` tenant -> version with `$maintenance` tenant guarded by a feature - version with `$maintenance` tenant but not guarded by a feature -> version with `$maintenance` tenant guarded by a feature The PR adds `enabled` flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection. The `$maintenance` tenant is added to the config as disabled and it gets enabled once the corresponding feature is enabled. Fixes scylladb/scylladb#20070 Refs scylladb/scylla-enterprise#4403 Closes scylladb/scylladb#19802 * github.com:scylladb/scylladb: message/messaging_service: guard adding maintenance tenant under cluster feature message/messaging_service: add feature_service dependency message/messaging_service: add `enabled` flag to statement tenants	2024-09-17 18:24:34 +02:00
Avi Kivity	1663fbe717	cql3: statement_restrictions: use functional style Instead of a constructor, use a new function analyze_statement_restrictions() as the entry point. It returns an immutable statement_restrictions object. This opens the door to returning a variant, with each arm of the variant corresponding to a different query plan.	2024-09-17 17:13:27 +03:00
Avi Kivity	3169b8e0ec	cql3: statement_restrictions: calculate the index only once find_idx() is called several times. Rename it do_find_idx(), call it just once, store the results, and make find_idx() return the stored results. This simplifies control flow and reduces the risk that successive calls of find_idx return different results.	2024-09-17 17:03:31 +03:00
Avi Kivity	d5c8083b76	cql3: statement_restrictions: make it a const object Make validate_secondary_index_selections() const (it trivially is), and call prepare_indexed_local() / prepared_indexed_global() at the end of the constructor. By making statement_restrictions a const object, reasoning about it can be local (looking at the source file) rather than global (looking at all the interactions of the class with its environment. In fact, we might make it a function one day. Since prepare_indexed_global()/prepare_indexed_local() only mutate _idx_tbl_ck_prefix, which isn't mutated by the rest of the code, the transformation is safe. The corresponding code is removed from select_statement. The removal isn't complete since it still uses some computation, but later deduplication is left for another day.	2024-09-17 17:03:27 +03:00
Sergey Zolotukhin	68740f57c2	cql_server: Add a test for multiple query msg rebounces. The test emulates several LWT(Lightweight Transaction) query rebounces. Currently, the code that processes queries does not expect that a query may be rebounced more than once. It was impossible with the VNodes, but with intruduction of the Tablets, data can be moved between shards by the balancer thus a query can be rebounced to different shards multiple times.	2024-09-17 15:19:56 +02:00
Benny Halevy	65430b9e1b	cql_server::connection: process: rebounce msg if needed Rebounce the msg to another shard if needed, e.g. in the case of tablet migration. An example for that, as given by Tomasz Grabiec: > Bouncing happens when executing LWT statement in > modification_statement::execute_with_condition by returning a > special result message kind. The code assumes that after > jumping to the shard from the bounce request, the result > message is the regular one and not yet another bounce. > There is no problem with vnodes, because shards don't change. > With tablets, they can change at run time on migration. Fixes scylladb/scylladb#15465 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-17 15:09:43 +02:00
Sergey Zolotukhin	f674f522aa	cql_server::connection: process: co-routinize connection::process_on_shard `cql_server::connection::process_on_shard` is made a co-routine to make sure captured objects' lifetime is managed by the source shard, avoiding error prone inter-shard objects transfers.	2024-09-17 14:54:42 +02:00
Nadav Har'El	17deaae463	alternator: make alternator_enforce_authorization live-updateable For no good reason, the "alternator_enforce_authorization" flag (which chooses whether to enable authentication and authorization checks in Alternator) was not live-updatable, so make it so. Both "server" and "executor" objects use this configuration flag, the former is fixed in this patch (to hold a live-updatable reference instead of a copy of a boolean), the latter was already prepared for this change and already held a live-updatable reference. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-17 15:51:16 +03:00
Nadav Har'El	00793059e1	alternator: fix alternator_enforce_authorization=false When the configuration has alternator_enforce_authorization=false, Alternator should not do authentication (check which user signed each request) nor authorization (check if that user has permissions to do each operation). Our implementation forgot to disable the authorization checks when it's configured to false. The (incorrect) assumption was that when alternator_enforce_authorization is configured to false, the CQL 'authenticator' and 'authorizer' configuration is also disabled - so the authorization checks will be no-ops. But we can't assume that: Users are free to configure 'authenticator' and 'authorizer' for use in CQL, and then set alternator_enforce_authorization=false just for Alternator. So this patch adds a new test for this case - when we have authenticator=PasswordAuthenticator, authorizer=CassandraAuthorizer but alternator_enforce_authorization=false, and fixes it to work correctly. The heart of the fix is trivial: the `verify_*_permission()` functions just need to check the alternator_enforce_authorization and return immediately when false. The bigger part of this change is to get the alternator_enforce_authorization into the "executor" object and then to pass it into the verify calls. Although alternator_enforce_authorization is not YET live updatable, this code is prepared for the future that it may become live updatable, so the executor object saves not the boolean value of this flag, but a live-updatable reference to it. Fixes #20619 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-17 15:50:00 +03:00
Nadav Har'El	76af7c0389	alternator: improve error message when unauthenticated When access-control checks report permission denied, we want to report the name of the authenticated role (the role signing the request) which didn't have the permission. When authentication was disabled, and there is no authenticated role, we printed the fake name "anonymous", but this can confuse users (it confused me!) to think there's an actual role named "anonymous". So let's change that string to "<anonymous>" with angle brackets - it makes it more obvious that this isn't a real role, but actually an anonymous request. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-17 15:44:29 +03:00
Tomasz Grabiec	e70ce4d6ed	gdb: Introduce "scylla sstable-dump-cached-index" command	2024-09-17 14:41:18 +02:00
Tomasz Grabiec	9f0eed263d	gdb: Introduce "scylla sstable-promoted-index" command	2024-09-17 14:41:13 +02:00
Nadav Har'El	3543bf14e9	alternator: avoid use-after-free in RBAC While auditing the code, I noticed that the current Alternator access control checks have code like: ``` return client_state.check_has_permission(auth::command_desc( permission_to_check, auth::make_data_resource(schema->ks_name(), schema->cf_name()))).then( ``` There's a problem here - it turns out that, unfortunately, command_desc holds a reference to the "resource" object - not a copy. So the temporary object returned by make_data_resource may be freed and then used... Curiously, we've not seen a bug caused by this in practice (not even in debug build mode), but better safe than sorry, so this patch changes the code in one of two ways: 1. Code using coroutines can keep the "resource" as a variable on the stack. 2. Code using continuations needs to hold the "resource" with do_with(), but since this already incurs the cost of an extra allocation (even in the successful case), might as well just switch to using coroutines and have less ugly code. This patch does not change any functionality, and all the tests seem to work before and after it the same. Signed-off-by: Nadav Har'El <nyh@scylladb.com> hello	2024-09-17 15:41:09 +03:00
Tomasz Grabiec	2c463ead59	gdb: Fix range printer for singular ranges Before, it printed [x, +inf) instead of {x}	2024-09-17 14:30:28 +02:00
Andrei Chekun	bbb6c3c2ff	test.py: Add resource consumption metrics This PR adds the possibility to gather resource consumption metrics. The collected metrics can be used to compare performance before and after specific changes aimed at increasing performance. Currently, this functionality works only in manual mode, and this is just raw data. Later on, these metrics can be used in Jupyter notebook to analyze and visualize how the resources are used and can provide the insight on how to improve it. This PR is a first insight after gathering these metrics. Add the possibility to gather resource consumption for the test.py execution. SQLite DB will be created with different performance metrics that will allow comparing the resource consumption between changes. The DB will be in the tmp directory that by default set to testlog. Across the runs, the DB will not be deleted, so each new run will just add information to the existing DB. Parameter --get-metrics was added to switch on or off the metrics gathering. By default, it's switched on. Closes: scylladb/qa-tasks#1666 Closes: scylladb/qa-tasks#1707 Closes scylladb/scylladb#19881	2024-09-17 15:22:34 +03:00
Benny Halevy	39ce358d82	time_window_compaction_strategy: get_reshaping_job: restrict sort of multi_window vector to its size Currently the function calls boost::partial_sort with a middle iterator that might be out of bound and cause undefined behavior. Check the vector size, and do a partial sort only if its longer than `max_sstables`, otherwise sort the whole vector. Fixes scylladb/scylladb#20608 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20609	2024-09-17 15:05:37 +03:00
Tomasz Grabiec	adf99402c5	Merge 'readers/flat_mutation_reader_v2: call set_close_required() from consume()' from Botond Dénes The `consume()` variants just forward the call to the `_impl` method with the same name. The latter, being a member of `::impl`, will bypass the top level `fill_buffer()`, etc. methods and thus will never call `set_close_required()`. Do this in the top-level `consume()` methods instead, to ensure a reader, on which only `consume()` is called, and then is destroyed, will complain as it should (and abort). Only one place was found in core code, which didn't close the reader: `split_mutation() in `mutation/mutation.cc` and this reader is the "from-mutation" one which has no real close routine. All other places were in tests. All this is to say, there were no real bugs uncovered by this PR. Fixes #16520 Improvement, no backport required. Closes scylladb/scylladb#16522 * github.com:scylladb/scylladb: readers/flat_mutation_reader_v2: call set_close_required() from consume*() test/boost/sstable_compaction_test: close reader after use test/boost/repair_test: close reader after use mutation/mutation: split_mutation(): close reader after use	2024-09-17 13:21:34 +02:00
Anna Mikhlin	66c0814c33	Update ScyllaDB version to: 6.3.0-dev	2024-09-17 13:43:04 +03:00
Botond Dénes	6250ff18eb	Merge 'sstable: s/crawling_sstable_mutation_reader/sstable_full_scan_reader' from Kefu Chai "crawling" is a little bit obscure in this context. so let's rename this class to reflect the fact that this reader only reads the entire content of the sstable. both crawling reader for kl and mx formats are renamed. also, in order to be consistent, all "crawling reader" in variable names are updated as well. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#20599 * github.com:scylladb/scylladb: sstable: s/crawling_sstable_mutation_reader/sstable_full_scan_reader sstable/mx/reader: add comment for mx_crawling_sstable_mutation_reader	2024-09-17 11:55:08 +03:00
Pavel Emelyanov	ebfa73e004	s3/client: Don't move file from write_body's lambda Requests sent by S3 are retriable, so when request.write_body() is called, it should keep everything intact in case http client will call it again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20579	2024-09-17 09:48:09 +03:00
Tzach Livyatan	cb864b11d8	Update client-node-encryption: OpsnSSL is FIPS enabled Closes scylladb/scylladb#19705	2024-09-17 09:47:07 +03:00
Botond Dénes	f32e67cb9e	Merge 'Make sstables without on-disk path' from Pavel Emelyanov New sstables for a table are created by the table::make_sstable() method. The method then calls sstables_manager::make_sstable() and passes there a path to component files which, in turn, sits on table::config. Since some time ago having an on-disk path for an sstable had become optional, as sstables could be put on S3 storage without local paths involved. In that case the aforementioned "path" is ~~ab~~used as a key in the system.sstables registry, that references a record with information used to retrieve URLs of sstables' objects. This PR removes the "path" argument from sstables_manager::make_sstable() and its sstable_sdirectory peer. The details of sstables' location are moved onto storage_options and depend on storage type. For now in both storage types this location is still the good-old $datadir/$keyspace/$table-$uuid string. S3 storage needs to be patched more to use more elegant "location" value. Eventually the `table::config::{datadir\|all_datadirs}` will be removed, this PR is the step towards it. closes: #12707 Closes scylladb/scylladb#20542 * github.com:scylladb/scylladb: table: Use storage options to clean the storage sstables/storage: Re-use ocally generated vector of paths sstables/storage: Visit options once to initialize storage sstables_manager: Return table storage options when initalizing storage sstables/storage: Fix indentation after previous patch table: Move datadirs initialization parallelism to storage level sstables/storage: Split the visitor's overloaded functor restore: Don't use table_dir to construct sstable_directory sstable_directory: Remove table_dir field sstable_directory: Use options details in lister sstables_manager: Remove table_dir from make_sstable() sstables: Remove table_dir from sstable constructor sstables/storage: Remove sstring dir from make_storage() sstables/storage: Use options to construct tests: Properly initialize storage options with "dir" distributed_loader: Create S3 options with prefix for restore storage_options: Add special-purpose local options maker storage_options: Keep local path / s3 prefix onboard table: Get another options when initializing storage	2024-09-17 09:41:21 +03:00
Botond Dénes	a4a8cad97f	Merge 'atomic_delete: allow deletion of sstables from several prefixes' from Benny Halevy Allow create_pending_deletion_log to delete a bunch of sstables potentially resides in different prefixes (e.g. in the base directory and under staging/). The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories. Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit. Fixes scylladb/scylladb#18862 Needs backport to 6.0 since tablets require this capability Closes scylladb/scylladb#19555 * github.com:scylladb/scylladb: sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory sstables: storage: keep base directory in base class sstables: storage: define opened_directory in header file sstable_directory: use only dirlog	2024-09-17 08:30:40 +03:00
Kefu Chai	df7f332a58	sstable: s/crawling_sstable_mutation_reader/sstable_full_scan_reader "crawling" is a little bit obscure in this context. so let's rename this class to reflect the fact that this reader only reads the entire content of the sstable. both crawling reader for kl and mx formats are renamed. also, in order to be consistent, all "crawling reader" in variable names are updated as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-17 10:39:37 +08:00
Kefu Chai	c1ed2f0ea4	sstable/mx/reader: add comment for mx_crawling_sstable_mutation_reader to explain its typical usage. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-17 10:39:25 +08:00
Lakshmi Narayanan Sreethar	626f55a2ea	compaction: run cleanup under maintenance scheduling group The cleanup compaction task is a maintenance operation that runs after topology changes. So, run it under the maintenance scheduling group to avoid interference with regular compaction tasks. Also remove the share allocations done by the cleanup task, as they are unnecessary when running under the maintenance group. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20582	2024-09-16 16:58:43 +03:00
Michał Jadwiszczak	b4b91ca364	message/messaging_service: guard adding maintenance tenant under cluster feature Set `enabled` flag for `$maintenance` tenant to false and enable it when `MAINTENANCE_TENANT` feature is enabled.	2024-09-16 15:34:36 +02:00
Michał Jadwiszczak	71a03ef6b0	message/messaging_service: add feature_service dependency	2024-09-16 15:33:40 +02:00
Michał Jadwiszczak	d44844241d	message/messaging_service: add `enabled` flag to statement tenants Adding a new tenant needs to be done under cluster feature protection. However it wasn't the case for adding `$maintenance` statement tenant and to fix it we need to support an upgrade from node which doesn't know about maintenance tenant at all and from one which uses it without any cluster feature protection. This commit adds `enabled` flag to statement tenants. This way, when the tenant is disabled, it cannot be used to create a connection, but it can be used to accept an incoming connection.	2024-09-16 15:31:04 +02:00
Michał Jadwiszczak	de7acbad8b	test/cql-pytest: add test for `SELECT ... USING SERVICE LEVEL`	2024-09-16 14:31:43 +02:00
Michał Jadwiszczak	8255c61f5f	cql3/Cql.g: extend grammar to allow `SELECT ... USING SERVICE LEVEL`	2024-09-16 14:31:32 +02:00
Michał Jadwiszczak	af6dc78025	cql3/statements/select_statement: use service level timeout Use service level timeout in selecte statement when specified. `USING TIMEOUT` have higher priority in timeout definition.	2024-09-16 13:48:48 +02:00
Michał Jadwiszczak	2e545c915b	cql3/attributes: add service level name field In next patches, we will allow to do `SELECT ... USING SERVICE LEVEL sl_name`. To do it, we need to extend `cql3::attributes` with service level name.	2024-09-16 13:48:43 +02:00
Michał Jadwiszczak	b9b326c2bb	qos/service_level_controller: add method to check if service level exists in cache There is `service_level_controller::get_service_level()` method, which searches for service level in the controller cache and returns default service level if SL with given name doesn't exist. Added method allows to check whether a service level exists in the controller cache.	2024-09-16 12:41:15 +02:00
Pavel Emelyanov	bf5021e735	test: Remove sstables::test::binary_search() That's the most mysterious wrapper in this set as it doesn't need sstable itself at all, it just duplicates the existing non-class function out there. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-16 12:51:35 +03:00
Pavel Emelyanov	309d315af7	test: Remove sstables::test::move_summary() This one is a bit tricky, as it needs to modify the sstables's summary. However, the sstables::test::_summary() one returns mutable reference and the only caller can use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-16 12:50:48 +03:00
Pavel Emelyanov	deec952111	test: Remove sstables::test::read_toc() The sstable::read_toc() is public method, use it directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-16 12:50:19 +03:00
Pavel Emelyanov	25cd8ccdd8	test: Remove sstables::test::get_summary() Same as previous patch -- callers can come with const reference to summary, so they can live with existing public sstable::get_summary(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-16 12:49:39 +03:00
Pavel Emelyanov	f714ac9b48	test: Remove sstables::test::get_statistics() Just call the public sstable::get_statistics(). The callers would get const reference on it, but they don't need more than that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-16 12:48:43 +03:00
Pavel Emelyanov	53afa583e8	test: Remove sstables::test::data_read() The wrapper just changes the order of arguments for a public method. Drop it, and call the wrapee directly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-16 12:47:59 +03:00
Avi Kivity	e4cab3a5e9	cql3: statement_restrictions: drop accessors for single-column key restrictions No longer used.	2024-09-16 12:15:14 +03:00
Avi Kivity	626acf416e	cql3: selection: adjust indentation	2024-09-16 12:15:14 +03:00
Avi Kivity	c443d922ea	cql3: selection: delete empty loop Our refactoring left a loop with no body, delete it.	2024-09-16 12:15:14 +03:00
Avi Kivity	56e8a4c931	cql3: statement_restrictions, selection: fold multi-column restrictions into row-level filter When filtering, we apply single-column and multi-column filters separately. This is completely unnecessary. Find the multi-column filters during prepare time and append them to the row-level filter. This slightly changes the original: in the original, if we had a multi-column filter, we applied all of the restrictions. But hopefully if we check for multi-column filters, that's what we need.	2024-09-16 12:15:14 +03:00
Avi Kivity	a6d81806c0	cql3: statement_restrictions, selection: merge clustering key filter and regular columns filter The two filters are used in the same way: check the filter, return false if it matches. Unify the two filters into a clustering_row_level_filter. Since one of the two filters wasn't std::optional, we take the liberty of making the combined filter non-optional.	2024-09-16 12:15:03 +03:00
Avi Kivity	2933a2f118	cql3: statement_restrictions, selection: merge partition key filter and static columns filter The two filters are used in the same way: check the filter, set a boolean flag if it matches, return false. The two boolean flags are in turn checked in the same way. Unify the two filters into a partition_level_filter. Since one of the two filters wasn't std::optional, we take the liberty of making the combined filter non-optional.	2024-09-16 12:10:49 +03:00
Avi Kivity	870d1c16f7	scripts: fix bin/cqlsh shortcut Since `3c7af28725`, the cqlsh submodule no longer contains a bin/cqlsh shell script. This broke the supermodule's bin/cqlsh shortcut. Fix it by invoking cqlsh.py directly. Closes scylladb/scylladb#20591	2024-09-16 09:52:29 +03:00
Botond Dénes	ea29fe579b	Merge 'replica: ignore cleanup of deallocated storage group' from Aleksandra Martyniuk Cleanup of a deallocated tablet throws an exception. Since failed cleanup is retried, we end up in an infinite loop. Ignore cleanup of deallocated storage groups. Fixes: #19752. Needs to be backported to all branches with tablets (6.0 and later) Closes scylladb/scylladb#20584 * github.com:scylladb/scylladb: test: check if cleanup of deallocated sg is ignored replica: ignore cleanup of deallocated storage group	2024-09-16 09:22:56 +03:00
Gleb Natapov	695f112795	paxos_state: release semaphore units before checking if a semaphore can be dropped To drop a semaphore it should not be held by anyone, so we need to release out units before checking if a semaphore can be dropped. Fixes: scylladb/scylladb#20602 Closes scylladb/scylladb#20607	2024-09-15 21:21:03 +03:00
Kefu Chai	028410ba58	mutation_writer: use bucket parameter instead of using it->first as `_bucket` is an `unordered_map<bucket_id, timestamp_bucket_writer>`, when writing to a given bucket, we try to create a writer with the specified bucket id, so the returned iterator should point to a node whose `first` element is always the bucket id. so, there is no need to reference `it` for the bucket id, let's just reference the parameter. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20598	2024-09-15 20:05:12 +03:00
Kefu Chai	49f232f405	compaction: fix a typo in comment s/expection/exception/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20594	2024-09-15 16:09:01 +03:00
Avi Kivity	807153a9ed	cql3: selection: filter regular and static rows as a single expression each Instead of filtering regular and static columns column by column, call is_satisfied_by() for an expression containing all the static columns predicates, and one for all the regular column. We cannot have one expression, since the code sets _current_static_row_does_not_match only for static columns. Note the fix for #20485 is now implicit, since the evaluation machinery will treat missing regular columns as NULL.	2024-09-15 14:33:57 +03:00
Avi Kivity	3c71096479	cql3: statement_restrictions: collect regular column and static column filters into single expressions Similar to previous work with clustering and partition key, expose static and reglar column filters as single expressions. Since we don't currently expose a boolean for whether those filters exist, we expose them now as non-optionals. In any case evaluating an empty conjunction is plenty fast.	2024-09-15 14:33:57 +03:00
Avi Kivity	ec2898afe9	cql3: selection: filter clustering key as a single expression Instead of filtering the clustering key column by column, call is_satisfied_by() for an expression containing all the clustering key predicates. The check for clustering_key.empty() is removed; the evaluation machinery is able to handle partial clustering keys. In fact if we add IS NULL, we have to evaluate as an empty clustering key should match.	2024-09-15 14:33:57 +03:00
Avi Kivity	318d653d80	cql3: statement_restrictions: expose filter for clustering key cql3::selection performs filtering by consulting ck_restrictions_need_filtering() and get_single_column_clustering_key_restrictions() (which is a map of column definition to expressions). Make them available in one nice package as an optional<expression>. When the optional is engaged, filtering is needed, and the expression in the equivalent of all of the map.	2024-09-15 14:33:57 +03:00
Avi Kivity	0bd2f12922	cql3: selection: filter partition key as a single expression Instead of filtering the partition key column by column, call is_satisfied_by() for an expression containing all the partition key predicates.	2024-09-15 14:33:56 +03:00
Avi Kivity	21cb91077f	cql3: statement_restrictions: expose filter for partition key cql3::selection performs filtering by consulting pk_restrictions_need_filtering() and get_single_column_partition_key_restrictions() (which is a map of column definition to expressions). Make them available in one nice package as an optional<expression>. When the optional is engaged, filtering is needed, and the expression in the equivalent of all of the map.	2024-09-15 14:33:56 +03:00
Avi Kivity	a453221314	cql3: statement_restrictions: remove relations used for indexing from filtering statement_restrictions does not name columns that were used for a secondary index for selection for filtering, since accessing the index "pre-filters" these columns. However, it keeps the relations that contain these columns. This makes it impossible (besides unnecessary) to evaluate the relations, as the columns they reference aren't selected. The reason this works now is that result_set_builder::restrictions_filter::do_filter() iterates on selected columns, matching them to relations, then execute the matched relation. A relation that references an unselected column is invisible to do_filter(). We wish to filter using complete expressions, rather than fragments, so as a first step remove these unnecessary and unusable relations while we choose which columns are necessary for filtering. calculate_column_defs_for_filtering is renamed to remind us of the extra work done.	2024-09-15 14:33:56 +03:00
Avi Kivity	ba8c2014bf	cql3: statement_restrictions: bail out of find_idx if !_uses_secondary_index The condition seems trivial, but wasn't implemented, without ill effects so far. With the following patches, calculate_column_defs_for_filtering() becomes confused as it selects an indexing code path even when !_uses_secondary_index, triggered by the reproducer of #10300.	2024-09-15 14:33:56 +03:00
Avi Kivity	65ba19323c	cql3: statement_restrictions, modification_statement: pass correct value of check_indexes Our UPDATE/INSERT/DELETE statements require a full primary/partition key and therefore never use indexes; fix the check_index parameter passed from modification_statement. So far the bug is benign as we did not take any action on the value. Make the parameter non-default to avoid such confusion in the future.	2024-09-15 14:33:56 +03:00
Avi Kivity	71ea3200ba	cql3: statement_restrictions: correct mismatched clustering/partition restrictions references The second loop of calculate_column_defs_for_filtering() finds clustering keys that are used for filtering, minus and clustering keys that happen to be used for secondary indexing. However, to check whether the clustering key is used for secondary indexing, it looks up in _single_column_partition_key_restrictions, which contains partition key restrictions. The end result is that we select a column which ends the partition key for the secondary index, and so is unnecessary. We do a little more work, but the bug is benign. Nevertheless, fix it, as it interferes with following work.	2024-09-15 14:33:56 +03:00
Avi Kivity	33db14e7d5	cql3: statement_restrictions: precalculate get_column_defs_for_filtering() get_column_defs_for_filtering() names all the columns that are required for filtering. While doing that, it skips over columns that are participate in indexing (primary or secondary), since the index "pre-filters" the query. We wish to make use of this skipping. As a first step, call the calculation from the constructor, so we have control over when it is executed.	2024-09-15 14:33:56 +03:00
Avi Kivity	251ad4fcd0	cql3: selection: do_filter(): push static/regular row glue to higher level Currently, for each column we call get_non_pk_values() to transform the way we get the information (query::result_row_view) to the way the expression evaluation machinery wants it (vector<managed_bytes_opt>). Call it just once outside the loop.	2024-09-15 14:33:56 +03:00
Avi Kivity	b9bc783418	cql3: selection: don't ignore regular column restriction if a regular row is not present If a regular row isn't present, no regular column restriction (say, r=3) can pass since all regular columns are presented as NULL, and we don't have an IS NULL predicate. Yet we just ignore it. Handle the restriction on a missing column by return false, signifying the row was filtered out. We have to move the check after the conditional checking whether there's any restriction at all, otherwise we exit early with a false failure. Unit test marked xfail on this issue are now unmarked. A subtest of test_tombstone_limit is adjusted since it depended on this bug. It tested a regular column which wasn't there, and this bug caused the filter to be ignored. Change to test a static column that is there. A test for a bug found while developing the patch is also added. It is also tested by test_tombstone_limit, but better to have a dedicated test. Fixes #10357 Closes scylladb/scylladb#20486	2024-09-15 13:44:16 +03:00
Botond Dénes	6d8e9645ce	test/*/run: restore --vnodes into working order This option was silently broken when --enable-tablet's default changed from false to true. The reason is that when --vnodes is passed, run only removes --enable-tablets=true from scylla's command line. With the new default this is not enough, we need to explicitely disable tablets to override the default. Closes scylladb/scylladb#20462	2024-09-13 17:10:09 +03:00
Pavel Emelyanov	f850681b14	table: Use storage options to clean the storage Like it was done for table::init_storage(), patch the table::destroy_storage() not to mess with datadir path and rely on storage options only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	3aea7bebb7	sstables/storage: Re-use ocally generated vector of paths A cleanup after prefious patch -- in order to create storage options for table the local initialization code can re-use the vector of paths that it hag generated in the same call to create table directory layout. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	7c34724509	sstables/storage: Visit options once to initialize storage The init_table_storage() method now does it twice -- one time to initialize the storage, another one to create new options for table. Both can be merged, thus making table storage options initialization better encapsulated for local/s3 cases. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	311fb906be	sstables_manager: Return table storage options when initalizing storage Now the table::init_storage() calls sstables manager two times -- first, to get storage options, second, to initialize the storage with obtained options. Merge two calls into one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	918ec00c1d	sstables/storage: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	f1e4367439	table: Move datadirs initialization parallelism to storage level The table::init_table_storage() calls sstables_manager's storage initialization for each of the datadirs found on config. That's not great, it's sstables manager (and its storage) that know if table needs to mess with datadirs or not. This patch moves the loop to storage.cc. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	b6b3a477c5	sstables/storage: Split the visitor's overloaded functor The main goal is to have init_table_storage() overload for local options as standalone function. This makes next patching simpler. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	30c8d89f97	restore: Don't use table_dir to construct sstable_directory Continuation of the previous patch patching the special-purpose sstable directory constructor that's used by restore-from-s3-backup code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	af14408052	sstable_directory: Remove table_dir field It's no longer needed -- both, lister and making sstable, work with having storage options at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	f403728aa4	sstable_directory: Use options details in lister This class is very similar to sstables::storage one -- it also needs path or s3 prefix to construct. Now when this information is stored on storage_options, it's better to stick to it, not to the argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	36863d4ad0	sstables_manager: Remove table_dir from make_sstable() It used to be passed to sstable constructor, but now it doesn't need this argument. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	0764eca553	sstables: Remove table_dir from sstable constructor It used to be passed to storage constructor, now storage works with options only and this argument is no longer needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	d79ae1f02b	sstables/storage: Remove sstring dir from make_storage() Now the directory/s3 prefix is propagated via storage options. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	65a19df8ef	sstables/storage: Use options to construct All callers of make_sstable are now patched to provide correct storage options with path/prefix set. The make_storage() helper can switch to using it. Respectively, it's good to make sure that the storage is created with table options that have path/prefix. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	4425cf54c6	tests: Properly initialize storage options with "dir" Most of the tests work with local storage options. Some support S3 options as well. Whatever it is, when creating an sstable, tests need to put proper "dir" on the options, this patch does so. In fact, storage options for tests are created together with the test-env, and ideally this is the place where dir should be assigned on it. However, there are still places that explicitly specify path they want to see sstables at, for those the new temporary options should be constructed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:49:50 +03:00
Pavel Emelyanov	33bc9e7112	distributed_loader: Create S3 options with prefix for restore Restore-from-backup code wants to collect sstables from remote S3. For that it constructs S3 options, and now it needs to put prefix on it as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:32:39 +03:00
Pavel Emelyanov	56111a50cd	storage_options: Add special-purpose local options maker Lost of code (in tools and tests) explicitly deal with local sstables and need to create options for it. Currently default-constructing options generates local ones, but without the directory path. Add a helper that creates local options with path and patch callers. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:32:39 +03:00
Pavel Emelyanov	95e60cde9f	storage_options: Keep local path / s3 prefix onboard Now when tables keep their own copy of storage options, it's possible for each table to add table-specific information on it. Namely -- path for local storage and prefix for S3 one (in fact, it's not a "prefix", but a key in sstables registry, but fixing it is beyond the scope of this set). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:32:32 +03:00
Pavel Emelyanov	14976fda73	table: Get another options when initializing storage Right now the table's storage_options life starts in cql, and shortly after the lw-shared-pointer to options is put on keyspace metadata. Later, when the table is created the pointer from keyspace is copied on the table via its contructor. Next patches will extend the options pointed to by a table, and the extension is going to be different for different tables. For that, each table needs to have its private options and this patch prepares for that. For now table directly calls sstables/storage code to get the options from, but it's temporary, soon the options will be created via sstables manager together with initialising the storage itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-13 16:32:32 +03:00
Nadav Har'El	f255391d52	cql-pytest: translate Cassandra's tests for arithmetic operators This is a translation of Cassandra's CQL unit test source file OperationFctsTest.java into our cql-pytest framework. This is a massive test suite (over 800 lines of code) for Cassandra's "arithmetic operators" CQL feature (CASSANDRA-11935), which was added to Cassandra almost 8 years ago (and reached Cassandra 4.0), but we never implemented it in Scylla. All of the tests in suite fail in ScyllaDB due to our lack of this feature: Refs #2693: Support arithmetic operators One test also discovered a new issue: Refs #20501: timestamp column doesn't allow "UTC" in string format All the tests pass on Cassandra. Some of the tests insist on specific error message strings and specific precision for decimal arithmetic operations - where we may not necessarily want to be 100% compatible with Cassandra in our eventual implementation. But at least the test will allow us to make deliberate - and not accidental - deviations from compatibility with Cassandra. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20502	2024-09-13 14:52:59 +03:00
Botond Dénes	d3a9654fcc	Merge 'Make use of async() context in sstable_mutation_test' from Pavel Emelyanov This test runs all its cases in seastar thread, but still uses .then() continuations in some of them. This PR converts all continuations into plain .get()-s. Closes scylladb/scylladb#20457 * github.com:scylladb/scylladb: test: Restore indentation after previous changes test: Threadify tombstone_in_tombstone2() test: Threadify range_tombstone_reading() test: Threadify tombstone_in_tombstone() test: Threadify broken_ranges_collection() test: Threadify compact_storage_dense_read() test: Threadify compact_storage_simple_dense_read() test: Threadify compact_storage_sparse_read() test: Simplify test_range_reads() counting test: Simplify test_range_reads() inner loop test: Threadify test_range_reads() itself test: Threadify test_range_reads() callers test: Threadify generate_clustered() itself test: Threadify generate_clustered() callers test: Threadify test_no_clustered test test: Threadify nonexistent_key test	2024-09-13 14:09:53 +03:00
Aleksandra Martyniuk	2c4b1d6b45	test: check if cleanup of deallocated sg is ignored	2024-09-13 13:00:58 +02:00
Aleksandra Martyniuk	20d6cf55f2	replica: ignore cleanup of deallocated storage group Currently, attempt to cleanup deallocated storage group throws an exception. Failed tablet cleanup is retried, stucking in an endless loop. Ignore cleanup of deallocated storage group.	2024-09-13 13:00:53 +02:00
Botond Dénes	cb30271d29	readers/flat_mutation_reader_v2: call set_close_required() from consume() The `consume()` variants just forward the call to the `_impl` method with the same name. The latter, being a member of `::impl`, will bypass the top level `fill_buffer()`, etc. methods and thus will never call `set_close_required()`. Do this in the top-level `consume()` methods instead, to ensure a reader, on which only `consume()` is called, and then is destroyed, will complain as it should (and abort). operator()() was also missing `set_close_required()`, fix that too.	2024-09-13 06:52:26 -04:00
Botond Dénes	fbed280cd5	test/boost/sstable_compaction_test: close reader after use	2024-09-13 06:52:26 -04:00
Botond Dénes	116b044fec	test/boost/repair_test: close reader after use	2024-09-13 06:52:26 -04:00
Botond Dénes	1a11f9cf95	mutation/mutation: split_mutation(): close reader after use	2024-09-13 06:52:26 -04:00
Andrei Chekun	bad7407718	test.py: Add support for BOOST_DATA_TEST_CASE Currently, test.py will throw an error if the test will use BOOST_DATA_TEST_CASE. test.py as a first step getting all test functions in the file, but when BOOST_DATA_TEST_CASE will be used the output will have additional lines indicating parametrized test that test.py can not handle. This commit adds handling this case, as a caveat all tests should start from 'test' or they will be ignored. Closes: #20530 Closes scylladb/scylladb#20556	2024-09-13 13:44:26 +03:00
Botond Dénes	7cb8cab2ae	Merge 'Remove make_shared_schema() helper' from Pavel Emelyanov This function was obsoleted by schema_builder some time ago. Not to patch all its callers, that helper became wrapper around it. Remained users are all in tests, and patching the to use builder directory makes the code shorter in many cases. Closes scylladb/scylladb#20466 * github.com:scylladb/scylladb: schema: Ditch make_shared_schema() helper test: Tune up indentation in uncompressed_schema() test: Make tests use schema_builder instead of make_shared_schema	2024-09-13 12:25:10 +03:00
Pavel Emelyanov	730731da4a	test: Remove unused table config from max_ongoing_compaction_test The local config is unused since #15909, when the table creation was changed to use env's facilities. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20511	2024-09-13 12:21:56 +03:00
Pavel Emelyanov	4c77f474ed	test: Remove unused upload_path local variable Since #14152 creation of an sstable takes table dir and its state. The test in question wants to create and sstable in upload/ subdir and for that it used to maintain full "cf.dir/upload" path, which is not required any more. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20514	2024-09-13 12:21:00 +03:00
Pavel Emelyanov	e9a1c0716f	test: Use sstables::test_env to make sstables for directory test This is continuation of #20431 in another test. After #20395 it's also possible to remove unused local dir variables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20541	2024-09-13 12:19:59 +03:00
Botond Dénes	4fb194117e	Merge 'Generalize multipart upload implementations in S3 client' from Pavel Emelyanov There are two currently -- upload_sink_base and do_upload_file. This PR merges as much code as possible (spoiler: it's already mostly copy-n-pase-d, so squashing is pretty straightforward) Closes scylladb/scylladb#20568 * github.com:scylladb/scylladb: s3/client: Reuse class multipart_upload in do_upload_file s3/client: Split upload_sink_base class into two	2024-09-13 10:35:10 +03:00
Kefu Chai	cf1f90fe0c	auth: remove unused #include the `seastar/core/print.hh` header is no longer required by `auth/resource.hh`. this was identified by clang-include-cleaner. As the code is audited, wecan safely remove the #include directive. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20575	2024-09-13 09:49:05 +03:00
Botond Dénes	c7c5817808	Merge 'Improve timestamp heuristics for tombstone garbage collection' from Benny Halevy When purging regular tombstone consult the min_live_timestamp, if available. This is safe since we don't need to protect dead data from resurrection, as it is already dead. For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp, if available, otherwise fallback to the min_live_timestamp. If we see in a view table a shadowable tombstone with time T, then in any row where the row marker's timestamp is higher than T the shadowable tombstone is completely ignored and it doesn't hide any data in any column, so the shadowable tombstone can be safely purged without any effect or risk resurrecting any deleted data. In other words, rows which might cause problems for purging a shadowable tombstone with time T are rows with row markers older or equal T. So to know if a whole sstable can cause problems for shadowable tombstone of time T, we need to check if the sstable's oldest row marker (and not oldest column) is older or equal T. And the same check applies similarly to the memtable. If both extended timestamp statistics are missing, fallback to the legacy (and inaccurate) min_timestamp. Fixes scylladb/scylladb#20423 Fixes scylladb/scylladb#20424 > [!NOTE] > no backport needed at this time > We may consider backport later on after given some soak time in master/enterprise > since we do see tombstone accumulation in the field under some materialized views workloads Closes scylladb/scylladb#20446 * github.com:scylladb/scylladb: cql-pytest: add test_compaction_tombstone_gc sstable_compaction_test: add mv_tombstone_purge_test sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection sstable_compaction_test: tombstone_purge_test: add testlog debugging sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp sstable, compaction: add debug logging for extended min timestamp stats compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats compaction: define max_purgeable_fn tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh sstables: scylla_metadata: add ext_timestamp_stats compaction_group, storage_group, table_state: add extended timestamp stats getters sstables, memtable: track live timestamps memtable_encoding_stats_collector: update row_marker: do nothing if missing	2024-09-13 08:56:51 +03:00
Takuya ASADA	3cd2a61736	dist: drop scylla-jmx Since JMX server is deprecated, drop them from submodule, build system and package definition. Related scylladb/scylla-tools-java#370 Related #14856 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes scylladb/scylladb#17969	2024-09-13 07:59:45 +03:00
Botond Dénes	fc9804ec31	Update tools/java submodule * tools/java 0b4accdd...e505a6d3 (1): > [C-S] Make it use DCAwareRoundRobinPolicy unless rack is provided Closes scylladb/scylladb#20562	2024-09-13 06:30:04 +03:00
Takuya ASADA	0ac450de05	scylla_raid_setup: configure SELinux file context On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla. Fixes #20573	2024-09-13 04:31:52 +09:00
Takuya ASADA	56c971373c	scylla_coredump_setup: fix SELinux configuration for RHEL9 Seems like specific version of systemd pacakge on RHEL9 has a bug on SELinux configuration, it introduced "systemd-container-coredump" module to provide rule for systemd-coredump, but not enabled by default. We have to manually load it, otherwise it causes permission error. Fixes #19325	2024-09-13 04:31:16 +09:00
Pavel Emelyanov	17e7d3145c	s3/client: Reuse class multipart_upload in do_upload_file Uploading a file is implemented by the do_upload_file class. This class re-implements a big portion of what's currently in multipart_upload one. This patch makes the former class inherit from the latter and removes all the duplication from it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-12 18:38:16 +03:00
Pavel Emelyanov	14b741afc9	s3/client: Split upload_sink_base class into two This class implements two facilities -- multipart upload protocol itself plus some common parts of upload_sink_impl (in fact -- only close() and plugs put(packet)). This patch aplits those two facilities into two classes. One of them will be re-used later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-12 18:00:19 +03:00
Sergey Zolotukhin	612a141660	raft: Fix race condition on override_snapshot_thresholds. When the server_impl::applier_fiber is paused by a co_await at line raft/server.cc:1375: ``` co_await override_snapshot_thresholds(); ``` a new snapshot may be applied, which updates the actual values of the log's last applied and snapshot indexes. As a result, the new snapshot index could become higher than the old value stored in _applied_idx at line raft/server.cc:1365, leading to an assertion failure in log::last_conf_for(). Since error injection is disabled in release builds, this issue does not affect production releases. This issue was introduced in the following commit `9dfa041fe1`, when error injection was added to override the log snapshot configuration parameters. How to reproduce: 1. Build debug version of randomized_nemesis_test ``` ninja-build build/debug/test/raft/randomized_nemesis_test ``` 2. Run ``` parallel --halt now,fail=1 -j20 'build/debug/test/raft/randomized_nemesis_test \ --run_test=test_frequent_snapshotting -- -c2 -m2G --overprovisioned --unsafe-bypass-fsync 1 \ --kernel-page-cache 1 --blocked-reactor-notify-ms 2000000 --default-log-level \ trace > tmp/logs/eraseme_{}.log 2>&1 && rm tmp/logs/eraseme_{}.log' ::: {1..1000} ``` Fixes scylladb/scylladb#20363 Closes scylladb/scylladb#20555	2024-09-12 16:19:27 +02:00
Aleksandra Martyniuk	59fba9016f	docs: operating-scylla: add task manager docs Admin-facing documentation of task manager. Closes scylladb/scylladb#20209	2024-09-12 16:42:28 +03:00
Nadav Har'El	d49dbb944c	Merge 'doc: move Alternator in the page tree and remove it's redundant ToC' from Anna Stuchlik This PR hides the ToC on the Alternator page, as we don't need it, especially at the end of the page. The ToC must be hidden rather than removed because removing it would, in turn, remove the "Getting Started With ScyllaDB Alternator" and "ScyllaDB Alternator for DynamoDB users" from the page tree and make them inaccessible. In addition, this PR moves Alternator higher in the page tree. Fixes https://github.com/scylladb/scylladb/issues/19823 Closes scylladb/scylladb#20565 * github.com:scylladb/scylladb: doc: move Alternator higher in the page tree doc: hide the redundant ToC on the Alternator page	2024-09-12 15:58:34 +03:00
Nadav Har'El	930accad12	alternator: return error on unused AttributeDefinitions A CreateTable request defines the KeySchema of the base table and each of its GSIs and LSIs. It also needs to give an AttributeDefinition for each attribute used in a KeySchema - which among other things specifies this attribute's type (e.g., S, N, etc.). Other, non-key, attributes do not have a specified type, and accordingly must not be mentioned in AttributeDefinitions. Before this patch, Alternator just ignored unused AttributeDefinitions entries, whereas DynamoDB throws an error in this case. This patch fixes Alternator's behavior to match DynamoDB's - and adds a test to verify this. Besides being more error-path-compatible with DynamoDB, this extra check can also help users: We already had one user complaining that an AttributeDefinitions setting he was using was ignored, not realizing that it wasn't used by any KeySchema. A clear error message would have saved this user hours of investigation. Fixes #19784. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20378	2024-09-12 15:37:18 +03:00
Pavel Emelyanov	632a65bffa	Merge 'repair: row_level: coroutinize more functions' from Avi Kivity Coroutinize more functions in row-level repair to improve maintainability. The functions all deal with repair buffers, so coroutinization does not affect performance. Cleanup, no reason to backport Closes scylladb/scylladb#20464 * github.com:scylladb/scylladb: repair: row_level: restore indentation repair: row_level: coroutinize repair_service::insert_repair_meta() repair: row_level: coroutinize repair_meta::get_full_row_hashes() repair: row_level: coroutinize repair_meta::apply_rows_on_follower() repair: row_level: coroutinize repair_meta::clear_working_row_buf() repair: row_level: coroutinize get_common_diff_detect_algorithm() repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload) repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload) repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload) repair: row_level: row_level_repair::run() repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes() repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes() repair: row_level: row_level_repair::negotiate_sync_boundary() repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op() repair: row_level: coroutinize repair_meta::get_sync_boundary_handler() repair: row_level: coroutinize repair_meta::get_sync_boundary() repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler() repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions() repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler() repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions() repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler() repair: row_level: coroutinize repair_meta::repair_row_level_stop() repair: row_level: coroutinize repair_meta::repair_row_level_start_handler() repair: row_level: coroutinize repair_meta::repair_row_level_start() repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler() repair: row_level: coroutinize repair_meta::get_combined_row_hash() repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler() repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream() repair: row_level: coroutinize repair_meta::request_row_hashes()	2024-09-12 15:35:57 +03:00
Botond Dénes	f834ad81e0	docs/dev/reader-concurrency-semaphore.md: update the documentation on diagnostics dumps The part of the document which explains diagnostics dumps was due for an update. It was missing an explanation on the dumped stats and it also needs to explain the "Problematic permit" and "Identified bottleneck(s)".	2024-09-12 08:31:25 -04:00
Botond Dénes	fdff4beb1f	test/boost/reader_concurrency_semaphore_test: test the new diagnostics functionality Adjust the test reader_concurrency_semaphore_dump_reader_diganostics to also cover the new diagnostics functionality. The test is not a correctness test -- the output has to be inspected by a human. But it is good enough to make sure the code paths do not have any memory errors.	2024-09-12 08:31:25 -04:00
Botond Dénes	40b6616d3d	reader_concurrency_semaphore: add bottleneck self-diagnosis to diagnosis dump There are a few typical cases of bottlenecks, which can be easily identified when dumping the semaphore diagnostics. Identify and print these to fast-track investigations.	2024-09-12 08:31:25 -04:00
Botond Dénes	7d2b931619	reader_concurrency_semaphore: include trigger permit in diagnostic dump In the previous patch, we provided an opportunity for callers to provide a trigger permit, when calling `maybe_dump_reader_permit_diagnostics()`. If the caller provided the trigger permit, include its details in the dump, allowing the identification of the table and code-path of the permit which triggered the dump.	2024-09-12 08:30:50 -04:00
Kefu Chai	197451f8c9	utils/rjson.cc: include the function name in exception message recently, we are observing errors like: ``` stderr: error running operation: rjson::error (JSON SCYLLA_ASSERT failed on condition 'false', at: 0x60d6c8e 0x4d853fd 0x50d3ac8 0x518f5cd 0x51c4a4b 0x5fad446) ``` we only passed `false` to the `RAPIDJSON_ASSERT()` macro, so what we have is but the type of the error (rjson::error) and a backtrace. would be better if we can have more information without recompiling or fetching the debug symbols for decipher the backtrace. Refs scylladb/scylladb#20533 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20539	2024-09-12 15:22:49 +03:00
Anna Stuchlik	851e903f46	doc: move Alternator higher in the page tree	2024-09-12 14:08:26 +02:00
Anna Stuchlik	a32ff55c66	doc: hide the redundant ToC on the Alternator page This commit hides the ToC, as we don't need it, especially at the end of the page. The ToC must be hidden rather than removed because removing it would, in turn, remove the "Getting Started With ScyllaDB Alternator" and "ScyllaDB Alternator for DynamoDB users" from the page tree and make them inaccessible.	2024-09-12 14:01:15 +02:00
Benny Halevy	0b93409b44	cql_server: connection: process: fixup indentation Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-12 11:32:17 +02:00
Benny Halevy	71052dca6a	cql_server: connection: process_on_shard: drop permit parameter It is currently unused in `process_on_shard`, which generates an empty service_permit. The next patch may call process_on_shard in a loop, so it can't simply move the permit to the callee and better hold on to it until processing completes. `cql_server::connection::process` was turned into a coroutine in this patch to hold on to the permit parameter in a simple way. This is a preliminary step to changing `if (bounce_msg)` to `while (bounce_msg)` that will allow rebouncing the message in case it moved yet again when yielding in `process_on_shard`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-12 11:32:17 +02:00
Benny Halevy	eb7fbdbed2	transport: server: pass bounce_to_shard as foreign shared ptr So it can safely passed between shards, as will be needed in the following patch that handles a (re)bounce_to_shard result from process_fn that's called by `process_on_shard` on the `move_to_shard`. With that in mind, pass the `bounce_to_shard` payload to `process_on_shard` rather than the foreign shared ptr since the latter grabs what it needs from it on entry and the shared_ptr can be released on the calling shard. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-12 11:32:15 +02:00
Benny Halevy	0df6f55379	cql_server: connection: process: add template concept for process_fn Quoting Avi Kivity: > Out of scope: we should consider detemplating this. As a follow-up we should consider that and pass a function object as process_fn, just make sure there are no drawbacks. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-12 10:26:13 +02:00
Benny Halevy	150dce5de0	cql_server: move process_fn_return_type to class definition So it can be used for a template concept in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-12 10:26:13 +02:00
Alexey Novikov	8b6e987a99	test: add test_pinned_cl_segment_doesnt_resurrect_data add test for issue when writes in commitlog segments pinned to another table can be resurrected. This test based on dtest code published in #14870 and adapted for community version. It's a regression test for #15060 fix and should fail before this patch and succeed afterwards. Refs #14870, #15060 Closes scylladb/scylladb#20331	2024-09-12 10:58:22 +03:00
Takuya ASADA	90ab2a24df	toolchain: restore multiarch build When we introduced optimized clang at `6e487a4`, we dropped multiarch build on frozen toolchain, because building clang on QEMU emulation is too heavy. Actually, even after the patch merged, there are two mode which does not build clang, --clang-build-mode INSTALL_FROM and --clang-build-mode SKIP. So we should restore multiarch build only these mode, and keep skipping on INSTALL mode since it builds clang. Since we apply multiarch on INSTALL_FROM mode, --clang-archive replaced to --clang-archive-x86_64 and --clang-archive-aarch64. Note that this breaks compatibility of existing clang archive, since it changes clang root directory name from llvm-project to llvm-project-$ARCH. Closes #20442 Closes scylladb/scylladb#20444	2024-09-12 10:44:45 +03:00
Botond Dénes	c044904f07	reader_concurrency_semaphore: propagate permit to do_dump_reader_permit_diagnostics() Will be used in the next patch.	2024-09-12 00:51:56 -04:00
Botond Dénes	67565a5eee	reader_concurrency_semaphore: use consistent exception type for timeout When a read times out, we use different exception types for the permit's future (if the permit is waiting), or the permit's abort exception _ex (which is used to abort ongoing reads). This patch changes both to use named_semaphore_timed_out, which is the more verbose of the two.	2024-09-12 00:51:03 -04:00
Botond Dénes	036d27dc1b	reader_concurrency_semaphore: dump diagnostics when non-waiting reader times out Currently the semaphore only dumps diagnostics when a waiting reader times out. The diagnostics are also useful when a non-waiting reader (which is in the process of reading) times out, so also dump diagnostics in this case. Change the code to use a switch statement, so future addition of states don't miss updating this logic.	2024-09-12 00:51:03 -04:00
Kefu Chai	3e84d43f93	treewide: use seastar::format() or fmt::format() explicitly before this change, we rely on `using namespace seastar` to use `seastar::format()` without qualifying the `format()` with its namespace. this works fine until we changed the parameter type of format string `seastar::format()` from `const char*` to `fmt::format_string<...>`. this change practically invited `seastar::format()` to the club of `std::format()` and `fmt::format()`, where all members accept a templated parameter as its `fmt` parameter. and `seastar::format()` is not the best candidate anymore. despite that argument-dependent lookup (ADT for short) favors the function which is in the same namespace as its parameter, but `using namespace` makes `seastar::format()` more competitive, so both `std::format()` and `seastar::format()` are considered as the condidates. that is what is happening scylladb in quite a few caller sites of `format()`, hence ADT is not able to tell which function the winner in the name lookup: ``` /__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous 265 \| return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id()); \| ^~~~~~ /usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 4290 \| format(format_string<_Args...> __fmt, _Args&&... __args) \| ^ /__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>] 143 \| format(fmt::format_string<A...> fmt, A&&... a) { \| ^ ``` in this change, we change all `format()` to either `fmt::format()` or `seastar::format()` with following rules: - if the caller expects an `sstring` or `std::string_view`, change to `seastar::format()` - if the caller expects an `std::string`, change to `fmt::format()`. because, `sstring::operator std::basic_string` would incur a deep copy. we will need another change to enable scylladb to compile with the latest seastar. namely, to pass the format string as a templated parameter down to helper functions which format their parameters. to miminize the scope of this change, let's include that change when bumping up the seastar submodule. as that change will depend on the seastar change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-11 23:21:40 +03:00
Pavel Emelyanov	f227f4332c	test: Remove unused path local variable Left after #20499 :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20540	2024-09-11 23:10:25 +03:00
Avi Kivity	ed7d352e7d	Merge 'Validate checksums for uncompressed SSTables' from Nikos Dragazis This PR introduces a new file data source implementation for uncompressed SSTables that will be validating the checksum of each chunk that is being read. Unlike for compressed SSTables, checksum validation for uncompressed SSTables will be active for scrub/validate reads but not for normal user reads to ensure we will not have any performance regression. It consists of: * A new file data source for uncompressed SSTables. * Integration of checksums into SSTable's shareable components. The validation code loads the component on demand and manages its lifecycle with shared pointers. * A new `integrity_check` flag to enable the new file data source for uncompressed SSTables. The flag is currently enabled only through the validation path, i.e., it does not affect normal user reads. * New scrub tests for both compressed and uncompressed SSTables, as well as improvements in the existing ones. * A change in JSON response of `scylla validate-checksums` to report if an uncompressed SSTable cannot be validated due to lack of checksums (no `CRC.db` in `TOC.txt`). Refs #19058. New feature, no backport is needed. Closes scylladb/scylladb#20207 * github.com:scylladb/scylladb: test: Add test to validate SSTables with no checksums tools: Fix typo in help message of scylla validate-checksums sstables: Allow validate_checksums() to report missing checksums test: Add test for concurrent scrub/validate operations test: Add scrub/validate tests for uncompressed SSTables test/lib: Add option to create uncompressed random schemas test: Add test for scrub/validate with file-level corruption test: Check validation errors in scrub tests sstables: Enable checksum validation for uncompressed SSTables sstables: Expose integrity option via crawling mutation readers sstables: Expose integrity option via data_consume_rows() sstables: Add option for integrity check in data streams sstables: Remove unused variable sstables: Add checksum in the SSTable components sstables: Introduce checksummed file data source implementation sstables: Replace assert with on_internal_error	2024-09-11 23:09:45 +03:00
Calle Wilund	b7839ec5d0	cql_test_env: Use temp socket + retry to ensure usable port for message_service if listen is enabled Fixes #20543 In cql_test_env, if cfg_in.ms_listen is set, we try to get a free port for the current test on which message service rpc can bind. This to allow multiple tests in parallel. However, we just do this by using random and getting a number, not actually verifying it against host ports in use. This is complicated further by the fact that port reuse is effectively disabled in seastar (see reactor::posix_reuseport_detect()). Due to this, the solution applied here is a combo of * Create temp socket with port = 0 to get a previously free port * Close socket right before listen (to handle reuse not working) * Retry on EADDRINUSE Closes scylladb/scylladb#20547	2024-09-11 23:02:41 +03:00
Aleksandra Martyniuk	31ea74b96e	db: system_keyspace: change version of topology_requests schema In `880058073b` a new column (request_type) was added to topology_requests table, but the table's schema version wasn't changed. Due to that during cluster upgrade, the old and the new versions occur but they are not distinguishable. Add offset to schema version of topology_requests table if it contains request_type column. Fixes: #20299. Closes scylladb/scylladb#20402	2024-09-11 16:36:35 +03:00
Piotr Dulikowski	d98708013c	Merge 'view: move view_build_status to group0' from Michael Litvak Migrate the `system_distributed.view_build_status` table to `system.view_build_status_v2`. The writes to the v2 table are done via raft group0 operations. The new parameter `view_builder_version` stored in `scylla_local` indicates whether nodes should use the old or the new table. New clusters use v2. Otherwise, the migration to v2 is initiated by the topology coordinator when the feature is enabled. It reads all the rows from the old table and writes them to the new table, and sets `view_builder_version` to v2. When the change is applied, all view_builder services are updated to write and read from the v2 table. The old table `system_distributed.view_build_status` is set to read virtually from the new table in order to maintain compatibility. When removing a node from the cluster, we remove its rows from the table atomically (fixes https://github.com/scylladb/scylladb/issues/11836). Also, during the migration, we remove all invalid rows. Fixes scylladb/scylladb#15329 dtest https://github.com/scylladb/scylla-dtest/pull/4827 Closes scylladb/scylladb#19745 * github.com:scylladb/scylladb: view: test view_build_status table with node replace test/pylib: use view_build_status_v2 table in wait_for_view view_builder: common write view_build_status function view_builder: improve migration to v2 with intermediate phase view: delete node rows from view_build_status on node removal view: sanitize view_build_status during migration view: make old view_build_status table a virtual table replica: move streaming_reader_lifecycle_policy to header file view_builder: test view_build_status_v2 storage_service: add view_build_status to raft snapshot view_builder: migration to v2 db:system_keyspace: add view_builder_version to scylla_local view_builder: read view status from v2 table view_builder: introduce writing status mutations via raft view_builder: pass group0_client and qp to view_builder view_builder: extract sys_dist status operations to functions db:system_keyspace: add view_build_status_v2 table	2024-09-11 13:02:58 +02:00
Nikos Dragazis	d1152a200f	test: Add test to validate SSTables with no checksums In a previous patch we extended the return status of `sstables::validate_checksums()` to report if an SSTable cannot be validated due to a missing CRC component (i.e., CRC.db does not appear in TOC.txt). Add a test case for this. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:40 +03:00
Nikos Dragazis	1f275c71b1	tools: Fix typo in help message of scylla validate-checksums Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	5c0a7f706b	sstables: Allow validate_checksums() to report missing checksums Change the return type of `sstable::validate_checksums()` from binary (valid/invalid) to a ternary (valid/invalid/no_checksums). The third status represents uncompressed SSTables without a CRC component (no entry for CRC.db in the TOC). Also, change the JSON response of `sstable validate-checksums` to expose the new status. Replace the boolean value for valid/invalid checksums with an object that contains two boolean keys: one that indicates if the SSTable has checksums, and one that indicates if the checksums are valid or not. The second key is optional and appears only if the SSTable has checksums. Finally, update the documentation to reflect the changes in the API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	5a284f4a9d	test: Add test for concurrent scrub/validate operations Theoretically it is possible to launch more than one scrub instances simultaneously. Since the checksum component is a shared resource, accesses have to be synchronized. Add a test that launches two scrub operations in validate mode and ensures that the checksum component is loaded once, referenced by all scrub instances via shared pointers, and deleted once the scrub operations finish. Introduce an injection point to achieve concurrent execution of scrubs. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	e2353f3b3e	test: Add scrub/validate tests for uncompressed SSTables Currently the unit tests check scrub in validate mode against compressed SSTables only. Mirror the tests for uncompressed SSTables as well. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:39 +03:00
Nikos Dragazis	2991b09c8e	test/lib: Add option to create uncompressed random schemas Extend the `random_schema_specification` to support creating both compressed and uncompressed schemas. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 13:12:32 +03:00
Nikos Dragazis	4f56c587f6	test: Add test for scrub/validate with file-level corruption Currently, we test scrub/validate only against a corrupted SSTable with content-level corruption (out-of-order partition key). Add a test for file-level corruption as well. This should trigger the checksum check in the underlying compressed file data source implementation. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	cc10a5f287	test: Check validation errors in scrub tests Scrub was extended in PR #11074 to report validation errors but the unit tests were not updated. Update the tests to check the validation errors reported by scrub. Validation errors must be zero for valid SSTables and non-zero for invalid SSTables. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	719757fba9	sstables: Enable checksum validation for uncompressed SSTables Extend the `sstable::validate()` to validate the checksums of uncompressed SSTables. Given that this is already supported for compressed SSTables, this allows us to provide consistent behavior across any type of SSTable, be it either compressed or uncompressed. The most prominent use case for this is scrub/validate, which is now able to detect file-level corruption in uncompressed SSTables as well. Note that this change will not affect normal user reads which skip checksum validation altogether. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	716fc487fd	sstables: Expose integrity option via crawling mutation readers Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	1d2dc9f2e1	sstables: Expose integrity option via data_consume_rows() Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:28:59 +03:00
Nikos Dragazis	2feced32f7	sstables: Add option for integrity check in data streams Add a new boolean parameter in `sstable::data_stream()` to enable/disable integrity mechanisms in the underlying data streams. Currently, this only affects uncompressed SSTables and it allows to enable/disable checksum validation on each chunk. The validation happens transparently via the checksummed data source implementation. The reason we need this option is to allow differentiating the behavior between normal user reads and scrub/validate reads. We would like to enable scrub to verify checksums for uncompressed SSTables, while leaving normal user reads unchanged for performance reasons (read amplification due to round up of reads to chunk size and loading of the CRC component). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:27:54 +03:00
Nikos Dragazis	d5bd40ad2c	sstables: Remove unused variable Remove unused stream variable from `sstable::data_stream()`. This was introduced in commit `47e07b787e` but never used. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:27:54 +03:00
Nikos Dragazis	2575d20f41	sstables: Add checksum in the SSTable components Uncompressed SSTables store their checksums in a separate CRC.db file. Add this in the list of SSTable components. Since this component is used only for validation, load the component on-demand for validation tasks and delete it when all validation tasks finish. In more detail: - Make the checksum component shareable and weakly referencable. Also, add a constructor since it is no longer an aggregate. - Use a weak pointer to store a non-owning reference in the components and a shared pointer to keep the object alive while validation runs. Once validation finishes, the component should be cleaned up automatically. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:27:38 +03:00
Nikos Dragazis	b7dfba4c18	sstables: Introduce checksummed file data source implementation Introduce a new data source implementation for uncompressed SSTables. This is just a thin wrapper for a raw data source that also performs checksum validation for each chunk. This way we can have consistent behavior for compressed and uncompressed SSTables. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-11 12:26:18 +03:00
Botond Dénes	0e5b444777	Merge 'database::get_all_tables_flushed_at: fix return value' from Lakshmi Narayanan Sreethar The `database::get_all_tables_flushed_at` method returns a variable without setting the computed all_tables_flushed_at value. This causes its caller, `maybe_flush_all_tables` to flush all the tables everytime regardless of when they were last flushed. Fix this by returning the computed value from `database::get_all_tables_flushed_at`. Fixes #20301 Requires a backport to 6.0 and 6.1 as they have the same issue. Closes scylladb/scylladb#20471 * github.com:scylladb/scylladb: cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config database::get_all_tables_flushed_at: fix return value	2024-09-11 11:43:45 +03:00
Amnon Heiman	46792bd04f	docs/alternator/compatibility.md: explain the consumed capacity provisioned This patch change the alternator documentation to express that the provisoned units are stored and return but Alternator ignores them. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-10 17:28:31 -04:00
Amnon Heiman	3726c20564	Add test/alternator/test_provisioned_throughput.py The test_provisioned_throughput.py test ProvisionedThroughput support. The first test, check that ProvisionedThroughput can be set and get when using describe table. The second test check that missing read or write will throw an exception. The third test check that when using billing PAY_PER_REQUEST it returns zero for the read and write units. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-10 17:27:19 -04:00
Amnon Heiman	9b5f29b6bc	test/alternator/util.py: Allow override BillingMode This patch adds the ability to override the BillingMode. If a BillingMode is provided to the create_test_table function, it will override the default BillingMode. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-10 17:06:47 -04:00
Amnon Heiman	c76347032d	alternator/executor.cc: Store ProvisionedThroughput This patch adds the ability to store and retrieve the ProvisionedThroughput in a table. The information is stored in the table tags. We use the TTL convention used in alternator, and the tags will be: system:provisioned_rcu and system:provisioned_wcu. verify_billing_mode function now return a struct with the billing mode information. The code of describe_table now check if the provision tags exists and return the RCU and WCU accordingly. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-09-10 17:06:40 -04:00
Benny Halevy	4e8f3f4cdd	cql-pytest: add test_compaction_tombstone_gc Test tombstone garbage collection with: 1. conflicting live data in memtable (verifying there is no regression in this area) 2. deletion in memtable (reproducing scylladb/scylladb#20423) 3. materialized view update in memtable (reproducing scylladb/scylladb#20424) in materialized_views Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:06:23 +03:00
Benny Halevy	9270348c38	sstable_compaction_test: add mv_tombstone_purge_test Simulate view updates pattern and verify that they don't inhibit tombstone garbage collection. Verify fix for scylladb/scylladb#20424 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:06:23 +03:00
Benny Halevy	0407e50aa4	sstable_compaction_test: tombstone_purge_test: test that old deleted data do not inhibit tombstone garbage collection Tests fix for scylladb/scylladb#20423 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:06:06 +03:00
Benny Halevy	a7caa79df7	sstable_compaction_test: tombstone_purge_test: add testlog debugging Add some testlog debug printouts for the make_* helpers. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:58 +03:00
Benny Halevy	470d301fe3	sstable_compaction_test: tombstone_purge_test: make_expiring: use next_timestamp Rather than forging a timestamp from the gc_clock just use `next_timestamp` do it can be considered for tomebstone purging purposes. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:58 +03:00
Benny Halevy	5849ba83e0	sstable, compaction: add debug logging for extended min timestamp stats Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	7d893a5ed9	compaction: get_max_purgeable_timestamp: use memtable and sstable extended timestamp stats When purging regular tombstone consult the min_live_timestamp, if available. For shadowable_tombstones, consult the min_memtable_live_row_marker_timestamp, if available, otherwise fallback to the min_live_timestamp. If both are missing, fallback to the legacy (and inaccurate) min_timestamp. Fixes scylladb/scylladb#20423 Fixes scylladb/scylladb#20424 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	57e9e9c369	compaction: define max_purgeable_fn Before we add a new, is_shadowable, parameter to it. And define global `can_always_purge` and `can_never_purge` functions, a-la `always_gc` and `never_gc`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	b6fabd98c6	tombstone: can_gc_fn: move declaration to compaction_garbage_collector.hh And define `never_gc` globally, same as `always_gc` Before adding a new, is_shadowable parameter to it. Since it is used in the context of compaction it better fits compaction_garbage_collector header rather than tombstone.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	4de4af954f	sstables: scylla_metadata: add ext_timestamp_stats Store and retrieve the optional extended timestamp statistics (min_live_timestamp and min_live_row_marker_timestamp) in the scylla_metadata component. Note that there is no need for a cluster feature to store those attributes since the scylla_metadata on-disk format is extensible so that old sstables can be read by new versions, seeing the extra stats is missing, and new sstables can be read by old versions that ignore unknown scylla metadata section types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	6f202cf48b	compaction_group, storage_group, table_state: add extended timestamp stats getters To return the minimum live timestamp and live row-marker timestamp across a compaction_group, storage_group, or table_state. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:57 +03:00
Benny Halevy	14d86a3a12	sstables, memtable: track live timestamps When garbage collecting tombstones, we care only about shadowing of live data. However, currently we track min/max timestamp of both live and dead data, but there is no problem with purging tombstones that shadow dead data (expired or shdowed by other tombstones in the sstable/memtable). Also, for shadowable tombstones, we track live row marker timestamps separately since, if the live row marker timestamp is greater than a shadowable tombstone timestamp, then the row marker would shadow the shadowable tombstone thus exposing the cells in that row, even if their timestasmp may be smaller than the shadow tombstone's. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 19:05:49 +03:00
Abhi	9b09439065	raft: Add descriptions for requested abort errors Fixes: scylladb/scylladb#18902 Closes scylladb/scylladb#20291	2024-09-10 17:56:29 +02:00
Botond Dénes	de81388edb	Merge 'commitlog: Handle oversized entries' from Calle Wilund Refs #18161 Yet another approach to dealing with large commitlog submissions. We handle oversize single mutation by adding yet another entry typo: fragmented. In this case we only add a fragment (aha) of the data that needs storing into each entry, along with metadata to correlate and reconstruct the full entry on replay. Because these fragmented entries are spread over N segments, we also need to add references from the first segment in a chain to the subsequent ones. These are released once we clear the relevant cf_id count in the base. * This approach has the downside that due to how serialization etc works w.r.t. mutations, we need to create an intermediate buffer to hold the full serialized target entry. This is then incrementally written into entries of < max_mutation_size, successively requesting more segments. On replay, when encountering a fragment chain, the fragment is added to a "state", i.e. a mapping of currently processing frag chains. Once we've found all fragments and concatenated the buffers into a single fragmented one, we can issue a replay callback as usual. Note that a replay caller will need to create and provide such a state object. Old signature replay function remains for tests and such. This approach bumps the file format (docs to come). To ensure "atomicity" we both force synchronization, and should the whole op fail, we restore segment state (rewinding), thus discarding data all we wrote. Closes scylladb/scylladb#19472 * github.com:scylladb/scylladb: commitlog/database: Make some commitlog options updatable + add feature listener features/config: Add feature for fragmented commitlog entries docs: Add entry on commitlog file format v4 commitlog_test: Add more oversized cases commitlog_replayer: Replay segments in order created commitlog_replayer: Use replay state to support fragmented entries commitlog_replayer: coroutinize partly commitlog: Handle oversized entries	2024-09-10 17:15:46 +03:00
Benny Halevy	8d67357c42	memtable_encoding_stats_collector: update row_marker: do nothing if missing If the row_marker is missing then its timestamp is missing as well, so there's no point calling update_timestamp for it. Better return early. This should cause no functional change. The following patch will add more logic for tracking extended timestamp stats. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 16:46:34 +03:00
Pavel Emelyanov	b6f662417c	table: Remove unused database& argument from take_snapshot() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20496	2024-09-10 14:53:06 +03:00
Gleb Natapov	af83c5e53e	group0: stop group0 before draining storage service during shutdown Currently storage service is drained while group0 is still active. The draining stops commitlogs, so after this point no more writes are possible, but if group0 is still active it may try to apply commands which will try to do writes and they will fail causing group0 state machine errors. This is benign since we are shutting down anyway, but better to fix shutdown order to keep logs clean. Fixes scylladb/scylladb#19665	2024-09-10 13:15:56 +02:00
Lakshmi Narayanan Sreethar	a0f4fe3fc4	cql-pytest: add test to verify compaction_flush_all_tables_before_major_seconds config Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-10 16:39:05 +05:30
Lakshmi Narayanan Sreethar	4ca720f0bd	database::get_all_tables_flushed_at: fix return value The `database::get_all_tables_flushed_at` method returns a variable without setting the computed all_tables_flushed_at value. This causes its caller, `maybe_flush_all_tables` to flush all the tables everytime regardless of when they were last flushed. Fix this by returning the computed value from `database::get_all_tables_flushed_at`. Fixes #20301 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-10 16:35:47 +05:30
Yaniv Michael Kaul	a4ff0aae47	HACKIGN.md: clarify the use of dbuild when running test.py If you are using dbuild, that's where test.py needs to run. Also, replace 'Docker image' with the more generic 'container' term. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20336	2024-09-10 13:40:45 +03:00
Botond Dénes	08f109724b	docs/cql/ddl.rst: fix description of sstable_compression ScyllaDB doesn't support custom compressors. The available compressors are the only available ones, not the default ones. Adjust the text to reflect this. Closes scylladb/scylladb#20225	2024-09-10 13:39:24 +03:00
Pavel Emelyanov	cfa59ab73d	test: Use single temp dir for sharded<sstables::test_env> The test-env in question is mostly started in one-shard mode. Also there are several boost tests that start sharded<> environment. In that case instances on different shards live in different temp dirs. That's not critical yet, but better to have single directory for the whole test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20412	2024-09-10 11:25:04 +03:00
Artsiom Mishuta	f95c257a1e	[test.py]: Fail test teardown in case of task leakage In test.py every asyncio task spawned during the test must be finished before the next test, otherwise, tests might affect each other results. The developers are responsible for writing asyncio code in a way that doesn’t leave task objects unfinished. Test.py has a mechanism that helps test writers avoid such tasks. At the end of each test case, it verifies that the test did not produce/leave any tasks and sets an event object that fails the next test at the start if this is the case(issue https://github.com/scylladb/scylladb/issues/16472) The problem with this was that breaking the next test was counterintuitive, and the logging for this situation was insufficient and unobvious. notes: Task.cancel() is not an option to avoid task leakage 1) Calling cancel() Does Not Cancel The Task : the cancel() method just request that the target task cancel. 2) Calling cancel() Does Not Block Until The Task is Cancelled: If the caller needs to know the task is cancelled and done, it could await for the target 3) In particular PR, task.cancel() cancell task on client(ManagerClient) but not on http server(ScyllaManager). so "await" is needed. Closes scylladb/scylladb#20012	2024-09-10 10:51:45 +03:00
Pavel Emelyanov	ac2127a640	test: Call table::make_sstable() directly in compaction test The test in question generates a bunch of table_for_tests objects and creates sstables for each. For that it calls test_env::make_sstable(), but it can be made shorter, by calling table method directly. The hidden goal of this change is to remove the explicit caller of table::dir() method. The latter is going away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20451	2024-09-10 10:19:20 +03:00
Botond Dénes	76bb22664a	Merge 'Sanitize open_sstables() helper in compaction test' from Pavel Emelyanov This includes - coroutinization - elimination of unused overload Closes scylladb/scylladb#20456 * github.com:scylladb/scylladb: test: Squash two open_sstables() helper together test: Coroutinize open_sstables() helper	2024-09-10 10:18:33 +03:00
Botond Dénes	a4a4797e27	Merge 'Alternator: tests and other preparation towards allowing adding a GSI to an existing table' from Nadav Har'El This series prepares us for working on #11567 - allow adding a GSI to a pre-existing table. This will require changing the implementation of GSIs in Alternator to not use real columns in the schema for the materialized view, and instead of a computed column - a function which extracts the desired member from the `:attrs` map and de-serializes it. This series does not contain the GSI re-implementation itself. Rather it contains a few small cleanups and mostly - new regression tests that cover this area, of adding and removing a GSI, and using a GSI, in more details than the tests we already had. I developed most of these tests while working on buggy fixes for #11567; The bugs in those implementations were exposed by the tests added here - they exposed bugs both in the new feature of adding or removing a GSI, and also regressions to the ordinary operation of GSI. So these tests should be helpful for whoever ends up fixing #11567, be it me based on my buggy implementation (which is _not_ included in this patch series), or someone else. No backports needed - this is part of a new feature, which we don't usually backport. Closes scylladb/scylladb#20383 * github.com:scylladb/scylladb: test/alternator: more extensive tests for GSI with two new key attributes test/alternator: test invalid key types for GSI test/alternator: test combination of LSI and GSI test/alternator: expand another test to use different write operations test/alternator: test GSIs with different key types alternator: better error message in some cases of key type mismatch test/alternator: test for more elaborate GSI updates test/alternator: strengthen tests for empty attribute values test/alternator: fix typo in test_batch.py test/alternator: more checks for GSI-key attribute validation Alternator: drop unneeded "IS NOT NULL" clauses in MV of GSI/LSI test/alternator: add more checks for adding/deleting a GSI test/alternator: ensure table deletions in test_gsi.py	2024-09-10 10:13:52 +03:00
Pavel Emelyanov	42f8d06a17	test: Use correct schema in directory tests with created table There are some test cases in sstable_directory_test test actually create a table with CQL and then try to manipulate its sstables with the help of sstable_directory. Those tests use existing local helper that starts sharded<sstable_directory> and this helper passes test-local static schema to sstable_directory constructor. As a result -- the schema of a table that test case created and the schema that sstable_directory works with are different. They match in the columns layout, which helps the test cases pass, but otherwise are two different schema objects with different IDs. It's more correct to use table schema for those runs. The fix introduces another helper to start sharded<sstable_directory>, and the older wrapper around cql_test_env becomes unused. Drop it too not to encourage future tests use it and re-introduce schema mismatch again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20499	2024-09-10 09:56:26 +03:00
Benny Halevy	f47b5e60bc	sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory To be able to atomically delete sstables both in base table directory and in its sub-directories, like `staging/`, use a shared pending_delete_dir under under the base directory. Note that this requires loading and processing the base directory first. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:13 +03:00
Benny Halevy	44bd183187	sstables: storage: keep base directory in base class so we can use the base (table) directory for e.g. pending_delete logs, in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:13 +03:00
Benny Halevy	027e64876a	sstables: storage: define opened_directory in header file So it can be used outside the storage module in the following patches. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:13 +03:00
Benny Halevy	a7b92d7b6f	sstable_directory: use only dirlog Currently, there are leftover log messages using sstlog rather than dirlog, that was introduced in `aebd965f0e`, and that makes debugging harder. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-09-10 09:28:11 +03:00
Botond Dénes	fc690a60d8	Update tools/cqlsh submodule * tools/cqlsh 86a280a1...b09bc793 (6): > build(deps): bump actions/download-artifact in /.github/workflows > cqlshlib/test: Add test_formatting.py > cqlshlib/test: Use assertEqual instead of assertEquals > cqlsh.py: Send DESCRIBE statement to server before parsing > cqlsh.py: Fix indentation > cqlsh.py: change shebang to /usr/bin/env python3	2024-09-10 08:11:40 +03:00
Lakshmi Narayanan Sreethar	2148e33d37	compaction: remove unnecessary share bump for split, scrub, and upgrade When split, scrub, and upgrade compactions ran under the compaction group, they had to bump up their shares to a minimum of 200 to prevent slow progress as they neared completion, especially in workloads with inconsistent ingestion rates. Since commit `e86965c2` moved these compactions to the maintenance group, this share bump is no longer necessary. This patch removes the unnecessary share allocation. Fixes #20224 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20495	2024-09-09 22:03:38 +03:00
Avi Kivity	9448260b30	Merge 'major compaction: check only sstables being compacted for tombstone garbage collection' from Lakshmi Narayanan Sreethar Any expired tombstone can be garbage collected if it doesn't shadow data in the commit log, memtable, or uncompacting SSTables. This PR introduces a new mode to major compaction, enabled by the `consider_only_existing_data` flag that bypasses these checks. When enabled, memtables and old commitlog segments are cleared with a system-wide flush and all the sstables (after flush) are included in the compaction, so that it works with all data generated up to a given time point. This new mode works with the assumption that newly written data will not be shadowed by expired tombstones. So it ignores new sstables (and new data written to memtable) created after compaction started. Since there was a system wide flush, commitlog checks can also be skipped when garbage collecting tombstones. Introducing data shadowed by a tombstone during compaction can lead to undefined behavior, even without this PR, as the tombstone may or may not have already been garbage collected. Fixes #19728 Closes scylladb/scylladb#20031 * github.com:scylladb/scylladb: cql-pytest: add test to verify consider_only_existing_data compaction option tools/scylla-nodetool: add consider-only-existing-data option to compact command api: compaction: add `consider_only_existing_data` option compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled tombstone_gc_state: introduce with_commitlog_check_disabled() compaction: introduce new option to check only compacting sstables for gc compaction: rename maybe_flush_all_tables to maybe_flush_commitlog compaction: maybe_flush_all_tables: add new force_flush param	2024-09-09 20:45:41 +03:00
Avi Kivity	894b85ce95	Merge 'hints: send hints with CL=ALL if target is leaving' from Piotr Dulikowski Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destination of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835 Should be backported to 6.0 and 6.1; the dtest started failing due to topology on raft, which sped up execution of the test and exposed the preexisting problem. Closes scylladb/scylladb#20488 * github.com:scylladb/scylladb: test: topology_custom/test_hints: consistency test for decommission test: topology_custom/test_hints: move sync point helpers to top level test: topology/util: extract find_server_by_host_id hints: send hints with CL=ALL if target is leaving hints: inline do_send_one_mutation	2024-09-09 18:23:13 +03:00
Avi Kivity	c3e19425bd	Merge 'docs/dev/docker-hub.md: refresh aio-max-nr calculation' from Laszlo Ersek ~~~ What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back to scylla commit `f4412029f4` ("docs/docker-hub.md: add quickstart section with --smp 1", 2020-09-22). Problems with the current language: - The "65K" claim as default value on non-production systems is wrong; "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K. - The section in question uses equal signs (=) incorrectly. The intent was probably to say "which means the same as", but that's not what equality means. - In the same section, the relational operator "<" is bogus. The available AIO count must be at least as high (>=) as the requested AIO count. - Clearer names should be used; adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc" sets a great example: - "reactor::max_aio" should be called "storage_iocbs", - "detect_aio_poll" should be called "preempt_iocbs", - "reactor_backend_aio::max_polls" should be called "network_iocbs". - The specific value 10000 for the last one ("network_iocbs") is not correct in scylla's context. It is correct as the Seastar default, but scylla has used 50000 since commit `2cfc517874` ("main, test: adjust number of networking iocbs", 2021-07-18). Rewrite the section to address these problems. See also: - https://github.com/scylladb/scylladb/issues/5981 - https://github.com/scylladb/seastar/pull/2396 - https://github.com/scylladb/scylladb/pull/19921 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> ~~~ No need for backporting; the documentation being refreshed targets developers as audience, not end-users. Closes scylladb/scylladb#20398 * github.com:scylladb/scylladb: docs/dev/docker-hub.md: refresh aio-max-nr calculation docs/dev/docker-hub.md: strip trailing whitespace	2024-09-09 15:04:38 +03:00
Botond Dénes	3e0bff161c	Merge 'Use yielding directory lister in sstable_directory' from Pavel Emelyanov The yielding lister is considered to be better replacement that scan_dir(lambda) one. Also, the sstable directory will be patched to scan the contents of S3 bucket and yielding lister fits better for generalization. Closes scylladb/scylladb#20114 * github.com:scylladb/scylladb: sstable_directory: Fix indentation after previous patches sstable_directory: Use yielding lister in .handle_sstables_pending_delete() sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs() sstable_directory: Use yielding lister in .prepare() sstable_directory: Shorten lister loop sstable_directory: Use with_closeable() in .process() directory_lister: Add noexcept default move-constructor	2024-09-09 14:35:51 +03:00
Pavel Emelyanov	0f48847d02	test: Use shorter with_sstable_directory overload() In sstable directory test there are two of those -- one that works on path, state, env and callback, and the other one that just needs env and callback, getting path from env and assuming state is normal. Two test cases in this test can enjoy the shorter one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20395	2024-09-09 14:25:24 +03:00
Pavel Emelyanov	2bfbbaffac	test: Use sstables::test_env to make sstables for schema loader test This test calls manager directly, but it's shorter to ask test_env for that Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20431	2024-09-09 14:22:58 +03:00
Takuya ASADA	e36c939505	dist: tune LimitNOFILES for large nodes On very large node, LimitNOFILES=80000 may not enough size, it can cause "Too many files" error. To avoid that, let's increase LimitNOFILES on scylla_setup stage, generate optimal value calurated from memory size and number of cpus. Closes scylladb/scylla-enterprise#4304 Closes scylladb/scylladb#20443	2024-09-09 14:13:49 +03:00
Piotr Smaron	60af48f5fd	cql: fix exception when validating KS in CREATE TABLE `c70f321c6f` added an extra check if KS exists. This check can throw `data_dictionary::no_such_keyspace` exception, which is supposed to be caught and a more user-friendly exception should be thrown instead. This commit fixes the above problem and adds a testcase to validate it doesn't appear ever again. Also, I moved the check for the keyspace outside of the `for` loop, as it doesn't need to be checked repeatedly. Fixes: scylladb/scylladb#20097 Closes scylladb/scylladb#20404	2024-09-09 13:30:57 +03:00
Nadav Har'El	ee7d4d8825	test/alternator: more extensive tests for GSI with two new key attributes The case of a GSI with two key attributes (hash and range) which were both not keys in the base table is a special case, not supported by CQL but allowed in Alternator. We have several tests for this case, but they don't cover all the strange possibilities that a GSI row disappears / reappears when one or two of the attributes is updated / inserted / deleted. So this patch includes a more extensive test for this case. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	ad53d6a230	test/alternator: test invalid key types for GSI This patch adds a test that types which are not allowed for GSI keys - basically any type except S(tring), B(ytes) or N(number), are rejected as expected - an error path that we didn't cover in existing tests. The new test passes - Alternator doesn't have a bug in this area, and as usual, also passes on DynamoDB. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	c4021d0819	test/alternator: test combination of LSI and GSI To allow adding a GSI to an existing table (refs #11567), we plan to re-implement GSIs to stop forcing their key attribute to become a real column in the schema - and let it remains a member of the map ":attrs" like all non-key attributes. But since LSIs can only be defined on table creation time, we don't have to change the LSI implementation, and these can still force their key to become a real column. What the test in this patch does is to verify that using the same attribute as a key of both GSI and LSI on the same table works. There's a high risk that it won't work: After all, the LSI should force the attribute to become a real column (to which base reads and writes go), but the GSI will use a computed column which reads from ":attrs", no? Well, it turns out that view.cc's value_getter::operator() always had a surprising exception which "rescues" this test and makes it pass: Before using a computed column, this code checks if a base-table column with the same name exists, and if it does, it is used instead of the computed column! It's not clear why this logic was chosen, but it turns out to be really useful for making the test in this test pass. And it's important that if we ever change that unintuitive behavior, we will have this test as a regression test. The new test unsurprisingly passes on current Scylla because its implementation of GSI and LSI is still the same. But it's an important regression test for when we change the GSI implementation. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	7563d0a8a1	test/alternator: expand another test to use different write operations Expand another Alternator test (test_gsi.py::test_gsi_missing_attribute) to write items not just using PutItem, but also using UpdateItem and BatchWriteItem. There is a risk that these different operations use slightly different code paths - so better check all of them and not just PutItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	4d02beec53	test/alternator: test GSIs with different key types All of the tests in test/alternator/test_gsi.py use strings as the GSI's keys. This tests a lot of GSI functionality, but we implicitly assumed that our implementation used an already-correct and already-tested implementation of key columns and MV, which if it works for one type, works for other types as well. This assumption will no longer hold if we reimplement GSI on a "computed column" implementation, which might run different code for different types of GSI key attributes (the supported types are "S"tring, "B"ytes, and "N"umber). So in this patch we add tests for writing and reading different types of GSI key attributes. These tests showed their importance as regression tests when the first draft of the GSI reimplementation series failed them. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	80a0798e77	alternator: better error message in some cases of key type mismatch Alternator uses a common function get_typed_value() to read the values of key attribute and confirm they have the expected type (key attributes have a fixed type in the schema). If the type is wrong, we want to print a "Type mismatch" error message. But the current implementation did the checks in the wrong order, and as a result could print a "Malformed value object" message instead of a "Type mismatch". That could happen if the wrong type is a boolean, map, list, or basically any type whose JSON representation is not a string. The allowed key types - bytes), string and number - all have string representations in JSON, but still we should first report the mismatched type and only report the "Malformed object" if the type matches but the JSON is faulty. In addition to fixing the error message, we fix an existing test which complained in a comment (but ignored) that the error message in some case (when trying to use a map where a key is expected) the strange "Malformed value object" instead of the expected "Type mismatch". The next patch will add an additional reproducer for this problem and its fix. That test will do: ``` with pytest.raises(ClientError, match='ValidationException.*mismatch'): test_table_gsi_6.put_item(Item={'p': p, 's': True}) ``` I.e., it tries to set a boolean value for a string key column, and expect to get the "Type mismatch" error and not the ugly "Malformed value object". Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	624ed32278	test/alternator: test for more elaborate GSI updates Most tests in test_gsi.py involve simple updates to a GSI, just creating a GSI row. Although a couple of tests did involve more complex operations (such as an update requiring deleting an old row from the GSI and inserting a new one,), we did not have a single organized test designed to check all these cases, so we add one in this patch. This test (test_update_gsi_pk) will be important for verifying the low-level implementation of the new GSI implementation that we plan to based on computed columns. Early versions of that code passed many of the simpler tests, but not this one. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:49 +03:00
Nadav Har'El	65d4ddf093	test/alternator: strengthen tests for empty attribute values We soon plan to refactor Alternator's GSI and change the validation of values set in attributes which are GSI keys. It's important to test that when updating attributes that are not GSI keys - and are either base- table keys or normal non-key attributes - the validation didn't change. For example, empty strings are still not allowed in base-table key attributes, but are allowed (since May 2020 in DynamoDB) in non-key attributes. We did have tests in this area, but this patch strengthens them - adding a test for non-key attribute, and expanding the key-attribute test to cover the UpdateItem and BatchWriteItem operations, not just PutItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 13:14:41 +03:00
Avi Kivity	9a5061209f	Merge '[test.py] Enable allure for python test' from Andrei Chekun To enhance the test reports UX: 1. switching off/on passed/failed/skipped test for better visibility 2. better searching in test results 3. understanding the trends of execution for each test 4. better configurability of the final report Enable allure adapter for all python tests. Add tags and parameters to the test to be able to distinguish them across modes and runs. Related: https://github.com/scylladb/qa-tasks/issues/1665 Related: https://github.com/scylladb/scylladb/pull/19335 Related: https://github.com/scylladb/scylladb/pull/18169 Closes scylladb/scylladb#19942 * github.com:scylladb/scylladb: [test.py] Clean duplicated arg for test suite [test.py] Enable allure for python test	2024-09-09 12:53:00 +03:00
Nadav Har'El	5859daed68	test/alternator: fix typo in test_batch.py Two tests had a typo 'item' instead of 'Item'. If Scylla had a bug, this could have caused these tests to miss the bug. Scylla passes also the fixed test, because Scylla's behavior is correct. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	1f8e39f680	test/alternator: more checks for GSI-key attribute validation When an attribute is a GSI key, DynamoDB imposes certain rules when writing values for it - it must be of the declared type for that key, and can't be an empty string. We had tests for this, but all of them did the write using the PutItem operation. In this patch we also test the same things using the UpdateItem and BatchWriteItem operations. Because Scylla has different code paths for these three operations, and each code path needs to remember to call the validation function, all three should all be checked and not just PutItem. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	cf5d7ce212	Alternator: drop unneeded "IS NOT NULL" clauses in MV of GSI/LSI Scylla's materialized views naturally skip any base rows where the view's key isn't set (is NULL), because we can't create a view row with a null key. To make the user aware that this is happening, the user is required to add "WHERE ... IS NOT NULL" for the view's key columns when defining the view. However, the only place that these extra IS NOT NULL clauses are checked are in the CQL "CREATE MATERIALIZED VIEWS" statement - they are completely ignored in all other places in the code. In particular, when we create a materialized view in Alternator (GSI or LSI), we don't have to add these "IS NOT NULL" clauses, as they are outright ignored. We didn't know they were ignored, and made an effort to add them - but no matter how incorrectly we did it, it didn't matter :-) In commit `2bf2ffd3ed` it turned out we had a typo that caused the wrong column name to be printed. Also, even today we are still missing base key columns that aren't listed as a view key in Alternator but still added as view clustering keys in Scylla - and again the fact these were missing also didn't matter. So I think it's time to stop pretending, and stop calculating these "IS NOT NULL" strings, so this patch outright removes them from the Alternator view-creation code. Beyond being a nice cleanup of unnecessary and inaccurate code, it will also be necessary when we allow in later patches to index for an Alternator attribute "x" not a real column x in the base table but rather an element in the ":attrs" map - so adding a "x IS NOT NULL" isn't only unnecessary, it is outright illegal: The expression evaluation code, even though it doesn't do anything with the "IS NOT NULL" expression, still verifies that "x" is a valid column, which it isn't. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	8beaa9d10e	test/alternator: add more checks for adding/deleting a GSI We already have tests for the feature of adding or removing a GSI from an existing table, which Alternator doesn't yet support (issue #11567). In this patch we add another check, how after a GSI is added, you can no longer add items with the wrong type for the indexed type, and after removing a GSI, you can. The expanded tests pass on DynamoDB, and obviously still xfail on Alternator because the feature is not yet implemented. Refs #11567. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Nadav Har'El	ce19311ab3	test/alternator: ensure table deletions in test_gsi.py Most of the Alternator tests are careful to unconditionally remove the test tables, even if the test fails. This is important when testing on a shared database (e.g., DynamoDB) but also useful to make clean shutdown faster as there should be no user table to flush. We missed a few such cases in test_gsi.py, and fixed some of them in commit `59c1498338` but still missed a few, and this patch fixes some more instances of this problem. We do this by using the context manager new_test_table() - which automatically deletes the table when done - instead of the function create_test_table() which needs an explicit delete at the end. There are no functional changes in this patch - most of the lines changed are just reindents. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-09-09 12:09:25 +03:00
Kefu Chai	ccbd3eb9f7	main: do not register redis and alternator services if not enabled in main.cc, we start redis with `ss.local().register_protocol_server()` only if it is enabled. but `storage_service` always calls `stop_server()` with _all_ registered server, no matter if they have started or not. in general, it does not hurt. for instance, `redis::controller::stop_server()` is a noop, if the controller is not started. but `storage_service` still print the logging message like: ``` INFO 2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server INFO 2024-09-04 11:20:02,224 [shard 0:main] storage_service - Shutting down redis server was successful ``` this could be confusing or at least distracting when a field engineer looks at the log. also, please note, `redis_port` and `redis_ssl_port` cannot be changed dynamically once scylla server is up, so we do not need to worry about "what if the redis server is started at runtime, how can is be stopped?". the same applies to alternator service. in this change, to avoid surprises, we conditionally register the protocol servers with the storage service based on their enabled statuses. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20472	2024-09-09 08:44:50 +03:00
Avi Kivity	58713f3080	types: remove some unused free functions These functions are unused, so safe to remove, and reduce the work to convert to managed_bytes{,_view}. Closes scylladb/scylladb#20482	2024-09-09 08:36:33 +03:00
Kefu Chai	720997d1de	cql3/statements: mark format string as `constexpr const` after switching over to the new `seastar::format()` which enables the compile-time format check, the fmt string should be a constexpr, otherwise `fmt::format()` is not able to perform the check at compile time. to prepare for bumping up the seastar module to a version which contains the change of `seastar::format()`, let's mark the format string with `constexpr const`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20484	2024-09-09 08:35:45 +03:00
Piotr Dulikowski	6f3d0af994	test: topology_custom/test_hints: consistency test for decommission Adds the test_hints_consistency_during_decommission test which reproduces the failure observed in scylladb/scylla-dtest#4582. It uses error injections, including the newly added topology_coordinator_pause_after_streaming injection, to reliably orchestrate the scenario observed there. In a nutshell, the test makes sure to replay hints after streaming during decommission has finished, but before the cluster switches to reading from new replicas. Without the fix, hints would be replayed to the decommissioned node and then would be lost forever after the cluster start reading from new replicas.	2024-09-08 10:51:38 +02:00
Piotr Dulikowski	30d53167c9	test: topology_custom/test_hints: move sync point helpers to top level Move create_sync_point and await_sync_point from the scope of the test_sync_point test to the file scope. They will be used in a test that will be introduced in the commit that follows.	2024-09-08 10:51:38 +02:00
Piotr Dulikowski	a75d0c0bfa	test: topology/util: extract find_server_by_host_id Move it out from test_mv_tablets_replace.py. It will be used by a test introduced in a later commit.	2024-09-08 10:51:38 +02:00
Piotr Dulikowski	61ac0a336d	hints: send hints with CL=ALL if target is leaving Currently, when attempting to send a hint, we might choose its recipients in one of two ways: - If the original destination is a natural endpoint of the hint, we only send the hint to that node and none other, - Otherwise, we send the hint to all current replicas of the mutation. There is a problem when we decommission a node: while data is streamed away from that node, it is still considered to be a natural endpoint of the data that it used to own. Because of that, it might happen that a hint is sent directly to it but streaming will miss it, effectively resulting in the hint being discarded. As sending the hint _only_ to the leaving replica is a rather bad idea, send the hint to all replicas also in the case when the original destiantion of the hint is leaving. Note that this is a conservative fix written only with the decommission + vnode-based keyspaces combo in mind. In general, such "data loss" can occur in other situations where the replica set is changing and we go through a streaming phase, i.e. other topology operations in case of vnodes and tablet load balancing. However, the consistency guarantees of hinted handoff in the face of topology changes are not defined and it is not clear what they should be, if there should be any at all. The picture is further complicated by the fact that hints are used by materialized views, and sending view updates to more replicas than necessary can introduce inconsistencies in the form of "ghost rows". This fix was developed in response to a failing test which checked the hint replay + decommission scenario, and it makes it work again. Fixes scylladb/scylla-dtest#4582 Refs scylladb/scylladb#19835	2024-09-08 10:50:59 +02:00
Piotr Dulikowski	8abb06ab82	hints: inline do_send_one_mutation It's a small method and it is only used once in send_one_mutation. Inlining it lets us get rid of its declaration in the header - now, if one needs to change the variables passed from one function to another, it is no longer necessary to change the header.	2024-09-08 07:19:35 +02:00
Avi Kivity	ab32ce6b45	Merge 'Coroutinize sstable::read_summary() method' from Pavel Emelyanov Shorter and simpler this way. Hopefully it doesn't sit on critical paths Closes scylladb/scylladb#20460 * github.com:scylladb/scylladb: sstables: Fix indentation after previous patch sstables: Coroutinize sstable::read_summary()	2024-09-06 18:45:54 +03:00
Pavel Emelyanov	103c68b419	sstables: Restore indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-06 18:24:56 +03:00
Pavel Emelyanov	c47c0f1cd6	sstables: Coroutinize remove_unshared_sstables() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-06 18:24:40 +03:00
Kefu Chai	aeaeaf345d	compaction: use structured binding when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20473	2024-09-06 18:17:48 +03:00
Kamil Braun	427ad2040f	Merge 'test: randomized failure injection for Raft-based topology' from Evgeniy Naydanov The idea of the test is to have a cluster where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the raft state machine. To achieve this following two lists introduced in the PR: - ERROR_INJECTIONS in error_injections.py - CLUSTER_EVENTS in cluster_events.py Each cluster event is an async generator which has 2 yields and should be used in the following way: 0. Start the generator: ```python >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection) ``` 1. Run the prepare part (before the first yield) ```python >>> await anext(cluster_event_steps) ``` 2. Run the cluster event itself (between the yields) ```python >>> await anext(cluster_event_steps) ``` 3. Run the check part (after the second yield) ```python >>> await anext(cluster_event, None) ``` Closes scylladb/scylladb#16223 * github.com:scylladb/scylladb: test: randomized failure injection for Raft-based topology test: error injections for Raft-based topology [test.py] topology.util: add get_non_coordinator_host() function [test.py] random_tables: add UDT methods [test.py] random_tables: add CDC methods [test.py] api: get scylla process status [test.py] api: add expected_server_up_state argument to server_add()	2024-09-06 14:00:41 +02:00
Pavel Emelyanov	226fd03bae	Merge 'service/qos: remove unused marked_for_deletion field from service_level struct' from Piotr Dulikowski The `service_level::marked_for_deletion` field is always set to `false`. It might have served some purpose in the past, but now it can be just removed, simplifying the code and eliminating confusion about the field. This is just code cleanup, no backport is needed. Closes scylladb/scylladb#20452 * github.com:scylladb/scylladb: service/qos: remove the marked_for_deletion parameter service/qos: add constructors to service_level	2024-09-06 11:44:25 +03:00
Kamil Braun	52fdf5b4c9	test: test_raft_no_quorum: increase raft timeout in debug mode The test cases in this file use an error injection to reduce raft group 0 timeouts (from the default 1 minute), in order to speed up the tests; the scenarios expect these timeouts to happen, so we want them to happen as quick as possible, but we don't want to reduce timeouts so much that it will make other operations fail when we don't expect them to (e.g. when the test wants to add a node to the cluster). Unfortunately the selected 5 seconds in debug mode was not enough and made the tests flaky: scylladb/scylladb#20111. Increase it to 10 seconds. This unfortunately will slow down these tests as they have to sometimes wait for 10 seconds for the timeout to happen. But better to have this than a flaky test. Fixes: scylladb/scylladb#20111 Closes scylladb/scylladb#20320	2024-09-06 11:40:09 +03:00
Avi Kivity	384a09585b	repair: row_level: repair_get_row_diff_with_rpc_stream_process_op: simplify return value During review of `0857b63259` it was noticed that the function repair_get_row_diff_with_rpc_stream_process_op() and its _slow_path callee only ever return stop_iteration::no (or throw an exception). As such, its return value is useless, and in fact the only caller ignores it. Simplify by returning a plain future<>. Closes scylladb/scylladb#20441	2024-09-06 11:39:21 +03:00
Kefu Chai	034c1df29b	auth/authentication_options: move fmt::formatter up so that it is accessible from its caller. if we enforce the compile-time format string check, the formatter would need the access to the specialization of `fmt::formatter` of the arguments being foramtted. to be prepared for this change, let's move the `fmt::formatter` specialization up, otherwise we'd have following error after switching to the compile-time format string check introduced by a recent seastar change: ``` In file included from ./auth/authenticator.hh:22: ./auth/authentication_options.hh:50:49: error: call to consteval function 'fmt::basic_format_string<char, auth::authentication_option &>::basic_format_string< char[32], 0>' is not a constant expression 50 \| : std::invalid_argument(fmt::format("The {} option is not supported.", k)) { \| ^ ./auth/authentication_options.hh:57:13: error: explicit specialization of 'fmt::formatter<auth::authentication_option>' after instantiation 57 \| struct fmt::formatter<auth::authentication_option> : fmt::formatter<string_view> { \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /usr/include/fmt/base.h:1228:17: note: implicit instantiation first required here 1228 \| -> decltype(typename Context::template formatter_type<T>().format( \| ^ In file included from replica/distributed_loader.cc:30: ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20447	2024-09-06 09:12:38 +03:00
Pavel Emelyanov	527fc9594a	sstables: Fix indentation after previous patch And move the comment inside if while at it, it looks better in there (and makes less churn in the patch itself) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-06 08:43:08 +03:00
Pavel Emelyanov	f7325586f3	sstables: Coroutinize sstable::read_summary() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-06 08:43:07 +03:00
Evgeniy Naydanov	dd99cf197d	test: randomized failure injection for Raft-based topology The idea of the test is to have a small cluster, where one node is stressed with injections and failures and the rest of the cluster is used to make progress of the Raft state machine. To achieve this following two lists introduced in the commit: - ERROR_INJECTIONS in error_injections.py - CLUSTER_EVENTS in cluster_events.py Each cluster event is an async generator which has 2 yields and should be used in the following way: 0. Start the generator: >>> cluster_event_steps = cluster_event(manager, random_tables, error_injection) 1. Run the prepare part (before the first yield) >>> await anext(cluster_event_steps) 2. Run the cluster event itself (between the yields) >>> await anext(cluster_event_steps) 3. Run the check part (after the second yield) >>> await anext(cluster_event, None)	2024-09-05 22:11:32 +00:00
Evgeniy Naydanov	769424723b	test: error injections for Raft-based topology Add following error injections: - stop_after_init_of_system_ks - stop_after_init_of_schema_commitlog - stop_after_starting_gossiper - stop_after_starting_raft_address_map - stop_after_starting_migration_manager - stop_after_starting_commitlog - stop_after_starting_repair - stop_after_starting_cdc_generation_service - stop_after_starting_group0_service - stop_after_starting_auth_service - stop_during_gossip_shadow_round - stop_after_saving_tokens - stop_after_starting_gossiping - stop_after_sending_join_node_request - stop_after_setting_mode_to_normal_raft_topology - stop_before_becoming_raft_voter - topology_coordinator_pause_after_updating_cdc_generation - stop_before_streaming - stop_after_streaming - stop_after_bootstrapping_initial_raft_configuration	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	ac4ffbad5c	[test.py] topology.util: add get_non_coordinator_host() function Add get_non_coordinator_host() function which returns ServerInfo for the first host which is not a coordinator or None if there is no such host. Also rework get_coordinator_host() to not fail if some of the hosts don't have a host id.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	d95d698601	[test.py] random_tables: add UDT methods Add .add_udt() / .drop_udt() methods.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	8cb442ca50	[test.py] random_tables: add CDC methods Add .enabled_cdc() / .disable_cdc() methods.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	a7119cf420	[test.py] api: get scylla process status Add `server_get_process_status(server_id)` API call and wait_for_scylla_process_status() helper function.	2024-09-05 22:11:31 +00:00
Evgeniy Naydanov	241bbb4172	[test.py] api: add expected_server_up_state argument to server_add() Allow to return from server_add() when a server reaches specified state. One of: - PROCESS_STARTED - HOST_ID_QUERIED (previously called NOT_CONNECTED) - CQL_CONNECTED (renamed from CONNECTED) - CQL_QUERIED (was just QUERIED) Also, rename CqlUpState to ServerUpState and move to internal_types.	2024-09-05 22:11:31 +00:00
Pavel Emelyanov	f02a686115	schema: Ditch make_shared_schema() helper Now it's unused Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 19:34:00 +03:00
Pavel Emelyanov	d045aa6df7	test: Tune up indentation in uncompressed_schema() After it was switched to use schema builder, the indenation of untouched lines deserves one extra space. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 19:33:29 +03:00
Pavel Emelyanov	a1deba0779	test: Make tests use schema_builder instead of make_shared_schema Everything, but perf test is straightforward switch. The perf-test generated regular columns dynamically via vector, with builder the vector goes away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 19:31:30 +03:00
Avi Kivity	c57b8dd0bf	repair: row_level: restore indentation	2024-09-05 18:38:43 +03:00
Avi Kivity	710977ef88	repair: row_level: coroutinize repair_service::insert_repair_meta() Some of the indentation was broken, and is partially repaired by this change.	2024-09-05 17:59:42 +03:00
Avi Kivity	f23a32ed84	repair: row_level: coroutinize repair_meta::get_full_row_hashes()	2024-09-05 17:56:27 +03:00
Avi Kivity	607747beb1	repair: row_level: coroutinize repair_meta::apply_rows_on_follower()	2024-09-05 17:55:07 +03:00
Avi Kivity	89d4394d12	repair: row_level: coroutinize repair_meta::clear_working_row_buf()	2024-09-05 17:52:32 +03:00
Pavel Emelyanov	69a5ec69c4	test: Use table storage options in sstable_directory_test When creating sstables this test allocates temporary local options. That works, because this test doesn't run on object storage, but it's more correct to pick storage options from the table at hand. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20440	2024-09-05 17:48:25 +03:00
Avi Kivity	4cfc25f8d7	repair: row_level: coroutinize get_common_diff_detect_algorithm() The function is threaded, but the inner lambda can be coroutinized.	2024-09-05 17:47:27 +03:00
Michael Litvak	9545e0a114	view: test view_build_status table with node replace Add a test replacing a node and verifying the contents of the view_build_status table are updated as expected, having rows for the new node and no rows for the old node.	2024-09-05 15:42:35 +03:00
Michael Litvak	3ca5dd537f	test/pylib: use view_build_status_v2 table in wait_for_view Change the util function wait_for_view to read the view build status from the system.view_build_status_v2 table which replaces system_distributed.view_build_status. The old table can still be used but it is less efficient because it's implemented as a virtual table which reads from the v2 table, so it's better to read directly from the v2 table. This can cause slowness in tests. The additional util function wait_for_view_v1 reads from the old table. This may be needed in upgrade tests if the v2 table is not available yet.	2024-09-05 15:42:35 +03:00
Michael Litvak	5c95aaae0d	view_builder: common write view_build_status function When writing to the view_build_status we have common logic related to upgrade and deciding whether to write to sys_dist ks or group0. Move this common logic to a generic function used by all functions writing to the table.	2024-09-05 15:42:35 +03:00
Michael Litvak	c1f3517a75	view_builder: improve migration to v2 with intermediate phase Add an intermediate phase to the view builder migration to v2 where we write to both the old and new table in order to not lose writes during the migration. We add an additional view builder version v1_5 between v1 and v2 where we write to both tables. We perform a barrier before moving to v2 to ensure all the operations to the old table are completed.	2024-09-05 15:42:35 +03:00
Michael Litvak	446ad3c184	view: delete node rows from view_build_status on node removal When a node is removed we want to clean its rows from the view_build_status table. Now when removing a node and generating the topology state update, we generate also the mutations to delete all the possible rows belonging to the node from the table.	2024-09-05 15:42:35 +03:00
Michael Litvak	08462aaff7	view: sanitize view_build_status during migration When migrating the view_build_status to v2, skip adding any leftover rows that don't correspond to an existing node or an existing view. Previously such rows could have been created and not cleaned, for example when a node is removed.	2024-09-05 15:42:35 +03:00
Michael Litvak	78d6ff6598	view: make old view_build_status table a virtual table After migrating the view build status from system_distributed.view_build_status to system.view_build_status_v2, we set system_distributed.view_build_status to be a virtual table, such that reading from it is actually reading from the underlying new table. The reason for this is that we want to keep compatibility with the old table, since it exists also in Cassandra and it is used by various external tools to check the view build status. Making the table virtual makes the transition transparent for external users. The two tables are in different keyspaces and have different shard mapping. The v1 table is a distributed table with a normal shard mapping, and the v2 table is a local table using the null sharder. The virtual reader works by constructing a multishard reader which reads the rows from shard zero, and then filtering it to get only the rows owned by the current shard.	2024-09-05 15:42:35 +03:00
Michael Litvak	09eadcff08	replica: move streaming_reader_lifecycle_policy to header file move the class streaming_reader_lifecycle_policy to a header file in order to make it reusable in other places.	2024-09-05 15:42:35 +03:00
Michael Litvak	22f4f1fa49	view_builder: test view_build_status_v2 Add tests to verify the new view_build_status_v2 is used by the view_builder and can be read from all nodes with the expected values. Also test a migration from the v1 layout to v2.	2024-09-05 15:42:35 +03:00
Michael Litvak	fcf66ad541	storage_service: add view_build_status to raft snapshot Include the table system.view_build_status_v2 in the raft snapshot, and also the view_builder version parameter.	2024-09-05 15:42:30 +03:00
Michael Litvak	8d25a4d678	view_builder: migration to v2 Migrate view_builder to v2, to store the view build status of all nodes in the group0 based table view_build_status_v2. Introduce a feature view_build_status_on_group0 so we know when all nodes are ready to migrate and use the new table. A new cluster is initialized to use v2. Otherwise, The topology coordinator initiates the migration when the feature is enabled, if it was not done already. The migration reads all the rows in the v1 table and writes it via group0 to the v2 table, together with a mutation that updates the view_builder parameter in scylla_local to v2. When this mutation is applied, it updates the view_builder service to start using the v2 table.	2024-09-05 15:41:04 +03:00
Michael Litvak	f3887cd80b	db:system_keyspace: add view_builder_version to scylla_local Add a new scylla_local parameter view_builder_version, and functions to read and mutate the value. The version value defaults to v1 if it doesn't exist in the table.	2024-09-05 15:41:04 +03:00
Michael Litvak	d58a8930c4	view_builder: read view status from v2 table Update the view_status function to read from the new view_build_status_v2 table when enabled. The code to read and extract the values is identical to v1 and v2 except it accesses different keyspace and table, so the common code is extracted to the view_status_common function and used by both v1 and v2 flows with appropriate parameters.	2024-09-05 15:41:04 +03:00
Michael Litvak	05d18b818f	view_builder: introduce writing status mutations via raft Introduce the announce_with_raft function as alternative to writing view build status mutations to the table in system_distributed. Instead, we can apply the mutations via group0 operation to the view_build_status_v2 table. All the view_builder functions that write to the view_build_status table can be configured by a flag to either write the legacy way or via raft.	2024-09-05 15:41:04 +03:00
Michael Litvak	b8c7a10ae6	view_builder: pass group0_client and qp to view_builder Store references of group0_client and query_processor in the view_builder service. They are required for generating mutations and writing them via group0.	2024-09-05 15:41:04 +03:00
Michael Litvak	b2332c5a72	view_builder: extract sys_dist status operations to functions Extract all the update and read operations of a view build status in the table system_distributed.view_build_status to separate functions.	2024-09-05 15:41:04 +03:00
Michael Litvak	bf4a58bf91	db:system_keyspace: add view_build_status_v2 table add the table system.view_build_status_v2 with the same schema as system_distributed.view_build_status.	2024-09-05 15:41:04 +03:00
Gleb Natapov	807e37502a	db/consistency_level: do not use result from heat weighted load balancer if it contains duplicates Because of https://github.com/scylladb/scylladb/issues/9285 heat weighted load balancer may sometimes return same node twice. It may cause wrong data to be read or unexpected errors to be returned to a client. Since the original bug is not easy to fix and it is rare lets introduce a workaround. We will check for duplicates and will use non HWLB one if one is found. Fixes scylladb/scylladb#20430 Closes scylladb/scylladb#20414	2024-09-05 15:21:35 +03:00
Wojciech Mitros	c1b0434c16	test: finish mv view update explicitly instead of relying on delay duration When testing mv admission control, we perform a large view update and check if the following view update can be admitted due to the high view backlog usage. We rely on a delay which keeps the backlog high for longer to make sure the backlog is still increased during the second write. However, in some test runs the delay is not long enough, causing the second write to miss the large backlog and not hit admission control. In this patch we keep the increased backlog high using another injection instead of relying on a delay to make absolute sure that the backlog is still high during the second write. Fixes scylladb/scylladb#20382 Closes scylladb/scylladb#20445	2024-09-05 15:08:04 +03:00
Lakshmi Narayanan Sreethar	7c5efab7d5	cql-pytest: add test to verify consider_only_existing_data compaction option Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:34:13 +05:30
Lakshmi Narayanan Sreethar	68a902f74a	tools/scylla-nodetool: add consider-only-existing-data option to compact command Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:34:06 +05:30
Lakshmi Narayanan Sreethar	84d06a13c7	api: compaction: add `consider_only_existing_data` option Added a new parameter `consider_only_existing_data` to major compaction API endpoints. When enabled, major compaction will: - Force-flush all tables. - Force a new active segment in the commit log. - Compact all existing SSTables and garbage-collect tombstones by only checking the SSTables being compacted. Memtables, commit logs, and other SSTables not part of the compaction will not be checked, as they will only contain newer data that arrived after the compaction started. The `consider_only_existing_data` is passed down to the compaction descriptor's `gc_check_only_compacting_sstables` option to ensure that only the existing data is considered for garbage collection. The option is also passed to the `maybe_flush_commitlog` method to make sure all the tables are flushed and a new active segment is created in the commit log. Fixes #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	98bc44f900	compaction: consider gc_check_only_compacting_sstables when deducing max purgeable timestamp When gc_check_only_compacting_sstables is enabled, get_max_purgeable_timestamp should not check memtables and other sstables that are not part of the compaction to deduce the max purgeable timestamp. Refs #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	7b9ce8e040	compaction: do not check commitlog if gc_check_only_compacting_sstables is enabled When the compaction_descriptor's gc_check_only_compacting_sstables flag is enabled, create and pass a copy of the get_tombstone_gc_state that will skip checking the commitlog. Refs #19728 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	12fa40154b	tombstone_gc_state: introduce with_commitlog_check_disabled() Added a new method, `with_commitlog_check_disabled`, that returns a new copy of the tombstone_gc_state but with commitlog check disabled. This will be used by a following patch to disable commitlog checks during compaction. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	5b8c6a8a5e	compaction: introduce new option to check only compacting sstables for gc Added new option, `gc_check_only_compacting_sstables`, to compaction_descriptor to control the garbage collection behavior. The subsequent patches will use this flag to decide if the garbage collection has to check only the SSTables being compacted to collect tombstones. This option is disabled for now and will be enabled based on a new compaction parameter that will be added later in this patch series. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	5e6bffc146	compaction: rename maybe_flush_all_tables to maybe_flush_commitlog Major compaction flushes all tables as a part of flushing the commitlog. After forcing new active segments in the commitlog, all the tables are flushed to enable reclaim of older commitlog segments. The main goal is to flush the commitlog and flushing all the table is just a dependency. Rename maybe_flush_all_tables to maybe_flush_commitlog so that it reflects the actual intent of the major compaction code. Added a new wrapper method to database::flush_all_tables(), database::flush_commitlog(), that is now called from maybe_flush_commitlog. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Lakshmi Narayanan Sreethar	fa2488cc83	compaction: maybe_flush_all_tables: add new force_flush param Add a new parameter, `force_flush` to the maybe_flush_all_tables() method. Setting `force_flush` to true will flush all the tables regardless of when they were flushed last. This will be used by the new compaction option in a following patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-09-05 17:25:45 +05:30
Laszlo Ersek	53524974db	docs/dev/maintainer.md: clarify "Updating submodule references" Before the introduction of "scripts/refresh-submodules.sh", there was indeed some manual work for the maintainer to do, hence "publish your work" must have sounded correct. Today, the phrase "publish your work" sounds confusing. Commit `71da4e6e79` ("docs: Document sync-submodules.sh script in maintainer.md", 2020-06-18) should have arguably reworded the last step of the submodule refresh procedure; let's do it now. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20333	2024-09-05 13:57:32 +03:00
Pavel Emelyanov	1f0db29ef6	test: Remove unused directory semaphore The with_sstable_dir() helper no longer needs one, it used to pass it as argument to sstable_directory constructor, but now the directory doesn't need it (takes semaphore via table object). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20396	2024-09-05 13:11:35 +03:00
Kefu Chai	b4fc24cc1f	github: use needs.read-toolchain.outputs.image for build-scylla so we don't need to hardwire the image on which we build scylla. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20370	2024-09-05 12:58:36 +03:00
Pavel Emelyanov	955391d209	sstable_directory: Fix indentation after previous patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	2febde24f3	sstable_directory: Use yielding lister in .handle_sstables_pending_delete() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	02aac3e407	sstable_directory: Use yielding lister in .cleanup_column_family_temp_sst_dirs() Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	ff77a677a6	sstable_directory: Use yielding lister in .prepare() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	7b5fe6bee6	sstable_directory: Shorten lister loop Squash call to lister.get() and check for the returned value into while()'s condition. This saves few more lines of code as well. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	5dc266cefa	sstable_directory: Use with_closeable() in .process() The method already uses yielding lister, but handles the exceptions explicitly. Use with_closeable() helper, it makes the code shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:19:19 +03:00
Pavel Emelyanov	7742b90cb1	directory_lister: Add noexcept default move-constructor It's required to make it possible to push lister into with_closeable(). Its requiremenent of nothrow-move-constructible doesn't accept default-generated one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 11:10:21 +03:00
Nikos Dragazis	2450afb934	sstables: Replace assert with on_internal_error The `skip()` method of the compressed data source implementation uses an assert statement to check if the given offset is valid. Replace this with `on_internal_error()` to fail gracefully. An invalid offset shouldn't bring the whole server down. Also, enhance the error message for unsynced compressed readers. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2024-09-05 11:03:54 +03:00
Pavel Emelyanov	da598a6210	test: Restore indentation after previous changes Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:38:01 +03:00
Pavel Emelyanov	e16c07c896	test: Threadify tombstone_in_tombstone2() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	28d016f312	test: Threadify range_tombstone_reading() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	7d567d07ad	test: Threadify tombstone_in_tombstone() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	a34e38f070	test: Threadify broken_ranges_collection() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	eac4ec47f8	test: Threadify compact_storage_dense_read() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	322c1ee9c5	test: Threadify compact_storage_simple_dense_read() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	df71b3e446	test: Threadify compact_storage_sparse_read() Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	142ccc64fb	test: Simplify test_range_reads() counting It used to keep counter with the help of a smart pointer, now it can just use on-stack variable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	a78ab2e998	test: Simplify test_range_reads() inner loop It used to rely on bool (wrapped with pointer) and future<>-based loop helper, now it can just break from the while loop. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	c84ae64562	test: Threadify test_range_reads() itself And update its callers again. Preserve no longer relevant local smart pointers until next patch. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:33 +03:00
Pavel Emelyanov	253d53b6a1	test: Threadify test_range_reads() callers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:36:00 +03:00
Pavel Emelyanov	fd8bb0c46c	test: Threadify generate_clustered() itself And update its callers again. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:35:59 +03:00
Pavel Emelyanov	f500ee690b	test: Threadify generate_clustered() callers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:34:54 +03:00
Pavel Emelyanov	08186c048d	test: Threadify test_no_clustered test And update its callers. Indentation is deliberately left broken. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:26:25 +03:00
Pavel Emelyanov	5f0a40f959	test: Threadify nonexistent_key test Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 10:26:13 +03:00
Pavel Emelyanov	a150a63259	test: Squash two open_sstables() helper together One accepts integer generations, another one accepts "generic" ones. The latter is only called by the former, so no sense in keeping it around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 09:08:40 +03:00
Pavel Emelyanov	4184c688ea	test: Coroutinize open_sstables() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-05 09:08:12 +03:00
Piotr Dulikowski	ecd53db3b0	service/qos: remove the marked_for_deletion parameter It is always set to false and it doesn't seem to serve any function now.	2024-09-04 21:52:34 +02:00
Piotr Dulikowski	bae6076541	service/qos: add constructors to service_level Add a default constructor and a constructor which explicitly initializes all fields of the service_level structure. This is done in order to make sure that removal of the marked_for_deletion field can be done safely - otherwise, for example, service_level could be aggregate-initialized with an incomplete list of values for the fields, and removing marked_for_deletion which is in the middle of the struct would cause the is_static field to be initialized with the value that was designated for marked_for_deletion. As a bonus, make sure that marked_for_deletion and is_static bool fields are initialized in the default constructor to false in order to avoid potential undefined behavior.	2024-09-04 21:52:13 +02:00
Avi Kivity	ec8590ae6c	Merge 'Always pass `abort_source&` to `raft_group0_client::hold_read_apply_mutex`' from Kamil Braun There are two versions of `raft_group0_client::hold_read_apply_mutex`, one takes `abort_source&`, the other doesn't. Modify all call sites that used the non-abort-source version to pass an `abort_source&`, allowing us to remove the other overload. If there is no explicit reason not to pass an `abort_source&`, then one should be passed by default -- it often prevents hangs during shutdown. --- No backport needed -- no known issues affected by this change. Closes scylladb/scylladb#19996 * github.com:scylladb/scylladb: raft_group0_client: remove `hold_read_apply_mutex` overload without `abort_source&` storage_service: pass `_abort_source` to `hold_read_apply_mutex` group0_state_machine: pass `_abort_source` to `hold_read_apply_mutex` api: move `reload_raft_topology_state` implementation inside `storage_service`	2024-09-04 21:35:27 +03:00
Kefu Chai	fe0e961856	docs: do not install scylla/ppa repo when perform upgrade for following reasons: 1. the ppa in question does not provide the build for the latest ubuntu's LTS release. it only builds for trusty, xenial, bionic and jammy. according to https://wiki.ubuntu.com/Releases, the latest LTS release is ubuntu noble at the time of writing. 2. the ppa in question does not provide the packages used in production. it does provides the package for building scylla 3. after we introduced the relocatable package, there is no need to provide extra user space dependencies apart from scylla packages. so, in this change, we remove all references to enabling the Scylla/PPA repository. Fixes scylladb/scylladb#20449 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20450	2024-09-04 20:30:40 +03:00
Avi Kivity	20b79816f1	repair: row_level: coroutinize repair_service::remove_repair_meta() (non-selective overload)	2024-09-04 18:43:19 +03:00
Avi Kivity	3b9ac51b6b	repair: row_level: coroutinize repair_service::remove_repair_meta() (by-address overload)	2024-09-04 18:39:21 +03:00
Avi Kivity	704e3f5432	repair: row_level: coroutinize repair_service::remove_repair_meta() (by-id overload)	2024-09-04 18:37:48 +03:00
Avi Kivity	9612c4d790	repair: row_level: row_level_repair::run() The function itself is threaded, but the inner lambdas are coroutinized (except one which is expected to run in a thread, and so is threaded).	2024-09-04 18:34:45 +03:00
Avi Kivity	2b94ee981b	repair: row_level: row_level_repair::send_missing_rows_to_follower_nodes() The function itself is threaded, but the inner lambda is coroutinized.	2024-09-04 18:28:27 +03:00
Avi Kivity	c768448339	repair: row_level: row_level_repair::get_missing_rows_from_follower_nodes() The function itself is threaded, but the inner lambda is coroutinized.	2024-09-04 18:28:12 +03:00
Avi Kivity	d2f1b44487	repair: row_level: row_level_repair::negotiate_sync_boundary() The function itself is threaded, but the inner lambda is coroutinized.	2024-09-04 18:21:39 +03:00
Kefu Chai	0756520f82	sstable: coroutinize sstable::seal_sstable() for better readability. presumably, `sstable::seal_sstable()` is not on the critical path, and we don't need to worry about the overhead of using C++20 coroutine. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20410	2024-09-04 18:14:33 +03:00
Kefu Chai	88c5c3001a	compaction: refactor compaction_manager::can_proceed() instead of chaining the conditions with '&&', break them down. for two reasons: * for better readability: to group the conditions with the same purpose together * so we don't look up the table twice. it's an anti-pattern of using STL, and it could be confusing at first glance. this change is a cleanup, so it does not change the behavior. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20369	2024-09-04 18:12:29 +03:00
Avi Kivity	645e39e746	repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_process_op() Both the outer function and the inner lambda are coroutinized.	2024-09-04 18:10:43 +03:00
Avi Kivity	4c05d0b965	repair: row_level: coroutinize repair_meta::get_sync_boundary_handler()	2024-09-04 15:33:40 +03:00
Avi Kivity	eea011fad5	repair: row_level: coroutinize repair_meta::get_sync_boundary() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:31:57 +03:00
Avi Kivity	91b88df956	repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions_handler()	2024-09-04 15:20:53 +03:00
Avi Kivity	b73194c9bf	repair: row_level: coroutinize repair_meta::repair_set_estimated_partitions() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:18:33 +03:00
Avi Kivity	a69fb626bd	repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions_handler()	2024-09-04 15:17:42 +03:00
Avi Kivity	5cd8207ac7	repair: row_level: coroutinize repair_meta::repair_get_estimated_partitions() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:16:32 +03:00
Avi Kivity	e108f867a9	repair: row_level: coroutinize repair_meta::repair_row_level_stop_handler()	2024-09-04 15:15:42 +03:00
Avi Kivity	ffbb973063	repair: row_level: coroutinize repair_meta::repair_row_level_stop() Not really helping anything, but a coroutine is a safer platform for future changes in administrative APIs.	2024-09-04 15:14:08 +03:00
Avi Kivity	587b6fe400	repair: row_level: coroutinize repair_meta::repair_row_level_start_handler()	2024-09-04 15:12:49 +03:00
Avi Kivity	db7b1014ff	repair: row_level: coroutinize repair_meta::repair_row_level_start()	2024-09-04 15:10:45 +03:00
Avi Kivity	17b82265ae	repair: row_level: coroutinize repair_meta::get_combined_row_hash_handler()	2024-09-04 15:08:58 +03:00
Avi Kivity	bacbdde791	repair: row_level: coroutinize repair_meta::get_combined_row_hash()	2024-09-04 15:07:27 +03:00
Avi Kivity	8b8dc5092f	repair: row_level: coroutinize repair_meta::get_full_row_hashes_handler()	2024-09-04 15:05:28 +03:00
Avi Kivity	21e01990ff	repair: row_level: coroutinize repair_meta::get_full_row_hashes_with_rpc_stream() The when_all_succeed() call is changed to the safer coroutine::when_all(), which avoids the temporary futures.	2024-09-04 15:03:00 +03:00
Avi Kivity	572fbfde09	repair: row_level: coroutinize repair_meta::request_row_hashes()	2024-09-04 14:07:59 +03:00
Nadav Har'El	15f8046fcb	alternator ttl: fix use-after-free The Alternator TTL scanning code uses an object "scan_ranges_context" to hold the scanning context. One of the members of this object is a service::query_state, and that in turn holds a reference to a service::client_state. The existing constructor created a temporary client_state object and saved a reference to it - which can result in use after free as the temporary object is freed as soon as the constructor ends. The fix is to save a client_state in the scan_ranges_context object, instead of a temporary object. Fixes #19988 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20418	2024-09-03 22:15:18 +03:00
Pavel Emelyanov	c03b1e2827	test: Remove unused database argument from make_sstable_for_all_shards() helper Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20427	2024-09-03 21:36:28 +03:00
Calle Wilund	2695fefa81	commitlog/database: Make some commitlog options updatable + add feature listener Makes some commitlog options runtime updatable. Most important for this case, the usage of fragmented entries. Also adds a subscription in database on said feature, to possibly enable once cluster enables it.	2024-09-03 16:38:28 +00:00
Calle Wilund	238a0236e5	features/config: Add feature for fragmented commitlog entries Hides the functionality behind a cluster feature, i.e. postspones using it until an upgrade is complete etc. This to allow rolling back even with dirty nodes, at least until a cluster is commited. Feature can also be disabled by scylla option, just in case. This will lock it out of whole cluster, but this is probably good, because depending on off or on, certain schema/raft ops might fail or succeed (due to large mutations), and this should probably be equivalent across nodes.	2024-09-03 16:38:28 +00:00
Calle Wilund	9bf452c7a0	docs: Add entry on commitlog file format v4	2024-09-03 16:38:28 +00:00
Calle Wilund	ad595e4d6a	commitlog_test: Add more oversized cases Also adds some randomization to the tests.	2024-09-03 16:38:28 +00:00
Calle Wilund	1d5e509136	commitlog_replayer: Replay segments in order created Minimizes potential buffer usage for fragmented entries.	2024-09-03 16:38:28 +00:00
Calle Wilund	61ff9486fb	commitlog_replayer: Use replay state to support fragmented entries	2024-09-03 16:38:27 +00:00
Calle Wilund	7c16683184	commitlog_replayer: coroutinize partly	2024-09-03 16:38:27 +00:00
Calle Wilund	05bf2ae5d7	commitlog: Handle oversized entries Refs #18161 Yet another approach to dealing with large commitlog submissions. We handle oversize single mutation by adding yet another entry type: fragmented. In this case we only add a fragment (aha) of the data that needs storing into each entry, along with metadata to correlate and reconstruct the full entry on replay. Because these fragmented entries are spread over N segments, we also need to add references from the first segment in a chain to the subsequent ones. These are released once we clear the relevant cf_id count in the base. * This approach has the downside that due to how serialization etc works w.r.t. mutations, we need to create an intermediate buffer to hold the full serialized target entry. This is then incrementally written into entries of < max_mutation_size, successively requesting more segments. On replay, when encountering a fragment chain, the fragment is added to a "state", i.e. a mapping of currently processing frag chains. Once we've found all fragments and concatenated the buffers into a single fragmented one, we can issue a replay callback as usual. Note that a replay caller will need to create and provide such a state object. Old signature replay function remains for tests and such. This approach bumps the file format (docs to come). To ensure "atomicity" we both force syncronization, and should the whole op fail, we restore segment state (rewinding), thus discarding data all we wrote. v2: * Improve some bookeep, ensure we keep track of segments and flush properly, to get counter correct	2024-09-03 16:38:27 +00:00
Anna Stuchlik	35796306a7	doc: comment out redirections for pages under Features This commit temporarily disables redirections for all pages under Features that were moved with this PR: https://github.com/scylladb/scylladb/pull/20401 Redirections work for all versions. This means that pages in 6.1 are redirected to URLs that are not available yet (because 6.2 has not been released yet). The redirections are correct and should be enabled when 6.2 is released: I've created an issue to do it: https://github.com/scylladb/scylladb/issues/20428 Closes scylladb/scylladb#20429	2024-09-03 17:16:51 +02:00
Avi Kivity	6ddcf80d89	Merge 'Reuse sstable::test_env::reusable_sst() helper for pre-exsting sstables' from Pavel Emelyanov Tests that try to access sstables from test/resource/ typically sstable::load() it after object creation. There's reusable_sst() helper for that. This PR fixes one more caller that still goes longer route by doing sstable and loading it on its own. Closes scylladb/scylladb#20420 * github.com:scylladb/scylladb: test: Call reusable sst from ka_sst() helper test: Move sstable_open_config to reusable_sst()'s argument	2024-09-03 17:40:34 +03:00
Kamil Braun	504bf68ebb	raft_group0_client: remove `hold_read_apply_mutex` overload without `abort_source&` Ensure that every caller passes `abort_source&`.	2024-09-03 15:52:05 +02:00
Kamil Braun	79983723c8	storage_service: pass `_abort_source` to `hold_read_apply_mutex` There's no point waiting for this lock if `storage_service` is being aborted. In theory the lock, if held, should be eventually released by whatever is holding it during shutdown -- but if there is some cyclic reference between the services, and e.g. whatever holds the lock is stuck because of ongoing shutdown and would only be unstuck by `storage_service` getting stopped (which it can't because it's waiting on the lock), that would cause a shutdown deadlock. Better to be safe than sorry.	2024-09-03 15:52:05 +02:00
Kamil Braun	a7097fb985	group0_state_machine: pass `_abort_source` to `hold_read_apply_mutex` `transfer_snapshot` was already passing `_abort_source` when trying to take the lock but other member functions didn't.	2024-09-03 15:52:05 +02:00
Kamil Braun	a4d1065628	api: move `reload_raft_topology_state` implementation inside `storage_service` In later commit we'll want to access more `storage_service` internals in the API's implementation (namely, `_abort_source`) Also moving the implementation there allows making `service::topology_transition()` private again (it was made public in `992f1327d3` only for this API implementation)	2024-09-03 15:52:03 +02:00
Andrei Chekun	27e5fa149a	[test.py] Clean duplicated arg for test suite Arguments mode and run_id already set in the _prepare_pytest_params, so there is no need to set them one more time.	2024-09-03 14:41:57 +02:00
Andrei Chekun	8a9146ebda	[test.py] Enable allure for python test Enable allure adapter for all python tests. Add tag and parameters to the test to be able to distinguish them across modes and runs. Related: https://github.com/scylladb/qa-tasks/issues/1665	2024-09-03 14:41:57 +02:00
Łukasz Paszkowski	20a6296309	test: Add reversed query tests on simulated upgrade process Run the reversed queries on a 2-node cluster with CL=ALL with and without NATIVE_REVERSE_QUERIES feature flag. When the flag is enabled, the native reversed format is used, otherwise the legacy format. The NATIVE_REVERSE_QUERIES feature flag is suppressed with an error injection that simulates cluster upgrade process. Backport is not required. The patch adds additional upgrade tests for https://github.com/scylladb/scylladb/pull/18864 Closes scylladb/scylladb#20179	2024-09-03 14:45:08 +03:00
Pavel Emelyanov	0857b63259	Merge 'repair: row_level: coroutinize some slow-path functions' from Avi Kivity This series coroutinizes up some functions in repair/row_level.cc. This enhances readability and reduces bloat: ``` size build/release/repair/row_level.o.{before,after} text data bss dec hex filename 1650619 48 524 1651191 1931f7 build/release/repair/row_level.o.before 1604610 48 524 1605182 187e3e build/release/repair/row_level.o.after ``` 46kB of text were saved. Functions that only touch a single mutation fragment were not coroutinized to avoid adding a allocation in a fast path. In one case a function was split into a fast path and a slow path. Clean-up series, backport not needed. Closes scylladb/scylladb#20283 * github.com:scylladb/scylladb: repair: row_level: restore indentation repair: row_level: coroutinize repair_meta::get_full_row_hashes_sink_op() repair: row_level: coroutinize repair_meta::get_full_row_hashes_source_op() repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_handler() repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_handler() repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_handler() repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_process() repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_process_op_slow_path() repair: row_level: split repair_get_row_diff_with_rpc_stream_process_op() into fast and slow paths repair: row_level: coroutinize repair_meta::put_row_diff_handler() repair: row_level: coroutinize repair_meta::put_row_diff_sink_op() repair: row_level: coroutinize repair_meta::put_row_diff_source_op() repair: row_level: coroutinize repair_meta::put_row_diff() repair: row_level: coroutinize repair_meta::get_row_diff_handler() repair: row_level: coroutinize repair_meta::get_row_diff_sink_op() repair: row_level: coroutinize repair_meta::to_repair_rows_on_wire() repair: row_level: coroutinize repair_meta::do_apply_rows() repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf_within_set_diff() repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf() repair: row_level: coroutinize repair_meta::row_buf_csum() repair: row_level: coroutinize repair_meta::get_repairs_row_size() repair: row_level: coroutinize repair_meta::set_estimated_partitions() repair: row_level: coroutinize repair_meta::get_estimated_partitions() repair: row_level: coroutinize repair_meta::do_estimate_partitions_on_local_shard() repair: row_level: coroutinize repair_reader::close() repair: row_level: coroutinize repair_reader::end_of_stream() repair: row_level: coroutinize sink_source_for_repair::close() repair: row_level: coroutinize sink_source_for_repair::get_sink_source()	2024-09-03 14:41:22 +03:00
Nadav Har'El	dd030f8112	alternator: improve RBAC access denied error messages This patch address two requests made by reviewers of the original "Add CQL-based RBAC support to Alternator" series. Both requests were about the error messages produced when access is denied: 1. The error message is improved to use more proper English, and also to include the name of the role which was denied access. 2. The permission-check and error-message-formatting code is de-duplicated, using a common function verify_permission(). This de-duplication required moving the access-denied error path to throwing an exception instead of the previous exception-free implementation. However, it can be argued that this change is actually a good thing, because it makes the successful case, when access is allowed, faster. The de-duplicated code is shorter and simpler, and allowed changing the text of the error message in just one place. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20326	2024-09-03 14:39:30 +03:00
Kefu Chai	d26bb9ae30	sstables: correct the debugging message printed when removing temp dir in `372a4d1b79`, we introduced a change which was for debugging the logging message. but the logging message intended for printing the temp_dir not prints an `optional<int>`. this is both confusing, and more importantly, it hurts the debuggability. in this change, the related change is reverted. Fixes scylladb/scylladb#20408 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20409	2024-09-03 14:36:08 +03:00
Pavel Emelyanov	e4bc5470cf	test: Call reusable sst from ka_sst() helper The sstable_mutation_test wants to load pre-existing sstables from resouce/ subdir. For that there's reusable_sst() helper on env. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-03 14:01:28 +03:00
Pavel Emelyanov	e9980bd6dd	test: Move sstable_open_config to reusable_sst()'s argument So that callers are able to provide custom config in the future Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-09-03 14:00:59 +03:00
Laszlo Ersek	cd0819e3ed	docs/dev/docker-hub.md: refresh aio-max-nr calculation What we have today in "docs/dev/docker-hub.md" on "aio-max-nr" dates back to scylla commit `f4412029f4` ("docs/docker-hub.md: add quickstart section with --smp 1", 2020-09-22). Problems with the current language: - The "65K" claim as default value on non-production systems is wrong; "fs/aio.c" in Linux initializes "aio_max_nr" to 0x10000, which is 64K. - The section in question uses equal signs (=) incorrectly. The intent was probably to say "which means the same as", but that's not what equality means. - In the same section, the relational operator "<" is bogus. The available AIO count must be at least as high (>=) as the requested AIO count. - Clearer names should be used; adjust_max_networking_aio_io_control_blocks() in "src/core/reactor.cc" sets a great example: - "reactor::max_aio" should be called "storage_iocbs", - "detect_aio_poll" should be called "preempt_iocbs", - "reactor_backend_aio::max_polls" should be called "network_iocbs". - The specific value 10000 for the last one ("network_iocbs") is not correct in scylla's context. It is correct as the Seastar default, but scylla has used 50000 since commit `2cfc517874` ("main, test: adjust number of networking iocbs", 2021-07-18). Rewrite the section to address these problems. See also: - https://github.com/scylladb/scylladb/issues/5981 - https://github.com/scylladb/seastar/pull/2396 - https://github.com/scylladb/scylladb/pull/19921 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-09-03 12:10:59 +02:00
Laszlo Ersek	15738d14ce	docs/dev/docker-hub.md: strip trailing whitespace Strip trailing whitespace. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-09-03 12:00:28 +02:00
Botond Dénes	2556e902b1	Update tools/jmx submodule * tools/jmx 89308b77...793452a9 (1): > dist: support building packages in Github Actions	2024-09-03 11:58:37 +03:00
Anna Stuchlik	5193d2d171	doc: remove the seeds-related questions from the FAQ This commit one of the series to remove the FAQ page by removing irrelevant/outdated entries or moving them to the forum. The question about seeds is irrelevant, not frequently asked, and covered in other sections of the docs. Also, it mentions versions that are no longer supported. Closes scylladb/scylladb#20403	2024-09-03 11:01:49 +03:00
Takuya ASADA	9d7fed40b5	install.sh: fix more incorrect permission on strict umask Even after `13caac7`, we still have more files incorrect permission, since we use "cp -r" and creating new file with redirect. To fix this, we need to replace "cp -r" with "cp -pr", and "chmod <perm>" on newly created files. Fixes #14383 Related #19775 Closes scylladb/scylladb#19786	2024-09-03 10:37:53 +03:00
Anna Stuchlik	360f7b3d33	doc: move Features to the top-level page This commit moves the Features page from the section for developers to the top level in the page tree. This involves: - Moving the source files to the features folder from the using-scylla folder. - Moving images into features/images folder. - Updating references to the moved resources. - Adding redirections to the moved pages. Closes scylladb/scylladb#20401	2024-09-03 07:24:33 +03:00
Kefu Chai	fb2ed20b42	.github: post a comment if "Fixes" policy is violated it's more visible than an "Error" in the action's detail message. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19271	2024-09-03 07:23:48 +03:00
Botond Dénes	8f31d3f1fc	Merge 'tools/nodetool: improve backup and restore commands' from Kefu Chai this change contains two improvements to "backup" and "restore" commands: - let them print task id - let them return 1 as the exist status code upon operation failure ---- these changes are improvements to the newly introduced commands, which are not in any LTS branches yet, so no need to backport. Closes scylladb/scylladb#20371 * github.com:scylladb/scylladb: tools/scylla-nodetool: return failure with exit code in backup/restore tools/scylla-nodetool: let backup/restore print task id	2024-09-02 16:40:55 +03:00
Takuya ASADA	59aedb38d0	locator: retry HTTP request to GCE/Azure metadata service Like we already do on EC2, implement retrying request to the metadata service on GCE and Azure. Closes #19817 Closes scylladb/scylladb#20189	2024-09-02 13:04:05 +03:00
Kefu Chai	e66e885e5b	tools/scylla-nodetool: return failure with exit code in backup/restore before this change, "backup" and "restore" commands always return 0 as their exist code no matter if the performed operation fails or not. inspired by the "task" commands of nodetool, let's return 1 with exit code if the operation fails. the tests are updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-02 15:12:26 +08:00
Kefu Chai	470c3e8535	tools/scylla-nodetool: let backup/restore print task id in `20fffcdc`, we added the "task wait" subcommand, so user is allowed to interact with a task with its task id. and in existing implementation of "backup" and "restore" command, if user does not pass `--nowait`, the command just exits without any output upon sending the request to scylladb. in this change, we print out the task_id if user does not pass `--nowait` command line option to "backup" or "restore" command. this allows user to follow up on the operation if necessary. the tests are updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-02 15:12:26 +08:00
Nadav Har'El	0b3890df46	test/cql-pytest: test RBAC auto-grant (and reproduce CDC bug) This patch adds functional testing for the role-based access control (RBAC) "auto-grant" feature, where if a user that is allowed to create a table, it also recieves full permissions over the table it just created. We also test permissions over new materialized views created by a user, and over CDC logs. The test for CDC logs reproduces an already suspected bug, #19798: A user may be allowed to create a table with CDC enabled, but then is not allowed to read the CDC log just created. The tests show that the other cases (base tables and views) do not have this bug, and the creating user does get appropriate permissions over the new table and views. In addition to testing auto-grant, the patch also includes tests for the opposite feature, "auto-revoke" - that permissions are removed when the table/view/cdc is deleted. If we forget to do that while implementing auto-grant, we risk that users may be able to use tables created by other users just because they used the same table _name_ earlier. It's important to have these auto-revoke tests together with the auto-grant tests that reproduce #19798 - so we don't forget this part when finally fixing #19798. Refs #19798. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19845	2024-09-02 09:03:40 +03:00
Botond Dénes	52bed81a1e	Merge 'cql3: add option to not unify bind variables with the same name' from Avi Kivity Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559 This may be useful to users transitioning from Cassandra, so merits a backport. Closes scylladb/scylladb#19493 * github.com:scylladb/scylladb: cql3: add option to not unify bind variables with the same name cql3: introduce dialect infrastructure cql3: prepared_statement_cache: drop cache key default constructor	2024-09-02 08:34:24 +03:00
Kefu Chai	28b5471c01	docs/dev/maintainer.md: fix formatting * in the "Backporting Seastar commits" section, there's a single quote instead of a backtick in this line, so fix it. * add backticks around `refresh-submodules.sh`, which is a filename. * correct the command line setting a git config option, because `git-config` does not support this command line syntax, ```console $ git config --global diff.conflictstyle = diff3 $ git config --global get diff.conflictstyle = $ git config --global diff.conflictstyle diff3 $ git config --global get diff.conflictstyle diff3 ``` quote from git-config(1) > ``` > git config set [<file-option>] [--type=<type>] [--all] [--value=<value>] [--fixed-value] <name> <value> > ``` * stop using the deprecated mode of the `git-config` command, and use subcommand instead. as git-config(1) puts: > git config <name> <value> [<value-pattern>] > Replaced by git config set [--value=<pattern>] <name> <value>. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20328	2024-09-01 22:24:01 +03:00
Yaniv Michael Kaul	2ebba9cd11	tools/toolchain/dbuild: prefer podman over docker Check if podman is available before docker. If it is, use it. Otherwise, check for docker. 1. Podman is better. It runs with fewer resources, and I've had display issues with Docker (output was not shown consistently) 2. 'which docker' works even when the docker service and socket are turned off. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20342	2024-09-01 22:17:01 +03:00
David Garcia	c4da75e392	docs: run docs test on changing config params Triggers the "Build Docs" PR workflow whenever the `db/config.cc` or `db/config.h` files are edited. These files are used to produce documentation, and this change will help prevent the introduction of breaking changes to the documentation build when they are modified. Closes scylladb/scylladb#20347	2024-09-01 22:15:48 +03:00
Avi Kivity	0f4b05824e	Merge 'perf/perf_sstable: add {crawling,partitioned}_streaming modes' from Kefu Chai for testing the load performance of load_and_stream operation. Refs #19989 --- no need to backport. it adds two new tests to the existing `perf_sstable` tool for evaluating the load performance when performing the "load_and_streaming" operation. hence has no impact on the production. Closes scylladb/scylladb#20186 * github.com:scylladb/scylladb: perf/perf_sstable: add {crawling,partitioned}_streaming modes test/perf/perf_sstable: use switch-case when appropriate	2024-09-01 22:04:22 +03:00
Avi Kivity	7197d280b0	Merge 'scylla-gdb.py: lazy-evaluate the constants ' from Kefu Chai instead of evaluating the constants in-class, accessing them via a cached class property. it would be handy if we could source `scylla-gdb.py` in `.gdbinit`, but this script accesses some symbols which are not available without a file being debugged. what's why gdb fails to load the init script: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module> class intrusive_slist: File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist size_t = gdb.lookup_type('size_t') ^^^^^^^^^^^^^^^^^^^^^^^^^ gdb.error: No type named size_t. ``` so we have to `file path/to/scylla` and then `source scylla-gdb.py` every time when we debug scylla or a seastar application, instead of loading `scylla-gdb.py` in `.gdbinit`. the reason is that the script accesses the debug symbols like `gdb.lookup_type('size_t')` in-class. so when the python interpreter reads the script, it evaluates this statement, but at that moment, the debug symbols are not loaded, so `source scylla-gdb.py` fails in `.gdbinit`. in this change, we transform all these class variables to cached properties, so that they * are evaluated on-demand * are evaluated only once at most this addresses the pain at the expense of verbosity. --- this change intends to improve the developer's user experience, and has no impacts on product, so no need to backport. Closes scylladb/scylladb#20334 * github.com:scylladb/scylladb: test/scylla_gdb: test the .gdb init use case scylla-gdb.py: lazy-evaluate the constants	2024-09-01 20:00:53 +03:00
Pavel Emelyanov	7df43312ac	test: Remove sstable making helpers from table_for_tests All users of it have sstable_test_env at hand (in fact -- they call env method to get table_for_test). And since sstable_test_env already has a bunch of methods to create sstable, the table_for_test wrapper doesn't need to duplicate this code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20360	2024-09-01 19:58:15 +03:00
Kefu Chai	bc2b7b47c8	build: cmake: add and use Scylla_CLANG_INLINE_THRESHOLD cmake parameter so that we can set this the parameter passed to `-inline-threshold` with `configure.py` when building with CMake. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20364	2024-09-01 19:56:02 +03:00
Kefu Chai	6970c502c9	dist: drop %pretrans section before this change, if user does not have `/bin/sh` around, when installing scylla packages, the script in `%pretrans" is executed, and fails due to missing `/bin/sh`. per https://docs.fedoraproject.org/en-US/packaging-guidelines/Scriptlets/#pretrans > Note that the %pretrans scriptlet will, in the particular case of > system installation, run before anything at all has been installed. > This implies that it cannot have any dependencies at all. For this > reason, %pretrans is best avoided, but if used it MUST (by necessity) > be written in Lua. See > https://rpm-software-management.github.io/rpm/manual/lua.html for more > information. but we were trying to warn users upgrading from scylla < 1.7.3, which was released 7 years ago at the time of writing. in this change, we drop the `%pretrans` section. hopefuly they will find their way out if they still exist. Fixes scylladb/scylladb#20321 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20365	2024-09-01 19:46:19 +03:00
Kefu Chai	a06e1c6545	scylla-housekeeping: use raw string to avoid using escape sequence before this change, when running `scylla-housekeeping`: ``` /opt/scylladb/scripts/libexec/scylla-housekeeping:122: SyntaxWarning: invalid escape sequence '\s' match = re.search(".http.?://repositories./scylladb/([^/\s]+)/./([^/\s]+)/scylladb-.", line) ``` we could have the warning above. because `\s` is not a valid escape sequence, but the Python interpreter accepts it as two separated characters of `\s` after complaining. but it's still annoying. so, let's use a raw string here. Refs scylladb/scylladb#20317 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20359	2024-09-01 18:59:23 +03:00
Kefu Chai	e431b90145	test/boost/view_build_test: include used header before this change, when building the test of `view_build_test` with clang-20, we can have following build failure: ``` FAILED: test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -MF test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o.d -o test/boost/CMakeFiles/view_build_test.dir/Debug/view_build_test.cc.o -c /home/kefu/dev/scylladb/test/boost/view_build_test.cc /home/kefu/dev/scylladb/test/boost/view_build_test.cc:998:5: error: unknown type name 'simple_schema' 998 \| simple_schema ss; \| ^ ``` apparently, `simple_schema`'s declaration is not available in this translation unit. in this change * we include the header where `simple_schema` is defined, so that the build passes with clang-20. * also take this opportunity to reorder the header a little bit, so the testing headers are grouped together. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20367	2024-09-01 18:58:23 +03:00
Kefu Chai	753188c33d	test: include seastar/testing/random.hh when appropriate in a recent seastar change (644bb662), we do not include `seastar/testing/random.hh` in `seastar/testing/test_runner.hh` anymore, as the latter is not a facade of the former, and neither does it use the former. as a sequence, some tests which take the advantage of the included `seastar/testing/random.hh` do not build with the latest seastar: ``` FAILED: test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o /usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DBOOST_UNIT_TEST_FRAMEWORK_DYN_LINK -DBOOST_UNIT_TEST_FRAMEWORK_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSCYLLA_ENABLE_PREEMPTION_SOURCE -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -I/__w/scylladb/scylladb/build -isystem /__w/scylladb/scylladb/abseil -isystem /__w/scylladb/scylladb/build/rust -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -Werror=unused-result -fstack-clash-protection -MD -MT test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -MF test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o.d -o test/lib/CMakeFiles/test-lib.dir/key_utils.cc.o -c /__w/scylladb/scylladb/test/lib/key_utils.cc In file included from /__w/scylladb/scylladb/test/lib/key_utils.cc:11: /__w/scylladb/scylladb/test/lib/random_utils.hh:25:30: error: no member named 'local_random_engine' in namespace 'seastar::testing' 25 \| return seastar::testing::local_random_engine; \| ~~~~~~~~~~~~~~~~~~^ 1 error generated. ``` in this change, we include `seastar/testing/random.hh` when the random facility is used, so that they can be compiled with the latest seastar library. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20368	2024-09-01 18:57:07 +03:00
Kefu Chai	0104c7d371	tools/scylla-nodetool: s/vm.count()/vm.contains()/ under the hood, std::map::count() and std::map::contains() are nearly identical. both operations search for the given key witin the map. however, the former finds a equal range with the given key, and gets the distance between the disntance between the begin and the end of the range; while the later just searches with the given key. since scylla-nodetool is not a performance-critical application, the minor difference in efficiency between these two operations is unlikely to have a significant impact on its overall performance. while std::map::count() is generally suitable for our need, it might be beneficial to use a more appropriate API. in this change, we use std::map::contains() in the place of std::map::count() when checking for the existence of a paramter with given name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20350	2024-09-01 18:39:00 +03:00
Avi Kivity	ddf344e4f1	Merge 'compaction: use structured binding and ranges library when appropriate' from Kefu Chai for better readability --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#20366 * github.com:scylladb/scylladb: compaction: use std::views::reverse when appropriate compaction: use structured binding when appropriate	2024-09-01 18:35:15 +03:00
Avi Kivity	ea8441dfa3	cql3: add option to not unify bind variables with the same name Bind variables in CQL have two formats: positional (`?`) where a variable is referred to by its relative position in the statement, and named (`:var`), where the user is expected to supply a name->value mapping. In `19a6e69001` we identified the case where a named bind variable appears twice in a query, and collapsed it to a single entry in the statement metadata. Without this, a driver using the named variable syntax cannot disambiguate which variable is referred to. However, it turns out that users can use the positional call form even with the named variable syntax, by using the positional API of the driver. To support this use case, we add a configuration variable to disable the same-variable detection. Because the detection has to happen when the entire statement is visible, we have to supply the configuration to the parser. We call it the `dialect` and pass it from all callers. The alternative would be to add a pre-prepare call similar to fill_prepare_context that rewrites all expressions in a statement to deduplicate variables. A unit test is added. Fixes #15559	2024-09-01 17:27:48 +03:00
Avi Kivity	60acfd8c08	docs: cql: document ZstdCompressor for CREATE TABLE Adjust the wording slightly to be less awkward. Closes scylladb/scylladb#20377	2024-09-01 14:28:09 +03:00
Kefu Chai	e53a9a99cd	compaction: use std::views::reverse when appropriate let's use the standard library when appropriate. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-01 08:44:01 +08:00
Kefu Chai	3801c079e2	compaction: use structured binding when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-09-01 08:34:10 +08:00
Avi Kivity	61e6a77a99	repair: row_level: restore indentation	2024-08-30 23:00:59 +03:00
Avi Kivity	a35942e09a	repair: row_level: coroutinize repair_meta::get_full_row_hashes_sink_op() Extra care is needed for exception handling.	2024-08-30 22:55:16 +03:00
Avi Kivity	8e9ebd82fc	repair: row_level: coroutinize repair_meta::get_full_row_hashes_source_op()	2024-08-30 22:55:16 +03:00
Avi Kivity	f7d19e237d	repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_handler() Both the handle_exception() and finally() blocks need some extra care.	2024-08-30 22:55:16 +03:00
Avi Kivity	bb8751f4b5	repair: row_level: coroutinize repair_put_row_diff_with_rpc_stream_handler() Both the handle_exception() and finally() blocks need some extra care.	2024-08-30 22:55:16 +03:00
Avi Kivity	7ba0642da2	repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_handler() Both the handle_exception() and finally() blocks need some extra care.	2024-08-30 22:55:16 +03:00
Avi Kivity	61bbf452c6	repair: row_level: coroutinize repair_get_full_row_hashes_with_rpc_stream_process()	2024-08-30 22:55:16 +03:00
Avi Kivity	01a578f608	repair: row_level: coroutinize repair_get_row_diff_with_rpc_stream_process_op_slow_path()	2024-08-30 22:55:16 +03:00
Avi Kivity	3733105f78	repair: row_level: split repair_get_row_diff_with_rpc_stream_process_op() into fast and slow paths This allows coroutinization of the slow path without affecting the fast path.	2024-08-30 22:55:16 +03:00
Avi Kivity	e17c3b71a8	repair: row_level: coroutinize repair_meta::put_row_diff_handler()	2024-08-30 22:55:16 +03:00
Avi Kivity	74ea2b9663	repair: row_level: coroutinize repair_meta::put_row_diff_sink_op() Exception handling is a bit awkward since can't co_await in a catch block.	2024-08-30 22:55:16 +03:00
Avi Kivity	e4362a5b7b	repair: row_level: coroutinize repair_meta::put_row_diff_source_op()	2024-08-30 22:55:16 +03:00
Avi Kivity	b998d69f09	repair: row_level: coroutinize repair_meta::put_row_diff()	2024-08-30 22:55:16 +03:00
Avi Kivity	3f2b5fe5dc	repair: row_level: coroutinize repair_meta::get_row_diff_handler()	2024-08-30 22:55:16 +03:00
Avi Kivity	cd63971501	repair: row_level: coroutinize repair_meta::get_row_diff_sink_op() Since sink.close() is called from an exception handler, some code movement is needed so it isn't co_awaited from a catch block.	2024-08-30 22:55:16 +03:00
Avi Kivity	3f28dec88c	repair: row_level: coroutinize repair_meta::to_repair_rows_on_wire() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:16 +03:00
Avi Kivity	1a84f1a73d	repair: row_level: coroutinize repair_meta::do_apply_rows() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:16 +03:00
Avi Kivity	7f15cc446f	repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf_within_set_diff() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:16 +03:00
Avi Kivity	93ca202bd3	repair: row_level: coroutinize repair_meta::copy_rows_from_working_row_buf() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:15 +03:00
Avi Kivity	5f8895d908	repair: row_level: coroutinize repair_meta::row_buf_csum() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:15 +03:00
Avi Kivity	d1e45f2982	repair: row_level: coroutinize repair_meta::get_repairs_row_size() coroutine::maybe_yield() introduced to compensate for loss of stall-protected do_for_each()	2024-08-30 22:55:15 +03:00
Avi Kivity	0b1bf57d19	repair: row_level: coroutinize repair_meta::set_estimated_partitions()	2024-08-30 22:55:15 +03:00
Avi Kivity	aee078d8e5	repair: row_level: coroutinize repair_meta::get_estimated_partitions()	2024-08-30 22:55:15 +03:00
Avi Kivity	51534f60eb	repair: row_level: coroutinize repair_meta::do_estimate_partitions_on_local_shard()	2024-08-30 22:55:12 +03:00
Kamil Braun	e01cef01a6	Merge 'Ignore seed name resolution errors during the restart of a cluster member node.' from Sergey Zolotukhin All seeds hostname resolution errors will be ignored during a node restart in case the node had already joined a cluster. This will prevent restart errors if some seed names are not resolvable. Fixes scylladb/scylladb#14945 Closes scylladb/scylladb#20292 * github.com:scylladb/scylladb: Ignore seed name resolution errors on restart. Add a test for starting with a wrong seed.	2024-08-30 11:33:44 +02:00
Kamil Braun	292ef0d1f9	Merge 'Fix node replace with inter-dc encryption enabled.' from Gleb Natapov Currently if a coordinator and a node being replaced are in the same DC while inter-dc encryption is enabled (connections between nodes in the same DC should not be encrypted) the replace operation will fail. It fails because a coordinator uses non encrypted connection to push raft data to the new node, but the new node will not accept such connection until it knows which DC the coordinator belongs to and for that the raft data needs to be transferred. The series adds the test for this scenario and the fix for the chicken&egg problem above. The series (or at least the fix itself) needs to be backported because this is a serious regression. Fixes: scylladb/scylladb#19025 Closes scylladb/scylladb#20290 * github.com:scylladb/scylladb: topology coordinator: fix indentation after the last patch topology coordinator: do not add replacing node without a ring to topology test: add test for replace in clusters with encryption enabled test.py: add server encryption support to cluster manager .gitignore: fix pattern for resources to match only one specific directory	2024-08-30 11:29:05 +02:00
Kefu Chai	82fbe317ec	test/scylla_gdb: test the .gdb init use case before this change, we run all the tests in a single pytest session, with scylladb debug symbols loaded. but we want to test another use case, where the scylladb debug symbols are missing. in this change, * we do not check for the existence of debug symbols until necessary * add a mark named "without_scylla" * run the tests in two pytest sessions - one with "without_scylla" mark - one with "not without_scylla" mark * add a test which is marked with the "without_scylla" mark. the test verify that the scylla-gdb.py script can be loaded even without scylladb debug symbols. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-30 17:05:29 +08:00
Kefu Chai	7dd63c891f	scylla-gdb.py: lazy-evaluate the constants instead of evaluating the constants in-class, accessing them via a cached class property. it would be handy if we could source `scylla-gdb.py` in `.gdbinit`, but this script accesses some symbols which are not available with a file being debugged. so when gdb fails to load init script: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/scylla-gdb.py", line 167, in <module> class intrusive_slist: File "/home/kefu/dev/scylladb/scylla-gdb.py", line 168, in intrusive_slist size_t = gdb.lookup_type('size_t') ^^^^^^^^^^^^^^^^^^^^^^^^^ gdb.error: No type named size_t. ``` so we have to `file path/to/scylla` and then `source scylla-gdb.py` every time when we debug scylla or a seastar application, instead of loading `scylla-gdb.py` in `.gdbinit`. the reason is that the script access the debug symbols like `gdb.lookup_type('size_t')` in-class. so when the python interpreter reads the script, it evaluates this statement, but at that moment, the debug symbols are not loaded, so `source scylla-gdb.py` fails in `.gdbinit`. in this change, we transform all these class variables to cached property, so that they * are evaluated on-demand * are evaluated only once at most this addresses the pain at the expense of verbosity. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-30 17:05:29 +08:00
Pavel Emelyanov	cec4d207f6	Merge 'repair: throw if batchlog manager isn't initialized' from Aleksandra Martyniuk repair_service::repair_flush_hints_batchlog_handler may access batchlog manager while it is uninitialized. Throw if batchlog manager isn't initialized. Fixes: #20236. Needs backport to 6.0 and 6.1 as they suffer from the uninitialized bm access. Closes scylladb/scylladb#20251 * github.com:scylladb/scylladb: test: add test to ensure repair won't fail with uninitialized bm repair: throw if batchlog manager isn't initialized	2024-08-30 11:37:24 +03:00
Anna Stuchlik	4471c80bdc	doc: add the 6.1-to-6.2 upgrade guide This commit replaces the 6.0-to-6.1 upgrade guide with the 6.1-to-6.2 upgrade guide. The new guide is a template that covers the basic procedure. If any 6.2-specific updates are required, they will have to be added along with development. Closes scylladb/scylladb#20178	2024-08-30 10:10:45 +03:00
Piotr Dulikowski	c05be27e4a	Merge 'db/hints: Move the code for writing hints to a separate function' from Dawid Mędrek In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`, we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda to improve the readability. However, there was a subtle problem related to lifetimes of the captures that needed to be addressed: * Since we started `co_await`ing in the lambda, the captures were at risk of being destructed too soon. The usual solution is to wrap a coroutine lambda within a `seastar::coroutine::lambda` object and rely on the extended lifetime enforced by the semantics of the language. See `docs/dev/lambda-coroutine-fiasco.md` for more context. * However, since we don't immediately `co_await` the future returned by `with_gate()`, we cannot rely on the extended lifetime provided by the wrapper. The document linked in the previous bullet point suggests keeping the passed coroutine lambda as a variable and pass it as a reference to `with_gate()`. However, that's not feasible either because we discard the returned future and the function returns almost instantly -- destructing every local object, which would encompass the lambda too. The solution used in the commit was to move captures of the lambda into the lambda's body. That helped because Seastar's backend is responsible for keeping all of the local variables alive until the lambda finishes its execution. However, we didn't move all of the captures into the lambda -- the missing one was the `this` pointer that was implicitly used in the lambda. Address sanitiser hasn't reported any bugs related to the pointer yet, but the bug is most likely there. In this commit, we transform the lambda's body into a new member function and only call it from the lambda. This way, we don't need to care about the lifetimes of the captures because Seastar ensures that the function's arguments stay alive until the coroutine finishes. Choosing this solution instead of assigning `this` to a pointer variable inside the lambda's body and using it to refer to the object's members has actual benefit: it's not possible to accidentally forget to refer to a member of the object via the pointer; it also makes the code less awkward. Fixes scylladb/scylladb#20306 Closes scylladb/scylladb#20258 * github.com:scylladb/scylladb: db/hints: Fix indentation in `do_store_hint()` db/hints: Move code for writing hints to separate function	2024-08-30 09:09:02 +02:00
Avi Kivity	bbcfd47bf5	doc: nodetool: toppartitions: document --samplers and --capacity In particular --capacity is critical for obtaining accurate measurements. Closes scylladb/scylladb#20192	2024-08-30 10:07:54 +03:00
Botond Dénes	9f9346fc59	Merge 'nodetool: tasks: add nodetool commands to track task manager tasks' from Aleksandra Martyniuk Add nodetool commands to manage task manager tasks: - tasks abort - aborts the task - tasks list - lists all tasks in the module - tasks modules - lists all modules - tasks set-ttl - sets task ttl - tasks status - gets status of the task - tasks tree - gets statuses of the task and all its desendent's - tasks ttl - gets task ttl - tasks wait - waits for the task and gets its status Fixes: https://github.com/scylladb/scylladb/issues/19201. Closes scylladb/scylladb#19614 * github.com:scylladb/scylladb: test: nodetool: add tests for tasks commands nodetool: tasks: add nodetool commands to track task manager tasks api: task_manager: return status 403 if a task is not abortable api: task_manager: return none instead of empty task id api: task_manager: add timeout to wait_task api: task_manager: add operation to get ttl nodetool: add suboperations support nodetool: change operations_with_func type nodetool: prepare operation related classes for suboperations	2024-08-30 07:37:37 +03:00
Avi Kivity	d69bf4f010	cql3: introduce dialect infrastructure A dialect is a different way to interpret the same CQL statement. Examples: - how duplicate bind variable names are handled (later in this series) - whether `column = NULL` in LWT can return true (as is now) or whether it always returns NULL (as in SQL) Currently, dialect is an empty structure and will be filled in later. It is passed to query_processor methods that also accept a CQL string, and from there to the parser. It is part of the prepared statement cache key, so that if the dialect is changed online, previous parses of the statement are ignored and the statement is prepared again. The patch is careful to pick up the dialect at the entry point (e.g. CQL protocol server) so that the dialect doesn't change while a statement is parsed, prepared, and cached.	2024-08-29 21:19:23 +03:00
Avi Kivity	f9322799af	cql3: prepared_statement_cache: drop cache key default constructor It's unnecessary, and interferes with the following patch where we change the cache key type.	2024-08-29 21:07:00 +03:00
Avi Kivity	67b24859bc	Merge 'generic_server: convert connection tracking to seastar::gate' from Laszlo Ersek ~~~ generic_server: convert connection tracking to seastar::gate If we call server::stop() right after "server" construction, it hangs: With the server never listening (never accepting connections and never serving connections), nothing ever calls server::maybe_stop(). Consequently, co_await _all_connections_stopped.get_future(); at the end of server::stop() deadlocks. Such a server::stop() call does occur in controller::do_start_server() [transport/controller.cc], when - cserver->start() (sharded<cql_server>::start()) constructs a "server"-derived object, - start_listening_on_tcp_sockets() throws an exception before reaching listen_on_all_shards() (for example because it fails to set up client encryption -- certificate file is inaccessible etc.), - the "deferred_action" cserver->stop().get(); is invoked during cleanup. (The cserver->stop() call exposing the connection tracking problem dates back to commit `ae4d5a60ca` ("transport::controller: Shut down distributed object on startup exception", 2020-11-25), and it's been triggerable through the above code path since commit `6b178f9a4a` ("transport/controller: split configuring sockets into separate functions", 2024-02-05).) Tracking live connections and connection acceptances seems like a good fit for "seastar::gate", so rewrite the tracking with that. "seastar::gate" can be closed (and the returned future can be waited for) without anyone ever having entered the gate. NOTE: this change makes it quite clear that neither server::stop() nor server::shutdown() must be called multiple times. The permitted sequences are: - server::shutdown() + server::stop() - or just server::stop(). Fixes #10305 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> ~~~ Fixes #10305. I think we might want to backport this -- it fixes a hang-on-misconfiguration which affects `scylla-6.1.0-0.20240804.abbf0b24a60c.x86_64` minimally. Basically every release that contains commit `ae4d5a60ca` has a theoretical chance for the hang, and every release that contains commit `6b178f9a4a` has a practical chance for the hang. Focusing on the more practical symptom (i.e., releases containing commit `6b178f9a4a`), `git tag --contains 6b178f9a4a90` gives us (ignoring candidates and release candidates): - scylla-6.0.0 - scylla-6.0.1 - scylla-6.0.2 - scylla-6.1.0 Closes scylladb/scylladb#20212 * github.com:scylladb/scylladb: generic_server: make server::stop() idempotent generic_server: coroutinize server::shutdown() generic_server: make server::shutdown() idempotent test/generic_server: add test case configure, cmake: sort the lists of boost unit tests generic_server: convert connection tracking to seastar::gate	2024-08-29 19:45:48 +03:00
Laszlo Ersek	db44000f8d	Update seastar submodule * seastar 83e6cdfd...ec5da7a6 (1): > reactor, linux-aio: advise users in more detail on setting aio-max-nr Fixes #5981 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20307	2024-08-29 19:42:02 +03:00
Raphael S. Carvalho	26facd807e	storage_service: avoid processing same table unnecessarily in split monitor If there's a token metadata for a given table, and it is in split mode, it will be registered such that split monitor can look at it, for example, to start split work, or do nothing if table completed it. during topology change, e.g. drain, split is stalled since it cannot take over the state machine. It was noticed that the log is being spammed with a message saying the table completed split work, since every tablet metadata update, means waking up the monitor on behalf of a table. So it makes sense to demote the logging level to debug. That persists until drain completes and split can finally complete. Another thing that was noticed is that during drain, a table can be submitted for processing faster than the monitor can handle, so the candidate queue may end up with multiple duplicated entries for same table, which means unnecessary work. That is fixed by using a sequenced set, which keeps the current FIFO behavior. Fixes #20339. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#20029	2024-08-29 19:38:43 +03:00
Aleksandra Martyniuk	1f46cad5de	test: nodetool: add tests for tasks commands	2024-08-29 17:37:13 +02:00
Aleksandra Martyniuk	20fffcdcf5	nodetool: tasks: add nodetool commands to track task manager tasks	2024-08-29 17:37:12 +02:00
Avi Kivity	7da3314deb	Merge 'Integrated restore' from Ernest Zaslavsky Handed over from https://github.com/scylladb/scylladb/pull/20149 This adds minimal implementation of the start-restore API call. The method starts a task that runs load-and-stream functionality against sstables from S3 bucket. Arguments are: ``` endpoint -- the ID in object_store.yaml config file bucket -- the target bucket to get objects from keyspace -- the keyspace to work on table -- the table to work on snapshot -- the name of the snapshot from which the backup was taken ``` The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion. Remote sstables components are scanned as if they were placed in local upload/ directory. Then colelcted sstables are fed into load-and-stream. This branch has https://github.com/scylladb/scylladb/pull/19890 (Integrated backup), https://github.com/scylladb/scylladb/pull/20120 (S3 lister) and few more minor PRs merged in. The restore branch itself starts with [utils: Introduce abstract (directory) lister](`29c867b54d`) commit. refs: https://github.com/scylladb/scylladb/issues/18392 Closes scylladb/scylladb#20305 * github.com:scylladb/scylladb: tools/scylla-nodetool: add restore integration test/object_store: Add simple restore test test/object_store: Generalize prepare_snapshot_for_backup() code: Introduce restore API method sstable_loader: Add sstables::storage_manager dependency sstable_loader: Maintain task manager module sstable_loader: Out-line constructor distributed_loader: Split get_sstables_from_upload_dir() sstables/storage: Compose uploaded sstable path simpler sstable_directory: Prepare FS lister to scan files on S3 sstable_directory: Parse sstable component without full path s3-client: Add support for lister::filter utils: Introduce abstract (directory) lister	2024-08-29 18:25:30 +03:00
Kamil Braun	9574c399ce	Merge 'add support for zero-token nodes' from Patryk Jędrzejczak We revive the `join_ring` option. We support it only in the Raft-based topology, as we plan to remove the gossip-based topology when we fix the last blocker - the implementation of the manual recovery tool. In the Raft-based topology, a node can be assigned tokens only once when it joins the cluster. Hence, we disallow joining the ring later, which is possible in Cassandra. The main idea behind the solution is simple. We make the unsupported special case of zero tokens a supported normal case. Nodes with zero tokens assigned are called "zero-token nodes" from now on. From the topology point of view, zero-token nodes are the same as token-owning nodes. They can be in the same states, etc. From the data point of view, they are different. They are not members of the token ring, so they are not present in `token_metadata::_normal_token_owners`. Hence, they are ignored in all non-local replication strategies. The tablet load balancer also ignores them. Zero-token nodes can be used as coordinator-only nodes, just like in Cassandra. They can handle requests just like token-owning nodes. The main motivation behind zero-token nodes is that they can prevent the Raft majority loss efficiently. Zero-token nodes are group 0 voters, but they can run on much weaker and cheaper machines because they do not replicate data and handle client requests by default (drivers ignore them). For example, if there are two DCs, one with 4 nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes, every DC will contain less than half of the nodes, so we won't lose the majority when any DC dies. Another way of preventing the Raft majority loss is changing the voter set, which is tracked by scylladb/scylladb#18793. That approach can be used together with zero-token nodes. In the example above, if we choose equal numbers of voters in both DCs, then a DC with one zero-token node will be sufficient. However, in the typical setup of 2 DCs with the same number of nodes it is enough to add a DC with only one zero-token node without changing the voter set. Zero-token nodes could also be used as load balancers in the Alternator. Additionally, this PR fixes scylladb/scylladb#11087, which turned out to be a blocker. This PR introduced a new feature. There is no need to backport it. Fixes scylladb/scylladb#6527 Fixes scylladb/scylladb#11087 Fixes scylladb/scylladb#15360 Closes scylladb/scylladb#19684 * github.com:scylladb/scylladb: docs: raft: document using zero-token nodes to prevent majority loss test: test recovery mode in the presence of zero-token nodes test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency treewide: support zero-token nodes in the recovery mode storage_proxy: make TRUNCATE work locally for local tables test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes test: test zero-token nodes test: test_topology_ops: move helpers to topology/util.py feature_service: introduce the ZERO_TOKEN_NODES feature storage_service: rename join_token_ring to join_topology storage_service: raft_topology_cmd_handler: improve warnings topology_coordinator: fix indentation after the previous patch treewide: introduce support for zero-token nodes in Raft topology system_keyspace: load_topology_state: remove assertion impossible to hit treewide: distinguish all nodes from all token owners gossip topology: make a replacing node remove the replaced node from topology locator: topology: add_or_update_endpoint: use none as the default node state test: boost: tablets tests: ensure all nodes are normal token owners token_metadata: rename get_all_endpoints and get_all_ips network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership	2024-08-29 16:26:21 +02:00
Gleb Natapov	32a59ba98f	topology coordinator: fix indentation after the last patch	2024-08-29 17:14:09 +03:00
Gleb Natapov	17f4a151ce	topology coordinator: do not add replacing node without a ring to topology When only inter dc encryption is enabled a non encrypted connection between two nodes is allowed only if both nodes are in the same dc. If a nodes that initiates the connection knows that dst is in the same dc and hence use non encrypted connection, but the dst not yet knows the topology of the src such connection will not be allowed since dst cannot guaranty that dst is in the same dc. Currently, when topology coordinator is used, a replacing node will appear in the coordinator's topology immediately after it is added to the group0. The coordinator will try to send raft message to the new node and (assuming only inter dc encryption is enabled and replacing node and the coordinator are in the same dc) it will try to open regular, non encrypted, connection to it. But the replacing node will not have the coordinator in it's topology yet (it needs to sync the raft state for that). so it will reject such connection. To solve the problem the patch does not add a replacing node that was just added to group0 to the topology. It will be added later, when tokens will be assigned to it. At this point a replacing node will already make sure that its topology state is up-to-date (since it will execute a raft barrier in join_node_response_params handler) and it knows coordinator's topology. This aligns replace behaviour with bootstrap since bootstrap also does not add a node without a ring to the topology. The patch effectively reverts `b8ee8911ca` Fixes: scylladb/scylladb#19025	2024-08-29 17:14:09 +03:00
Gleb Natapov	2f1b1fd45e	test: add test for replace in clusters with encryption enabled	2024-08-29 17:14:09 +03:00
Gleb Natapov	b98282a976	test.py: add server encryption support to cluster manager	2024-08-29 17:14:09 +03:00
Gleb Natapov	84757a4ed3	.gitignore: fix pattern for resources to match only one specific directory	2024-08-29 17:13:58 +03:00
Dawid Medrek	d459cf91eb	db/hints: Fix indentation in `do_store_hint()`	2024-08-29 14:47:08 +02:00
Dawid Medrek	75ce6943d0	db/hints: Move code for writing hints to separate function In scylladb/scylladb@7301a96, in the function `hint_endpoint_manager::store_hint()`, we transformed the lambda passed to `seastar::with_gate()` to a coroutine lambda to improve the readability. However, there was a subtle problem related to lifetimes of the captures that needed to be addressed: * Since we started `co_await`ing in the lambda, the captures were at risk of being destructed too soon. The usual solution is to wrap a coroutine lambda within a `seastar::coroutine::lambda` object and rely on the extended lifetime enforced by the semantics of the language. See `docs/dev/lambda-coroutine-fiasco.md` for more context. * However, since we don't immediately `co_await` the future returned by `with_gate()`, we cannot rely on the extended lifetime provided by the wrapper. The document linked in the previous bullet point suggests keeping the passed coroutine lambda as a variable and pass it as a reference to `with_gate()`. However, that's not feasible either because we discard the returned future and the function returns almost instantly -- destructing every local object, which would encompass the lambda too. The solution used in the commit was to move captures of the lambda into the lambda's body. That helped because Seastar's backend is responsible for keeping all of the local variables alive until the lambda finishes its execution. However, we didn't move all of the captures into the lambda -- the missing one was the `this` pointer that was implicitly used in the lambda. Address sanitiser hasn't reported any bugs related to the pointer yet, but the bug is most likely there. In this commit, we transform the lambda's body into a new member function and only call it from the lambda. This way, we don't need to care about the lifetimes of the captures because Seastar ensures that the function's arguments stay alive until the coroutine finishes. Choosing this solution instead of assigning `this` to a pointer variable inside the lambda's body and using it to refer to the object's members has actual benefit: it's not possible to accidentally forget to refer to a member of the object via the pointer; it also makes the code less awkward.	2024-08-29 14:47:02 +02:00
Aleksandra Martyniuk	627fc46ca7	api: task_manager: return status 403 if a task is not abortable	2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk	10ab60f32b	api: task_manager: return none instead of empty task id If a user requests a status of a task that does not have a parent, show "none" instead of an empty parent_id.	2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk	5bcff4d544	api: task_manager: add timeout to wait_task	2024-08-29 13:53:40 +02:00
Aleksandra Martyniuk	3d78172328	api: task_manager: add operation to get ttl	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	fb160afaf6	nodetool: add suboperations support Modify nodetool methods so that it support suboperations.	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	4b96f9abb9	nodetool: change operations_with_func type Change the type of operations_with_func so that they can contain suboperations.	2024-08-29 13:53:39 +02:00
Aleksandra Martyniuk	c6f8a0116a	nodetool: prepare operation related classes for suboperations Modify operation and add operation_action class so that information about suboperations is stored. It's a preparation for adding suboperations support to nodetool.	2024-08-29 13:53:39 +02:00
Kefu Chai	dbb056f4f7	build: cmake: point -ffile-prefix-map to build directory before this change, we included `-ffile-prefix-map=${CMAKE_SOURCE_DIR}=.` in cflags when building the tree with CMake, but this was wrong. as the "." directory is the build directory used by CMake. and this directory is specified by the `-B` option when generating the building system. if `configure.py --use-cmake` is used to build the tree, the build directory would be "build". so this option instructs the compiler to replace the directory of source file in the debug symbols and in `__FILE__` at compile time. but, in a typical workspace, for instance, `build/main.cc` does not exist. the reason why this does not apply to CMake but applies to the rules generated by `configure.py` is that, `configure.py` puts the generated `build.ninja` right under the top source directory, so `.` is correct and it helps to create reproducible builds. because this practically erases the path prefixes in the build output. while CMake puts it under the specified build directory, replacing the source directory with the build directory with the file prefix map is just wrong. there are two options to address this problem: * stop passing this option. but this would lead to non-reproducible builds. as we would encode the build directory in the "scylla" executable. if a developer needs to rebuild an executable for debugging a coredump generated in production, he/she would have to either build the tree in the same directory as our CI does. or, he/she has to pass `-ffile-prefix-map=...` to map the local build directory to the one used by CI. this is not convenient. * instead of using `${CMAKE_SOURCE_DIR}=.`, add `${CMAKE_BINARY_DIR}=.`. this erases the build directory in the outputs, but preserves the debuggability. so we pick the second solution. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20329	2024-08-29 12:28:11 +03:00
Patryk Jędrzejczak	c192a9ee3b	docs: raft: document using zero-token nodes to prevent majority loss	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	e027ffdffc	test: test recovery mode in the presence of zero-token nodes We modify existing tests to verify that the recovery mode works correctly in the presence of zero-token nodes. In `test_topology_recovery_basic`, we test the case when a zero-token node is live. In particular, we test that the gossip-based restart of such a node works. In `test_topology_recovery_after_majority_loss`, we test the case when zero-token nodes are unrecoverable. In particular, we test that the gossip-based removenode of such nodes works. Since zero-token nodes are ignored by the Python driver if it also connects to other nodes, we use different CQL sessions for a zero-token node in `test_topology_recovery_basic`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	fb1e060c4c	test: topology: util.py: add cqls parameter to check_system_topology_and_cdc_generations_v3_consistency In the following commit, we modify `test_topology_recovery_basic` to test the recovery mode in the presence of live zero-token nodes. Unfortunately, it requires a bit ugly workaround. Zero-token nodes are ignored by the Python driver if it also connects to other nodes because of empty tokens in the `system.peers` table. In that test, we must connect to a zero-token node to enter the recovery mode and purge the Raft data. Hence, we use different CQL sessions for different nodes. In the future, we may change the Python driver behavior and revert this workaround. Moreover, the recovery tests will be removed or significantly changed when we implement the manual recovery tool. Therefore, we shouldn't worry about this workaround too much.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	54905fc179	test: topology: util.py: accept zero tokens in check_system_topology_and_cdc_generations_v3_consistency Before we use `check_system_topology_and_cdc_generations_v3_consistency` in a test with a zero-token node, we must ensure it doesn't fail because of zero tokens in a row of the `system.topology` table.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	02bb70da19	treewide: support zero-token nodes in the recovery mode Before we implement the manual recovery tool, we must support zero-token nodes in the recovery mode. This means that two topology operations involving zero-token nodes must work in the gossip-based topology: - removing a dead zero-token node, - restarting a live zero-token node. We make changes necessary to make them work in this patch.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	87b415efdc	storage_proxy: make TRUNCATE work locally for local tables In on of the following patches, we implement support for zero-token nodes in the recovery mode. To achieve this, we need to be able to purge all Raft data on live zero-token nodes by using TRUNCATE. Currently, TRUNCATE works the same for all replication strategies - it is performed on all token owners. However, zero-token nodes are not token owners, so TRUNCATE would ignore them. Since zero-token nodes store only local tables, fixing scylladb/scylladb#11087 is the perfect solution for the issue with zero-token nodes. We do it in this patch. Fixes scylladb/scylladb#11087	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	21c8409fa4	test: topology: util.py: document that check_token_ring_and_group0_consistency fails with zero-token nodes	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	95e14ae44b	test: test zero-token nodes We add tests to verify the basic properties of zero-token nodes. `test_zero_token_nodes_no_replication` and `test_not_enough_token_owners` are more or less deterministic tests. Running them only in the dev mode is sufficient. `test_zero_token_nodes_topology_ops` is quite slow, as expected, considering parameterization and the number of topology operations. In the future we can think of making it faster or skipping in the debug mode. For now, our priority is to test zero-token nodes thoroughly.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	d43d67c525	test: test_topology_ops: move helpers to topology/util.py In one of the following patches, we reuse the helper functions from `test_topology_ops` in a new test, so we move them to `util.py`. Also, we add the `cl` parameter to `start_writes`, as the new test will use `cl=2`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	574c252391	feature_service: introduce the ZERO_TOKEN_NODES feature Zero-token nodes must be supported by all nodes in the cluster. Otherwise, the non-supporting nodes would crash on some assertion that assumes only token-owing normal nodes make sense. Hence, we introduce the ZERO_TOKEN_NODES cluster feature. Zero-token nodes refuse to boot if it is not supported. I tested this patch manually. First, I booted a node built in the previous patch. Then, I tried to add a zero-token node built in this patch. It refused to boot as expected.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	c25eefe217	storage_service: rename join_token_ring to join_topology After introducing zero-token nodes that call join_token_ring but do not join the ring, the join_token_ring name does not make much sense.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	9937cf3a24	storage_service: raft_topology_cmd_handler: improve warnings	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	3ce936da7b	topology_coordinator: fix indentation after the previous patch	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	22d907e721	treewide: introduce support for zero-token nodes in Raft topology We revive the `join_ring` option. We support it only in the Raft-based topology, as we plan to remove the gossip-based topology when we fix the last blocker - the implementation of the manual recovery tool. In the Raft-based topology, a node can be assigned tokens only once when it joins the cluster. Hence, we disallow joining the ring later, which is possible in Cassandra. The main idea behind the solution is simple. We make the unsupported special case of zero tokens a supported normal case. Nodes with zero tokens assigned are called "zero-token nodes" from now on. From the topology point of view, zero-token nodes are the same as token-owning nodes. They can be in the same states, etc. From the data point of view, they are different. They are not members of the token ring, so they are not present in `token_metadata::_normal_token_owners`. Hence, they are ignored in all non-local replication strategies. The tablet load balancer also ignores them. Topology operations involving zero-token nodes are simplified: - `add` and `replace` finish in the `join_group0` state, so creating a new CDC generation and streaming are skipped, - `removenode` and `decommission` skip streaming, - `rebuild` does not even contact the topology coordinator as there is nothing to rebuild, Also, if the topology operation involves a token-owning node, zero-token nodes are ignored in streaming. Zero-token nodes can be used as coordinator-only nodes, just like in Cassandra. They can handle requests just like token-owning nodes. The main motivation behind zero-token nodes is that they can prevent the Raft majority loss efficiently. Zero-token nodes are group 0 voters, but they can run on much weaker and cheaper machines because they do not replicate data and handle client requests by default (drivers ignore them). For example, if there are two DCs, one with 4 nodes and one with 5 nodes, if we add a DC with 2 zero-token nodes, every DC will contain less than half of the nodes, so we won't lose the majority when any DC dies. Another way of preventing the Raft majority loss is changing the voter set, which is tracked by scylladb/scylladb#18793. That approach can be used together with zero-token nodes. In the example above, if we choose equal numbers of voters in both DCs, then a DC with one zero-token node will be sufficient. However, in the typical setup of 2 DCs with the same number of nodes it is enough to add a DC with only one zero-token node without changing the voter set. Zero-token nodes could also be used as load balancers in the Alternator.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	ba016c9af7	system_keyspace: load_topology_state: remove assertion impossible to hit We store tokens in a non-frozen set, which doesn't distinguish an empty set from no value. Hence, hitting this assertion is impossible.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	ed55261650	treewide: distinguish all nodes from all token owners In one of the following patches, we introduce support for zero-token nodes. From that point, getting all nodes and getting all token owners isn't equivalent. In this patch, we ensure that we consider only token owners when we want to consider only token owners (for example, in the replication logic), and we consider all nodes when we want to consider all nodes (for example, in the topology logic). The main purpose of this patch is to make the PR introducing zero-token nodes easier to review. The patch that introduces zero-token nodes is already complicated. We don't want trivial changes from this patch to make noise there. This patch introduces changes needed for zero-token nodes only in the Raft-based topology and in the recovery mode. Zero-token nodes are unsupported in the gossip-based topology outside recovery. Some functions added to `token_metadata` and `topology` are inefficient because they compute a new data structure in every call. They are never called in the hot path, so it's not a serious problem. Nevertheless, we should improve it somehow. Note that it's not obvious how to do it because we don't want to make `token_metadata` store topology-related data. Similarly, we don't want to make `topology` store token-related data. We can think of an improvement in a follow-up. We don't remove unused `topology::get_datacenter_rack_nodes` and `topology::get_datacenter_nodes`. These function can be useful in the future. Also, `topology::_dc_nodes` is used internally in `topology`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	2d9575d6a9	gossip topology: make a replacing node remove the replaced node from topology In the following patch, we change the gossiper to work the same for zero-token nodes and token-owning nodes. We replace occurrences of `is_normal_token_owner` with topology-based conditions. We want to rely on the invariant that token-owning nodes own tokens if and only if they are in the normal or leaving state. However, this invariant is broken by a replacing node because it does not remove the replaced node from topology. Hence, after joining, the replacing node has topology with a node that is not a token owner anymore but is in a leaving state (`being_replaced`). We fix it to prevent the following patch from introducing a regression.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	c7016dedb3	locator: topology: add_or_update_endpoint: use none as the default node state In one of the following patches, we change the gossiper to work the same for zero-token nodes and token-owning nodes. We replace occurrences of `is_normal_token_owner` with topology-based conditions. We want to rely on the invariant that token-owning nodes own tokens if and only if they are in the normal or leaving state. However, this invariant can be broken in the gossip-based topology when a new node joins the cluster. When a boostrapping node starts gossiping, other nodes add it to their topology in `storage_service::on_alive`. Surprisingly, the state of the new node is set to `normal`, as it's the default value used by `add_or_update_endpoint`. Later, the state will be set to `bootstrapping` or `replacing`, and finally it will be set again to `normal` when the join operation finishes. We fix this strange behavior by setting the node state to `none` in `storage_service::on_alive` for nodes not present in the topology. Note that we must add such nodes to the topology. Other code needs their Host ID, IP, and location. We change the default node state from `normal` to `none` in `add_or_update_endpoint` to prevent bugs like the one in `storage_service::on_alive`. Also, we ensure that nodes in the `none` state are ignored in the getters of `locator::topology`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	6adaf85634	test: boost: tablets tests: ensure all nodes are normal token owners In one of the following patches, we make NetworkTopologyStrategy and the tablet load balancer consider only normal token owners to ensure they ignore zero-token nodes. Some unit tests would start failing after this change because they do not ensure that all nodes are normal token owners. This patch prevents it. Judging by the logic in the test cases in `network_topology_strategy_test`, `point++` was probably intended anyway.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	366605224c	token_metadata: rename get_all_endpoints and get_all_ips In one of the following patches, we introduce support for zero-token nodes. A zero-token node that has successfully joined the cluster is in the normal state but is not a normal token owner. Hence, the names of `get_all_endpoints` and `get_all_ips` become misleading. They should specify that the functions return only IDs/IPs of token owners.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	293a66fe41	network_topology_strategy: reallocate_tablets: remove unused dc_rack_nodes	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	4ff08decb8	virtual_tables: cluster_status_table: execute: set dc regardless of the token ownership If a node is in `locator::topology`, then it has a location. We remove the token ownership condition to make the table more descriptive.	2024-08-29 10:37:06 +02:00
Kefu Chai	ecfe0aace6	perf: perf_mutation_readers: break memtable class down before this change, memtable serves as the fixture for 6 test cases, actually these 6 test cases can be categorized into a matrix of 3 x 2: { single_row, multi_row, large_partition } x { single_partition, multi_paritition }. in this change, we break memtable into 3 different fixtures, to reflect this fact. more readable this way. and a benefit is that each test does not have to pay for the overhead of setup it does not use at all. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20177	2024-08-29 08:54:17 +03:00
Botond Dénes	e538e3593c	Merge 'build: add --no-use-cmake option to configure.py' from Kefu Chai as part of the efforts to address scylladb/scylladb#2717, we are switching over to the CMake-based building system, and fade out the mechinary to create the rules manually in `configure.py`. in this change, we add `--no-use-cmake` to `configure.py`, it serves two purposes: * prepare for the change which enables cmake by default, by then, we would set the default value of `use_cmake` to True, and allow user to keep using the existing mechinary in the transition period using `--no-use-cmake`. * allows the CI to tell if a tree is able to build with CMake. the command line option of `--use-cmake` is also used by the CI workflows, and is passed to `configure.py` if `BUILD_WITH_CMAKE` jenkins pipeline parameter is set. but not all branches with `--use-cmake` are ready to build with CMake -- only the latest master HEAD is ready. so the CI needs to check the capability of building with CMake by looking at the output of `configure.py --help`, to see if it includes `--no-use-cmake`. after this change lands. we will remove the `BUILD_WITH_CMAKE` parameter, and use cmake as long as `configure.py` supports `--no-use-cmake` option. the existing mechinary will stay with us for a short transition period so that developers can take time to get used to the usage of the naming of targets and the new directory arrangement. as a side effect, #20079 will be fixed after switching to CMake. --- this is a cmake-related change, hence no need to backport. Closes scylladb/scylladb#20261 * github.com:scylladb/scylladb: build: add --no-use-cmake option to configure.py build: let configure.py fail if unknown option is passed to it	2024-08-29 08:51:41 +03:00
Kefu Chai	a182bfd96a	tools/read_mutation: reuse parse_table_directory_name() less repeatings this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20315	2024-08-29 08:49:20 +03:00
Nadav Har'El	6391550bbc	test/alternator: add another check to test_stream_list_tables The test test_streams.py::test_stream_list_tables reproduces a bug where enabling streams added a spurious result to ListTables. A reviewer of that patch asked to also add a check that name of the table itself doesn't disappear from ListTables when a stream is enabled, so this is what this patch adds. This theoretical scenario (a table's name disappearing from ListTables) never happened, so the new check doesn't reproduce any known bug, but I guess it never hurts to make the test stronger for regression testing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19934	2024-08-29 08:45:22 +03:00
Nadav Har'El	61e5927e8e	repair: fix build on older compilers The code tries to build as "neighbors" an unordered_map from an iterator of std::tuple, instead of the correct std::pair. Apparently, the tuples are transparently converted to pairs on the newest compilers and the whole works, but on slightly older compilers (like the one on Fedora 39) Scylla no longer compiles - the compiler complains it can't convert a tuple to a pair in this context. So fix the code to use pairs, not tuples, and it fixes the build on Fedora 39. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#20319	2024-08-28 19:56:03 +03:00
Laszlo Ersek	49bff3b1ab	generic_server: make server::stop() idempotent After server::shutdown(), make server::stop() more robust too, by allowing callers (internal or external) to call it several times (not concurrently though, just yet; see <https://github.com/scylladb/scylladb/issues/20309>). Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 15:54:31 +02:00
Kefu Chai	03ab80501f	tools/scylla-nodetool: add restore integration as we have an API for restore a keyspace / table, let's expose this feature with nodetool. so we can exercise it without the help of scylla-manager or 3rd-party tools with a user-friendly interface. in this change: * add a new subcommand named "restore" to nodetool * add test to verify its interaction with the API server * update the document accordingly. * the bash completion script is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	41b9eda398	test/object_store: Add simple restore test The test shows how to restore previously backed up table: - backup - truncate to get rid of existing sstables - start restore with the new API method - wait for the task to finish Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	f5a22a94c6	test/object_store: Generalize prepare_snapshot_for_backup() Give it snapshot-name argument. Next test will want custom snapshot name. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	11a04bfb66	code: Introduce restore API method The method starts a task that uses sstables_loader load-and-stream functionality to bring new sstables into the cluster. The existing load-and-stream picks up sstables from upload/ directory, the newly introduced task collects them from S3 bucket and given prefix (that correspond to the path where backup API method put them). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Sergey Zolotukhin	65f37f3ba6	Ignore seed name resolution errors on restart. Gossiper seeds host name resolution failures are ignored during restart if a node is already boostrapped (i.e. it has successfully joined the cluster). Fixes scylladb/scylladb#14945	2024-08-28 14:01:04 +02:00
Patryk Jędrzejczak	08cb3a5e2c	test: test_raft_recovery_basic: add raft=trace logs It could help when we hit scylladb/scylladb#17918 again. This PR only changes log levels in a test, no need to backport it. Refs scylladb/scylladb#17918 Closes scylladb/scylladb#20318	2024-08-28 13:50:09 +02:00
Sergey Zolotukhin	fc5e683d02	Add a test for starting with a wrong seed. The test checks a bootstrapped node start with a wrong host name in the seeds config. Test for scylladb/scylladb#14945	2024-08-28 11:34:37 +02:00
Laszlo Ersek	1138347e7e	generic_server: coroutinize server::shutdown() By turning server::shutdown() into a coroutine, we need not dynamically allocate "nr_conn". Verified as follows: (1) In terminal #1: build/Dev/scylla --overprovisioned --developer-mode=yes \ --memory=2G --smp=1 --default-log-level error \ --logger-log-level cql_server=debug:cql_server_controller=debug > INFO [...] cql_server_controller - Starting listening for CQL clients > on 127.0.0.1:9042 (unencrypted, > non-shard-aware) > INFO [...] cql_server_controller - Starting listening for CQL clients > on 127.0.0.1:19042 (unencrypted, > shard-aware) (2) In terminals #2 and #3: tools/cqlsh/bin/cqlsh.py (3) Press ^C in terminal #1: > DEBUG [...] cql_server - abort accept nr_total=2 > DEBUG [...] cql_server - abort accept 1 out of 2 done > DEBUG [...] cql_server - abort accept 2 out of 2 done > DEBUG [...] cql_server - shutdown connection nr_total=4 > DEBUG [...] cql_server - shutdown connection 1 out of 4 done > DEBUG [...] cql_server - shutdown connection 2 out of 4 done > DEBUG [...] cql_server - shutdown connection 3 out of 4 done > DEBUG [...] cql_server - shutdown connection 4 out of 4 done > INFO [...] cql_server_controller - CQL server stopped This patch is best viewed with "git show --word-diff=color". Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	2216275ebd	generic_server: make server::shutdown() idempotent Make server::shutdown() more robust by allowing callers (internal or external) to call it several times (not concurrently though, just yet; see <https://github.com/scylladb/scylladb/issues/20309>). Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	dbc0ca6354	test/generic_server: add test case Check whether we can stop a generic server without first asking it to listen. The test fails currently; the failure mode is a hang, which triggers the 5 minute timeout set in the test: > unknown location(0): fatal error: in "stop_without_listening": > seastar::timed_out_error: timedout > seastar/src/testing/seastar_test.cc(43): last checkpoint > test/boost/generic_server_test.cc(34): Leaving test case > "stop_without_listening"; testing time: 300097447us Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	931f2f8d73	configure, cmake: sort the lists of boost unit tests Both lists were obviously meant to be sorted originally, but by today we've introduced many instances of disorder -- thus, inserting a new test in the proper place leaves the developer scratching their head. Sort both lists. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Laszlo Ersek	5a04743663	generic_server: convert connection tracking to seastar::gate If we call server::stop() right after "server" construction, it hangs: With the server never listening (never accepting connections and never serving connections), nothing ever calls server::maybe_stop(). Consequently, co_await _all_connections_stopped.get_future(); at the end of server::stop() deadlocks. Such a server::stop() call does occur in controller::do_start_server() [transport/controller.cc], when - cserver->start() (sharded<cql_server>::start()) constructs a "server"-derived object, - start_listening_on_tcp_sockets() throws an exception before reaching listen_on_all_shards() (for example because it fails to set up client encryption -- certificate file is inaccessible etc.), - the "deferred_action" cserver->stop().get(); is invoked during cleanup. (The cserver->stop() call exposing the connection tracking problem dates back to commit `ae4d5a60ca` ("transport::controller: Shut down distributed object on startup exception", 2020-11-25), and it's been triggerable through the above code path since commit `6b178f9a4a` ("transport/controller: split configuring sockets into separate functions", 2024-02-05).) Tracking live connections and connection acceptances seems like a good fit for "seastar::gate", so rewrite the tracking with that. "seastar::gate" can be closed (and the returned future can be waited for) without anyone ever having entered the gate. NOTE: this change makes it quite clear that neither server::stop() nor server::shutdown() must be called multiple times. The permitted sequences are: - server::shutdown() + server::stop() - or just server::stop(). Fixes #10305 Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-28 10:59:44 +02:00
Kefu Chai	6d8dca1e20	build: add --no-use-cmake option to configure.py as part of the efforts to address scylladb/scylladb#2717, we are switching over to the CMake-based building system, and fade out the mechinary to create the rules manually in `configure.py`. in this change, we add `--no-use-cmake` to `configure.py`, it serves two purposes: * prepare for the change which enables cmake by default, by then, we would set the default value of `use_cmake` to True, and allow user to keep using the existing mechinary in the transition period using `--no-use-cmake`. * allows the CI to tell if a tree is able to build with CMake. the command line option of `--use-cmake` is also used by the CI workflows, and is passed to `configure.py` if `BUILD_WITH_CMAKE` jenkins pipeline parameter is set. but not all branches with `--use-cmake` are ready to build with CMake -- only the latest master HEAD is ready. so the CI needs to check the capability of building with CMake by looking at the output of `configure.py --help`, to see if it includes --no-use-cmake`. after this change lands. we will remove the `BUILD_WITH_CMAKE` parameter, and use cmake as long as `configure.py` supports `--no-use-cmake` option. the existing mechinary will stay with us for a short transition period so that developers can take time to get used to the usage of the naming of targets and the new directory arrangement. as a side effect, #20079 will be fixed after switching to CMake. Refs scylladb/scylladb#2717 Refs scylladb/scylladb#20079 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-28 11:37:56 +08:00
Kefu Chai	a2de14be7f	build: let configure.py fail if unknown option is passed to it this allows us to use `configure.py` to tell if a certain argument is supported without parsing its output. in the next commit, we will add `--no-use-cmake` option, which will be used to tell if the tree is ready for using CMake for its building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-28 11:37:55 +08:00
Kefu Chai	e4b213f041	build: cmake: use the same options to configure seastar in `configure.py`, a set of options are specified when configuring seastar, but not all of them were ported to scylla's CMake building system. for instance, `configure.py` explicitly disables io_uring reactor backend at build time, but the CMake-based system does not. so, in this change, in order to preserve the existing behavior, let's port the two previously missing option to CMake-based building system as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20288	2024-08-28 06:15:59 +03:00
Avi Kivity	94d5507237	Merge 'select from mutation_fragments() + tablets: handle reads for non-owned partitions' from Botond Dénes Attempting to read a partition via `SELECT * FROM MUTATION_FRAGMENTS()`, which the node doesn't own, from a table using tablets causes a crash. This is because when using tablets, the replica side simply doesn't handle requests for un-owned tokens and this triggers a crash. We should probably improve how this is handled (an exception is better than a crash), but this is outside the scope of this PR. This PR fixes this and also adds a reproducer test. Fixes: https://github.com/scylladb/scylladb/issues/18786 Fixes a regression introduced in 6.0, so needs backport to 6.0 and 6.1 Closes scylladb/scylladb#20109 * github.com:scylladb/scylladb: test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works replica/mutation_dump: enfore pinning of effective replication map replica/mutation_dump: handle un-owned tokens (with tablets)	2024-08-27 20:46:10 +03:00
Avi Kivity	b13ab90448	Merge 'alternator/executor: Use native reversed format' from Łukasz Paszkowski When executing reversed queries, a native revered format shall be used. Therefore, the table schema and the clustering key bounds are reversed before a partition slice and a read command are constructed. It is, however, possible to run a reversed query passing a table schema but only when there are no restrictions on the clustering keys. In this particular situation, the query returns correct results. Since the current alternator tests in test.py do not imply any restrictions, this situation was not caught during development of https://github.com/scylladb/scylladb/pull/18864. Hence, additional tests are provided that add clustering keys restrictions when executing reversed queries to capture such errors earlier than in dtests. Additional manual tests were performed to test a mixed-node cluster (with alternator API enabled in Scylla on each node): 1. 2-node cluster with one node upgraded: reverse read queries performed on an old node 2. 2-node cluster with one node upgraded: reverse read queries performed on a new node 3. 2-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node 4. 2-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node All reverse read queries above consists of: - single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on The exact same tests were also performed on a fully upgraded cluster. Fixes https://github.com/scylladb/scylladb/issues/20191 No backport is required as this is a complementary patch for the series https://github.com/scylladb/scylladb/pull/18864 that did not require backporting. Closes scylladb/scylladb#20205 * github.com:scylladb/scylladb: test_query.py: Test reverse queries with clustering key bounds alternator::do_query Add additional trace log alternator::do_query: Use native reversed format alternator::do_query Rename schema with table_schema	2024-08-27 20:40:49 +03:00
Benny Halevy	18c45f7502	raft_rebuild: propagate source_dc force option to rebuild_option Currently, the `force` property of the `source_dc` rebuild option is lost and `raft_topology_cmd_handler` has no way to know if it was given or not. This in turn can cause rebuild to fail, even when `--force` is set by the user, where it would succeed with gossip topology changes, based on the source_dc --force semantics. Fixes scylladb/scylladb#20242 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20249	2024-08-27 17:05:48 +02:00
Kefu Chai	d27fdf9f57	Update seastar submodule * seastar a7d81328...83e6cdfd (29): > fair_queue: Export the number of times class was activated > tests/unit: drop support of C++17 > remove vestigial OSv support > cmake: undefine _FORTIFY_SOURCE on thread.cc > container_perf: a benchmark for container perf > io_sink: use chunked_fifo as _pending_io container > chunked_fifo: implement clear in terms of pop_n > chunked_fifo: pop_front_n > io_sink: use iteration instead of indexing > json2code_test: choose less popular port number > ioinfo: add '--max-reqsize' parameter > treewide: drop the support of fmtlib < 8.0.0 > build: bump up the required fmtlib version to 8.1.1 > conditional-variable: align when() and wait() behaviour in case of a predicate throwing an exception > stall-analyser: add output support for flamegraph > reactor: Add --io-completion-notify-ms option > io_queue: Stall detector > io_queue: Keep local variable with request execution delay > io_queue: Rename flow ratio timer to be more generic > reactor: Export _polls counter (internally) > dns: de-inline dns_resolver::impl methods > dns: enter seastar::net namespace > dnf: drop compatibility for c-ares <= 1.16 > reactor: add missing includes of noncopyable_function.hh > reactor: Reset one-shot signal to DFL before handling > future: correctly document nested exception type emitted by finally() > modules: fix FATAL_ERROR on compiler check > seastar.cc: include fmt/ranges.h > pack io_request Closes scylladb/scylladb#20300	2024-08-27 17:51:21 +03:00
Avi Kivity	2f4ef31254	Merge 'tools/testing: update dist-check to use rockylinux and adapt to cmake' from Kefu Chai `dist-check` tests the generated rpm packages by installing them in a centos 7 container. but this script is terribly outdated - centos 7 is deprecated. we should use a new distro's latest stable release. - cqlsh was added to the family of rpms a while ago. we should test it as well. - the directory hierarchy has been changed. we should read the artifacts from the new directories. - cmake uses a different directory hierarchy. we should check the directory used by cmake as well. to address these breaking changes, the scripts are updated accordingly. --- this change gives an overhaul to a test, which is not used in production. so no need to backport. Closes scylladb/scylladb#20267 * github.com:scylladb/scylladb: tools/testing: add cqlsh rpm tools/testing: adapt to cmake build directory tools/testing: test with rockylinux:9 not centos:7 tools/testing: correct the paths to rpm packages and SCYLLA-*-FILE dist-check: add :z option when mapping volume	2024-08-27 16:16:34 +03:00
Pavel Emelyanov	1f3f0b1926	sstable_loader: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The sstables loader is going to spawn tasks that will talk to to those storages, thus it needs the storage manager to get the clients clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	06c3c53deb	sstable_loader: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	9cf95e8a07	sstable_loader: Out-line constructor It will grow and become more complicated. Better to have it outside the header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	6a006d2255	distributed_loader: Split get_sstables_from_upload_dir() Next patches will need this method to initialize sstable_directory differently and then do its regular processing. For that, split the method into two, next patch will re-use the common part it needs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	630ab1dbea	sstables/storage: Compose uploaded sstable path simpler Current S3 storage driver keeps sstables in bucket in a form of /bucket/generation/component-name To get sstables that are backed up on S3 this format doesn't apply, because components are uploaded with their names unmodified. This patch makes S3 storage driver account for that and not re-format component paths for upload sstable state. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	2eda917375	sstable_directory: Prepare FS lister to scan files on S3 When component lister is created it checks the target storage options for what kind of lister to create. For local options it creates FS lister that collects sstables from their component files. For S3 options, it relies on sstables registry. When collecting sstables from backup, it's not possible to use registry, because those entries are not there. Instead, lister should pick up individual components as it they were on local FS. This patch prepares the lister for that -- in case S3 options are provided and the sstables' state is "upload", don't try to read those from registry, but instantiate the FS lister that will later use s3::bucket_lister. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	60d43911a9	sstable_directory: Parse sstable component without full path When sstable directory collects a entry from storage, it tries to parse its full path with the help of sstables::parse_path(). There are two overloads of that function -- one with ks:cf arguments and one without. The latter tries to "guess" keyspace and table names from the directory name. However, ks and table names are already known by the directory, it doesn't even use the returned ks and cf values, so this parsing is excessive. Also, future patches will put here backup paths, that might not match the ks_name/table_name-table_uuid/ pattern that the parser expects. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	86bc5b11fe	s3-client: Add support for lister::filter Directory lister comes with a filter function that tells lister which entries to skip by its .get() method. For uniformity, add the same to S3 bucket_lister. After this change the lister reports shorter name in the returned directory entry (with the prefix cut), so also need to tune up the unit test respectively. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:40 +03:00
Pavel Emelyanov	113d2449f8	utils: Introduce abstract (directory) lister This patch hides directory_lister and bucket_lister behind a common facade. The intention is to provide a uniform API for sstable_directory that it could use to list sstables' components wherever they are. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:40 +03:00
Piotr Dulikowski	da5f4faac1	Merge 'mv: reject user requests by coordinator when a replica is overloaded by MVs' from Wojciech Mitros Currently, when a view update backlog of one replica is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded. This patch does not remove the replica write failures, because we still may reach a full backlog when more view updates are generated after the coordinator check is performed and before the write reaches the replica. Fixes scylladb/scylladb#17426 Closes scylladb/scylladb#18334 * github.com:scylladb/scylladb: mv: test the view update behavior mv: add test for admission control storage_proxy: return overloaded_exception instead of throwing mv: reject user requests by coordinator when a replica is overloaded by MVs	2024-08-27 12:50:34 +02:00
Aleksandra Martyniuk	f38bb6483a	test: add test to ensure repair won't fail with uninitialized bm	2024-08-27 11:37:50 +02:00
Aleksandra Martyniuk	d8e4393418	repair: throw if batchlog manager isn't initialized repair_service::repair_flush_hints_batchlog_handler may access batchlog manager while it is uninitialized. Batchlog manager cannot be initialized before repair as we have the dependencies chain: repair_service -> storage_service::join_cluster -> batchlog_manager. Throw if batchlog manager isn't initialized. That won't cause repair to fail.	2024-08-27 11:22:28 +02:00
Botond Dénes	5c0f6d4613	Merge 'Make Summary support histogram with infinite bucket vlaues' from Amnon Heiman This series fixes an issue where histogram Summaries return an infinite value. It updated the quantile calculation logic to address cases where values fall into the infinite bucket of a histogram. Now, instead of returning infinite (max int), the calculation will return the last bucket limit, ensuring finite outputs in all cases. The series adds a test for summaries with a specific test case for this scenario. Fixes #20255 Need backport to 6.0, 6.1 and 2023.1 and above Closes scylladb/scylladb#20257 * github.com:scylladb/scylladb: test/estimated_histogram_test Add summary tests utils/histogram.hh: Make summary support inifinite bucket.	2024-08-27 10:33:54 +03:00
Kefu Chai	ae7ce38721	build: print out the default value of options instead of using the default `argparse.HelpFormatter`, let's use `ArgumentDefaultsHelpFormatter`, so that the default values of options are displayed in the help messages. this should help developer understand the behavior of the script better. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20262	2024-08-27 10:04:31 +03:00
Kefu Chai	e2747e4bb5	build: cmake: add dist-check target to achieve feature parity with our existing building system, we need to implement a new build target "dist-check" in the CMake-based building system. in this change, "dist-check" is added to CMake-based building system. unlike the rules generated by `configure.py`, the `dist-check` target in CMake depends on the dist-*-rpm targets. the goal is to enable user to test `dist-check` without explicitly building the artifacts being tested. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20266	2024-08-27 10:03:41 +03:00
Kefu Chai	ea612e7065	docs: install poetry>=1.8.0 in `57def6f1`, we specified "package-mode" for poetry, but this option was introduced in poetry 1.8.0, as the "non-package" mode support. see https://github.com/python-poetry/poetry/releases/tag/1.8.0 this change practically bumps up the minimum required poetry version to 1.8.0, we did update `pyproject.tombl` to reflect this change. but wefailed to update the `Makefile`. in this change, we update `Makefile` to ensure that user which happens have an older version of poetry can install the version which supports this version when running `make setupenv`. Refs scylladb/scylladb#20284 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20286	2024-08-27 09:20:09 +03:00
Yaniv Michael Kaul	022eb25d98	tools/toolchain/README.md: fix wording Forgot to add that 'reg' tool is also needed. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#20287	2024-08-27 09:18:23 +03:00
Kefu Chai	5cffb23aa3	scylla-gdb.py: use chunked_fifo to represent _sink._pending_io we switched from `circular_buffer` to `chunked_fifo` to present `io_sink::_pending_io` in the latest seastar now. to be prepared for this change, let's * add `chunked_fifo` class in `scylla-gdb.py`. * use `circular_buffer` as a fallback of `chunked_fifo`. instead of doing this the other way around, we try to send the message that the latest seastar uses `chunked_fifo`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20280	2024-08-27 08:44:56 +03:00
Andrei Chekun	fd51332978	test.py: Add parameter to control the pool size from the command line Add parameter --cluster-pool-size that can control pool size for all PythonTestSuite tests. By default, the pool size set to 10 for most of the suites, but this is too much for laptops. So this parameter can be used to lower the pool size and not to freeze the system. Additionally, the environment variable CLUSTER_POOL_SIZE was added for a convenient way to limit pool size in the system without the need to provide each time an additional parameter. Related: https://github.com/scylladb/scylladb/pull/20276 Closes scylladb/scylladb#20289	2024-08-26 19:55:41 +03:00
Avi Kivity	0acfa4a00d	Merge 'abstract_replication_strategy: make get_ranges async' from Benny Halevy To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Closes scylladb/scylladb#19758 * github.com:scylladb/scylladb: abstract_replication_strategy: make get_ranges async database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param compaction: task_manager_module: open code maybe_get_keyspace_local_ranges alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder alternator: ttl: can pass const gms::gossiper& to ranges_holder alternator: ttl: ranges_holder_primary: unconstify _token_ranges member alternator: ttl: refactor token_ranges_owned_by_this_shard	2024-08-26 16:56:18 +03:00
Botond Dénes	6d633e89ef	Merge 'update CODEOWNERS' from Piotr Smaron Removed people that no longer contribute to the scylladb.git and added/substituted reviewers responsible for maintaining the frontend components. No need to backport, this is just an information for the github tool. Closes scylladb/scylladb#20136 * github.com:scylladb/scylladb: codeowners: add appropriate reviewers to the cluster components codeowners: add appropriate reviewers to the frontend components codeowners: fix codeowner names codeowners: remove non contributors	2024-08-26 16:44:39 +03:00
Botond Dénes	4505b14fd6	Merge 'table_helper: complete coroutinization' from Avi Kivity table_helper has some quite awkward code, improve it a little. Code cleanup, so no reason to backport. Closes scylladb/scylladb#20194 * github.com:scylladb/scylladb: table_helper: insert(): improve indentation table_helper: coroutinize insert() table_helper: coroutinize cache_table_info() table_helper: extract try_prepare()	2024-08-26 13:43:17 +03:00
Botond Dénes	b2c07c9b6f	Merge 'compaction: change compaction stop reason ' from Aleksandra Martyniuk Currently "table removal" is logged as a reason of compaction stop for table drop, tablet cleanup and tablet split. Modify log to reflect the reason. Closes scylladb/scylladb#20042 * github.com:scylladb/scylladb: test: add test to check compaction stop log compaction: fix compaction group stop reason	2024-08-26 13:40:07 +03:00
Kefu Chai	4d516a8363	tools/testing: add cqlsh rpm we need to test the installation of cqlsh rpm. also, we should use the correct paths of the generated rpm packages. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:33:57 +08:00
Kefu Chai	baee15390e	tools/testing: adapt to cmake build directory cmake uses a different arrangement, so let's check for the existence of the build directory and fallback to cmake's build directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:33:57 +08:00
Kefu Chai	b802c000e1	tools/testing: test with rockylinux:9 not centos:7 the centos image repos on docker has been deprecated, and the repo for centos7 has been removed from the main CentOS servers. so we are either not able to install packages from its default repo, without using the vault mirror, or no longer to pull its image from dockerhub. so, in this change * we switch over to rockylinux:9, which is the latest stable release of rockylinux, and rockylinux is a popular clone of RHEL, so it matches our expectation of a typical use case of scylla. * use dnf to manage the packages. as dnf is the standard way to manage rpm packages in modern RPM-based distributions. * do not install deltarpm. delta rpms are was not supported since RHEL8, and the `deltarpm` package is not longer available ever since. see https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html-single/considerations_in_adopting_rhel_8/index#ref_the-deltarpm-functionality-is-no-longer-supported_notable-changes-to-the-yum-stack as a sequence, this package does not exist in Rockylinux-9. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:33:53 +08:00
Kefu Chai	00dad27f67	tools/testing: correct the paths to rpm packages and SCYLLA-*-FILE when building with the rules generated from `configure.py`, these files are located under tools' own build directory. so correct them. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:19:24 +08:00
Kefu Chai	86ef63df92	dist-check: add :z option when mapping volume if SELinux is enabled on the host, we'd have following failure when running `dist-check.sh`: ``` + podman run -i --rm -v /home/kefu/dev/scylladb:/home/kefu/dev/scylladb docker.io/centos:7 /bin/bash -c 'cd /home/kefu/dev/scylladb && /home/kefu/dev/scylladb/tools/testing/dist-check/docker.io/centos-7.sh --mode debug' /bin/bash: line 0: cd: /home/kefu/dev/scylladb: Permission denied ``` to address the permission issue, we need to instruct podman to relabel the shared volume, so that the container can access the shared volume. see also https://docs.podman.io/en/stable/markdown/podman-pod-create.1.html#volume-v-source-volume-host-dir-container-dir-options Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-26 11:15:40 +08:00
Kefu Chai	8ef26a9c8c	build: cmake: add "test" target before this change, none of the target generated by CMake-based building system runs `test.py`. but `build.ninja` generated directly by `configure.py` provides a target named `test`, which runs the `test.py` with the options passed to `configure.py`. to be more compatible with the rules generated by `configure.py`, in this change * do not include "CTest" module, as we are not using CTest for driving tests. we use the homebrew `test.py` for this purpose. more importantly, the target named "test" is provided by "CTest". so in order to add our own "test" target, we cannot use "CTest" module. * add a target named "test" to run "test.py". * add two CMake options so we can customize the behavior of "test.py", this is to be compatible with the existing behavior of `configure.py`. Refs scylladb/scylladb#2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20263	2024-08-25 21:45:13 +03:00
Avi Kivity	72a85e3812	Merge 'Integrated backup' from Pavel Emelyanov This adds minimal implementation of the start-backup API call. The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. Arguments are: - endpoint -- the ID in object_store.yaml config file - bucket -- the target bucket to put objects into - keyspace -- the keyspace to work on - snapshot -- the method assumes that the snapshot had been already taken and only copies sstables from it The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API). Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path. refs: #18391 Closes scylladb/scylladb#19890 * github.com:scylladb/scylladb: tools/scylla-nodetool: add backup integration docs: Document the new backup method test/object_store: Test that backup task is abortable test/object_store: Add simple backup test test/object_store: Move format_tuples() test/pylib: Add more methods to rest client backup-task: Make it abortable (almost) code: Introduce backup API method database: Export parse_table_directory_name() helper database: Introduce format_table_directory_name() helper snapshot-ctl: Add config to snapshot_ctl snapshot-ctl: Add sstables::storage_manager dependency snapshot-ctl: Maintain task manager module snapshot-ctl: Add "snapshots" logger snapshot-ctl: Outline stop() method and constructor snapshot-ctl: Inline run_snapshot_list<> test/cql_test_env: Export task manager from cql test env task_manager: Print task ttl on start (for debugging) docs: Update object_storage.md with AWS_ environment docs: Restructure object_storage.md	2024-08-25 20:19:10 +03:00
Kefu Chai	f8931a4578	build: cmake: add "dist" target since the rules generated by `configure.py` has this target, we need to have an equivalent target as well in CMake-based buidling system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20265	2024-08-25 20:18:12 +03:00
Andrei Chekun	f54b7f5427	test.py: Increase pool size Increase pool size changes were recently reverted because of the flakiness for the test_gossip_boot test. Test started to fail on adding the node to the cluster without any issues in the Scylla log file. In test logs it looked like the installation process for the new node just hanged. After investigating the problem, I've found out that the issue is that test.py was draining the io_executor pool for cleaning the directory during install that was set to eight workers. So to fix the issue, io_executor pool should be increased to more or less the same ratio as it was: doubled cluster pool size. Closes scylladb/scylladb#20276	2024-08-25 19:59:18 +03:00
Kefu Chai	a0688b29ea	replication_strategy: add fmt::formatter<replication_strategy_type> so that we can use {fmt} with it without the help of fmt::streamed. also since we have a proper formatter for replication_strategy_type, let's implement `formatter<vnode_effective_replication_map::factory_key>` as well. since there are no callers of these two operator<<, let's drop them in this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20248	2024-08-25 19:34:52 +03:00
Kefu Chai	c88b63ce13	github: use clang-20 in clang-nightly workflow since clang 19 has been branched. let's track the development brach, which is clang 20. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20279	2024-08-25 19:31:43 +03:00
Benny Halevy	686a8f2939	abstract_replication_strategy: make get_ranges async To prevent stalls due to large number of tokens. For example, large cluster with say 70 nodes can have more than 16K tokens. Fixes #19757 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:57:34 +03:00
Benny Halevy	2bbbe2a8bc	database: get_keyspace_local_ranges: get vnode_effective_replication_map_ptr param Prepare for making the function async. Then, it will need to hold on to the erm while getting the token_ranges asynchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:55:33 +03:00
Benny Halevy	ea5a0cca10	compaction: task_manager_module: open code maybe_get_keyspace_local_ranges It is used only here and can be simplified by checking if the keyspace replication strategy is per table by the caller. Prepare for making get_keyspace_local_ranges async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	824bdf99d2	alternator: ttl: token_ranges_owned_by_this_shard: let caller make the ranges_holder Add static `make` methods to ranges_holder_{primary,secondary} and use them to make the ranges objects and pass them to `token_ranges_owned_by_this_shard`, rather than letting token_ranges_owned_by_this_shard invoke the right constructor of the ranges_holder class. Prepare for making `make` async. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	b2abbae24b	alternator: ttl: can pass const gms::gossiper& to ranges_holder There's no need to pass a mutable reference to the gossiper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	333c0d7c88	alternator: ttl: ranges_holder_primary: unconstify _token_ranges member To allow the class to be nothrow_move_constructable. Prepare for returning it as a future value. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Benny Halevy	d385219a12	alternator: ttl: refactor token_ranges_owned_by_this_shard Rather than holding a variant member (and defining both ranges_holder_{primary,secondary} in both specilizations of the class, just make the internal ranges_holder class first-class citizens and parameterize the `token_ranges_owned_by_this_shard` template by this class type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-25 10:25:32 +03:00
Avi Kivity	c4dd21de38	repair: row_level: coroutinize repair_reader::close()	2024-08-24 00:36:48 +03:00
Avi Kivity	b1dd470533	repair: row_level: coroutinize repair_reader::end_of_stream()	2024-08-24 00:35:59 +03:00
Avi Kivity	7ce76fd0ea	repair: row_level: coroutinize sink_source_for_repair::close() The repeat() loop translates to almost nothing.	2024-08-24 00:30:02 +03:00
Avi Kivity	168a018e45	repair: row_level: coroutinize sink_source_for_repair::get_sink_source()	2024-08-24 00:19:12 +03:00
Avi Kivity	6b370d8154	table_helper: insert(): improve indentation Restore after coroutinization.	2024-08-24 00:08:05 +03:00
Avi Kivity	ecd7702007	table_helper: coroutinize insert() Improves readability. The do_with() ensures it's at least as performant (though it's not in any fast path).	2024-08-24 00:08:05 +03:00
Avi Kivity	980ec2f925	table_helper: coroutinize cache_table_info() After we extracted try_prepare(), this is fairly simple, and improves readability.	2024-08-24 00:08:05 +03:00
Avi Kivity	4e44a15d4d	table_helper: extract try_prepare() table_helper::cache_table_info() is fairly convoluted. It cannot be easily coroutinized since it invokes asynchronous functions in a catch block, which isn't supported in coroutines. To start to break it down, extract a block try_prepare() from code that is called twice. It's both a simplification and a first step towards coroutinization. The new try_prepare() can return three values: `true` if it succeeded, `false` if it failed and there's the possibility of attempting a fallback, and an exception on error.	2024-08-24 00:08:05 +03:00
Lakshmi Narayanan Sreethar	4823a1e203	test/pylib: fix keyspace_compaction method The `keyspace_compaction` method incorrectly appends the column family parameter to the URL using a regular string, `"?cf={table}"`, instead of an f-string, `f"?cf={table}"`. As a result, the column family name is sent as `{table}` to the server, causing the compaction request to fail. Fix this issue by passing the parameter to the POST request using a dictionary instead of appending it to the URL. Fixes #20264 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#20243	2024-08-23 15:20:10 +03:00
Kefu Chai	4a405b0af9	perf/perf_sstable: enumerate sstables when loading them before this change, we use the default options when creating `test_env`, and the default options enable `use_uuid`. but the modes of `perf-sstables` involving reads assumes that the identifiers are deterministic. so that the previously written sstables using the "write" mode can be read with the modes like "index_read", which just uses `test_env::make_sstable()` in `load_sstables()`, and under the hood, `test_env::make_sstable()` uses `test_env::new_generation()` for retrieving the next identifier of sstable. when using integer-base identifier, this works. as the sstable identifiers are generated from a monotonically increasing integer sequence, where the identifiers are deterministic. but this does not apply anymore when the UUID-based identifiers are used, as the identifiers are generated with a pseudorandom generator of UUID v1. in this change, to avoid relying on the determinism of the integer-based sstable identifier generation, we enumerate sstables by listing the given directory, and parse the path for their identifier. after this change, we are able to support the UUID-based sstable identifier. another option is disable the UUID-based sstable identifier when loading sstables. the upside is that this approach is minimal and straightforward. but the downside is that it encodes the assumption in the algorithm implicitly, and could be confusing -- we create a new generation for loading an existing sstable with this generation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20183	2024-08-23 10:39:24 +03:00
Pavel Emelyanov	d1ac58f088	api: Get compaction througput via compaction manager Now the endpoint hanler gets the value from db::config which is not nice from several perspectives. First, it gets config (ab)using database. Second, it's compaction manager that "knows" its throughput, global config is the initial source of that information. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20173	2024-08-23 10:33:03 +03:00
Pavel Emelyanov	38edbebb10	compaction_manager: Keep flush-all-before-major option on own config Currently the major compaction task impl grabs this (non-updateable) value from db::config. That's not good, all services including compaction manager have their own configs from which they take options. Said that, this patch puts the said option onto compaction_manager::config, makes use of it and configures one from db::config on start (and tests). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20174	2024-08-23 10:31:55 +03:00
Botond Dénes	15fdc3f6cc	Merge 'Add ability to list S3 bucket contents' from Pavel Emelyanov This is prerequisite for "restore from object storage" feature. In order to collect the sstables in bucket one would need to list the bucket contents with the given prefix. The ListObjectsV2 provides a way for it and here's the respective s3::client extension. Closes scylladb/scylladb#20120 * github.com:scylladb/scylladb: test: Add test for s3::client::bucket_lister s3_client: Add bucket lister s3_client: Encode query parameter value for query-string	2024-08-23 10:16:07 +03:00
Kefu Chai	7f65ee3270	dbuild: pass --tty only if --interactive in `947e2814`, we pass `--tty` as long as we are using podman _or_ we are in interactive mode. but if we build the tree using podman using jenkins, we are seeing that ninja is displaying the output as if it's in an interactive mode. and the output includes ASCII escape codes. this is distracting. the reason is that we * are using podman, and * ninja tells if it should displaying with a "smart" terminal by checking istty() and the "TERM" environmental variable. so, in this change, we add --tty only if * we are in the interactive mode. * or stdin is associated with a terminal. this is the use case where user uses dbuild to interactively build scylla Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20196	2024-08-23 09:30:20 +03:00
Kefu Chai	ee19bbed05	test: do not define boost_test_print_type() for types with operator<< in `30e82a81`, we add a contraint to the template parameter of boost_test_print_type() to prevent it from being matched with types which can be formatted with operator<<. but it failed to work. we still have test failure reports like: ``` [Exception] - critical check ['s', 's', 't', '_', 'm', 'r', '.', 'i', 's', '_', 'e', 'n', 'd', '_', 'o', 'f', '_', 's', 't', 'r', 'e', 'a', 'm', '(', ')'] has failed ``` this is not what we expect. the reason is that we passed the template parameters to the `has_left_shift` trait in the wrong order, see https://live.boost.org/doc/libs/1_83_0/libs/type_traits/doc/html/boost_typetraits/reference/has_left_shift.html. we should have passed the lhs of operator<< expression as first parameter, and rhs the second. so, in this change, we correct the type constraint by passing the template parameter in the right order, now the error message looks better, like: ``` test/boost/mutation_query_test.cc(110): error: in "test_partition_query_is_full": check !partition_slice_builder(*s) .with_range({}) .build() .is_full() has failed ``` it turns out boost::transformed_range<> is formattable with operator<<, as it fulfills the constraints of `boost::has_left_shift<ostream, R>`, but when printing it, the compiler fails when it tries to insert the elements in the range to the output stream. so, in order to workaround this issue, we add a specialization for `boost::transformed_range<F, R`. also, to improve the readability, we reimplement the `has_left_shift<>` as a concept, so that it's obvious that we need to put both the output stream as the first parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20233	2024-08-23 09:26:22 +03:00
Amnon Heiman	644e6f0121	test/estimated_histogram_test Add summary tests This patch adds tests for summary calculation. It adds two tests, the first is a basic calculation for P50, P95, P99 by adding 100 elements into 20 buckets. The second test look that if elements are found in the infinite bucket, the result would be the lower limit (33s) and not infinite. Relates to #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-22 23:34:24 +03:00
Amnon Heiman	011aa91a8c	utils/histogram.hh: Make summary support inifinite bucket. This patch handles an edge cases related to The infinite bucket limit. Summaries are the P50, P95, and P99 quantiles. The quantiles are calculated from a histogram; we find the bucket and return its upper limit. In classic histograms, there is a notion of the infinite bucket; anything that does not fall into the last bucket is considered to be infinite; with quantile, it does not make sense. So instead of reporting infinite we'll report the bucket lower limit. Fixes #20255 Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-22 23:34:24 +03:00
Kefu Chai	39dd088374	test: include used headers before this change, clang 20 fails to build the tree, like: ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o -MF test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o.d -o test/boost/CMakeFiles/database_test.dir/Debug/database_test.cc.o -c /home/kefu/dev/scylladb/test/boost/database_test.cc /home/kefu/dev/scylladb/test/boost/database_test.cc:539:29: error: invalid use of incomplete type 'schema_builder' 539 \| return *schema_builder(ks_name, cf_name) \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/schema/schema.hh:115:7: note: forward declaration of 'schema_builder' 115 \| class schema_builder; \| ^ ``` and ``` /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o -MF test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o.d -o test/boost/CMakeFiles/group0_cmd_merge_test.dir/Debug/group0_cmd_merge_test.cc.o -c /home/kefu/dev/scylladb/test/boost/group0_cmd_merge_test.cc /home/kefu/dev/scylladb/test/boost/group0_cmd_merge_test.cc:78:18: error: member access into incomplete type 'db::config' 78 \| cfg.db_config->commitlog_segment_size_in_mb(1); \| ^ /home/kefu/dev/scylladb/data_dictionary/data_dictionary.hh:28:7: note: forward declaration of 'db::config' 28 \| class config; \| ^ 1 error generated. ``` and ``` `FAILED: test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o /home/kefu/.local/bin/clang++ -DBOOST_ALL_DYN_LINK -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TESTING_MAIN -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/build/rust -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o -MF test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o.d -o test/boost/CMakeFiles/repair_test.dir/Debug/repair_test.cc.o -c /home/kefu/dev/scylladb/test/boost/repair_test.cc /home/kefu/dev/scylladb/test/boost/repair_test.cc:149:45: error: use of undeclared identifier 'global_schema_ptr' 149 \| co_await e.db().invoke_on_all([gs = global_schema_ptr(gen.schema())](replica::database& db) -> future<> { \| ^ /home/kefu/dev/scylladb/test/boost/repair_test.cc:150:62: error: use of undeclared identifier 'gs' 150 \| co_await db.add_column_family_and_make_directory(gs.get(), replica::database::is_new_cf::yes); \| ^ 2 errors generated. ``` because we are using incomplete types when their complete definitions are required. so, in this change, we include the headers for their complete definition. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20239	2024-08-22 20:51:38 +03:00
Kefu Chai	969cbb75ce	tools/scylla-nodetool: add backup integration as we have an API for backup a keyspace, let's expose this feature with nodetool. so we can exercise it without the help of scylla-manager or 3rd-party tools with a user-friendly interface. in this change: * add a new subcommand named "backup" to nodetool * add test to verify its interaction with the API server * add two more route to the REST API mock server, as the test is using /task_manager/wait_task/{task_id} API. for the sake of completeness, the route for /task_manager/{part1} is added as well. * update the document accordingly. * the bash completion script is updated accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-22 19:48:06 +03:00
Pavel Emelyanov	245cc852dd	docs: Document the new backup method Add the new /storage_service/backup endpoint to object_storage.md as yet another way to use S3 from Scylla.	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	de87450453	test/object_store: Test that backup task is abortable It starts similarly to simpl backup test, but injects a pause into the task once a single file is scheduled for upload, then aborts the task, waits for it to fail, and check that _not_ all files are uploaded. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	f8d894bc23	test/object_store: Add simple backup test The test shows how to backup a keyspace: - flush - take snapshot - start backup with the new API method - wait for the task to finish Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	47e49e6dec	test/object_store: Move format_tuples() There will soon appear a new .py file in the suite that will want to use this helper too Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	d83d585709	test/pylib: Add more methods to rest client Namely: - POST /storage_service/snapshots to take snapshot on a ks - GET /task_manager/get_task_status/{id} to get status of a running task - GET /task_manager/wait_task/{id} to wait for a task to finish - POST /task_manager/abort_task/{id} to abort a running task Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	ed6e6700ab	backup-task: Make it abortable (almost) Make the impl::is_abortable() return 'yes' and check the impl::_as in the files listing loop. It's not real abort, since files listing loop is expected to be fast and most of the time will be spent in s3::client code reading data from disk and sending them to S3, but client doesn't support aborting its requests. That's some work yet to be done. Also add injection for future testing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	a812f13ddd	code: Introduce backup API method The method starts a task that uploads all files from the given keyspace's snapshot to the requested endpoint/bucket. The task runs in the background, its task_id is returned from the method once it's spawned and it should be used via /task_manager API to track the task execution and completion (hint: it's good to have non-zero TTL value to make sure fast backups don't finish before the caller manages to call wait_task API). If snapshot doesn't exist, nothing happens (FIXME, need to return back an error in that case). If endpoint is not configured locally, the API call resolves with bad-request instantly. Sstables components are scanned for all tables in the keyspace and are uploaded into the /bucket/${cf_name}/${snapshot_name}/ path. Task is not abortable (FIXME -- to be added) and doesn't really report its progress other than running/done state (FIXME -- to be added too). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 19:47:06 +03:00
Pavel Emelyanov	f7b380d53b	database: Export parse_table_directory_name() helper There's parse_table_directory_name() static helper in database.cc code that is used by methods that parse table tree layout for snapshot. Export this helper for external usage and rename to fit the format_... one introduced by previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:48 +03:00
Pavel Emelyanov	33962946fc	database: Introduce format_table_directory_name() helper The one makes table directory (not full path) out of table name and uuid. This is to be symmetrical with yet another helper that converts dirctory name back to table name and uuid (next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:48 +03:00
Pavel Emelyanov	dff51fd58c	snapshot-ctl: Add config to snapshot_ctl Pretty much all services in Scylla have their own config. Add one to snapshot-ctl too, it will be populated later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:57:20 +03:00
Pavel Emelyanov	f37857e20a	snapshot-ctl: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The snapshot ctl is going to spawn tasks that will talk to those storages, thus it needs the storage manager to get the clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	362331c89b	snapshot-ctl: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4ae89a9c81	snapshot-ctl: Add "snapshots" logger Will be used later Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	90c794172b	snapshot-ctl: Outline stop() method and constructor These two are going to grow, keep them out not to pollute the header Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	96946a4b11	snapshot-ctl: Inline run_snapshot_list<> This helper will be used by a code from another .cc file, so the template needs to be in header for smooth instantiation Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4e73b4d8ad	test/cql_test_env: Export task manager from cql test env To be used by one of the next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4b86eede1f	task_manager: Print task ttl on start (for debugging) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	8949d73cd9	docs: Update object_storage.md with AWS_ environment Commit `51c53d8db6` made it possible to configure object storage endpoint creds via environment. Mention this in the docs.	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	d3f9865d2f	docs: Restructure object_storage.md Currently the doc assumes that object storage can only be used to keep sstables on it. It's going to change, restructure the doc to allow for more usage scenarios.	2024-08-22 14:08:21 +03:00
Pavel Emelyanov	4e2d7aa2a2	test/tablets: Test that reading tablets' mutations from MUTATION_FRAGMENTS works Currently it doesn't, one of the node crashes with std::out_of_range exception and meaningless calltrace [Botond]: this test checks the case of reading a partition via MUTATION_FRAGMENTS from a node which doesn't own said partition. refs: #18786 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-22 06:24:06 -04:00
Botond Dénes	46563d719f	replica/mutation_dump: enfore pinning of effective replication map By making it a required argument, making sure the topology version is pinned for the duration of the query. This is needed because mutation dump queries bypass the storage proxy, where this pinning usually takes place. So it has to be enforced here.	2024-08-22 06:24:06 -04:00
Botond Dénes	de5329157c	replica/mutation_dump: handle un-owned tokens (with tablets) When using tablets, the replica-side doesn't handle un-owned tokens. table::shard_for_reads() will just return 0 for un-owned tokens, and a later attempt at calling table::storage_group_for_token() with said un-owned token will cause a crash (std::terminate due to std::out_of_range thrown in noexcept context). The replicas rely on the coordinator to not send stray requests, but for select from mutation_fragments(table) queries, there is no coordinator side who could do the correct dispatching. So do this in mutation_dump(), just creating empty readers for un-owned tokens.	2024-08-22 03:06:55 -04:00
Łukasz Paszkowski	a11d19f321	test_query.py: Test reverse queries with clustering key bounds Since a native reversed format is used for reversed queries, additional tests with restrictions on clustering keys are required to capture possible errors like https://github.com/scylladb/scylladb/issues/20191 earlier than in dtests. Add parametrization to the following tests: + test_query_reverse + test_query_reverse_paging to accept a comparison operator used in selection criteria for a Query operation.	2024-08-21 14:21:34 +02:00
Aleksandra Martyniuk	9b7c837106	test: add test to check compaction stop log	2024-08-21 12:42:37 +02:00
Aleksandra Martyniuk	5005e19de7	compaction: fix compaction group stop reason compaction_manager::remove passes "table removal" as a reason of stopping ongoing compactions, but currently remove method is also called when a tablet is migrated or split. Pass the actual reason of compaction stop, so that logs aren't misleading.	2024-08-21 12:42:09 +02:00
Avi Kivity	2ef5b5e4fe	Revert "[test.py] Increase pool size for CI" This reverts commit `cc428e8a36`. It causes may spurious CI failures while nodes are being torn down. Revert it until the root cause is fixed, after which it can be reinstated. Fixes #20116.	2024-08-21 13:21:08 +03:00
Benny Halevy	f40d06b766	table: calculate_tablet_count: use sg_manager storage_groups size Now, when each shard storage_group_manager keeps only the storage_groups for the tablet replica it owns, we can simple return the storage_group map size instead of counting the number of tablet replicas mapped to this shard. Add a unit test that sums the tablet count on all shards and tests that the sum is equal to the configured default `initial_tablets. Fixes #18909 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#20223	2024-08-21 11:01:58 +02:00
Tomasz Grabiec	a3a97e8aad	Merge 'schema_tables: calculate_schema_digest: prevent stalls due to large m…' from Benny Halevy …utations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Closes scylladb/scylladb#18174 * github.com:scylladb/scylladb: schema_tables: calculate_schema_digest: filter the key earlier schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector	2024-08-20 21:24:38 +02:00
Łukasz Paszkowski	f29d7ffa81	alternator::do_query Add additional trace log Additional log prints information on the read query being executed. It lists information like whether the query is a reversed one or not, and table_schema and query_schema versions.	2024-08-20 20:56:15 +02:00
Łukasz Paszkowski	727cbd8151	alternator::do_query: Use native reversed format When executing reversed queries, a native revered format shall be used. Therefore the table schema and the clustering key bounds are reversed before a partition slice and a read command are constructed. Similarly as for cql3::statements::select_statement.	2024-08-20 20:56:15 +02:00
Łukasz Paszkowski	3720e8aabe	alternator::do_query Rename schema with table_schema In order to increase readability, a schema variable is renamed to a table_schema to emphesize a table schema is passed to the function and used across it. Allows us to introduce a query_schema variable in the next patch.	2024-08-20 20:56:06 +02:00
Aleksandra Martyniuk	9d9414a75d	replica: add/remove table atomically Currently, database::tables_metadata::add_table needs to hold a write lock before adding a table. So, if we update other classes keeping track of tables before calling add_table, and the method yields, table's metadata will be inconsistent. Set all table-related info in tables_metadata::add_table_helper (called by add_table) so that the operation is atomic. Analogically for remove_table. Fixes: #19833. Closes scylladb/scylladb#20064	2024-08-20 20:53:32 +03:00
Kamil Braun	5c9efdff50	Merge 'raft: store_snapshot_descriptor to use actually preserved items number when truncating the local log table' from Sergey Zolotukhin io_fiber/store_snapshot_descriptor now gets the actual number of items preserved when the log is truncated, fixing extra entries remained after log snapshot creation. Also removes incorrect check for the number of truncated items in the raft_sys_table_storage::store_snapshot_descriptor. Minor change: Added error_injection test API for changing snapshot thresholds settings. Fixes scylladb/scylladb#16817 Fixes scylladb/scylladb#20080 Closes scylladb/scylladb#20095 * github.com:scylladb/scylladb: raft: Ensure const correctness in applier_fiber. raft: Invoke store_snapshot_descriptor with actually preserved items. raft: Use raft_server_set_snapshot_thresholds in tests. raft: Fix indentation in server.cc raft: Add a test to check log size after truncation. raft: Add raft_server_set_snapshot_thresholds injection. utils: Ensure const correctness of injection_handler::get().	2024-08-20 18:15:30 +02:00
Tomasz Grabiec	ff52527c54	Merge 'repair: do_rebuild_replace_with_repair: use source_dc only when safe' from Benny Halevy It is unsafe to restrict the sync nodes for repair to the source data center if it has too low replication factor in network_topology_replication_strategy, or if other nodes in that DC are ignored. Also, this change restricts the usage of source_dc to `network_topology` and `everywhere_topology` strategies, as with simple replication strategy there is no guarantee that there would be any more replicas in that data center. Fixes #16826 Reproducer submitted as https://github.com/scylladb/scylla-dtest/pull/3865 It fails without this fix and passes with it. * Requires backport to live versions. Issue hit in the filed with 2022.2.14 Closes scylladb/scylladb#16827 * github.com:scylladb/scylladb: repair: do_rebuild_replace_with_repair: use source_dc only when safe repair: replace_with_repair: pass the replace_node downstream repair: replace_with_repair: pass ignore_nodes as a set of host_id:s repair: replace_rebuild_with_repair: pass ks_erms from caller nodetool: rebuild: add force option Add and use utils::optional_param to pass source_dc	2024-08-20 16:13:23 +02:00
Sergey Zolotukhin	13b3d3a795	raft: Ensure const correctness in applier_fiber. Add 'const' to non mutable varibales in server_impl::applier_fiber() function.	2024-08-20 15:24:00 +02:00
Sergey Zolotukhin	c3e52ab942	raft: Invoke store_snapshot_descriptor with actually preserved items. - raft_sys_table_storage::store_snapshot_descriptor now receives a number of preserved items in the log, rather than _config.snapshot_trailing value; - Incorrect check for truncated number of items in store_snapshot_descriptor was removed. Fixes scylladb/scylladb#16817 Fixes scylladb/scylladb#20080	2024-08-20 15:22:49 +02:00
Sergey Zolotukhin	922e035629	raft: Use raft_server_set_snapshot_thresholds in tests. Replace raft_server_snapshot_reduce_threshold with raft_server_set_snapshot_thresholds in tests as raft_server_set_snapshot_thresholds fully covers the functionality of raft_server_snapshot_reduce_threshold.	2024-08-20 15:08:49 +02:00
Sergey Zolotukhin	00a1d3e305	raft: Fix indentation in server.cc	2024-08-20 15:08:45 +02:00
Sergey Zolotukhin	b6de8230a9	raft: Add a test to check log size after truncation. The test checks that snapshot_trailing_size parameter is taken into consideration when the log system table is truncated. Test for scylladb#16817	2024-08-20 14:15:50 +02:00
Sergey Zolotukhin	9dfa041fe1	raft: Add raft_server_set_snapshot_thresholds injection. Use error injection to allow overriding following snapshot threshold settings: - snapshot_threshold - snapshot_threshold_log_size - snapshot_trailing - snapshot_trailing_size	2024-08-20 14:15:50 +02:00
Sergey Zolotukhin	c5da0775f2	utils: Ensure const correctness of injection_handler::get(). Make utils::error_injection::injection_handler::get() method 'const' as it does not mutate object's state.	2024-08-20 14:15:50 +02:00
Botond Dénes	3ee0d7f2d1	Merge 'tools: Enhance scylla sstable shard-of to support tablets' from Kefu Chai before this change, `scylla sstable shard-of` didn't support tablets, because: - with tablets enabled, data distribution uses the scheduler - this replaces the previous method of mapping based on vnodes and shard numbers - as a result, we can no longer deduce sstable mapping from token ranges in this change, we: - read `system.tablets` table to retrieve tablet information - print the tablet's replica set (list of <host, shard> pairs) - this helps users determine where a given sstable is hosted This approach provides the closest equivalent functionality of `shard-of` in the tablet era. Fixes scylladb/scylladb#16488 --- no need to backport, it's an improvement, not a critical fix. Closes scylladb/scylladb#20002 * github.com:scylladb/scylladb: tools: enhance `scylla sstable shard-of` to support tablets replica/tablets: extract tablet_replica_set_from_cell() tools: extract get_table_directory() out tools: extract read_mutation out build: split the list of source file across multiple line tools/scylla-sstable: print warning when running shard-of with tablets	2024-08-20 13:51:12 +03:00
Avi Kivity	e2b179a3d0	Merge 'Coroutinize sstable_directory registry garbage collecting method' from Pavel Emelyanov null Closes scylladb/scylladb#20172 * github.com:scylladb/scylladb: sstable_directory: Coroutinize inner lambdas sstable_directory: Fix indentation after previous patch sstable_directory: Coroutinize outer cotinuation chain	2024-08-20 12:50:09 +03:00
David Garcia	fea707033f	docs: improve include flag directive The include flag directive now treats missing content as info logs instead of warnings. This prevents build failures when the enterprise-specific content isn't yet available. If the enterprise content is undefined, the directive automatically loads the open-source content. This ensures the end user has access to some content. address comments Closes scylladb/scylladb#19804	2024-08-20 12:21:39 +03:00
Kefu Chai	9a10c33734	build: cmake: do not build storage_proxy.o by default in `5ce07e5d84`, the target named "storage_proxy.o" was added for training the build of clang. but the rule for building this target has two flaws: * it was added a dependency of the "all" target, but we don't need to build `storage_proxy.cc` twice when building the tree in the regular build job. we only need to build it when creating the profile for training the build of clang. * it misses the include directory of abseil library. that's why we have following build failure when building the default target: ``` [2024-08-18T14:58:37.494Z] /usr/local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_PROMISE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/jenkins/workspace/scylla-master/scylla-ci/scylla -I/jenkins/workspace/scylla-master/scylla-ci/scylla/seastar/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/seastar/gen/include -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/seastar/gen/src -I/jenkins/workspace/scylla-master/scylla-ci/scylla/build/gen -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/jenkins/workspace/scylla-master/scylla-ci/scylla=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o -MF service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o.d -o service/CMakeFiles/storage_proxy.o.dir/Debug/storage_proxy.cc.o -c /jenkins/workspace/scylla-master/scylla-ci/scylla/service/storage_proxy.cc [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/service/storage_proxy.cc:17: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/db/commitlog/commitlog.hh:19: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/db/commitlog/commitlog_entry.hh:15: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/mutation/frozen_mutation.hh:15: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/mutation/mutation_partition_view.hh:16: [2024-08-18T14:58:37.495Z] In file included from /jenkins/workspace/scylla-master/scylla-ci/scylla/build/gen/idl/mutation.dist.impl.hh:14: [2024-08-18T14:58:37.495Z] /jenkins/workspace/scylla-master/scylla-ci/scylla/serializer_impl.hh:20:10: fatal error: 'absl/container/btree_set.h' file not found [2024-08-18T14:58:37.495Z] 20 \| #include <absl/container/btree_set.h> [2024-08-18T14:58:37.495Z] \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ [2024-08-18T14:58:37.495Z] 1 error generated. ``` * if user only enables "dev" mode, we'd have: ``` CMake Error at service/CMakeLists.txt:54 (add_library): No SOURCES given to target: storage_proxy.o ``` so, in this change, we * exclude this target from "all" * link this target against abseil header library, so it has access to the abseil library. please note, we don't need to build an executable in this case, so the header would suffice. * add a proxy target to conditionally enable/disable this target. as CMake does not support generator expression in `add_dependencies()` yet at the time of writing. see https://gitlab.kitware.com/cmake/cmake/-/issues/19467 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20195	2024-08-19 21:30:34 +03:00
Avi Kivity	7eb3b15fff	Merge 'utils/tagged_integer: remove conversion to underlying integer' from Laszlo Ersek ~~~ utils/tagged_integer: remove conversion to underlying integer Silently converting a tagged (i.e., "dimension-ful") integer to a naked ("dimensionless") integer defeats the purpose of having tagged integers, and is a source of practical bugs, such as <https://github.com/scylladb/scylladb/issues/20080>. We could make the conversion operator explicit, for enforcing static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE) in every conversion location -- but that's a mouthful to write. Instead, remove the conversion operator, and let clients call the (identically behaving) value() member function. ~~~ No backport needed (refactoring). The series is supposed to solve #20081. Two patches in the series touch up code that is known to be (orthogonally) buggy; see - `service/raft_sys_table_storage: tweak dead code` (#20080) - `test/raft/replication: untag index_t in test_case::get_first_val()` (#20151) Fixes for those (independent) issues will have to be rebased on this series, or this series will have to be rebased on those (due to context conflicts). The series builds at every stage. The debug and release unit test suites pass at the end. Closes scylladb/scylladb#20159 * github.com:scylladb/scylladb: utils/tagged_integer: remove conversion to underlying integer test/raft/randomized_nemesis_test: clean up remaining index_t usage test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot() test/raft/replication: clean up remaining index_t usage test/raft/replication: take an "index_t start_idx" in create_log() test/raft/replication: untag index_t in test_case::get_first_val() test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions test/raft/helpers: tighten compare_log_entries() param types service/raft_sys_table_storage: tweak dead code service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries) service/raft_sys_table_storage: untag index_t and term_t for queries raft/server: clean up index_t usage raft/tracker: don't drop out of index_t space for subtraction raft/fsm: clean up index_t and term_t usage raft/log: clean up index_t usage db/system_keyspace: promise a tagged integer from increment_and_get_generation() gms/gossiper: return "strong_ordering" from compare_endpoint_startup() gms/gossiper: get "int32_t" value of "gms::version_type" explicitly	2024-08-19 19:52:54 +03:00
Benny Halevy	5f655e41e3	repair: do_rebuild_replace_with_repair: use source_dc only when safe It is unsafe to restrict the sync nodes for repair to the source data center if we cannot guarantee a quorum in the data center with network-topology replication strategy. This change restricts the usage of source_dc in the following cases: 1. For SimpleStrategy - source_dc is ignored since there is no guarantee that it contains remaining replicas for all tokens. 2. For EverywhereStrategy - use source_dc if there are remaining live nodes in the datacenter. 3. For NetworkTopologyStrategy: a. It is considered unsafe to use source_dc if number of nodes lost in that DC (replaced/rebuilt node + additional ignored nodes) is greater than 1, or it has 1 lost node and rf <= 1 in the DC. b. If the source_dc arg is forced, as with the new `nodetool rebuild --force <source_dc>` option, we use it anyway, even if it's considered to be unsafe. A warning is printed in this case. c. If the source_dc arg is user-provided, (using nodetool rebuild), an error exception is thrown, advising to use an alternative dc, if available, omit source_dc to sync with all nodes, or use the --force option to use the given source_dc anyhow. d. Otherwise, we look for an alternative source datacenter, that has not lost any node. If such datacenter is found we use it as source_dc for the keyspace, and log a warning. e. If no alternative dc is found (and source_dc is implicit), then: log a warning and fall back to using replicas from all nodes in the cluster. Fixes #16826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:23:51 +03:00
Benny Halevy	8665eef98c	repair: replace_with_repair: pass the replace_node downstream To be used by the next path to count how many nodes are lost in each datacenter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:23:33 +03:00
Benny Halevy	9729dd21c3	repair: replace_with_repair: pass ignore_nodes as a set of host_id:s The callers already pass ignore_nodes as host_id:s and we translate them into inet_address only for repair so delay the translation as much as posible, Refs scylladb/scylladb#6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:22:01 +03:00
Benny Halevy	b5d0ab092c	repair: replace_rebuild_with_repair: pass ks_erms from caller The keyspaces replication maps must be in sync with the token_metadata_ptr passed already to the functions, so instead of getting it in the callee, let the caller get the ks_erms along with retrieving the tmptr. Note that it's already done on the rebuild path for streaming based rebuild. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:20:27 +03:00
Benny Halevy	0419b1d522	nodetool: rebuild: add force option To be used to force usage of source_dc, even when it is unsafe for rebuild. Update docs and add test/nodetool/test_rebuild.py Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:20:12 +03:00
Benny Halevy	8b1877f3ca	Add and use utils::optional_param to pass source_dc Clearly indicate if a source_dc is provided, and if so, was it explicitly given by the user, or was implicitly selected by scylla. This will become useful in the next patches that will use that to either reject the operation if it's unsafe to use the source_dc and the dc was explicitly given by the user, or whether to fallback to using all nodes otherwise. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-19 17:13:54 +03:00
Anna Stuchlik	83d5cb04c2	doc: extract the info about tablets defaut to a separate file This commit extracts the information about the default for tables in keyspace creation to a separate file in the _common folder. The file is then included using the scylladb_include_flag directive. The purpose of this commit is to make it possible to include a different file in the scylla-enterprise repo - with a different default. Refs https://github.com/scylladb/scylla-enterprise/issues/4585 Closes scylladb/scylladb#20181	2024-08-19 16:16:18 +03:00
Kefu Chai	25b3c50f71	test/nodetool: print default value of options in help message would be more helpful, if the output of "--help" command line can include the default value of options. so, in this change, we include the default values in it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20170	2024-08-19 16:15:24 +03:00
Botond Dénes	40d2a6f0b2	Merge 'test.py: use XPath for iterating in "TestSuite/TestSuite"' from Kefu Chai before this change, we check for the existence of "TestSuite" node under the root of XML tree, and then enumerating all "TestSuite" nodes under this "TestSuite", this approach works. but it * introduces unnecessary indent * is not very readable in this change, we just use "./TestSuite/TestSuite" for enumerating all "TestSuite" nodes under "TestSuite". simpler this way. --- it's a cleanup in the test driver script, hence no need to backport. Closes scylladb/scylladb#20169 * github.com:scylladb/scylladb: test.py: fix the indent test.py: use XPath for iterating in "TestSuite/TestSuite"	2024-08-19 16:13:42 +03:00
Botond Dénes	6835f7e993	Merge 'Add CQL-based RBAC support to Alternator' from Piotr Smaron Alternator already supports authentication - the ability to to sign each request as a particular user. The users that can be used are the different "roles" that are created by CQL "CREATE ROLE" commands. This series adds support for authorization, i.e., the ability to determine that only some of these roles are allowed to read or write particular tables, to create new tables, and so on. The way we chose to do this in this series is to support CQL's existing role-based access control (RBAC) commands - GRANT and REVOKE - on Alternator tables. For example, an Alternator table "xyz" is visible to CQL as "alternator_xyz.xyz", so a `GRANT SELECT ON alternator_xyz.xyz TO myrole` will allow read commands (e.g., GetItem) on that table, and without this GRANT, a GetItem will fail with `AccessDeniedException`. This series adds the necessary checks to all relevant Alternator operations, and also adds extensive functional testing for this feature - i.e., that certain DynamoDB API operations are not allowed without the appropriate GRANTs. The following permissions are needed for the following Alternator API operations: * SELECT: `GetItem`, `Query`, `Scan`, `BatchGetItem`, `GetRecords` * MODIFY: `PutItem`, `DeleteItem`, `UpdateItem`, `BatchWriteItem` * CREATE: `CreateTable` * DROP: `DeleteTable` * ALTER: `UpdateTable`, `TagResource`, `UntagResource`, `UpdateTimeToLive` * _none needed_: `ListTables`, `DescribeTable`, `DescribeEndpoints`, `ListTagsOfResource`, `DescribeTimeToLive`, `DescribeContinuousBackups`, `ListStreams`, `DescribeStream`, `GetShardIterator` Currently, I decided that for consistency each operation requires one permission only. For example, PutItem only requires MODIFY permission. This is despite the fact that in some cases (namely, `ReturnValues=ALL_OLD`) it can also _read_ the item. We should perhaps discuss this decision - and compare how it was done in CQL - e.g., what happens in LWT writes that may return old values? Different permissions can be granted for a base table, each of its views, and the CDC table (Alternator streams). This adds power - e.g., we can allow a role to read only a view but not the base table, or read the table but not its history. GRANTing permissions on views or CDC logs require knowing their names, which are somewhat ugly (e.g., the name of GSI "abc" in table "xyz" is `alternator_xyz.xyz:abc`). But usefully, the error message when permissions are denied contains the full name of the table that was lacking permissions and which permissions were lacking, so users can easily add them. In addition to permissions checking, this series also correctly supports _auto-grant_ (except #19798): When a role has permissions to `CreateTable`, any table it creates will automatically be granted all permissions for this role, so this role will be able to use the new table and eventually delete it. `DeleteTable` does the opposite - it removes permissions from tables being deleted, so that if later a second user re-creates a table with the same name, the first user will not have permissions over the new table. The already-existing configuration parameter `alternator_enforce_authorization` (off by default), which previously only enabled authentication, now also enables authorization. Users that upgrade to the new version and already had `alternator_enforce_authorization=true` should verify that the users they use to authenticate either have the appropriate permissions or the "superuser" flag. Roles used to authenticate must also have the "login" flag. Please note that although the new RBAC support implements the access control feature we asked for in #5047, this implementation is _not compatible_ with DynamoDB. In DynamoDB, the access control is configured through IAM operations or through the new `PutResourcePolicy` - operation, not through CQL (obviously!). DynamoDB also offers finer access-control granularity than we support (Scylla's RBAC works on entire tables, DynamoDB allows setting permissions on key prefixes, on individual attributes, and more). Despite this non-compatibility, I believe this feature, as is, will already be useful to Alternator users. Fixes #5047 (after closing that issue, a new clean issue should be opened about the DynamoDB-compatible APIs that we didn't do - just so we remember this wasn't done yet). New feature, should not be backported. Closes scylladb/scylladb#20135 * github.com:scylladb/scylladb: tests: disable test_alternator_enforce_authorization_true test, alternator: test for alternator_enforce_authorization config test/pylib: allow setting driver_connect() options in servers_add() test: fix test_localnodes_joining_nodes alternator, RBAC: reproducer for missing CDC auto-grant alternator: document the new RBAC support alternator: add RBAC enforcement to GetRecords test/alternator: additional tests for RBAC test/alternator: reduce permissions-validity-in-ms test/alternator: add test for BatchGetItem from multiple tables alternator: test for operations that do not need any permissions alternator: add RBAC enforcement to UpdateTimeToLive alternator: add RBAC enforcement to TagResource and UntagResource alternator: add RBAC enforcement to BatchGetItem alternator: add RBAC enforcement to BatchWriteItem alternator: add RBAC enforcement to UpdateTable alternator: add RBAC enforcement to Query and Scan alternator: add RBAC enforcement to CreateTable alternator: add RBAC enforcement to DeleteTable alternator: add RBAC enforcement to UpdateItem alternator: add RBAC enforcement to DeleteItem alternator: add RBAC enforcement to PutItem alternator: add RBAC enforcement to GetItem alternator: stop using an "internal" client_state	2024-08-19 16:09:53 +03:00
Tomasz Grabiec	c1de4859d8	Merge 'tablets: Fix race between repair and split' from Raphael "Raph" Carvalho Consider the following: ``` T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes ``` If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes #19378. Fixes #19416. *Please replace this line with justification for the backport/\ labels added to this PR** Closes scylladb/scylladb#19427 * github.com:scylladb/scylladb: tablets: Fix race between repair and split compaction: Allow "offline" sstable to be split	2024-08-19 14:44:28 +02:00
Kefu Chai	151074240c	utils: cached_file: use structured binding when appropriate for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20184	2024-08-19 14:01:42 +03:00
Piotr Smaron	f773c76bfb	codeowners: add appropriate reviewers to the cluster components	2024-08-19 12:39:47 +02:00
Anna Stuchlik	8fb746a5d2	doc: fix a link on the RBAC page This commit fixes an external link on the Role Based Access Control page. Fixes https://github.com/scylladb/scylladb/issues/20166 Closes scylladb/scylladb#20171	2024-08-19 12:56:38 +03:00
Piotr Smaron	cdc88cd06c	tests: disable test_alternator_enforce_authorization_true The test is flaky and needs to be fixed in order to not randomly break our CI, OTOH can be commented out for the time being, so that we can marge the feature.	2024-08-19 09:57:53 +02:00
Nadav Har'El	989dbef315	test, alternator: test for alternator_enforce_authorization config This patch adds tests that demonstrates the current way that Alternator's authentication and authorization are both enabled or disabled by the option "alternator_enforce_authorization". If in the future we decide to change this option or eliminate it (e.g., remain just with the "authenticator" and "authorizer" options), we can easily update these tests to fit the new configuration parameters and check they work as expected. Because the new tests want to start Scylla instances with different configuration parameters, they are written in the the "topology" framework and not in the test/alternator framework. The test/alternator framework still contains (test/alternator/test_cql_rbac.py) the vast majority of the functional testing of the RBAC feature where all those tests just assume that RBAC is enabled and needs to be tested. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	41418603e1	test/pylib: allow setting driver_connect() options in servers_add() The manager.driver_connect() functions allows to pass parameters when creating the connection (e.g., a special auth_provider), but unfortunately right now the servers_add() function always calls driver_connect() without parameters. So in this patch we just add a new optional parameter to servers_add(), driver_connect_opts, that will be passed to driver_connect(). In theory instead of the new option to driver_connect() a caller can pass start=False to servers_add() and later call driver_connect() manually with the right arguments. The problem is that start=False avoids more than just calling driver_connect(), so it doesn't solve the problem. An example of using the new option is to run Scylla with authentication enabled, and then connect to it using the correct default account ("cassandra"/"cassandra"): config = { 'authenticator': 'PasswordAuthenticator', 'authorizer': 'CassandraAuthorizer' } servers = await manager.servers_add(1, config=config, driver_connect_opts={'auth_provider': PlainTextAuthProvider(username='cassandra', password='cassandra')}) Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	de20ac1a6d	test: fix test_localnodes_joining_nodes The existing test topology_experimental_raft/test_alternator::test_localnodes_joining_nodes Tried to create a second server but not wait for it to complete, but the trick it used (cancelling the task) doesn't work since commit `2ee063c` makes a list of unwaited tasks and waits for them anyway. The test appears to work because it is the last test in the file, but if we ever add another test in the same file (like I plan to do in the next patch), that other test will find a "BROKEN" ScyllaClusterManager and report that it failed :-( Other tricks I tried to use (like killing the servers) also didn't work because of various limitations and complications of the test framework and all its layers. So not wanting to fight the fragile testing framework any more at this point, I just gave up and the test will wait for the second server to come up. This adds 120 seconds (!) to the test, but since this whole test file already takes more than 500 seconds to complete, let's bite this bullet. Maybe in the future when the test framework improves, we can avoid this 120 second wait. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	79f9b3007e	alternator, RBAC: reproducer for missing CDC auto-grant This patch adds a reproducing (xfailing) test for issue #19798, which shows that if a role is able to create an Alternator table, the role is able to read the new table (this is known as "auto-grant"), but is NOT able to read the CDC log (i.e., use Alternator Streams' "GetRecords"). Once we do fix this auto-grant bug, it's also important to also implement auto-revoke - the permissions on a deleted table must be deleted as well (otherwise the old owner of a deleted table will be able to read a new table with the same name). This patch also adds a test verifying that auto-revoke works. This test currently passes (because there is no auto- grant, so nothing needs to be revoked...) but if we'll implement auto-grant and forget auto-revoke, the second test will start to fail - so I added this test as a precaution against a bad fix. Refs #19798 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	7de6aedd47	alternator: document the new RBAC support In docs/alternator/compatibility.md we said that although Alternator supports authentication, it doesn't support authorization (access control). Now it does, so the relevant text needs to be corrected to fit what we have today. It's still in the compatibility.md document because it's not the same API as DynamoDB's, so users with existing applications may need to be aware of this difference. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	f9ff475dfb	alternator: add RBAC enforcement to GetRecords This patch adds a requirement for the "SELECT" permission on a table to run a GetRecords on it (the DynamoDB Streams API, i.e., CDC). The grant is checked on the CDC log table - not on the base table, which allows giving a role the ability to read the base but not is change stream, or vice versa. The operations ListStreams, DescribeStreams, GetShardIterators do not require any permissions to run - they do not read any data, and are (in my opinion) similar in spirit to DescribeTable, so I think it's fine not to require any permissions for them. A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	0789841cf8	test/alternator: additional tests for RBAC Additional tests for support for CQL Role-Based Access Control (RBAC) in Alternator: 1. Check that even in an Alternator table whose name isn't valid as CQL table names (e.g., uses the dot character) the GRANT/REVOKE commands work as expected. 2. Check that superuser roles have full permissions, as expected. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	409fea5541	test/alternator: reduce permissions-validity-in-ms We set in test/cql-pytest/run.py, affecting test/alternator/run, the configuration permissions_validity_in_ms by default to 100ms. This means that tests that need to check how GRANT or REVOKE work always need to sleep for more than 100ms, which can make a test with a lot of these operations very slow. So let's just set this configuration value to 5ms. I checked that it doesn't adversely affect the total running speed of test/alternator/run. This change only affects running tests through test/alternator/run, which is expected to be fast. I left the default for test.py as it was, 100ms, the latency of individual tests is less important there. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	1b20a11dec	test/alternator: add test for BatchGetItem from multiple tables While working on the RBAC on BatchGetItem, I noticed that although BatchGetItem may ask to read items from several tables, we don't have a test covering this case! This patch fixes that testing oversight. Note that for the write-side version of this operation, BatchWriteItem, we do have tests that write to several tables in the same batch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	f827bd51d2	alternator: test for operations that do not need any permissions Some operations, namely ListTables, DescribeTable, DescribeEndpoints, ListTagsOfResource, DescribeTimeToLive and DescribeContinuousBackups do not need any permissions to be GRANTed to a role. Our rationale for this decision is that in CQL, "describe table" and friends also do not require any permissions. This patch includes a test that verifies that they really don't need permissions. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	9417cf8bcf	alternator: add RBAC enforcement to UpdateTimeToLive This patch adds a requirement for the "ALTER" permission on a table to run a UpdateTimeToLive on it. UpdateTimeToLive is similar in purpose to UpdateTable, so it makes sense to use the same permission "ALTER" as we do for UpdateTable. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	e76316495c	alternator: add RBAC enforcement to TagResource and UntagResource This patch adds a requirement for the "ALTER" permission on a table to run the TagResource or UntagResource operations on it. CQL does not have an exact parallel of DynamoDB's tagging feature, but our usual use of tags as an extension of UpdateTable to change non-standard options (e.g., write isolation policy or tablets setup), so it makes sense to require the same permissions we require for UpdateTable - namely "ALTER". A test for both operations is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:53 +02:00
Nadav Har'El	fda4a9fad8	alternator: add RBAC enforcement to BatchGetItem This patch adds a requirement for the "SELECT" permission on a table to run a BatchGetItem on it. A single batch may ask to write to several different tables, so we fail the entire batch with AccessDeniedException if any of the tables mentioned in the batch do not have SELECT permissions for this role. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:57:51 +02:00
Nadav Har'El	b02288785f	alternator: add RBAC enforcement to BatchWriteItem This patch adds a requirement for the "MODIFY" permission on a table to run a BatchWriteItem on it. A single batch may ask to write to several different tables, so we fail the entire batch with AccessDeniedException if any of the tables mentioned in the batch do not have MODIFY permissions for this role. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:56:28 +02:00
Nadav Har'El	445a5d57cd	alternator: add RBAC enforcement to UpdateTable This patch adds a requirement for the "ALTER" permission on a table to run a UpdateTable on it. A tests is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	b4484158e7	alternator: add RBAC enforcement to Query and Scan This patch adds a requirement for the "SELECT" permission on a table to run a Query or Scan on it. Both Query and Scan operations call the same do_query() function, so the permission checks are put there. Note that Query can read from either the base table or one of its views, and the permissions on the base and each of the views can be separate (so we can allow a role to only read one view, for example). Tests for all of the above are also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	82f7e55943	alternator: add RBAC enforcement to CreateTable This patch adds a requirement for the "CREATE" permission on ALL KEYSPACES to run a CreateTable operation. The CreateTable operation also performs so-called "auto-grant": When a role creates a table, it is automatically granted full permissions to read, write, change or delete that new table. A test for all these things is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	79dfb7b7d5	alternator: add RBAC enforcement to DeleteTable This patch adds a requirement for the "DROP" permission on a table to run a DeleteTable on it. Moreover, when a table and its views are deleted, any special permissions previously GRANTed on this table are removed. This is necessary because if a role creates a table it is automatically granted permissions on this table (this is known as "auto-grant" - see the CreateTable patch for details). If this role deletes this table and later a second role creates a table with the same name, we don't want the first role to have permissions on this new table. Tests for permission enforcements and revocation on delete are also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	2ebc0501b8	alternator: add RBAC enforcement to UpdateItem This patch adds a requirement for the "MODIFY" permission on a table to run a UpdateItem on it. Only the MODIFY permission is required, even if the operation may also read the old value of the item, such as a read-modify-write operation or even using ReturnValues='ALL_OLD'. A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	36d8aea654	alternator: add RBAC enforcement to DeleteItem This patch adds a requirement for the "MODIFY" permission on a table to run a DeleteItem on it. Only the MODIFY permission is required, even if the operation may also read the old value of the item (using ReturnValues='ALL_OLD'). A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	34c975854a	alternator: add RBAC enforcement to PutItem This patch adds a requirement for the "MODIFY" permission on a table to run a PutItem on it. Only the MODIFY permission is required, even if the operation may also read the old value of the item (using ReturnValues='ALL_OLD'). A test is also added. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	3008b8416c	alternator: add RBAC enforcement to GetItem In this patch, we begin to add role-based access control (RBAC) enforement to Alternator - in this patch only to GetItem. After the preparation of client_state correctly in the previous patch, the permission check itself in the get_item() function is very simple. The bigger part of this patch is a full functional test in test/alternator/test_cql_rbac.py. The test is quite self-explanatory and heavily commented. Basically we check that a new role cannot read with GetItem a pre-existing table, and we can add that ability by GRANTing (in CQL) the new role the ability to SELECT the table, the keyspace, all keyspaces, or add that ability to some other role that this role inherits. In the following patches, we will add role-based access control to the Alternator operations, but the functional tests will be shorter - we don't need to check the role inheritence, "all keyspaces" feature, and so on, for every operation separately since they all use the same underlying checking functions which handles these role inheritence issues in exactly the same way. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Nadav Har'El	583f060bd8	alternator: stop using an "internal" client_state Scylla uses a "client_state" object to encapsulate the information of who the client is - its IP address, which user was authenticated, and so on. For an unknown reason, Alternator created for each request an "internal" client_state, meaning that supposedly the client for each request was some sort of internal process (e.g., repair) rather than a real client. This was wrong, and we even had a FIXME about not putting the client's IP address in client_state. So in this patch, we start using a normal "external" client_state instead of an "internal" one. The client_state constructors are very different in the two cases, so a few lines of code had to change. I hope that this change will cause no functional changes. For example, Alternator was already setting its own timeouts explicitly and not relying on the default ones for external clients. However, we need to fix this for the following patches which introduce permissions checks (Role-Based Access Control - RBAC) - the client_state methods for checking permissions become no-ops for internal clients (even if the client_state contains an authenticated users). We need these functions to do their job - so we need an external variant of client_state. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-08-19 09:45:22 +02:00
Tomasz Grabiec	ab7656a7be	Merge 'replica: fix copy constructor of tablet_sstable_set' from Lakshmi Narayanan Sreethar Commit `9f93dd9fa3` changed `tablet_sstable_set::_sstable_sets` to be a `absl::flat_hash_map` and in addition, `std::set<size_t> _sstable_set_ids` was added. `_sstable_set_ids` is set up in the `tablet_sstable_set(schema_ptr s, const storage_group_manager& sgm, const locator::tablet_map& tmap)` constructor, but it is not copied in `tablet_sstable_set(const tablet_sstable_set& o)`. This affects the `tablet_sstable_set::tablet_sstable_set` method as it depends on the copy constructor. Since sstable set can be cloned when a new sstable set is added, the issue will cause ids not being copied into the new sstable set. It's healed only after compaction, since the sstable set is rebuilt from scratch there. This PR fixes this issue by removing the existing copy constructor of `tablet_sstable_set` to enable the implicit default copy constructor. Fixes #19519 Closes scylladb/scylladb#20115 * github.com:scylladb/scylladb: boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor replica: fix copy constructor of tablet_sstable_set	2024-08-19 00:53:29 +02:00
Avi Kivity	390e01673b	Merge 'Adding batch latency and batch size metrics to Alternator' from Amnon Heiman This patch adds metrics for batch get_item and batch write_item. The new metrics record summary and histogram for latencies and batch size. Batch sizes are implemented as ever-growing counters. To get the average batch size divide the rate of the batch size counter by the rate of the number of batch counter: ```rate(batch_get_item_batch_size)/rate(batch_get_item)``` Relates to #17615 New code, No need to backport Closes scylladb/scylladb#20190 * github.com:scylladb/scylladb: Add tests for Alternator batch operation metrics alternator/executor: support batch latency and size metrics Add metrics for Alternator get and write batch operations	2024-08-18 21:22:39 +03:00
Amnon Heiman	63fdfb89cd	Add tests for Alternator batch operation metrics This patch adds unit tests to verify the correctness of the newly introduced histogram metrics for get and write batch operation latencies. The test uses the existing latency test with the added metrics. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-18 12:19:43 +03:00
Amnon Heiman	d20a333f51	alternator/executor: support batch latency and size metrics This patch Updated the get and write batch operations in Alternator to record latency using the newly added histogram metrics. It adds logic to increment the counters with the number of items processed in each batch. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-18 12:14:23 +03:00
Amnon Heiman	8bad4b44f8	Add metrics for Alternator get and write batch operations Introduced histogram metrics to track latency for Alternator's get and write batch operations. Added counters to record the number of items processed in each batch operation. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2024-08-18 12:09:46 +03:00
Lakshmi Narayanan Sreethar	ec47b50859	boost/sstable_set_test: add testcase to test tablet_sstable_set copy constructor Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-08-17 23:38:05 +05:30
Lakshmi Narayanan Sreethar	44583eed9e	replica: fix copy constructor of tablet_sstable_set Remove the existing copy constructor to enable the use of the implicit copy constructor. This fixes the issue of `_sstable_set_ids` not being copied in the current copy constructor. Fixes #19519 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-08-17 23:37:58 +05:30
Kefu Chai	3d593ceeb1	perf/perf_sstable: add {crawling,partitioned}_streaming modes for testing the load performance of load_and_stream operation. Refs scylladb/scylladb#19989 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-17 14:43:54 +08:00
Kefu Chai	7806c72e49	test/perf/perf_sstable: use switch-case when appropriate this change is a follow up of `06c60f6ab`, which updated the 2nd step of the test to use switch-case, but missed the 1st step. so this change updates the first step of the test as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-17 14:38:37 +08:00
Pavel Emelyanov	6a9b8ea135	sstable_directory: Coroutinize inner lambdas Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-16 10:45:27 +03:00
Pavel Emelyanov	7401c0ace2	sstable_directory: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-16 10:45:27 +03:00
Pavel Emelyanov	7422504d35	sstable_directory: Coroutinize outer cotinuation chain Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-16 10:45:27 +03:00
Kefu Chai	e8f9f71ef3	test.py: fix the indent and take this opportunity to fix a typo in comment. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-16 13:32:57 +08:00
Kefu Chai	e88166f7a4	test.py: use XPath for iterating in "TestSuite/TestSuite" before this change, we check for the existence of "TestSuite" node under the root of XML tree, and then enumerating all "TestSuite" nodes under this "TestSuite", this approach works. but it * introduces unnecessary indent * is not very readable in this change, we just use "./TestSuite/TestSuite" for enumerating all "TestSuite" nodes under "TestSuite". simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-16 13:32:33 +08:00
Kefu Chai	afee3924b3	s3/client: check for "Key" and "Value" tag in "Tag" XML tag despite that the API document at https://docs.aws.amazon.com/AmazonS3/latest/API/API_Tag.htm claims that both these tags are "Required" in the "Tag" object returned by S3 APIs, we still have to check them before dereferencing the pointer of the child node, as we should not trust the output of an external API. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20160	2024-08-15 20:16:35 +03:00
Andrei Chekun	f24f5b7db2	test.py: Fix boost XML conversion to allure when XML file is empty The method cannot find the TestSuite in the XML file and fails the whole job, however tests are passed. The issue was in incorrect understanding of boost summarization method. It creates one file for all modes, so there is no need to go through all modes to convert the XML file for allure. Closes: https://github.com/scylladb/scylladb/issues/20161 Closes scylladb/scylladb#20165	2024-08-15 20:15:31 +03:00
Benny Halevy	52234214e5	schema_tables: calculate_schema_digest: filter the key earlier Currently, each frozen mutation we get from system_keyspace::query_mutations is unfrozen in whole to a mutation and only then we check its key with the provided `accept_keyspace` function. This is wasteful, since they key can be processed directly form the frozen mutation, before taking the toll of unfreezing it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-15 12:33:34 +03:00
Benny Halevy	95a5fba0ea	schema_tables: calculate_schema_digest: prevent stalls due to large mutations vector With a large number of table the schema mutations vector might get big enoug to cause reactor stalls when freed. For example, the following stall was hit on 2023.1.0~rc1-20230208.fe3cc281ec73 with 5000 tables: ``` (inlined by) ~vector at /usr/bin/../lib/gcc/x86_64-redhat-linux/12/../../../../include/c++/12/bits/stl_vector.h:730 (inlined by) db::schema_tables::calculate_schema_digest(seastar::sharded<service::storage_proxy>&, enum_set<super_enum<db::schema_feature, (db::schema_feature)0, (db::schema_feature)1, (db::schema_feature)2, (db::schema_feature)3, (db::schema_feature)4, (db::schema_feature)5, (db::schema_feature)6, (db::schema_feature)7> >, seastar::noncopyable_function<bool (std::basic_string_view<char, std::char_traits<char> >)>) at ./db/schema_tables.cc:799 ``` This change returns a mutations generator from the `map` lambda coroutine so we can process them one at a time, destroy the mutations one at a time, and by that, reducing memory footprint and preventing reactor stalls. Fixes #18173 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-08-15 12:33:34 +03:00
Kefu Chai	c628fa4e9e	tools: enhance `scylla sstable shard-of` to support tablets before this change, `scylla sstable shard-of` didn't support tablets, because: - with tablets enabled, data distribution uses the scheduler - this replaces the previous method of mapping based on vnodes and shard numbers - as a result, we can no longer deduce sstable mapping from token ranges in this change, we: - read `system.tablets` table to retrieve tablet information - print the tablet's replica set (list of <host, shard> pairs) - this helps users determine where a given sstable is hosted This approach provides the closest equivalent functionality of `shard-of` in the tablet era. Fixes scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	4291033b14	replica/tablets: extract tablet_replica_set_from_cell() so it can be reused to implement a low-level tool which reads tablets data from sstables Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	e1162e0dae	tools: extract get_table_directory() out the `get_table_directory()` function will have applications beyond its current use in `schema_loader.cc`. its ability to locate the directory storing the sstables of given table could be valuable in other subcommand(s) implementation. so, in this change we extract it out into a dedicated source file, so that it accept the primary_key and an optional clustering_key. Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	a04e0b6c7d	tools: extract read_mutation out the `read_mutation_from_table_offline()` function will have applications beyond its current use in `schema_loader.cc`. its ability to parser mutation data from sstables could be valuable in other subcommand(s) implementation. so, in this change we extract it out into a dedicated source file, so that it accept the primary_key and an optional clustering_key. Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	74a670dd19	build: split the list of source file across multiple line Split the extended list of source files across multiple lines. This improves readability and makes future additions easier to review in diffs. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Kefu Chai	3f8f1d7274	tools/scylla-sstable: print warning when running shard-of with tablets the subcommand of "shard-of" does not support tablets yet. so let's print out an error message, instead of printing the mapping assuming that the sstables are distributed based on token only. this commit also adds two more command line options to this subcommand, so that user is required to specify either "--vnodes" or "--tablets" to instruct the tool how the cluster distributes the tokens across nodes and their shards. this helps to minimize the suprise of user. this change prepares for the succeeding changes to implement the tablets support. the corresponding test is updated accordingly so that it only exercises the "shard-of" subcommand without tablets. we will test it with tablets enabled in a succeeding change. Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-15 15:49:55 +08:00
Laszlo Ersek	baf6ec49ff	utils/tagged_integer: remove conversion to underlying integer Silently converting a tagged (i.e., "dimension-ful") integer to a naked ("dimensionless") integer defeats the purpose of having tagged integers, and is a source of practical bugs, such as <https://github.com/scylladb/scylladb/issues/20080>. We could make the conversion operator explicit, for enforcing static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE) in every conversion location -- but that's a mouthful to write. Instead, remove the conversion operator, and let clients call the (identically behaving) value() member function. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-15 02:12:58 +02:00
Laszlo Ersek	9aa7d232d6	test/raft/randomized_nemesis_test: clean up remaining index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in "test/raft/randomized_nemesis_test.cc": - addition of tagged and untagged (both should be tagged) - taking the minimum of an index difference and a container size (both should be untagged) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	1af3460a81	test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot() With implicit conversion of tagged integers to untagged ones going away, unpack and clean up the relatively complex first_to_remain = max(snap.idx + 1 - preserve_log_entries, 0) calculation in persistence::store_snapshot(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	4dc2faa49a	test/raft/replication: clean up remaining index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly untag the operands / arguments of the following operations, in "test/raft/replication.hh": - assignment to raft_cluster::_seen - call to hasher_int::hash_range() Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	3a32f3de81	test/raft/replication: take an "index_t start_idx" in create_log() raft_cluster::get_states() passes a "start_idx" to create_log(), and create_log() uses it as an "index_t" object. Match the type of "start_idx" to its name. This patch is best viewed with "git show -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	08e117aeb5	test/raft/replication: untag index_t in test_case::get_first_val() In test_case::get_first_val(), the asssignment first_val = initial_snapshots[initial_leader].snap.idx; both relies on implicit conversion of the tagged integer type "index_t" to the underlying "uint64_t", and is a logic bug, as reported at <https://github.com/scylladb/scylladb/issues/20151>. For now, wean the buggy asssignment off the disappearing tagged-to-untaggged conversion. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	6254fca7f5	test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions Properly annotate index_t and term_t constants for use in BOOST_CHECK_EQUAL() and BOOST_CHECK(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	bd4fc85bf0	test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions Properly annotate index_t and term_t constants for use in BOOST_CHECK_EQUAL(), BOOST_CHECK(). Clean up the first args of read_quorum() calls -- stay in term_t space. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Laszlo Ersek	265655473e	test/raft/helpers: tighten compare_log_entries() param types The "from" and "to" parameters of compare_log_entries() are raft log indices; change them to raft::index_t, and update the callers. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 22:54:42 +02:00
Piotr Smaron	3e3858521d	codeowners: add appropriate reviewers to the frontend components	2024-08-14 22:26:35 +02:00
Piotr Smaron	1b2e88b96a	codeowners: fix codeowner names	2024-08-14 22:26:26 +02:00
Laszlo Ersek	5dcc627465	service/raft_sys_table_storage: tweak dead code In raft_sys_table_storage::store_snapshot_descriptor(), the condition preserve_log_entries > snap.idx both relies on implicit conversion of the tagged integer type "index_t" to the underlying "uint64_t", and is a logic bug, as reported at <https://github.com/scylladb/scylladb/issues/20080>. Ticket#20080 explains that this condition always evaluates to false in practice, and that the "else" branch handles all cases correctly anyway. For now, wean the buggy expression off the disappearing tagged-to-untaggged conversion. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 21:35:34 +02:00
Andrei Chekun	3407ae5d8f	[test.py] Add Junit logger for boost test Currently, boost tests aren't using Junit. Enable Junit report output and clean them from skipped test, since boost tests are executed by function name rather than filename. This allows including boost tests result to the Allure report. Related: https://github.com/scylladb/qa-tasks/issues/1665 Closes scylladb/scylladb#19925	2024-08-14 22:18:31 +03:00
Avi Kivity	6d6f93e4b5	Merge 'test/nodetool: enable running nodetool tests under test/nodetool' from Kefu Chai before this change, we assume user runs nodetool tests right under the root source directory. if user runs them under `test/nodetool`, the suppression rules are not applied. as the path is incorrect in that case. after this change, the supression rules' path is deduced from the top src directory. so we can now run the nodetool test under `test/nodetool` . --- no need to backport, this change improves developer's experience. Closes scylladb/scylladb#20119 * github.com:scylladb/scylladb: test/nodetool: deduce subpression path from top srcdir test/nodetool: deduce path from top srcdir	2024-08-14 22:10:38 +03:00
Michał Jadwiszczak	f7eb74e31f	cql3/statements/create_service_level: forbid creating SL starting with `$` Tenant names starting with `$` are reserved for internal ones. Forbid creating new service level which name starts with `$` and log a warning for existing service levels with `$` prefix. Closes scylladb/scylladb#20122	2024-08-14 21:25:31 +03:00
Kefu Chai	5ce07e5d84	build: cmake: add compiler-training target `tools/toolchain/optimized_clang.sh` builds this target for creating the profile in order to build clang optimized with this profile data. so let's be compatible with `configure.py`, and add this target to CMake building system as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20105	2024-08-14 21:21:33 +03:00
Ernest Zaslavsky	f5f65ead1e	Add `.clang-format`, also add CLion build folder to the `.gitignore` file Closes scylladb/scylladb#20123	2024-08-14 21:20:29 +03:00
Pavel Emelyanov	66d72e010c	distributed_loader: Lock table via global table ptr The lock_table() method needs database, ks and cf to find the table on all shards. The same can be achieved with the help of global_table_ptr thing that all the core callers already have at hand. There's a test that doesn't have global table, but it can get one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20139	2024-08-14 20:53:21 +03:00
Pavel Emelyanov	7e3e5cfcad	sstable_directory: Simplify special-purpose local-only constructor Typically the sstable_directory is constructed out of a table object. Some code, namely tests and schema-loader, don't have table at hand and construct directory out of schema, sharder, path-to-sstables, etc. This code doesn't work with any storage options other than local ones, so there's no need (yet) to carry this argument over. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#20138	2024-08-14 20:22:50 +03:00
Avi Kivity	28d3b91cce	Merge 'test/perf/perf_sstables: use test_modes as the type of its option' from Kefu Chai before this change, we look up for the mode using the command line option as the key, but that's incorrect if the command line option does not match with any of the known names. in that case, `test_mode` just create another pair of <sstring, test_modes>, and return the second component of this pair. and the second component is not what we expect. we should have thrown an exception. in this change * the test_mode map is marked const. * the overloads for parsing / formatting the `test_modes` type are added, so that boost::program_options can parse and format it. after this change, we print more user friendly error, like ``` /scylla perf-sstable --mode index-foo error: the argument ('index-foo') for option '--mode' is invalid Try --help. ``` instead of a bunch of output which is printed as if we passes the correct option as the argument of the `--mode` option. --- it's an improvement of developer experience, hence no need to backport. Closes scylladb/scylladb#20140 * github.com:scylladb/scylladb: test/perf/perf_sstable: use switch-case when appropriate test/perf/perf_sstables: use test_modes as the type of its option	2024-08-14 20:18:22 +03:00
Piotr Smaron	31cb5b132b	codeowners: remove non contributors	2024-08-14 18:52:25 +02:00
Avi Kivity	3de4e8f91b	Merge 'cql: process LIMIT for GROUP BY select queries' from Paweł Zakrzewski This change fixes #17237, fixes #5361 and fixes #5362 by passing the limit value down the call chain in cql3. A test is also added. fixes #17237 fixes #5361 fixes #5362 The regression happened in 5.4 as we changed the way GROUP BY is processed in `432cb02` - to force aggregation when it is used. The LIMIT value was not passed to aggregations and thus we failed to adhere to it. W want to backport this fix to 5.4 and 6.0 to have continuous correct results for the test case from #17237 This patch consists of 4 commits: - fa4225ea0fac2057b7a9976f57dc06bcbd900cd4 - cql3: respect the user-defined page size in aggregate queries - a precondition for this patch to be implementable - 8fbe69e74dca16ed8832d9a90489ca47ba271d0b - cql3/select_statement: simplify the get_limit function - the `do_get_limit()` function did a lot of legwork that should not be associated with it. This change makes it trivial and makes its callers do additional checks (for unset guards, or for an aggregate query) - 162828194a2b88c22fbee335894ff045dcc943c9 - cql3: process LIMIT for GROUP BY queries - pass the limit value down the chain and make use of it. This is the actual fix to #17237 - b3dc6de6d6cda8f5c09b01463bb52f827a6a00b4 - test/cql-pytest: Add test for GROUP BY queries with LIMIT - tests Closes scylladb/scylladb#18842 * github.com:scylladb/scylladb: test/cql-pytest: Add test for GROUP BY queries with LIMIT cql3: process LIMIT for GROUP BY queries cql3/select_statement: simplify the get_limit function cql3: respect the user-defined page size in aggregate queries	2024-08-14 17:54:59 +03:00
Avi Kivity	8c257db283	Merge 'Native reverse pages over RPC' from Łukasz Paszkowski Drop half-reversed (legacy) format of query::partition_slice. The select query builds a fully reversed (native) slice for reversed queries and use it together with a reversed schema to construct query::read_command that is further propagated to the database. A cluster feature is added to support nodes that still operate on half-reversed slices. When the feature is turned off: - query::read_command is transformed (to have table schema and half-reversed slices) before sending to other nodes - query::read_command is transformed (to have query schema (reversed) and reversed slices) after receiving it from other nodes - Similarly, mutations are transformed. They are reversed before being sent to other nodes or after receiving them from other nodes. Additional manual tests were performed to test a mixed-node cluster: 1. 3-node cluster with one node upgraded: reverse read queries performed on an old node 2. 3-node cluster with one node upgraded: reverse read queries performed on a new node 3. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on an old node 4. 3-node cluster with one node upgraded and all its sstable files deleted to trigger repair: reverse read queries performed on a new node All reverse read queries above consists of: - single-partition reverse reads with no clustering key restrictions, with single column restrictions and multi column restrictions both with and without paging turned on - multi-partition reverse reads with range restrictions with optional partition limit and partial ordering The exact same tests were also performed on a fully upgraded cluster. Fixes https://github.com/scylladb/scylladb/issues/12557 Closes scylladb/scylladb#18864 * github.com:scylladb/scylladb: mutation_partition: drop reverse parameter in compact_for_query clustering_key_filter: unify get_ranges and get_native_ranges streamed_mutation_freezer: drop the reverse parameter reverse-reads.md: Drop legacy reverse format information Fix comments refering to half-reversed (legacy) slices select_statement::do_execute: Add tracing informaction query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges query-request: Drop half_reverse_slice as it is no longer used anywhere readers: Use reversed schema and native reversed slices database: accept reversed schema for reversed queries storage_proxy: Support reverse queries in native format query_pagers: Replace _schema with _query_schema query_pagers: Support reverse queries in native format select_statement: Execute reversed query in native format storage_proxy::remote: Add support for mixed-node clusters mutation_query: Add reversed function to reverse reconcilable_result query-request: Add reversed function to reverse read_command features: add native_reverse_queries kl::reader::make_reader: Unify interface with mx::reader::make_reader config: drop reversed_reads_auto_bypass_cache config: drop enable_optimized_reversed_reads	2024-08-14 17:51:56 +03:00
Anna Stuchlik	99be8de71e	doc: set 6.1 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: - 6.1 is the latest version. - 6.1 is removed from the list of unstable versions. It must be merged when ScyllaDB 6.1 is released. No backport is required. Closes scylladb/scylladb#20041	2024-08-14 13:43:17 +02:00
Laszlo Ersek	d87d1ae29d	service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries) With conversion of tagged integers to untagged ones going away, replace static_cast<uint64_t>(snap.idx) with snap.idx.value() Furthermore, casting "preserve_log_entries" (of type "size_t") to "uint64_t" is redundant (both "snap.idx" and "preserve_log_entries" carry nonnegative values, and the mathematical difference is expected to be nonnegative); remove the cast. Finally, simplify the initialization syntax. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	e781046739	service/raft_sys_table_storage: untag index_t and term_t for queries With implicit conversion of tagged integers to untagged ones going away, explicitly untag index_t and term_t values in the following two contexts: - when they are passed to CQL queries as int64_t, - when they are default-constructed as fallbacks for int64_t fields missing from CQL result sets. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	4f1f207be1	raft/server: clean up index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in "raft/server.cc": - addition of tagged and untagged (both should be tagged) - subscripting an array by tagged (should be untagged) - comparing a size-like threshold against tagged (should be untagged) - exposing tagged via gauges (should be untagged) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	1b134d52ac	raft/tracker: don't drop out of index_t space for subtraction Tagged integers support subtraction; use it. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	b6233209d9	raft/fsm: clean up index_t and term_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in "raft/fsm.cc": - addition of tagged and untagged (both should be tagged) - comparison (relop) between tagged an untagged (both should be tagged) - subscripting or sizing an array by tagged (should be untagged) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	5b9a4428c6	raft/log: clean up index_t usage With implicit conversion of tagged integers to untagged ones going away, explicitly tag (or untag, as necessary) the operands of the following operations, in raft/log.{cc,h}: - addition of tagged and untagged (both should be tagged) - comparison (relop) between tagged an untagged (both should be tagged) - subscripting an array, or offsetting an iterator, by tagged (should be untagged) - comparing an array bound against tagged (should be untagged) - subtracting tagged from an array bound (should be untagged) Note: these files mix uniform initialization syntax (index_t{...}) with constructor call syntax (index_t()), with the former being more frequent. Stick with the former here too, for consistency. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	9e95f3a198	db/system_keyspace: promise a tagged integer from increment_and_get_generation() Internally, increment_and_get_generation() produces a "gms::generation_type" value. In turn, all callers of increment_and_get_generation() -- namely scylla_main() [main.cc] and single_node_cql_env::run_in_thread() [test/lib/cql_test_env.cc] -- pass the resolved value to storage_service::init_address_map() and storage_service::join_cluster(), both of which take a "gms::generation_type". Therefore it is pointless to "untag" the generation value temporarily between the producer and the consumers. Correct the return type of increment_and_get_generation(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	baccbc09c5	gms/gossiper: return "strong_ordering" from compare_endpoint_startup() The callers of gossiper::compare_endpoint_startup() need not (should not) learn of any particular (tagged or untagged) difference of generations; they only care about the ordering of generations. Change the return type of compare_endpoint_startup() to "std::strong_ordering", and delegate the comparison to tagged_tagged_integer::operator<=>. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Laszlo Ersek	3bb608056c	gms/gossiper: get "int32_t" value of "gms::version_type" explicitly In do_sort(), we need to drop to "int32_t" temporarily, so that we can call ::abs() on the version difference. Do that explicitly. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-14 13:35:08 +02:00
Michał Chojnowski	4d77faa61e	cql_test_env: ensure shutdown() before stop() for system_keyspace If system_keyspace::stop() is called before system_keyspace::shutdown(), it will never finish, because the uncleared shared pointers will keep it alive indefinitely. Currently this can happen if an exception is thrown before the construction of the shutdown() defer. This patch moves the shutdown() call to immediately before stop(). I see no reason why it should be elsewhere. Fixes scylladb/scylla-enterprise#4380 Closes scylladb/scylladb#20089	2024-08-14 12:16:44 +03:00
Kefu Chai	06c60f6abe	test/perf/perf_sstable: use switch-case when appropriate instead of using a chain of `if-else`, use switch-case instead, it's visually easier to follow than `if`-`else` blocks. and since we never need to handle the `else` case, the `throw` statement is removed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-14 17:14:42 +08:00
Kefu Chai	5141c6efe0	test/perf/perf_sstables: use test_modes as the type of its option before this change, we look up for the mode using the command line option as the key, but that's incorrect if the command line option does not match with any of the known names. in that case, `test_mode` just create another pair of <sstring, test_modes>, and return the second component of this pair. and the second component is not what we expect. we should have thrown an exception. in this change * the test_mode map is marked const. * the overloads for parsing / formatting the `test_modes` type are added, so that boost::program_options can parse and format it. after this change, * we can print more user friendly error, like ``` /scylla perf-sstable --mode index-foo error: the argument ('index-foo') for option '--mode' is invalid Try --help. ``` instead of a bunch of output which is printed as if we passes the correct option as the argument of the `--mode` option. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-14 17:14:42 +08:00
Dawid Medrek	4ba9cb0036	README: Update the version of C++ to C++23 Scylla has started being built with C++23. We update the information in the relevant documents accordingly. Closes scylladb/scylladb#20134	2024-08-14 12:06:23 +03:00
Kamil Braun	a3d53bd224	Merge 'Prevent ALTERing non-existing KS with tablets' from Piotr Smaron ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 (I suggest to disable displaying whitespace differences when reviewing this PR). Fixes: scylladb/scylladb#19576 Closes scylladb/scylladb#19666 * github.com:scylladb/scylladb: tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist cql: refactor rf_change indentation Prevent ALTERing non-existing KS with tablets	2024-08-14 10:27:41 +02:00
Piotr Smaron	ddb5204929	tests: ensure ALTER tablets KS doesn't crash if KS doesn't exist Using the error injection framework, we inject a sleep into the processing path of ALTER tablets KS, so that the topology coordinator of the leader node sleeps after the rf_change event has been scheduled, but before it is started to be executed. During that time the second node executes a DROP KS statement, which is propagated to the leader node. Once leader node wakes up and resumes processing of ALTER tablets KS, the KS won't exist and the node cannot crash, which was the case before.	2024-08-13 21:51:51 +02:00
Pavel Emelyanov	05adee4c82	test: Add test for s3::client::bucket_lister Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 21:15:43 +03:00
Pavel Emelyanov	a02e65c649	s3_client: Add bucket lister The lister resembles the directory_lister from util -- it returns entries upon its .get() invocation, and should be .close()d at the end. Internally the lister issues ListObjectsV2 request with provided prefix and limits the server with the amount of entries returned not to consume too much local memory (we don't have streaming XML parser for response). If the result is indeed truncated, the subsequent calls include the continuation token as per [1] [1] https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 21:15:43 +03:00
Avi Kivity	d82fd8b5f0	Merge 'Relax sstable_directory::process_descriptor() call graph' from Pavel Emelyanov The method logic is clean and simple -- load sstable from the descriptor and sort it into one of collections (local, shared, remote, unsorted). To achieve that there's a bunch of helper methods, but they duplicate functionality of each other. Squashing most of this code into process_descriptor() makes it easier to read and keeps sstable_directory private API much shorter. Closes scylladb/scylladb#20126 * github.com:scylladb/scylladb: sstable_directory: Open-code load_sstable() into process_descriptor() sstable_directory: Squash sort_sstable() with process_descriptor() sstable_directory: Remove unused sstable_filename(desc) helper sstable_directory: Log sst->get_filename(), not sstable_filename(desc) sstable_directory: Keep loaded sst in local var sstable_directory: Remove unused helpers sstable_directory: Load sstable once when sorting	2024-08-13 16:42:52 +03:00
Pavel Emelyanov	d3870304a9	sstable_directory: Open-code load_sstable() into process_descriptor() There are two load_sstable() overloads, and one of them is only used inside process_descriptor(). What this loading helper does is, in fact, processes given descriptor, so it's worth having it open-coded into its caller. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 13:27:00 +03:00
Pavel Emelyanov	da4a5df339	sstable_directory: Squash sort_sstable() with process_descriptor() The latter (caller) loads sstable, so does the former, so load it once and then put it in either list/set, depending on flags and shard info. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 13:26:10 +03:00
Pavel Emelyanov	d8cb175fb7	sstable_directory: Remove unused sstable_filename(desc) helper It's unused after previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	aa40aeb72f	sstable_directory: Log sst->get_filename(), not sstable_filename(desc) There are some places that log sstable Data file name via sstable descriptor. After previous patching all those loggers have sstable at hand and can use sstable::get_filename() instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	369f9111b8	sstable_directory: Keep loaded sst in local var This will make next patch shorter. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	ad3725fbbd	sstable_directory: Remove unused helpers After previous patch some wrappers around load_sstable() became unused. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:40 +03:00
Pavel Emelyanov	63f1969e08	sstable_directory: Load sstable once when sorting In order to decide which list to put sstable into, the sort_sstable() first calls get_shards_for_this_sstable() which loads the sstable anyway. If loaded shards contain only the current one (which is the common case) sstable is loaded again. In fact, if the sstable happens to be remote it's loaded anyway to get its open info. Fix that by loading sstable, then getting shards directly from it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 12:55:16 +03:00
Łukasz Paszkowski	ba2f037af5	mutation_partition: drop reverse parameter in compact_for_query The reverse parameter is no longer used with native reverse reads. The row ranges are provided in native reverse order together with a reversed schema, thus the reverse parameter remain false all the time and can be droped.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	43221bbeed	clustering_key_filter: unify get_ranges and get_native_ranges When a reverse slice is provided, it is given in the native reverse format. Thus the ranges will be returned in the same order as stored in the slice. Therefore there is no need to distinguish between get_ranges and get_native_ranges. The latter one gets dropped and get_ranges returns ranges in the same order as stored in the slice.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	8b5ec0e963	streamed_mutation_freezer: drop the reverse parameter The reverse parameter is no longer used with native reverse reads. A reversed schema is provided and thus the reverse parameter shall remain false all the time.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	f4ca734ccb	reverse-reads.md: Drop legacy reverse format information	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	b3bf555036	Fix comments refering to half-reversed (legacy) slices	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	15a01c7111	select_statement::do_execute: Add tracing informaction Add information on table and query schema versions to tracing.	2024-08-13 10:07:12 +02:00
Łukasz Paszkowski	158b994676	query::trim_clustering_row_ranges_to: require reversed schema for native reversed ranges Simplify implementation and for clustering key ranges in native reversed format, require a reversed table schema. Trimming native reversed clustering key ranges requires a reversed schema to be passed in. Thus, the reverse flag is no longer required as it would always be set to false.	2024-08-13 10:07:10 +02:00
Łukasz Paszkowski	8d95d44027	query-request: Drop half_reverse_slice as it is no longer used anywhere	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	da95f44adc	readers: Use reversed schema and native reversed slices The reconcilable_result is built as it would be constructed for forward read queries for tables with reversed order. Mutations constructed for reversed queries are consumed forward. Drop overloaded reversed functions that reverse read_command and reconcilable_result directly and keep only those requiring smart pointers. They are not used any more.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	faa62310d9	database: accept reversed schema for reversed queries Remove schema reversing in query() and query_mutations() methods. Instead, a reversed schema shall be passed for reversed queries. Rename a schema variable from s into query_schema for readability.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	df734e35a1	storage_proxy: Support reverse queries in native format For reversed queries, query_result() method accepts a reversed table schema and read_command with a query schema version and a slice in native reversed format. Support mixed-node clusters. In such a case, the feature flag native_reverse_queries is disabled and the read_command in sent to replicas in the old regacy format (stores table schema version and a slice in the legacy reverse format). After the reconciliation, for the read+repair case, un-reversed mutations are sent to replicas, i.e. forward ones.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	d9e76a5295	query_pagers: Replace _schema with _query_schema For readability purposes. As the constructor accepts a query schema, let the varaible holding a schema be called _query_schema.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	0b2e5ff28f	query_pagers: Support reverse queries in native format For reversed queries, accept a reversed table schema and read_command with a query schema version and a slice in native reversed format.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	309ba68692	select_statement: Execute reversed query in native format Use a reversed schema and a native reversed slice when constructing a read_command and executing a reversed select statement. Such a created read_command is passed further down to query_pagers::pager and storage::proxy::query_result that transform it to the format they accept/know, i.e. lagacy.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	8c391a8ebe	storage_proxy::remote: Add support for mixed-node clusters In handle_read, detect whether a coming read_command is in the legacy reversed format or native reversed format. The result will be used to transform the read_command between format as well as to transforms the results before they are send back to the coordinator.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	fbd324b5cd	mutation_query: Add reversed function to reverse reconcilable_result The reconcilable_result is reversed by reversing mutations for all paritions it holds. Reversing is asynchronous to avoid potential stall. Use for transitions between legacy and native formats and in order to support mixed-nodes clusters.	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	b91edbacf1	query-request: Add reversed function to reverse read_command The read_command is reversed by reversing the schema version it holds and transforming a slice from the legacy reversed format to the native reversed format. Use for trasition between format and to support mixed-nodes clusters	2024-08-13 10:03:46 +02:00
Łukasz Paszkowski	9690785112	features: add native_reverse_queries Enabled when all replicas support the native_reversed command slice and return the result in reverse order in this case.	2024-08-13 10:03:42 +02:00
Łukasz Paszkowski	7b201e9165	kl::reader::make_reader: Unify interface with mx::reader::make_reader Ensure both readers have the same interfaces to avoid mistakes as both readers are used in sstable::make_reader. Less error prone.	2024-08-13 10:02:43 +02:00
Łukasz Paszkowski	b270097f1f	config: drop reversed_reads_auto_bypass_cache Reverse reads have already been with us for a while, thus this back door option to bypass in-memory data cache for reversed queries can be retired.	2024-08-13 10:02:42 +02:00
Łukasz Paszkowski	80df313f49	config: drop enable_optimized_reversed_reads Reverse reads have already been with us for a while, thus this back door option to read entire paritions forward and reversing them after can be retired.	2024-08-13 10:02:42 +02:00
Pavel Emelyanov	6675bd8a5c	s3_client: Encode query parameter value for query-string When signing AWS query one need to prepare "query string" which is a line looking like `encode(query_param)=encode(query_value)&...`. Encoded are only the query parameter names and values. It was missing in current code and so far worked because no encodable characters were used. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-13 10:59:31 +03:00
Raphael S. Carvalho	74612ad358	tablets: Fix race between repair and split Consider the following: T 0 split prepare starts 1 repair starts 2 split prepare finishes 3 repair adds unsplit sstables 4 repair ends 5 split executes If repair produces sstable after split prepare phase, the replica will not split that sstable later, as prepare phase is considered completed already. That causes split execution to fail as replicas weren't really prepared. This also can be triggered with load-and-stream which shares the same write (consumer) path. The approach to fix this is the same employed to prevent a race between split and migration. If migration happens during prepare phase, it can happen source misses the split request, but the tablet will still be split on the destination (if needed). Similarly, the repair writer becomes responsible for splitting the data if underlying table is in split mode. That's implemented in replica::table for correctness, so if node crashes, the new sstable missing split is still split before added to the set. Fixes #19378. Fixes #19416. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-08-12 17:28:51 -03:00
Raphael S. Carvalho	239344ab55	compaction: Allow "offline" sstable to be split In order to fix the race between split and repair, we must introduce the ability to split an "offline" sstable, one that wasn't added to any of the table's sstable set yet. It's not safe to split a sstable after adding it to the set, because a failure to split can result in unsplit data left in the set, causing split to fail down the road, since the coordinator thinks this replica has only split data in the set. Refs #19378. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-08-12 17:27:16 -03:00
Laszlo Ersek	607abe96e8	test/sstable: merge test_using_reusable_sst*() All lambdas passed to test_using_reusable_sst() conform to the prototype void (test_env&, sstable_ptr) All lambdas passed to test_using_reusable_sst_returning() conform to the prototype NON_VOID (test_env&, sstable_ptr) The common parameter list of both prototypes can be expressed with the concept std::invocable<test_env&, sstable_ptr> Once a "Func" template parameter (i.e., function type) satisfying this concept is taken, then "Func"'s void or non-void return type can be commonly expressed with std::invoke_result_t<Func, test_env&, sstable_ptr> In turn, test_env::do_with_async_returning<...> can be instantiated with this return type, even if it happens to be "void". ([stmt.return] specifies, "[a] return statement with an operand of type void shall be used only in a function that has a cv void return type", meaning that return func(env) will do the right thing in the body of test_env::do_with_async_returning<void>().) Merge test_using_reusable_sst() and test_using_reusable_sst_returning() into one. Preserve the function name from the former, and the test_env::do_with_async_returning<...>() call from the latter. Suggested-by: Avi Kivity <avi@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#20090	2024-08-12 17:52:01 +03:00
Kefu Chai	db4654ca49	test/nodetool: deduce subpression path from top srcdir there are chances that developer launch `pytest` right under `test/nodetool`, in that case current working directory is not the root directory of the project, so the path to suppression rules does not point to a file. to cater the needs to run the test under `test/nodetool`, let's use the path deduced from the top_srcdir. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-12 22:50:18 +08:00
Kefu Chai	c817e13d63	test/nodetool: deduce path from top srcdir add a helper to get path from top src dir, more readable this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-08-12 22:50:18 +08:00
Nikos Dragazis	90363ce802	test: Test the SSTable validation API against malformed SSTables Unit testing for the SSTable validation API happens in `sstable_validate_test`. Currently, this test checks the API against some invalid SSTables with out-of-order clustering rows and out-of-order partitions. However, both are types of content-level corruption that do not trigger `malformed_sstable_exception` errors. Extend the test to cover cases of file-level corruption as well, i.e., cases that would raise a `malformed_sstable_exception`. Construct an SSTable with an invalid checksum to trigger this. This is part of the effort to improve scrub to handle all kinds of corruption. Fixes scylladb/scylladb#19057 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#20096	2024-08-12 15:09:58 +03:00
Botond Dénes	fec57c83e6	Merge 'cell_locker: maybe_rehash: ignore allocation failures' from Benny Halevy `maybe_rehash` is complimentary and is not strictly require to succeed. If it fails, it will retry on the next call, but there's no reason to throw an exception that will fail its caller, since `maybe_rehash` is called as the final step after the caller has already succeeded with its action. Minor enhancement for the error path, no backport required. Closes scylladb/scylladb#19910 * github.com:scylladb/scylladb: cell_locker: maybe_rehash: reindent cell_locker: maybe_rehash: ignore allocation failures	2024-08-12 10:54:56 +03:00
Kefu Chai	0ae04ee819	build: cmake: use $<CONFIG:cfgs> when appropriate per https://cmake.org/cmake/help/latest/manual/cmake-generator-expressions.7.html#genex:CONFIG, `cfgs` can be a comma-separated list. this is supported by CMake 3.19 and up, and our minimum required CMake version is 3.27. so let's switch over from the composition of `IN_LIST` and `CONFIG` generator expressions to a single one. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20110	2024-08-11 21:28:38 +03:00
Avi Kivity	318278ff92	Merge 'tablets: reload only changed metadata' from Botond Dénes Currently, each change to tablet metadata triggers a full metadata reload from disk. This is very wasteful, especially if the metadata change affects only a single row in the `system.tablets` table. This is the case when the tablet load balancer triggers a migration, this will affect a single row in the table, but today will trigger a full reload. We expect tablet count to potentially grow to thousands and beyond and the overhead of this full reload can become significant. This PR makes tablet metadata reload partial, instead of reloading all metadata on topology or schema changes, reload only the partitions that are affected by the change. Copy the rest from the in-memory state. This is done with two passes: first the change mutations are scanned and a hint is produced. This hint is then passed down to the reload code, which will use it to only reload parts (rows/partitions) of the metadata that has actually changed. The performance difference between full reload and partial reload is quite drastic: ``` INFO 2024-07-25 05:06:27,347 [shard 0:stat] testlog - Tablet metadata reload: full 616.39ms partial 0.18ms ``` This was measured with the modified (by this PR) `perf_tablets`, which creates 100 tables, each with 2K tablets. The test was modified to change a single tablet, then do a full and partial reload respectively, measuring the time it takes for reach. Fixes: #15294 New feature, no backport needed. Closes scylladb/scylladb#15541 * github.com:scylladb/scylladb: test/perf/perf_tablets: add tablet metadata reload perf measurement test/boost/tablets_test: add test for partial tablet metadata updates db/schema_tables: pass tablet hint to update_tablet_metadata() service/storage_service: load_tablet_metadata(): add hint parameter service/migration_listener: update_tablet_metadata(): add hint parameter service/raft/group0_state_machine: provide tablet change hint on topology change service/storage_service: topology_state_load(): allow providing change hint replica/tablets: add update_tablet_metadata() replica/tablets: fix indentation replica/tablets: extract tablet_metadata builder logic replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint() locator/tablets: add tablet_map::clear_tablet_transition_info() locator/tablets: make tablet_metadata cheap to copy mutation/canonical_mutation: add key()	2024-08-11 21:27:18 +03:00
Botond Dénes	2b2db510b7	test/perf/perf_tablets: add tablet metadata reload perf measurement Measure reload perf of full reload vs. partial reload, after changing a single tablet. While at it, modify the `--tablets-per-table` parameter, so that it has a default parameter which works OOTB. The previous default was both too large (causing oversized commitlog entry errors) and not a power of two.	2024-08-11 09:53:19 -04:00
Botond Dénes	65eee200b2	test/boost/tablets_test: add test for partial tablet metadata updates	2024-08-11 09:53:19 -04:00
Botond Dénes	b886ed44a7	db/schema_tables: pass tablet hint to update_tablet_metadata() Replace the has_tablet_mutations in `merge_tables_and_views()` with a hint parameter, which is calculated in the caller, from the original schema change mutations. This hint is then forwarded to the notifier's `update_tablet_metadata()` so that subscribers can refresh only the tablet partitions that changed.	2024-08-11 09:53:19 -04:00
Botond Dénes	5bff422b54	service/storage_service: load_tablet_metadata(): add hint parameter Allowing for reloading only those parts of the tablet metadata that were actually changed.	2024-08-11 09:53:19 -04:00
Botond Dénes	2cec0d8dd1	service/migration_listener: update_tablet_metadata(): add hint parameter The hint contains information related to what exactly changed, allowing listeners to do partial updates, instead of reloading all metadata on each notification.	2024-08-11 09:53:19 -04:00
Botond Dénes	ca302d9e28	service/raft/group0_state_machine: provide tablet change hint on topology change So that when reloading tablet state metadata from the disk, only the changed parts are reloaded.	2024-08-11 09:53:19 -04:00
Botond Dénes	806ec3244a	service/storage_service: topology_state_load(): allow providing change hint So that when reloading state from disk, only changed parts are reloaded instead of all. For now, only tablets have hints implemented.	2024-08-11 09:53:18 -04:00
Botond Dénes	bb1e733fe0	replica/tablets: add update_tablet_metadata() Allows updateng tablet metadata in-place, according to the provided hint, reading and updating only the parts that actually changed.	2024-08-11 09:52:37 -04:00
Botond Dénes	66292b4baa	replica/tablets: fix indentation Left broken from the previous patch.	2024-08-11 09:52:37 -04:00
Botond Dénes	aa378c458e	replica/tablets: extract tablet_metadata builder logic So it can be reused in a new method. Indentation is left broken deliberately, to make the patch easier to read.	2024-08-11 09:52:37 -04:00
Botond Dénes	f5976aa87b	replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint() Extract a hint of what a tablet mutation changed. The hint can be later used to selectively reload only the changed parts from disk. Two variants are added: * get_tablet_metadata_change_hint() - extracts a hint from a list of tablet mutations * update_tablet_metadata_change_hint() - updates an existing hint based on a single mutation, allowing for incremental hint extraction	2024-08-11 09:52:37 -04:00
Botond Dénes	54ea71f8a6	locator/tablets: add tablet_map::clear_tablet_transition_info()	2024-08-11 09:52:37 -04:00
Botond Dénes	0254cfc7d3	locator/tablets: make tablet_metadata cheap to copy Keep lw_shared_ptr<tablet_map> in the tablet map and use COW semantics. To prevent accidental changes to shared tablet_map instances, all modifications to a tablet_map have to go through a new `mutate_tablet_map()` method, which implements the copy-modify-swap idiom.	2024-08-11 09:52:37 -04:00
Botond Dénes	fb0ab3c1fb	mutation/canonical_mutation: add key() Extracts the partition key without deserializing the entire mutation.	2024-08-11 09:52:37 -04:00
Calle Wilund	e18a855abe	extensions: Add exception types for IO extensions and handle in memtable write path Fixes #19960 Write path for sstables/commitlog need to handle the fact that IO extensions can generate errors, some of which should be considered retry-able, and some that should, similar to system IO errors, cause the node to go into isolate mode. One option would of course be for extensions to simply generate std::system_errors, with system_category and appropriate codes. But this is probably a bad idea, since it makes it more muddy at which level an error happened, as well as limits the expressibility of the error. This adds three distinct types (sharing base) distinguishing permission, availabilty and configuration errors. These are treated akin to EACCESS, ENOENT and EINVAL in disk error handler and memtable write loop. Tests updated to use and verify behaviour. Closes scylladb/scylladb#19961	2024-08-11 13:52:35 +03:00
Raphael S. Carvalho	75829d75ec	replica: Fix race between split compaction and migration After removal of rwlock (`53a6ec05ed`), the race was introduced because the order that compaction groups of a tablet are closed, is no longer deterministic. Some background first: Split compaction runs in main (unsplit) group, and adds sstable to left and right groups on completion. The race works as follow: 1) split compaction starts on main group of tablet X 2) tablet X reaches cleanup stage, so its compaction groups are closed in parallel 3) left or right group are closed before main (more likely when only main has flush work to do) 4) split compaction completes, and adds sstable to left and right 5) if e.g left is closed, adjusting backlog tracker will trigger an exception, and since that happens in row cache update's execute(), node crashes. The problem manifested as follow: [shard 0: gms] raft_topology - Initiating tablet cleanup of 5739b9b0-49d4-11ef-828f-770894013415:15 on 102a904a-0b15-4661-ba3f-f9085a5ad03c:0 ... [shard 0:strm] compaction - [Split keyspace1.standard1 009e2f80-49e5-11ef-85e3-7161200fb137] Splitting [/var/lib/scylla/data/keyspace1/...] ... [shard 0:strm] cache - Fatal error during cache update: std::out_of_range (Compaction state for table [0x600007772740] not found), at: ... -------- seastar::continuation<seastar::internal::promise_base_with_type<void>, row_cache::do_update(... -------- seastar::internal::do_with_state<std::tuple<row_cache::external_updater, std::function<seastar::future<void> ()> >, seastar::future<void> > -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::internal::coroutine_traits_base<void>::promise_type -------- seastar::(anonymous namespace)::thread_wake_task -------- seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::async<sstables::compaction::run(... seastar::continuation<seastar::internal::promise_base_with_type<sstables::compaction_result>, seastar::future<sstables::compaction_resu... From the log above, it can be seen cache update failure happens under streaming sched group and during compaction completion, which was good evidence to the cause. Problem was reproduced locally with the help of tablet shuffling. Fixes: #19873. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19987	2024-08-11 11:00:19 +03:00
Botond Dénes	1f4b9a5300	Merge 'compaction: drop compaction executors' possibility to bypass task manager' from Aleksandra Martyniuk If parent_info argument of compaction_manager::perform_compaction is std::nullopt, then created compaction executor isn't tracked by task manager. Currently, all compaction operations should by visible in task manager. Modify split methods to keep split executor in task manager. Get rid of the option to bypass task manager. Closes scylladb/scylladb#19995 * github.com:scylladb/scylladb: compaction: replace optional<task_info> with task_info param compaction: keep split executor in task manager	2024-08-11 10:26:43 +03:00
Botond Dénes	0bb1075a19	Merge 'tasks: fix task handler' from Aleksandra Martyniuk There are some bugs missed in task handler: - wait_for_task does not wait until virtual tasks are done, but returns the status immediately; - wait_for_task suffers from use after return; - get_status_recursively does not set the kind of task essentials. Fix the aforementioned. Closes scylladb/scylladb#19930 * github.com:scylladb/scylladb: test: add test to check that task handler is fixed tasks: fix task handler	2024-08-11 10:23:17 +03:00
Paweł Zakrzewski	9db272c949	test/cql-pytest: Add test for GROUP BY queries with LIMIT Remove xfail from all tests for #5361, as the issue is fixed. Remove xfail from test_group_by_clustering_prefix_with_limit It references #5362, but is fixed by #17237. Refs #17237	2024-08-11 09:08:44 +02:00
Paweł Zakrzewski	e7ae7f3662	cql3: process LIMIT for GROUP BY queries Currently LIMIT not passed to the query executor at all and it was just an accident that it worked for the case referenced in #17237. This change passes the limit value down the chain.	2024-08-11 09:08:43 +02:00
Paweł Zakrzewski	3838ad64b3	cql3/select_statement: simplify the get_limit function The get_limit() function performed tasks outside of its scope - for example checked if the statement was an aggregate. This change moves the onus of the check to the caller.	2024-08-11 09:08:43 +02:00
Paweł Zakrzewski	08f3219cb8	cql3: respect the user-defined page size in aggregate queries The comment in the code already states that we should use the user-defined page size if it's provided. To avoid OOM conditions we'll use the internally defined limit as the upper bound or if no page size is provided. This change lays ground work for fixing #5362 and is necessary to pass the test introduced in #19392 once it is implemented.	2024-08-11 09:08:43 +02:00
Michał Jadwiszczak	3745d0a534	gms/feature_service: allow to suppress features This patch adds `suppress_features` error injection. It allows to revoke support for some features and it can be used to simulate upgrade process in test.py. Features to suppress are passed as injection's value, separated by `;`. Example: `PARALLELIZED_AGGREGATION;UDA_NATIVE_PARALLELIZED_AGGREGATION` Fixes scylladb/scylladb#20034 Closes scylladb/scylladb#20055	2024-08-09 19:15:19 +02:00
Kefu Chai	a78f46aad7	s3/client: customize options for input_stream before this change, we use the default options for performing read on the input. and the default options is like ```c++ struct file_input_stream_options { size_t buffer_size = 8192; ///< I/O buffer size unsigned read_ahead = 0; ///< Maximum number of extra read-ahead operations }; ``` which is not able to offer good throughput when reading from disk, when we stream to S3. so, in this change, we use options which allows better throughput. Refs `061def001d` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20074	2024-08-09 11:52:30 +03:00
Dawid Medrek	e5d01d4000	db/hints: Make commitlog use commitlog IO scheduling group Before these changes, we didn't specify which I/O scheduling group commitlog instances in hinted handoff should use. In this commit, we set it explicitly to the commitlog scheduling group. The rationale for this choice is the fact we don't want to cause a bottleneck on the write path -- if hints are written too slowly, new incoming mutations (NOT hints) might be rejected due to a too high number of hints currently being written to disk; see `storage_proxy::create_write_response_handler_helper()` for more context. Fixes scylladb/scylladb#18654 Closes scylladb/scylladb#19170	2024-08-08 16:14:07 +02:00
Piotr Dulikowski	b72906518f	Merge 'service levels: update connections parameters automatically' from Michał Jadwiszczak This patch makes all cql connections update theirs service level parameters automatically when: - any service level is created or changed - one role is granted to another - any service level is attached to/detached from a role First of all, the patch defines what a service level and an effective service level are `938aa10509`. No new type of service levels are introduced, the commit only clarifies definitions and names what an effective service level is. (Effective service level is created by merging all service levels which are attached to all roles granted to the user. It represents exact values of connection's parameters.) Previously, to find an effective service level of a user, it required O(n) internal queries: O(n) queries to recursively find all granted roles (`standard_role_manager::query_granted()`) and a query for each role to get its service level (`standard_role_manager::get_attribute()`, which sums to O(n) queries). Because we want to reload SL parameters for all opened cql connections, we don't want to do O(n) queries for every connection, every time we create or change any service level/grant one role to another/attach or detach a service level to/from a role. To speed it up, the patch adds another layer of service level controller cache, which stored `role_name -> effective_service_level` mapping. This way finding a effective service level for a role is only a lookup to a map. Building the new cache requires only 2 queries: one to obtain all role hierarchy one to get all roles' service level. Fixes scylladb/scylladb#12923 Closes scylladb/scylladb#19085 * github.com:scylladb/scylladb: test/auth_cluster/test_raft_service_levels: add test for automatic connection update api/cql_server_test: add CQL server testing API transport/cql_server: subscribe to sl effective cache reloaded transport/controller: coroutinize `subscribe_server` and `unsubscribe_server` transport/cql_server: add method to update service level params on all connections generic_server: use async function in `for_each_gently()` service/qos/sl_controller: use effective service levels cache service/qos/service_level_controller: notify subscribers on effective cache reloaded service/raft/group0_state_machine: update effective service levels cache service/topology_coordinator: migrate service levels before auth service/qos/service_level_controller: effective service levels cache utils/sorting: allow to pass any container as verticies service/qos/service_level_controller: replace shard check to assert service/qos: define effective service level service/qos/qos_common: use const reference in `init_effective_names()` service/qos/service_level_controller: remove unused field auth: return map of directly granted roles test/auth/test_auth_v2_migration: create sl1 in the test	2024-08-08 15:31:04 +02:00
Anna Stuchlik	a1b4357765	doc: update Raft info in 6.1 This commit updates the Raft information regarding the Raft verification procedure. In 6.1, the procedure is no longer related to the upgrade. Fixes https://github.com/scylladb/scylladb/issues/19932 Closes scylladb/scylladb#20040	2024-08-08 11:25:50 +02:00
PeterFlockhart	0f9c6d24cf	Update SELECT grammar to define group_by_clause explicitly Closes scylladb/scylladb#20046	2024-08-08 12:23:20 +03:00
Avi Kivity	12c68bcf75	Merge 'querier: include cell stats in page stats' from Botond Dénes We have two mechanism to give visibility into reads having to process many tombstones: * a warning in the logs, triggered if a read processed more the `tombstone_warn_threshold` dead rows/tombstones * a trace message, which includes stats of the amount of rows in the page, including the amount of live and dead rows as well as tombstones This series extends this to also include information on cells, so we have visibility into the case where a read has to process an excessive amount of cell tombstones (mainly because of collections). A log line is now also logged if the amount of dead cells/tombstones in the page exceeds `tombstone_warn_threshold`. The trace message is also extended to contain cell stats. The `tombstone_warn_threshold` log lines now receive a 10s rate-limit to avoid excessive log spamming. The rate-limit is separate for the row and cell logs. Example of the new log line (`tombstone_warn_threshold=10` ): ``` WARN 2024-05-30 07:56:44,979 [shard 0:stmt] querier - Read 98 live cells and 126 dead cells/tombstones for system_schema.scylla_tables <partition-range-scan> (-inf, +inf) (see tombstone_warn_threshold) ``` Example of the new tracing message: ``` Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 13 cell(s) (1 live, 12 dead) [shard 0] \| 2024-05-30 08:13:19.690803 \| 127.0.0.1 \| 6114 \| 127.0.0.1 ``` Fixes: https://github.com/scylladb/scylladb/issues/18996 Improvement, not a backport candidate. Closes scylladb/scylladb#18997 * github.com:scylladb/scylladb: test/boost: mutation_test: add test for cell compaction stats mutation/compact_and_expire_result: drop operator bool() querier: consume_page(): add rate-limiting to tombstone warnings querier: consume_page(): add cell stats to page stats trace message querier: consume_page(): add tombstone warning for cell tombstones querier: consume_page(): extract code which logs tombstone warning mutation/mutation_compactor: collect and aggregate cell compaction stats mutation: row::compact_and_expire(): use compact_and_expire_result collection_mutation: compact_and_expire(): use compact_and_expire_result mutation: introduce compact_and_expire_result	2024-08-08 12:16:13 +03:00
Calle Wilund	d6742e9bce	distributed_loader: Remove load_prio_keyspaces Fixes #13334 All required code paths (see enterprise) now uses extensions::is_extension_internal_keyspace. The old mechanism can be removed. One less global var. Closes scylladb/scylladb#20047	2024-08-08 12:10:27 +03:00
Avi Kivity	db77b5bd03	Merge 'convert the rest of `test/boost/sstable_test.cc` to co-routines and seastar::thread' from Laszlo Ersek This is a followup to #19937, for #19803. See in particular [this comment](https://github.com/scylladb/scylladb/issues/19803#issuecomment-2258371923). The primary conversion target is coroutines. However, while coroutines are the most convenient style, they are only infrequently usable in this case, for the following reasons: - Wherever we have a `future::finally()` that calls a cleanup function that returns a future (which must be awaited), we cannot use `co_await`. We can only use `seastar::async()` with `deferred_close` or `defer()`. - The code passes lots of lambdas, and `co_await` cannot be used in lambdas. First, I tried, and the compiler rejects it; second, a capturing lambda that is a coroutine is a trap [[1]](https://devblogs.microsoft.com/oldnewthing/20211103-00/?p=105870) [[2]](https://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Rcoro-capture). In most cases, I didn't have to use naked `seastar::async()`; there were specialized wrappers in place already. Thus, most of the changes target `seastar::thread` context under existent `seastar::async()` wrappers, and only a few functions end up as coroutines. The last patch in the series (`test/sstable: remove useless variable from promoted_index_read()`) is an independent micro-cleanup, the opportunity for which I thought to have noticed while reading the code. The tail of `test/boost/sstable_test.cc` (the stuff following `promoted_index_read()`) is already written as `seastar::thread`. That's already better (for readability) than future chaining; but could have I perhaps further converted those functions to coroutines? My answer was "no": - Some of the candidate functions relied on deferred cleanups that might need to yield (all three variants of `count_rows()`). - Some had been implemented by passing lambdas to wrappers of `seastar::async()` (`sub_partition_read()`, `sub_partitions_read()`). - The test case `test_skipping_in_compressed_stream()` initially looked promising for co-routinization (from its starting point `seastar::async()`), because it seemed to employ no deferred cleanup (that might need to yield). However, the function uses three lambdas that must be able to yield internally, and one of those (`make_is()`) is even capturing. - The rest (`test_empty_key_view_comparison()`, `test_parse_path_good()`, `test_parse_path_bad()`) was synchronous code to begin with. ``` test/boost/sstable_test.cc \| 188 +++++++++----------- 1 file changed, 83 insertions(+), 105 deletions(-) ``` Refactoring; no backport needed. Closes scylladb/scylladb#20011 * github.com:scylladb/scylladb: test/sstable: remove useless variable from promoted_index_read() test/sstable: rewrite promoted_index_read() with async() test/sstable: unfuturize lambda invocation in test_using_reusable_sst() test/sstable: rewrite wrong_range() with async() test/sstable: simplify not_find_key_composite_bucket0() under test_using_reusable_sst() test/sstable: rewrite full_index_search() with async() test/sstable: simplify find_key(), all_in_place() under test_using_reusable_sst() test/sstable: rewrite (un)compressed_random_access_read() with async() test/sstable: simplify write_and_validate_sst() test/sstable: simplify check_toc_func() under async() test/sstable: simplify check_statistics_func() under async() test/sstable: simplify check_summary_func() under async() test/sstable: coroutinize check_component_integrity() test/sstable: rewrite write_sst_info() with async() test/sstable: simplify missing_summary_first_last_sane() test/sstable: coroutinize summary_query_fail() test/sstable: rewrite summary_query() with async() test/sstable: coroutinize (simple/composite)_index_read() test/sstable: rewrite index_read() with async() test/sstable: rewrite test_using_reusable_sst() with async() test/sstable: rewrite test_using_working_sst() with async()	2024-08-08 11:55:37 +03:00
Michał Jadwiszczak	b62a8b747a	test/auth_cluster/test_raft_service_levels: add test for automatic connection update	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	870bdaa6b1	api/cql_server_test: add CQL server testing API Add a CQL server testing API with and endpoint to dump service level parameters of all CQL connections. This endpoint will be later used to test functionality of automated updating CQL connections parameters.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	c3e8778ad4	transport/cql_server: subscribe to sl effective cache reloaded Make cql server (but not maintenance server) is subscribed to qos configuration change. Trigger update of connections' service level params on effective cache reloaded event. It's not done on maintenance server because it doesn't support role hierarchy nor attaching service levels.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	b2f2288292	transport/controller: coroutinize `subscribe_server` and `unsubscribe_server`	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	4af90726b6	transport/cql_server: add method to update service level params on all connections Trigger update of service level param on every cql connection. In enterprise, the method needs also to update connections' scheduling group.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	324b3c43c0	generic_server: use async function in `for_each_gently()` In the following patch, we will add a method to update service levels parameters for each cql connections. To support this, this patch allows to pass async function as a parameter to `for_each_gently()` method.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	93e6de0d04	service/qos/sl_controller: use effective service levels cache Use cache to quickly access effective service level of a role.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	664a1913c6	service/qos/service_level_controller: notify subscribers on effective cache reloaded Add event representing reload of effective service level cache and notify subscribers when the cache is reloaded.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	5f8132c13c	service/raft/group0_state_machine: update effective service levels cache Updates to `system.role_members` and `system.role_attributes` affect effective service levels cache, so applying mutations to those tables should reload the effective SL cache.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	7b28df9b4d	service/topology_coordinator: migrate service levels before auth Effective service level cache will be updated when mutations are applied to some of the auth tables. But the effective cache depends on first-level service levels cache, so service levels data should be migrated before auth data.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	842573d0af	service/qos/service_level_controller: effective service levels cache Add a second layer of service_level_controller cache which contains role name -> effective service level mapping. To build the mapping, controller uses first cache layer (service level name -> service level) and 2 queries to auth tables (one to `roles` and one to `role_members`).	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	4922f87fed	utils/sorting: allow to pass any container as verticies The container containing all verticies doesn't have to be a vector. Allowing to pass any container that meet conditions, will make to function more flexible.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	619937c466	service/qos/service_level_controller: replace shard check to assert The cache is only updated on shard 0, so doing assert is a better sanity check.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	be4c83ad3c	service/qos: define effective service level Write down definitions of `service level` and `effective service level` in service/qos/service_level_controller.hh. Until now, effective service level was only used as result of `LIST EFFECTIVE SERVICE LEVEL OF <role>`. Now we want to have quick access to effective service level of each role and introduce cache of effective sl to do it. New definitions clarify things. The commit also renames: - `update_service_levels_from_distributed_data` -> `update_service_levels_cache` Later we will introduce effective_service_level_cache, so this change standarizes the names. - `find_service_level` -> `find_effective_service_level` The function actualy returns effective service level.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	0da979e013	service/qos/qos_common: use const reference in `init_effective_names()` `service_level_options::init_effective_names()` method's argument has no reason to be mutable reference. This commit converts it to const ref.	2024-08-08 10:42:09 +02:00
Michał Jadwiszczak	37cd998993	service/qos/service_level_controller: remove unused field	2024-08-08 10:42:08 +02:00
Michał Jadwiszczak	f9048de0ce	auth: return map of directly granted roles Returns multimap of directly granted roles for each role. Uses only one query to create the map, instead of doing recursive queries for each individual role.	2024-08-08 10:42:08 +02:00
Michał Jadwiszczak	d643d5637c	test/auth/test_auth_v2_migration: create sl1 in the test Test `test_auth_v2_migration` creates auth data where role `users` has assigned service level `sl:fefe` but the service level isn't actually created. In following patches, we are going to introduce effective service levels cache which depends on auth and is refreshed when mutations are applied to v2 auth tables. Without this changes, this test will fail because the service level doesn't exist. Also the name `sl:fefe` is change to `sl1`.	2024-08-08 10:42:08 +02:00
Avi Kivity	3fe60560d2	Merge 'Coroutinize view_builder::start()' from Pavel Emelyanov It runs in the background and consists of two parts -- async() lambda and following .then()-s. This PR move the background running code into its own method and coroutinizes it in parts. With #19954 merged it finally looks really nice. Closes scylladb/scylladb#20058 * github.com:scylladb/scylladb: view_builder: Restore indentation after previous patches view_builder: Coroutinize inner start_in_background() calls view_builder: Coroutinize outer start_in_background() calls view_builder: Add helper method for background start	2024-08-07 19:47:32 +03:00
Kamil Braun	4181a1c53e	storage_service: raft topology: warn when `raft_topology_cmd_handler` fails due to abort Currently we print an ERROR on all exceptions in `raft_topology_cmd_handler`. This log level is too high, in some cases exceptions are expected -- like during shutdown. And it causes dtest failures. Turn exceptions from aborts into WARN level. Also improve logging by printing the command that failed. Fixes scylladb/scylladb#19754 Closes scylladb/scylladb#19935	2024-08-07 17:57:23 +02:00
Tomasz Grabiec	1a4baa5f9e	tablets: Do not allocate tablets on nodes being decommissioned If tablet-based table is created concurrently with node being decommissioned after tablets are already drained, the new table may be permanently left with replicas on the node which is no longer in the topology. That creates an immidiate availability risk because we are running with one replica down. This also violates invariants about replica placement and this state cannot be fixed by topology operations. One effect is that this will lead to load balancer failure which will inhibit progress of any topology operations: load_balancer - Replica 154b0380-1dd2-11b2-9fdd-7156aa720e1a:0 of tablet 7e03dd40-537b-11ef-9fdd-7156aa720e1a:1 not found in topology, at: ... Fixes #20032 Closes scylladb/scylladb#20053	2024-08-07 18:52:58 +03:00
Dawid Medrek	96509c4cf7	db/hints: Make sync points be created for all hosts when not specified Sync points are created, via POST HTTP requests, for a subset of nodes in the cluster. Those nodes are specified in a request's parameter `target_hosts`. When the parameter is empty, Scylla should assume the user wants to create a sync point for ALL nodes. Before these changes, sync points were created only for LIVE nodes. If a node was dead but still part of the cluster and the user requested creating a sync point leaving the parameter `target_hosts` empty, the dead node was skipped during the creation of the sync point. That was inconsistent with the guarantees the sync point API provides. In this commit, we fix that issue and add a test verifying that the changes have made the implementation compliant with the design of the sync point API -- the test only passes after this commit. Fixes scylladb/scylladb#9413 Closes scylladb/scylladb#19750	2024-08-07 13:15:20 +02:00
Pavel Emelyanov	63afbc0fcb	view_builder: Restore indentation after previous patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	aa1a5d3201	view_builder: Coroutinize inner start_in_background() calls One of the co_await-ed parts of this method is async() lambda. It can be coroutinized too. One thing to care is the semaphore units -- its scope should (?) terminate earlier than the whole start_in_background() so release it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	167c6a9c5e	view_builder: Coroutinize outer start_in_background() calls The method consists of two parts -- one running in async() thread and continuations to it. This patch turns the latter chain into co_await-s. The mentioned chain is "guarded" by then_wrapped() catch of any exception, which is turned into a plain try-catch block. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 14:00:01 +03:00
Pavel Emelyanov	10a87f5c5b	view_builder: Add helper method for background start The view_builder::start() happens in the background. It's good to have explicit start_in_background() method and coroutinize it next. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-07 13:59:57 +03:00
Dawid Medrek	ec691a84a5	docs/hinted_handoff: Describe sync point HTTP API In this commit, we describe the mechanism of sync point in Hinted Handoff in the user documentation. We explain the motivation for it and how to use it, as well as list and describe all of the parameters involved in the process. Errors that may appear and experienced by the user are addressed in the article. Fixes scylladb/scylladb#18500 Closes scylladb/scylladb#19686	2024-08-07 11:12:23 +02:00
Pavel Emelyanov	2fd60b0adc	api: Move config-related endpoints from storage_service.cc The get_all_data_file_locations and get_saved_caches_location get the returned data from db::config and should be next other endpoints working with config data. refs: #2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19958	2024-08-07 10:18:29 +03:00
Piotr Dulikowski	1963619803	Merge 'Use cross shard barrier to start view builder' from Pavel Emelyanov When starting, view builder wants all shards to synchronize with each other in the middle of initialization. For that they all synchronize via shard-0's instance counter and a shared future. There's cross-shard barrier in utils/ that provides the same facility. Closes scylladb/scylladb#19954 * github.com:scylladb/scylladb: view_builder: Drop unused members view_builder: Use cross-shard barrier on start view_builder: Add cross-shard barrier to its .start() method	2024-08-07 08:54:15 +02:00
Botond Dénes	78206a3fad	test/boost: mutation_test: add test for cell compaction stats	2024-08-06 08:56:28 -04:00
Botond Dénes	259a59bd64	mutation/compact_and_expire_result: drop operator bool() Having an operator bool() on this struct is counter-intuitive, so this commit drops it and migrates any remaining users to bool is_live(). The purpose of this operator bool() was to help in incrementally replace the previous bool return type with compact_and_expire_result in the compact_and_expire() call stack. Now that this is done, it has served its purpose.	2024-08-06 08:56:28 -04:00
Botond Dénes	f638c37c4b	querier: consume_page(): add rate-limiting to tombstone warnings These warnings can be logged once per query, which could result in filling the logs with thousands of log lines. Rate-limit to once per 10sec.	2024-08-06 08:56:11 -04:00
Botond Dénes	d69b16a51e	querier: consume_page(): add cell stats to page stats trace message	2024-08-06 08:56:11 -04:00
Botond Dénes	98c599f73a	querier: consume_page(): add tombstone warning for cell tombstones Since it is really difficult to meaningfully aggregate cell tombstones with row tombstones, there is two separate warning for them.	2024-08-06 08:56:11 -04:00
Botond Dénes	fa2ee6d545	querier: consume_page(): extract code which logs tombstone warning Soon, we want to log a warning on too many cell tombstones as well. Extract the logging code to allow reuse between the row and cell tombstone warnings.	2024-08-06 08:56:11 -04:00
Botond Dénes	e403644c8b	mutation/mutation_compactor: collect and aggregate cell compaction stats row::compact_and_expire() now returns details cell stats. Collect and aggregate these, using the existing compaction_stats::row_stats structure.	2024-08-06 08:56:11 -04:00
Botond Dénes	0396db497c	mutation: row::compact_and_expire(): use compact_and_expire_result Collect, store and return stats about cells, via compact_and_expire_result.	2024-08-06 08:56:11 -04:00
Botond Dénes	2c6d4e21e6	collection_mutation: compact_and_expire(): use compact_and_expire_result Collect, store and return stats about cells, via compact_and_expire_result.	2024-08-06 08:56:11 -04:00
Botond Dénes	e773a8eee6	mutation: introduce compact_and_expire_result To hold cell stats, to be collected during row::compact_and_expire(). Users will come in the next patches.	2024-08-06 08:56:11 -04:00
Aleksandra Martyniuk	9ec8000499	test: add test to check that task handler is fixed	2024-08-06 13:15:33 +02:00
Aleksandra Martyniuk	811ca00cec	tasks: fix task handler There are some bugs missed in task handler: - wait_for_task does not wait until virtual tasks are done, but returns the status immediately; - wait_for_task suffers from use after return; - get_status_recursively does not set the kind of task essentials. Fix the aforementioned.	2024-08-06 13:15:13 +02:00
Anna Stuchlik	849856b964	doc: add post-installation configuration to the Web Installer page This commit extracts the information about the configuration the user should do right after installation (especially running scylla_setup) to a separate file. The file is included in the relevant pages, i.e., installing with packages and installing with Web Installer. In addition, the examples on the Web Installer page are updated with supported versions of ScyllaDB. Fixes https://github.com/scylladb/scylladb/issues/19908 Closes scylladb/scylladb#20035	2024-08-06 13:49:09 +03:00
Kamil Braun	f348f33667	raft topology: improve logging Add more logging for raft-based topology operations in INFO and DEBUG levels. Improve the existing logging, adding more details. Fix a FIXME in test_coordinator_queue_management (by readding a log message that was removed in the past -- probably by accident -- and properly awaiting for it to appear in test). Enable group0_state_machine logging at TRACE level in tests. These logs are relatively rare (group 0 commands are used for metadata operations) and relatively small, mostly consist of printing `system.group0_history` mutation in the applied command, for example: ``` TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - apply() is called with 1 commands TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd: prev_state_id: optional(dd9d47c6-50ee-11ef-d77f-500b8e1edde3), new_state_id: dd9ea5c6-50ee-11ef-ae64-dfbcd08d72c3, creator_addr: 127.219.233.1, creator_id: 02679305-b9d1-41ef-866d-d69be156c981 TRACE 2024-08-02 18:47:12,238 [shard 0: gms] group0_raft_sm - cmd.history_append: {canonical_mutation: table_id 027e42f5-683a-3ed7-b404-a0100762063c schema_version c9c345e1-428f-36e0-b7d5-9af5f985021e partition_key pk{0007686973746f7279} partition_tombstone {tombstone: none}, row tombstone {range_tombstone: start={position: clustered, ckp{0010b4ba65c64b6e11ef8080808080808080}, 1}, end={position: clustered, ckp{}, 1}, {tombstone: timestamp=1722617232237511, deletion_time=1722617232}}{row {position: clustered, ckp{0010dd9ea5c650ee11efae64dfbcd08d72c3}, 0} tombstone {row_tombstone: none} marker {row_marker: 1722617232237511 0 0}, column description atomic_cell{ create system_distributed keyspace; create system_distributed_everywhere keyspace; create and update system_distributed(_everywhere) tables,ts=1722617232237511,expiry=-1,ttl=0}}} ``` note that the mutation contains a human-readable description of the command -- like "create system_distributed keyspace" above. These logs might help debugging various issues (e.g. when `apply` hangs waiting for read_apply mutex, or takes too long to apply a command). Ref: scylladb/scylladb#19105 Ref: scylladb/scylladb#19945 Closes scylladb/scylladb#19998	2024-08-06 11:50:16 +03:00
Kamil Braun	aa9d5fe3f5	Merge 'doc: add the 6.0-to-6.1 upgrade guide' from Anna Stuchlik This PR adds the 6.0-to-6.1 upgrade guide (including metrics) and removes the 5.4-to-6.0 upgrade guide. Compared 5.4-to-6.0, the the 6.0-to-6.1 guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure. - Removed the references to the Enable Consistent Topology Updates page (which was in version 6.0 and is removed with this PR) across the docs. See the individual commits for more details. Fixes https://github.com/scylladb/scylladb/issues/19853 Fixes https://github.com/scylladb/scylladb/issues/19933 This PR must be backported to branch-6.1 as it is critical in version 6.1. Closes scylladb/scylladb#19983 * github.com:scylladb/scylladb: doc: remove the 5.4-to-6.0 upgrade guide doc: add the 6.0-to-6.1 upgrade guide	2024-08-06 10:23:18 +02:00
Andrei Chekun	cc428e8a36	[test.py] Increase pool size for CI Currently, the resource utilization in CI is low. Increasing the number of clusters will increase how many tests are executed simultaneously. This will decrease the time it takes to execute and improve resource utilization. Related: https://github.com/scylladb/qa-tasks/issues/1667 Closes scylladb/scylladb#19832	2024-08-06 11:20:36 +03:00
Botond Dénes	822d3b11d0	tool/scylla-nodetool: refresh: improve error-message on missing ks/tbl args The command has a singl check for the missing keyspace and/or table parameters and if the check fails, there is a combined error message. Apparently this is confusing, so split the check so that missing keyspace and missing table args have its own check and error message. Fixes: scylladb/scylladb#19984 Closes scylladb/scylladb#20005	2024-08-05 22:36:05 +03:00
Anna Stuchlik	32fa5aa938	doc: remove the 5.4-to-6.0 upgrade guide This commit removes the 5.4-to-6.0 upgrade guide and all references to it. It mainly removes references to the Enable Consistent Topology Updates page, which was added as enabling the feature was optional. In rare cases, when a reference to that page is necessary, the internal link is replaced with an external link to version 6.0. Especially the Handling Cluster Membership Change Failures page was modified for troubleshooting purposes rather than removed.	2024-08-05 20:13:48 +02:00
Kefu Chai	b1405da6ac	s3/client: use div_ceil() defined by utils/div_ceil.hh instead of reinventing the wheel, let's use the existing one. in this change, we trade the `div_ceil()` implementated in s3/client.cc for the existing one in utils/div_ceil.hh . because we are not using `std::lldiv()` anymore, the corresponding `#include <cstdlib>` is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20000	2024-08-05 15:35:18 +03:00
Kefu Chai	12a066ccdf	sstable_directory: use return_exception_ptr() when appropriate instead of using `std::rethrow_exception()`, use `coroutine::return_exception_ptr()` which is a little bit more efficient. See also `6cafd83e1c` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20001	2024-08-05 12:54:27 +03:00
Kefu Chai	0bc886d005	service: mark fmt::formatter<T>::format() as const fmt 11 enforces the constness of `format()` member function, if it is not marked with `const`, the tree fails to build with fmt 11, like: ``` /usr/include/fmt/base.h:1393:23: error: no matching member function for call to 'format' 1393 \| ctx.advance_to(cf.format(static_cast<qualified_type>(arg), ctx)); \| ~~~^~~~~~ /usr/include/fmt/base.h:1374:21: note: in instantiation of function template specialization 'fmt::detail::value<fmt::context>::format_custom_arg<service::migration_badness, fmt::formatter<service::migration_badness>>' requested here 1374 \| custom.format = format_custom_arg< \| ^ /home/kefu/dev/scylladb/service/tablet_allocator.cc:170:14: note: in instantiation of function template specialization 'fmt::format_to<fmt::basic_appender<char>, const locator::global_tablet_id &, const locator::tablet_replica &, const locator::tablet_replica &, const service::migration_badness &, 0>' requested here 170 \| fmt::format_to(ctx.out(), "{{tablet: {}, {} -> {}, badness: {}", candidate.tablet, candidate.src, \| ^ /home/kefu/dev/scylladb/service/tablet_allocator.cc:161:10: note: candidate function template not viable: 'this' argument has type 'const fmt::formatter<service::migration_badness>', but method is not marked const 161 \| auto format(const service::migration_badness& badness, FormatContext& ctx) { \| ^ ``` so, in this change, we mark these two `format()` member functions const. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20013	2024-08-05 12:53:42 +03:00
Piotr Dulikowski	a038a1fdef	Merge 'db: coroutinize do_apply_counter_update' from Michael Litvak rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function. The second commit adds a test that drops a table while there is a counter update operation ongoing in the table. The test reproduces issue https://github.com/scylladb/scylla-enterprise/issues/4475 and verifies it is fixed. Follow-up to https://github.com/scylladb/scylladb/pull/19948 Doesn't require backport because the fix to the issue was already done and backported. This is just cleanup and a test. Closes scylladb/scylladb#19982 * github.com:scylladb/scylladb: db: test counter update while table is dropped db: coroutinize do_apply_counter_update	2024-08-05 10:08:18 +02:00
Nadav Har'El	247b84715a	test/cql-pytest: reproducers for key length bugs Recently, some users have seen "Key size too large" errors in various places. Cassandra and Scylla impose a 64KB length limit on keys, and we have known about bugs in this area for a long time - and even had some translated Cassandra unit tests that cover some of them. But these tests did not cover all the corner cases and left us with partial and fragmented knowledge of this problem, spread over many test files and many issues. In this patch, we add a single test file, test/cql-pytest/test_key_length.py which attempts to rigourously explore the various bugs we have with CQL key length limits. These test aim to reproduce all known bugs in this area: * Refs #3017 - CQL layer accepts set values too large to be written to an sstable * Refs #10366 - Enforce Key-length limits during SELECT * Refs #12247 - Better error reporting for oversized keys during INSERT * Refs #16772 - Key length should be limited to exactly 65535, not less The following less interesting bug is already covered by many tests so I decided not to test it again: * Refs #7745 - Length of map keys and set items are incorrectly limited to 64K in unprepared CQL There's also a situation in materialized views and secondary indexes, where a column that was _not_ a key, now becomes a key, and a length limit needs to be enforced on it. We already have good test coverage for this (in test/cql-pytest/test_secondary_index.py and in test/cql-pytest/test_materialized_view.py), and we have an issue: * Refs #8627 - Cleanly reject updates with indexed values where value > 64k All 16 tests added here pass on Cassandra 5 except one that fails on https://issues.apache.org/jira/browse/CASSANDRA-19270, but 11 of the tests currently fail on Scylla (6 on #12247, 2 on #10366, 3 on #16772). It is possible that our decision in #16772 will not be to fix Scylla to match Cassandra but rather to declare that strict compatibility isn't needed in this case or even that Cassandra is wrong. But even then, having these tests which demonstrate the behavior of both Cassandra and Scylla will be important. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#16779	2024-08-05 10:13:49 +03:00
Tzach Livyatan	861a1cedea	Improve tombstone_compaction_interval description Closes scylladb/scylladb#19072	2024-08-05 10:10:55 +03:00
Pavel Emelyanov	f0f28cf685	docs: Extend debugging with info about exploring ELF notes When debugging coredumps some (small, but useful) information is hidden in the notes of the core ELF file. Add some words about it exists, what it includes and the thing that is always forgotten -- the way to get one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19962	2024-08-05 09:49:52 +03:00
Tzach Livyatan	858fd4d183	Update tracing.rst - fix table node_slow_log_time name Closes scylladb/scylladb#19893	2024-08-05 09:47:27 +03:00
Botond Dénes	76b6e8c5aa	Merge 'Drop datadir from keyspace::config' from Pavel Emelyanov Commit `ad0e6b79` (replica: Remove all_datadir from keyspace config) removed all_datadirs from keyspace config, now it's datadir turn. After this change keyspace no longer references any on-disk directories, only the sstables's storage driver attached to keyspace's tables does. refs #12707 Closes scylladb/scylladb#19866 * github.com:scylladb/scylladb: replica: Remove keyspace::config::datadir sstables/storage: Evaluate path for keyspace directory in storage sstables/storage: Add sstables_manager arg to init_keyspace_storage()	2024-08-05 09:46:29 +03:00
Avi Kivity	2eff4b41ad	repair: row_level: coroutinize working_row_hashes() It uses do_with, so it allocates unconditionally. Might as well use the allocation for a nice coroutine. Closes scylladb/scylladb#19915	2024-08-05 08:55:34 +03:00
Anna Stuchlik	eca2dfd8c3	doc: add OS support for version 6.1 This commit adds OS support for version 6.1 and removes OS support for 5.4 (according to our support policy for versions). Closes scylladb/scylladb#19992	2024-08-05 08:25:16 +03:00
Avi Kivity	aa1270a00c	treewide: change assert() to SCYLLA_ASSERT() assert() is traditionally disabled in release builds, but not in scylladb. This hasn't caused problems so far, but the latest abseil release includes a commit [1] that causes a 1000 insn/op regression when NDEBUG is not defined. Clearly, we must move towards a build system where NDEBUG is defined in release builds. But we can't just define it blindly without vetting all the assert() calls, as some were written with the expectation that they are enabled in release mode. To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT() macro in utils/assert.hh. This macro is always defined and is not conditional on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release mode. [1] `66ef711d68` Closes scylladb/scylladb#20006	2024-08-05 08:23:35 +03:00
Avi Kivity	cdee667170	alternator: destroy streamed json values gently Large json return values are streamed to avoid memory pressure and stalls, but are destroyed all at once. This in itself can cause stalls [1]. Destroy them gently to avoid the stalls. [1] ++[0#1/1 100%] addr=0x46880df total=514498 count=7004 avg=73: \| seastar::backtrace<seastar::backtrace_buffer::append_backtrace_oneline()::{lambda(seastar::frame)#1}> at ./build/release/seastar.lto/./seastar/include/seastar/util/backtrace.hh:64 ++ - addr=0x4680b35: \| seastar::backtrace_buffer::append_backtrace_oneline at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:839 \| (inlined by) seastar::print_with_backtrace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:858 ++ - addr=0x46800f7: \| seastar::internal::cpu_stall_detector::generate_trace at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1469 ++ - addr=0x4680178: \| seastar::internal::cpu_stall_detector::maybe_report at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1206 \| (inlined by) seastar::internal::cpu_stall_detector::on_signal at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:1226 ++ - addr=0x3dbaf: ?? ??:0 ++[1#1/812 13%] addr=0x217b774 total=69336 count=990 avg=70: \| rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:721 \| ++[2#1/3 85%] addr=0x217b7db total=58974 count=842 avg=70: \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| ++[3#1/4 45%] addr=0x217b7db total=902102 count=12903 avg=70: \| \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| -> continued at addr=0x217b7db above \| \| \|+[3#2/4 40%] addr=0x217b8b3 total=794219 count=11363 avg=70: \| \| \| rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:726 \| \| \| ++[4#1/1 100%] addr=0x217b7db total=909571 count=13012 avg=70: \| \| \| \| rapidjson::GenericMember<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericMember at /usr/include/rapidjson/document.h:71 \| \| \| \| (inlined by) rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>::~GenericValue at /usr/include/rapidjson/document.h:733 \| \| \| -> continued at addr=0x217b7db above \| \| \|+[3#3/4 15%] addr=0x43d35a3 total=296768 count=4246 avg=70: \| \| \| seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492 \| \| \| (inlined by) seastar::shared_ptr_count_for<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:492 \| \| \| ++[4#1/2 98%] addr=0x43e7d06 total=289680 count=4144 avg=70: \| \| \| \| seastar::shared_ptr<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator> >::~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:570 \| \| \| \| (inlined by) alternator::make_streamed(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&&)::$_0::operator() at ./alternator/executor.cc:127 \| \| \| ++ - addr=0x184e0a6: \| \| \| \| std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/coroutine:240 \| \| \| \| (inlined by) seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose at ./build/release/seastar.lto/./seastar/include/seastar/core/coroutine.hh:125 \| \| \| \| (inlined by) seastar::reactor::run_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:2651 \| \| \| \| (inlined by) seastar::reactor::run_some_tasks at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3114 \| \| \| \| ++[5#1/1 100%] addr=0x2503b87 total=310677 count=4417 avg=70: \| \| \| \| \| seastar::reactor::do_run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3283 \| \| \| \| ++[6#1/2 78%] addr=0x46a2898 total=400571 count=5450 avg=73: \| \| \| \| \| seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0::operator() at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:4501 \| \| \| \| \| (inlined by) std::__invoke_impl<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 \| \| \| \| \| (inlined by) std::__invoke_r<void, seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0&> at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:111 \| \| \| \| \| (inlined by) std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290 \| \| \| \| ++ - addr=0x4673fda: \| \| \| \| \| std::function<void ()>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591 \| \| \| \| \| (inlined by) seastar::posix_thread::start_routine at ./build/release/seastar.lto/./seastar/src/core/posix.cc:90 \| \| \| \| ++ - addr=0x8c946: ?? ??:0 \| \| \| \| ++ - addr=0x11296f: ?? ??:0 \| \| \| \| ++[6#2/2 22%] addr=0x2502c1e total=113613 count=1549 avg=73: \| \| \| \| \| seastar::reactor::run at ./build/release/seastar.lto/./seastar/src/core/reactor.cc:3166 \| \| \| \| ++ - addr=0x22068e0: \| \| \| \| \| seastar::app_template::run_deprecated at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:276 \| \| \| \| ++ - addr=0x220630b: \| \| \| \| \| seastar::app_template::run at ./build/release/seastar.lto/./seastar/src/core/app-template.cc:167 \| \| \| \| ++ - addr=0x22334bc: \| \| \| \| \| scylla_main at ./main.cc:672 \| \| \| \| ++ - addr=0x20411cc: \| \| \| \| \| std::function<int (int, char**)>::operator() at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:591 \| \| \| \| \| (inlined by) main at ./main.cc:2072 \| \| \| \| ++ - addr=0x27b89: ?? ??:0 \| \| \| \| ++ - addr=0x27c4a: ?? ??:0 \| \| \| \| ++ - addr=0x28c8fb4: \| \| \| \| \| _start at ??:? Closes scylladb/scylladb#19968	2024-08-05 00:35:52 +03:00
Botond Dénes	c34127092d	reader_concurrency_semaphore: test constructor: don't ignore metrics param The for_tests constructor has a metrics parameter defaulted to register_metrics::no, but when delegating to the other constructor, a hard-coded register_metrics::no is passed. This makes no difference currently, because all callers use the default and the hard-coded value corresponds to it. Let's fix it nevertheless to avoid any future surprises. Closes scylladb/scylladb#20007	2024-08-04 21:14:42 +03:00
Laszlo Ersek	0933a52c0b	test/sstable: remove useless variable from promoted_index_read() The large_partition_schema() call returns a copy of the "schema_ptr" object that points to an effectively statically initialized thread_local "schema" object. The large_partition_schema() call has no bearing on whether, or when, the "schema" object is constructed, and has no side effects (other than copying an "lw_shared_ptr" object). Furthermore, the return value of large_partition_schema() is not used for anything in promoted_index_read(). This redundant call seems to date back to original commit `3dd079fb7a` ("tests: add test for reading parts of a large partition", 2016-08-07). Remove the call and the variable. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	bb58446258	test/sstable: rewrite promoted_index_read() with async() For better readability, replace future::then() chaining with future::get(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	1f565626d4	test/sstable: unfuturize lambda invocation in test_using_reusable_sst() All lambdas passed to test_using_reusable_sst() and test_using_reusable_sst_returning() have been converted to future::get() calls (according to the seastar::thread context that they are now executed in). None of the lambdas return futures anymore; they all directly return void or non-void. Therefore, drop futurize_invoke(...).get() around the lambda invocations in test_using_reusable_sst(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	8ea881ae04	test/sstable: rewrite wrong_range() with async() For better readability, replace the future::then() chaining (and the associated manual fiddling with object lifecycles) with future::get() (and rely on seastar::thread's stack). We're already in seastar::thread context. Similarly, replace the future::finally() underlying with_closeable() with deferred_close(); with the assumption that mutation_reader::close() never fails (and is therefore safe to call in the "deferred_close" destructor). This is actually guaranteed, as mutation_reader::close() is marked "noexcept". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	e7e9a0a696	test/sstable: simplify not_find_key_composite_bucket0() under test_using_reusable_sst() According to early patch "test/sstable: rewrite test_using_reusable_sst() with async" in this series, lambdas passed to test_using_reusable_sst() are invoked: (a) less importantly here, in seastar::thread context, (b) more importantly here, futurized (temporarily so). The test case not_find_key_composite_bucket0() doesn't chain futures; therefore it needs no conversion to future::get() for purpose (a); however, we can eliminate its empty future return. Fact (b) will cover for that, until all such lambdas are converted to direct "void" returns (at which point we can remove the futurization from test_using_reusable_sst()). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	95cf16708d	test/sstable: rewrite full_index_search() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) This patch is best viewed with "git show -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	2a27d5b344	test/sstable: simplify find_key*(), all_in_place() under test_using_reusable_sst() According to early patch "test/sstable: rewrite test_using_reusable_sst() with async" in this series, lambdas passed to test_using_reusable_sst() are invoked: (a) less importantly here, in seastar::thread context, (b) more importantly here, futurized (temporarily so). The test cases find_key_map(), find_key_set(), find_key_list(), find_key_composite(), all_in_place() don't chain futures; therefore they need no conversion to future::get() for purpose (a); however, we can eliminate their empty future returns. Fact (b) will cover for that, until all such lambdas are converted to direct "void" returns (at which point we can remove the futurization from test_using_reusable_sst()). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	d22bd93abb	test/sstable: rewrite (un)compressed_random_access_read() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	6e35e584c8	test/sstable: simplify write_and_validate_sst() All three lambdas passed to write_and_validate_sst() now use future::get() rather than future::then() chaining; in other words, the future::get() calls inside all these seastar::thread contexts have been pushed down to the lambdas. Change all these lambdas' return types from future<> to void. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	8819b3f134	test/sstable: simplify check_toc_func() under async() The lambda passed to write_and_validate_sst() already runs in seastar::thread context; replace future::then() chaining with future::get() calls. We're going to eliminate the trailing "return make_ready_future<>()" later. This patch is best viewed with "git show -W -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	de56883a17	test/sstable: simplify check_statistics_func() under async() The lambda passed to write_and_validate_sst() already runs in seastar::thread context; replace future::then() chaining with future::get() calls. We're going to eliminate the trailing "return make_ready_future<>()" later. This patch is best viewed with "git show -W -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	1a85412f96	test/sstable: simplify check_summary_func() under async() The lambda passed to write_and_validate_sst() already runs in seastar::thread context; replace future::then() chaining with future::get() calls. We're going to eliminate the trailing "return make_ready_future<>()" later. This patch is best viewed with "git show -W -b". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	7b21bce1ca	test/sstable: coroutinize check_component_integrity() check_component_integrity() does not rely on any deferred close or stop operations; turn it into a coroutine therefore, for best readability. This conversion demonstrates particularly well how much the stack eases coding. We no longer need to artificially extend the lifetime of "tmp" with a final .then([tmp] {}) future. Consequently, "tmp" no longer needs to be a shared pointer to an on-heap "tmpdir" object; "tmp" can just be a "tmpdir" object on the stack. While at it, eliminate the single-use local objects "s" and "gen", for movability's sake. (We could use std::move() on these variables, but it seems easier to just flatten the function calls that produce the corresponding rvalues into the write_sst_info() argument list.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	caca13fe28	test/sstable: rewrite write_sst_info() with async() For better readability, replace future::then() chaining with future::get(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	cfe92ee203	test/sstable: simplify missing_summary_first_last_sane() The lambda passed to test_using_reusable_sst() is now invoked -- futurized, transitorily -- in seastar::thread context; stop returning an explicit make_ready_future<>() from the lambda. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	10ebc0a2d2	test/sstable: coroutinize summary_query_fail() summary_query_fail() does not rely on any deferred close or stop operations; turn it into a coroutine therefore, for best readability. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	a403ad0703	test/sstable: rewrite summary_query() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	3a57a7cfea	test/sstable: coroutinize (simple/composite)_index_read() simple_index_read() and composite_index_read() do not rely on any deferred close or stop operations; turn them into coroutines therefore, for best readability. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	eeeab1110a	test/sstable: rewrite index_read() with async() For better readability, replace future::then() chaining with future::get(). (We're already in seastar::thread context.) Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	17d4fac669	test/sstable: rewrite test_using_reusable_sst() with async() Improve the readability of test_using_reusable_sst() by replacing future::then() chaining with test_env::do_with_async() and future::get(). Unlike seastar::async(), test_env::do_with_async() restricts its input lambda to returning "void". Because of this, introduce the variant test_using_reusable_sst_returning(), based on test_env::do_with_async_returning(), for lambdas returning non-void. Put the latter to use in index_read() at once. Subsequently, we'll gradually convert the lambdas passed to test_using_reusable_sst() and test_using_reusable_sst_returning() from returning futures to returning direct values. In order for test_using_reusable_sst() and test_using_reusable_sst_returning() to cope with both types of lambdas, wrap the lambdas into futurize_invoke().get(). In the seastar::thread context, future::get() will gracefully block on genuine futures, and return immediately on direct values that were futurized on the spot. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Laszlo Ersek	79a8a6c638	test/sstable: rewrite test_using_working_sst() with async() Make test_using_working_sst() easier to read by: (1) replacing test_env::do_with() with seastar::async(), seastar::defer(), and future::get(); (2) replacing seastar::async() and seastar::defer() with test_env::do_with_async(). Technically speaking, this change does not perfectly preserve exceptional behavior. Namely, test_env::do_with() uses future::finally() to link test_env::stop() to the chain of futures, and future::finally() permits test_env::stop() itself to throw an exception -- potentially leading to a seastar::nested_exception being thrown, which would carry both the original exception and the one thrown by test_env::stop(). Contrarily, the test_env::stop() deferred with seastar::defer() runs in a destructor, and therefore test_env::stop() had better not throw there. However, we will assume that test_env::stop() does not throw, albeit not marked "noexcept". Prior commits `8d704f2532` ("sstable_test_env: Coroutinize and move to .cc test_env::stop()", 2023-10-31) and `2c78b46c78` ("sstables::test_env: Carry compaction manager on board", 2023-10-31) show that we've considered individual actions in test_env::stop() not to throw before. The 128KB stack of seastar::thread (which underlies seastar::async()) should be a tolerable cost in a test case, in exchange for the improved readability. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-08-04 15:35:51 +02:00
Kefu Chai	0660675387	utils/div_ceil: add constraints to template arguments to better reflect what we expect from the arguments. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20003	2024-08-04 15:32:01 +03:00
Aleksandra Martyniuk	2ab56b7f56	repair: use find_column_family in insert_repair_meta repair_service::insert_repair_meta gets the reference to a table and passes it to continuations. If the table is dropped in the meantime, the reference becomes invalid. Use find_column_family at each table occurrence in insert_repair_meta instead. Closes scylladb/scylladb#19953	2024-08-04 13:56:38 +03:00
Kefu Chai	571ae0ac96	docs: link to current document instead of the github wiki before this change, the hyper link brings us to a GitHub wiki page, which just points the reader to https://docs.scylladb.com/operating-scylla/snitch/. this is not a great user experience. so, in this change, we just reference the document in the current build. more efficient this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19952	2024-08-04 11:47:21 +03:00
Kefu Chai	f7556edc65	build: cmake: define SCYLLA_ENABLE_PREEMPTION_SOURCE for dev build in `fabab2f4`, we introduced preemption_source, and added `SCYLLA_ENABLE_PREEMPTION_SOURCE` preprocessor macro to enable opt-in the pluggable preemption check. but CMake building system was not updated accordingly. so, in this change, let's sync the CMake building system with `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19951	2024-08-04 11:46:28 +03:00
Yaron Kaikov	8221a178d8	Revert "dist: support nonroot and offline mode for scylla-housekeeping" This reverts commit `c3bea539b6`. Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer Closes scylladb/scylladb#19976	2024-08-04 10:55:26 +03:00
Aleksandra Martyniuk	c456a43173	compaction: replace optional<task_info> with task_info param compaction_manager::perform_compaction does not create task manager task for compaction if parent_info is set to std::nullopt. Currently, we always want to create task manager task for compaction. Remove optional from task info parameters which start compaction. Track all compactions with task manager.	2024-08-02 14:38:46 +02:00
Aleksandra Martyniuk	108d0344b8	compaction: keep split executor in task manager If perform_compaction gets std::nullopt as a parent info then the executor won't be tracked by task manager. Modify storage_group::split call so that it passes empty task_info instead of nullopt to track split.	2024-08-02 12:45:32 +02:00
Wojciech Mitros	543dab9e88	mv: test the view update behavior With the recently added mv admission control, we can now test how are the view update backlogs updated and propagated without relying just on the response delays that it was causing until now. This patch adds a test for it, replicating issues scylladb/scylladb#18461 and scylladb/scylladb#18783. In the test, we start with an empty view update backlog, then perform a write to it, increasing its backlog and saving the updated backlog on coordinator, the backlog then drops back to 0, we wait 1s for the backlog to be gossiped and we perform another write which should succeed. Due to scylladb/scylladb#18461, the test would fail because in both gossip rounds before and after the write, the backlog was empty, causing the write to be blocked by admission control indefinitely. Due to scylladb/scylladb#18783, the test would fail because when the backlog drops back to 0 after the write, the change is never registered, causing all writes to be blocked as well.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	795ac177c2	mv: add test for admission control In this patch we add 2 tests for checking that the mv admission control works. The first one simply checks whether, after increasing the backlog on one node over the admission control threshold, the following request is rejected with the error message corresponding to the admission control. The second one checks whether, after triggering admission control, the entire user request fails instead of just failing a replica write. This is done by performing a number of writes, some of which trigger the admission control and cause retries, then checking if the node that had a large view update backlog received all the writes. Before, the writes would succeed on enough replicas, reaching QUORUM, and allowing the user write to succeed and cause no retries, even though on the replica with a high backlog the write got rejected due to the backlog size.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	a55b7688b6	storage_proxy: return overloaded_exception instead of throwing To avoid an expensive stack unwind, instead of throwing an error, we can just return it thanks to the boost::result type that the affected methods use. The result with an exception needs to be constructed not implicitly, but with boost::outcome_v2::failure, because the exception, converted into coordinator_exception_container can be then converted into both into a successful response_id_type as well as into a failure.	2024-08-02 12:12:24 +02:00
Wojciech Mitros	5eaae05aaf	mv: reject user requests by coordinator when a replica is overloaded by MVs Currently, when a replica's view update backlog is full, the write is still sent by the coordinator to all replicas. Because of the backlog, the write fails on the replica, causing inconsistency that needs to be fixed by repair. To avoid these inconsistencies, this patch adds a check on the coordinator for overloaded replicas. As a result, a write may be rejected before being sent to any replicas and later retried by the user, when the replica is no longer overloaded. Fixes scylladb/scylladb#17426	2024-08-02 12:12:19 +02:00
Piotr Dulikowski	39b49a41cc	Merge 'mv: delete a partition in a single operation when applicable' from Michael Litvak Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199 Closes scylladb/scylladb#19338 * github.com:scylladb/scylladb: mv: skip reading rows when generating partition tombstone update mv: delete a partition in a single operation when applicable cql-pytest: move ScyllaMetrics to util file to allow reuse	2024-08-02 11:00:18 +02:00
Michael Litvak	0f5e8c52ad	db: test counter update while table is dropped Add a test that drops a table while there is a counter update operation ongoing in the table. The test reproduces issue scylladb/scylla-enterprise#4475 and verifies it is fixed.	2024-08-01 22:23:17 +03:00
Avi Kivity	99d0aaa7d2	Merge 'tablets: load_balancer: Improve per-table balance' from Tomasz Grabiec Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} So worst shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes https://github.com/scylladb/scylladb/issues/16824 Closes scylladb/scylladb#19779 * github.com:scylladb/scylladb: test: perf: tablet_load_balancing: Test with higher shard and tablet counts tablets: load_balancer: Avoid quadratic complexity when finding best candidate tablets: load_balancer: Maintain load sketch properly during intra-node migration tablets: load_balancer: Use "drained" flag test: perf: tablet_load_balancing: Report load balancer stats tablets: load_balancer: Move load_balancer_stats_manager to header file tablets: load_balancer: Split evaluate_candidate() into src and dst part tablets: load_balancer: Optimize evaluate_candidate() tablets: load_balancer: Add more statistics tablets: load_balancer: Track load per table on cluster level tablets: load_balancer: Track load per table on node level tablets: load_balancer: Use a single load sketch for tracking all nodes locator: load_sketch: Introduce populate_dc() tablets: load_balancer: Modify target load sketch only when emitting migration locator: load_sketch: Introduce get_most_loaded_shard() locator: load_sketch: Introduce get_least_loaded_shard() locator: load_sketch: Optimize pick()/unload() locator: load_sketch: Introduce load_type test: perf: tablet_load_balancing: Report total tablet counts test: perf: tablet_load_balancing: Print run parameters in the single simulation case too test: perf: tablet_load_balancing: Report time it took to schedule migrations tablets: load_balancer: Log table load stats after each migration tablets: load_balancer: Log per-shard load distribution in debug level tablets: load_balancer: Improve per-table balance tablets: load_balancer: Extract check_convergence() tablets: load_balancer: Extract nodes_by_load_cmp tablets: load_balancer: Maintain tablet count per table tablets: load_balancer: Reuse src_node_info test: perf: tablet_load_balancing: Print warnings about bad overcommit test: perf: tablet_load_balancing: Allow running a single simulation test: perf: tablet_load_balancing: Report best possible shard overcommit test: perf: tablet_load_balancing: Report global shard overcommit	2024-08-01 21:12:14 +03:00
Michael Litvak	22b282f5c5	db: coroutinize do_apply_counter_update rewrite the function as coroutine to make it easier to read and maintain, following lifetime issues we had and fixed in this function.	2024-08-01 19:09:04 +03:00
Anna Stuchlik	9972e50134	doc: add the 6.0-to-6.1 upgrade guide This commit adds the 6.0-to-6.1 upgrade guide. Compared to the previous upgrade guide: - Added the "Ensure Consistent Topology Changes Are Enabled" prerequisite. - Removed the "After Upgrading Every Node" section. Both Raft-based schema changes and topology updates are mandatory in 6.1 and don't require any user action after upgrading to 6.1. - Removed the "Validate Raft Setup" section. Raft was enabled in all 6.0 clusters (for schema management), so now there's no scenario that would require the user to follow the validation procedure.	2024-08-01 14:58:14 +02:00
Piotr Smaron	0ea2128140	cql: refactor rf_change indentation	2024-08-01 14:37:53 +02:00
Piotr Smaron	5b089d8e10	Prevent ALTERing non-existing KS with tablets ALTER tablets KS executes in 2 steps: 1. ALTER KS's cql handler forms a global topo req, and saves data required to execute this req, 2. global topo req is executed by topo coordinator, which reads data attached to the req. The KS name is among the data attached to the req. There's a time window between these steps where a to-be-altered KS could have been DROPped, which results in topo coordinator forever trying to ALTER a non-existing KS. In order to avoid it, the code has been changed to first check if a to-be-altered KS exists, and if it's not the case, it doesn't perform any schema/tablets mutations, but just removes the global topo req from the coordinator's queue. BTW. just adding this extra check resulted in broader than expected changes, which is due to the fact that the code is written badly and needs to be refactored - an effort that's already planned under #19126 Fixes: #19576	2024-08-01 14:37:53 +02:00
Piotr Dulikowski	44f327675d	Merge 'Remove gossiper argument from storage_service::join_cluster()' from Pavel Emelyanov It's only needed to start hints via proxy, but proxy can do it without gossiper argument Closes scylladb/scylladb#19894 * github.com:scylladb/scylladb: storage_service: Remote gossiper argument from join_cluster() proxy: Use remote gossiper to start hints resource manager hints: Const-ify gossiper references and anchor pointers	2024-08-01 10:18:14 +02:00
Michael Litvak	c944e28e43	db: fix waiting for counter update operations on table stop When a table is dropped it should wait for all pending operations in the table before the table is destroyed, because the operations may use the table's resources. With counter update operations, currently this is not the case. The table may be destroyed while there is a counter update operation in progress, causing an assert to be triggered due to a resource being destroyed while it's in use. The reason the operation is not waited for is a mistake in the lifetime management of the object representing the write in progress. The commit fixes it so the object lives for the duration of the entire counter update operation, by moving it to the `do_with` list. Fixes scylladb/scylla-enterprise#4475 Closes scylladb/scylladb#19948	2024-08-01 09:39:49 +02:00
Nadav Har'El	5411559a94	test/cql-pytest: test ALLOW FILTERING in intersection of two indexes A user complained that ScyllaDB is incompatible with Cassandra when it requires ALLOW FILTERING on a restriction like WHERE x=1 AND y=1 where x and y are two columns with secondary indexes. In the tests added in this patch we show that: 1. Scylla is compatible with Cassandra when the traditional "CREATE INDEX" is used - ALLOW FILTERING is required in this case in both Cassandra and Scylla. 2. If SAI is used in Cassandra (CREATE CUSTOM INDEX USING 'SAI'), indeed ALLOW FILTERING becomes optional. I believe this is incorrect so I opened CASSANDRA-19795. These two tests combined show that we're not incompatible with Cassandra, rather Cassandra's two index implementations are incompatible between themselves, and Scylla is in fact compatible in this case with Cassadra's traditional index and not with SAI. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19909	2024-07-31 14:01:29 +03:00
Laszlo Ersek	e67eb0ccc1	test/sstable: coroutinize do_write_sst() Make do_write_sst() easier to read by coroutinizing it. Closes #19803. Suggested-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19937	2024-07-31 13:59:26 +03:00
Kefu Chai	020333fcf1	sstables: fix a typo in comment s/guranteed/guaranteed/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19946	2024-07-31 13:58:09 +03:00
Tomasz Grabiec	28de5231f4	test: perf: tablet_load_balancing: Test with higher shard and tablet counts We have up to 200 shards in production, so test this to catch performance issues.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	19b7fb3a4d	tablets: load_balancer: Avoid quadratic complexity when finding best candidate If the source and destination shards picked for migration based on global tablet balance do not have a good candidate in terms of effect on per-table balance, the algorithm explores other source shards and destinations. This has quadratic complexity in terms of shard count in the worst case, when there are no good candidates. Since we can have up to ~200 shards, this can slow down scheduling significantly. I saw total scheduling time of 5 min in the following run: scylla perf-load-balancing -c1 -m1G --iterations=8 \ --nodes=4 --tablets1=1024 --tablets2=8096 \ --rf1=2 --rf2=3 --shards=256 To improve, change the apprach to first find the best source shard and then best target shard, sequentially. So it's now linear in terms of shard count. After the change, the total scheduling time in that run is down to 4s. Minimizing source and destination metrics piece-wise minimizes the combined metric, so badness of the best candidate doesn't suffer after this change.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	93df82032f	tablets: load_balancer: Maintain load sketch properly during intra-node migration Affects only intra-node migration. The code was recording destination shard as taken and did not un-take it in case we skipped the migration due to lack of candidates. Noticed during code review. Impact is minor, since even if this leads to suboptimal balance, the next scheduling round should fix it. Also, the source shard was not unloaded, but that should have no impact on decisions. But to be future-proof, better to maintain the load accurately in case the algorithm is extended with more steps.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	88988ce0db	tablets: load_balancer: Use "drained" flag Cleanup / optimization.	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	56801b7cb7	test: perf: tablet_load_balancing: Report load balancer stats	2024-07-31 12:57:15 +02:00
Tomasz Grabiec	90c9934099	tablets: load_balancer: Move load_balancer_stats_manager to header file So that stats can be accessed outside tablet allocator.	2024-07-31 12:57:15 +02:00
Anna Stuchlik	ae28880fc8	doc: enable publishing docs for branch-6.1 This commit enables publishing documentation from branch-6.1. The docs will be published as UNSTABLE (the warning about version 6.1 being unstable will be displayed). Fixes https://github.com/scylladb/scylladb/issues/19926 No backport is required. Closes scylladb/scylladb#19931	2024-07-31 12:48:51 +02:00
Kamil Braun	c05e077a13	Merge 'raft: fix the shutdown phase being stuck' from Emil Maskovsky Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223 Closes scylladb/scylladb#19860 * github.com:scylladb/scylladb: raft: fix the shutdown phase being stuck raft: use the abort source reference in raft group0 client interface	2024-07-31 12:10:30 +02:00
Pavel Emelyanov	93ed978729	view_builder: Drop unused members There's a counter and a shared future on board, that used to facilitate start-time barrier synchronization. Now they are not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:59:40 +03:00
Pavel Emelyanov	613161c7b9	view_builder: Use cross-shard barrier on start When starting, view builder spawns an async background fibers, and upon its completion each shard needs to wait for other shards to do the same. This is exactly what cross-shard barrier is about, so instead of synchronizing via v.b.'s shard-0 instance, use the barrier. This makes the view_builder::start() shorder and earier to read. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:56:25 +03:00
Pavel Emelyanov	fb1b749445	view_builder: Add cross-shard barrier to its .start() method The barrier will be used by next patch to synchronize shards with each other. When passed to invoke_on_all() lambda like this, each lambda gets its its copy of the barrier "handler" that maintains shared state across shards. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-31 12:54:28 +03:00
Tomasz Grabiec	94cce4b7d3	tablets: load_balancer: Split evaluate_candidate() into src and dst part Those parts will be used separately later.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	4df2abe47a	tablets: load_balancer: Optimize evaluate_candidate() Moves load computation out of the hot path by relying on data structures maintained globally during plan making.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	5e7facd543	tablets: load_balancer: Add more statistics	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	be055977c9	tablets: load_balancer: Track load per table on cluster level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	81fcee2040	tablets: load_balancer: Track load per table on node level	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	e7ef7419dc	tablets: load_balancer: Use a single load sketch for tracking all nodes This is code simplification and optimization. Avoids multiple passes of tablet metadata to consturct load sketch for each target node.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	352b8e0ddd	locator: load_sketch: Introduce populate_dc()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	9a7afd334b	tablets: load_balancer: Modify target load sketch only when emitting migration This avoids the need to unpick() a replica when the candidate is not selected. Optimization.	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	b78657ce7d	locator: load_sketch: Introduce get_most_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	de404471b7	locator: load_sketch: Introduce get_least_loaded_shard()	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8fbfd595bb	locator: load_sketch: Optimize pick()/unload() They are executed frequently during tablet scheduling. Currently, they have time complexity of O(N*log(N)) in terms of shard count. With large shard counts, that has significant overhead. This patch optimizes them down to O(log(N)).	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	d0b0f95849	locator: load_sketch: Introduce load_type	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	8f3b623144	test: perf: tablet_load_balancing: Report total tablet counts	2024-07-31 11:38:17 +02:00
Tomasz Grabiec	662a0ff038	test: perf: tablet_load_balancing: Print run parameters in the single simulation case too	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	a040404875	test: perf: tablet_load_balancing: Report time it took to schedule migrations	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	ae7fd80554	tablets: load_balancer: Log table load stats after each migration	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	b8996a0f59	tablets: load_balancer: Log per-shard load distribution in debug level	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	469e2f3f90	tablets: load_balancer: Improve per-table balance Tablet load balancer tries to equalize tablet load between shards by moving tablets. Currently, the tablet load balancer assumes that each tablet has the same hotness. This may not be true, and some tables may be hotter than others. If some nodes end up getting more tablets of the hot table, we can end up with request load imbalance and reduced performance. In `79d0711c7e` we implemented a mitigation for the problem by randomly choosing the table whose tablet replica should be moved. This should improve fairness of movement. However, this proved to not be enough to get a good distribution of tablets. This change improves candidate selection to not relay on randomness but rather evaluating candidates with respect to the impact on load imbalance. Also, if there is no good candidate, we consider picking other source shards, not the most-loaded one. This is helpful because when finishing node drain we get just a few candidates per shard, all of which may belong to a single table, and the destination may already be overloaded with that table. Another shard may contain tablets of another table which is not yet overloaded on the destination. And shards may be of similar load, so it doesn't matter much which shard we choose to unload. We also consider other destinations, not the least-loaded one. This helps when draining nodes and the source node has few shard candidates. Shards on the destination may have similar load so there is more than one good destinatin candidate. By limiting ourselves to a single shard, we increase the chance that we're overload the table on that shard. The algorithm was evaluated using "scylla perf-load-balancing", which simulates a sequeunce of 8 node bootstraps and decommissions for different node and shard counts, RF, and tablet counts. For example, for the following parameters: params: {iterations=8, nodes=5, tablets1=128 (2.4/sh), tablets2=512 (9.6/sh), rf1=3, rf2=3, shards=32} The results are: After: Overcommit : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit : worst: {table1={shard=1.50 (best=1.25), node=1.02}, table2={shard=1.12 (best=1.04), node=1.01}} Overcommit : last : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Before: Overcommit (old) : init : {table1={shard=1.25 (best=1.25), node=1.00}, table2={shard=1.04 (best=1.04), node=1.00}} Overcommit (old) : worst: {table1={shard=4.00 (best=1.25), node=1.81}, table2={shard=1.25 (best=1.04), node=1.11}} Overcommit (old) : last : {table1={shard=2.50 (best=1.25), node=1.41}, table2={shard=1.25 (best=1.04), node=1.05}} So shard overcommit for table1 was reduced from 4 to 1.5. Overcommit of 4 means that the most-loaded shard has 4 times more tablets than the average per-shard load in the cluster. Also, node overcommit for table1 was reduced from 1.81 to 1.02. The magnitude of improvement depends greatly on test configurtion, so on topology and tablet distribution. The algorithm is not perfect, it finds a local optimum. In the above test, overcommit of 1.5 is not the best possible (1.25). One of the reason why the current algorithm doesn't achieve best distribution is that it works with a single movement at a time and replication constraints limit the choice of destinations. Viable destinations for remaining candidates may by only on nodes which are not least-loaded, and we won't be able to fill the least loaded node. Doing so would require more complex movement involving moving a tablet from one of the destination nodes which doesn't have a replica on the least loaded node and then replacing it with the candidate from the source node. Another limitation is that the algorithm can only fix balance by moving tablets away from most loaded nodes, and it does so due to imbalance between nodes. So it cannot fix the imbalance which is already present on the nodes if there is not much to move due to similar load between nodes. It is designed to not make the imbalance worse, so it works good if we started in a good shape. Fixes #16824	2024-07-31 11:38:16 +02:00
Tomasz Grabiec	b7661aa6c9	tablets: load_balancer: Extract check_convergence() Will be reused when evaluating different targets for migration in later stages. The refactoring drops updating of _stats.for_dc(dc).stop_no_candidates and we update _stats.for_dc(dc).stop_load_inversion in both cases where convergence check may fail. The reason is that stat updates must be outside check_convergence(), since the new use case should not update those stats (it doesn't stop balancing, just drops candidates). Propagating the information for distinguishing the two cases would be a burden. But it's not necessary, since both cases are actually load inversion cases, one pre-migration the other post-migration, so we don't need the distinction. It's actually wrong to increment stop_no_candidates, since there may still be candidates, it's the load which is inverted.	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	41e643ddb9	tablets: load_balancer: Extract nodes_by_load_cmp Will be reused in a different place.	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	8a7257971d	tablets: load_balancer: Maintain tablet count per table	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	4e4f13ac9d	tablets: load_balancer: Reuse src_node_info	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	71b8d6b7aa	test: perf: tablet_load_balancing: Print warnings about bad overcommit	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	0d50a028a5	test: perf: tablet_load_balancing: Allow running a single simulation	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	3f3660c3fe	test: perf: tablet_load_balancing: Report best possible shard overcommit	2024-07-31 11:26:11 +02:00
Tomasz Grabiec	c89a320925	test: perf: tablet_load_balancing: Report global shard overcommit Rather than maximum per-node shard overcommit. Global shard overcommit is a better metric since we want to equalize global load not just per-node load.	2024-07-31 11:26:11 +02:00
Emil Maskovsky	5dfc50d354	raft: fix the shutdown phase being stuck Some of the calls inside the `raft_group0_client::start_operation()` method were missing the abort source parameter. This caused the repair test to be stuck in the shutdown phase - the abort source has been triggered, but the operations were not checking it. This was in particular the case of operations that try to take the ownership of the raft group semaphore (`get_units(semaphore)`) - these waits should be cancelled when the abort source is triggered. This should fix the following tests that were failing in some percentage of dtest runs (about 1-3 of 100): * TestRepairAdditional::test_repair_kill_1 * TestRepairAdditional::test_repair_kill_3 Fixes scylladb/scylladb#19223	2024-07-31 09:18:54 +02:00
Emil Maskovsky	2dbe9ef2f2	raft: use the abort source reference in raft group0 client interface Most callers of the raft group0 client interface are passing a real source instance, so we can use the abort source reference in the client interface. This change makes the code simpler and more consistent.	2024-07-31 09:18:54 +02:00
Benny Halevy	82333036f3	cell_locker: maybe_rehash: reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-31 10:06:07 +03:00
Benny Halevy	8853adea96	cell_locker: maybe_rehash: ignore allocation failures `maybe_rehash` is complimentary and is not strictly required to succeed. If it fails, it will retry on the next call, but there's no reason to throw a bad_alloc exception that will fail its caller, since `maybe_rehash` is called as the final step after the caller has already succeeded with its action. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-31 10:06:06 +03:00
Pavel Emelyanov	9214aecbe7	storage_service: Remove orphan forward declaration of a method The start_sys_dist_ks() itself was removed by `bc051387c5` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19928	2024-07-30 16:17:49 +03:00
Benny Halevy	e58ca8c44b	service_level_controller: stop: always call subscription on_abort We want to call `service_level_controller::do_abort()` in all cases. The current code (introduced in `535e5f4ae7`) calls do_abort if abort was not requested, however, since it does so by checking the subscription bool operator, it would miss the case where abort was already requested before the subscription took place (in service_level_controller ctor). With scylladb/seastar@470b539b1c and scylladb/seastar@8ecce18c51 we can just unconditionally call the subscription `on_abort` method, that ensures only-once semantics, even if abort was already requested at subscription time. Fixes scylladb/scylladb#19075 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19929	2024-07-30 13:23:17 +03:00
Kefu Chai	35394c3f9a	docs/dev: fix a typo remove the extraneous "is". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19902	2024-07-30 10:46:25 +03:00
Pavel Emelyanov	97154b0671	Merge 'mapreduce_service: complete coroutinization' from Avi Kivity mapreduce_server was previously coroutinized, but only partially. This series completes coroutinization and eliminates remaining continuation chains. None of this code is performance sensitive as it runs at the super-coordinator level and is amortized over a full scan of the entire table. No backport needed as this is a cleanup. Closes scylladb/scylladb#19913 * github.com:scylladb/scylladb: mapreduce_service: reindent mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node() mapreduce_service: coroutinize dispatch() inner lambda	2024-07-30 10:44:34 +03:00
Nadav Har'El	d293a5787f	alternator: exclude CDC log table from ListTables The Alternator command ListTables is supposed to list actual tables created with CreateTable, and should list things like materialized views (created for GSI or LSI) or CDC log tables. We already properly excluded materialized views from the list - and had the tests to prove it - but forgot both the exclusion and the testing for CDC log tables - so creating a table xyz with streams enable would cause ListTables to also list "xyz_scylla_cdc_log". This patch fixes both oversights: It adds the code to exclude CDC logs from the output of ListTables, add adds a test which reproduces the bug before this fix, and verifies the fix works. Fixes #19911. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19914	2024-07-30 10:43:29 +03:00
Nadav Har'El	ca8b91f641	test: increase timeouts for /localnodes test In commit `bac7c33313` we introduced a new test for the Alternator "/localnodes" request, checking that a node that is still joining does not get returned. The tests used what I thought were "very high" timeouts - we had a timeout of 10 seconds for starting a single node, and injected a 20 second sleep to leave us 10 seconds after the first sleep. But the test failed in one extremely slow run (a debug build on aarch64), where starting just a single node took more than 15 seconds! So in this patch I increase the timeouts significantly: We increase the wait for the node to 60 seconds, and the sleeping injection to 120 seconds. These should definitely be enough for anyone (famous last words...). The test doesn't actually wait for these timeouts, so the ridiculously high timeouts shouldn't affect the normal runtime of this test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19916	2024-07-30 10:41:48 +03:00
Avi Kivity	52ee6127dd	Merge 'Use boto3 in object_store test to list bucket' from Pavel Emelyanov There's a test in object_store suite that verifies the contents of a bucket. It does with the plain http request, but unfortunately this doesn't work -- even local minio uses restricted bucket and using plain http request results in 403(Forbidden) error code. Test doesn't check it and continues working with empty list of objects which, in turn, is what it expects to see. The fix is in using boto3. With it, the acc/secret pair is picked up and listing the bucket finally works. Closes scylladb/scylladb#19889 * github.com:scylladb/scylladb: test/object_store: Use boto3.resource to list bucket test/object_store: Add get_s3_resource() helper	2024-07-29 13:49:50 +03:00
Pavel Emelyanov	8b1a106b62	test/object_store: Use boto3.resource to list bucket Instead of plain http request, use the power of boto3 package. The recently added get_s3_resource() facilitates creating one Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-29 12:29:16 +03:00
Pavel Emelyanov	172e1cb0da	test/object_store: Add get_s3_resource() helper It creates boto3.resource object that points to endpoint maintained by s3_server argument (that tests obtain via fixture). This allows using boto3 to access S3 bucket from local minio server. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-29 12:25:57 +03:00
Kefu Chai	1094c71282	cql3/statement: use compile-time format string instead of using fmt::runtime, use compile-time format string in order to detect the bad format string, or missing format arguments, or arguments which are not formattable at compile time. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19901	2024-07-28 21:54:43 +03:00
Benny Halevy	be880ab22c	Update seastar submodule * seastar 67065040...a7d81328 (30): > reactor: Initialize _aio_pollfd later > abortable_fifo: fix a typo in comment > net: Expose DNS error category > pollable_fd_state: use default-generated dtor > perftune: tune tcp_mem > scripts/perftune.py: clock source tweaking: special case Amazon and Google KVM virtualizations > abort_source: subscription: keep callback function alive after abort > github: disable ccache when building with C++ modules > github: add enable-ccache input to test.yaml > pollable_fd_state: Mark destructor protected and make non-virtual > reactor: Mark .configure() private > reactor: Set aio_nowait_supported once > reactor: Add .no_poll_aio to reactor_config > reactor: Move .max_poll_time on reactor_config > reactor: Move .task_quota on reactor_config > reactor: Move .strict_o_direct on reactor_config > reactor: Move .bypass_fsync on reactor_config > reactor: Move .max_task_backlog on reactor_config > reactor: Move .force_io_getevents_syscall on reactor_config > reactor: Move .have_aio_fsync on reactor_config > reactor: Move .kernel_page_cache on reactor_config > reactor: Move .handle_sigint on reactor_config > reactor_backend: Construct _polling_io from reactor config > reactor: Move config when constructing > reactor: Use designated initializers to set up reactor_config > native-stack: use queue::pop_eventually() in listener::accept() > abort_source: subscription: allow calling on_abort explicitly > file: document that close() returns the file object to uninitialized state > code-cleanup: do not include 'smp.hh' in 'reactor.hh' > code-cleanup: remove redundant includes of smp.hh Closes scylladb/scylladb#19912	2024-07-28 21:04:45 +03:00
Kefu Chai	36f5032b2d	db: correct the doxygen comment the parameter names do not match with the ones we are using. these comments were inherited from Origin, but we failed to update them accordingly. in this change, the comments are updated to reflect the function signatures. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19900	2024-07-28 18:24:57 +03:00
Kefu Chai	67e07bee25	build: cmake: use per-mode build dir The build_unified.sh script accepts a --build-dir option, which specifies the directory used for storing temporary files extracted from tarballs defined by the --pkgs option. When performing parallel builds of multiple modes, it's crucial that each build uses a unique build directory. Reusing the same build directory for different modes can lead to conflicts, resulting in build failures or, more seriously, the creation of tarballs containing corrupted files. so, in this change, we specify a different directory for each mode, so that they don't share the same one. Refs scylladb/scylladb#2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19905	2024-07-28 18:11:37 +03:00
Avi Kivity	149a47088e	mapreduce_service: reindent	2024-07-28 17:55:51 +03:00
Avi Kivity	0dd03789f3	mapreduce_service: coroutinize retrying_dispatcher::dispatch_to_node() Simplify the function by converting it to a coroutine. Note that while the final co_return co_await looks like a loop (and therefore an await would introduce an O(n) allocation), it really isn't - we retry at most once.	2024-07-28 17:54:01 +03:00
Avi Kivity	b019927a0e	mapreduce_service: coroutinize dispatch() inner lambda dispatch() is a coroutine, but the inner lambda that is executed per node is still a continuation chain. Make it uniform by converting to a coroutine.	2024-07-28 17:36:08 +03:00
Kefu Chai	ee80742c39	cql3: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19906	2024-07-28 17:29:07 +03:00
Benny Halevy	26abad23d9	sstable_directory: delete_atomically: allow sstables from multiple prefixes Currently, delete_atomically can be called with a list of sstables from mixed prefixes in two cases: 1. truncate: where we delete all the sstables in the table directory 2. tablet cleanup: similar to truncate but restricted to sstables in a single tablet replica In both cases, it is possible that sstables in staging (or quarantine) are mixed with sstables in the base directory. Until a more comprehensive fix is in place, (see https://github.com/scylladb/scylladb/pull/19555) this change just lifts the ban on atomic deletion of sstables from different prefixes, and acknowledging that the implementation is not atomic across prefixes. This is better than crashing for now, and can be backported more easily to branches that support tablets so tablet migration can be done safely in the presence of repair of tables with views. Refs scylladb/scylladb#18862 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19816	2024-07-28 17:26:31 +03:00
Pavel Emelyanov	aaad2bbeaf	storage_service: Remote gossiper argument from join_cluster() This pointer was only needed to pull all the way down the hints resource manager start() method. It's no longer needed for that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:29:58 +03:00
Pavel Emelyanov	a1dbaba9e1	proxy: Use remote gossiper to start hints resource manager By the time hinst resource manager is started, proxy already has its remote part initialized. Remote returns const gossiper pointer, but after previous change hints code can live with it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:29:03 +03:00
Pavel Emelyanov	dd7c7c301d	hints: Const-ify gossiper references and anchor pointers There are two places in hints code that need gossiper: hist_sender calling gossiper::is_alive() and endpoint_downtime_not_bigger_than() helper in manager. Both can live with const gossiper, so the dependency references and anchor pointers can be restricted to const too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-26 16:28:54 +03:00
Lakshmi Narayanan Sreethar	27b305b9d1	boost/bloom_filter_test: wait for total memory reclaimed update The testcase `test_bloom_filter_reclaim_during_reload` checks the SSTable manager's `_total_memory_reclaimed` against an expected value to verify that a Bloom filter was reloaded. However, it does not wait for the manager to update the variable, causing the check to fail if the update has not occurred yet. Fix it by making the testcase wait until the variable is updated to the expected value. Fixes #19879 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#19883	2024-07-26 08:15:11 +03:00
Tomasz Grabiec	851da230c8	Merge 'db/view: drop view updates to replaced node marked as left' from Piotr Dulikowski When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. In addition to the fix, this PR also includes a regression test heavily based on the test that @kbr-scylla prepared during his investigation of the issue. Fixes: scylladb/scylladb#19439 This issue can cause multiple nodes to crash at once and the fix is quite small, so I think this justifies backporting it to all affected versions. 6.0 and 6.1 are affected. No need to backport to 5.4 as this issue only happens with tablets, and tablets are experimental there. Closes scylladb/scylladb#19765 * github.com:scylladb/scylladb: test: regression test for MV crash with tablets during decommission db/view: drop view updates to replaced node marked as left	2024-07-25 11:47:14 +02:00
Michael Litvak	6f25f4b387	mv: skip reading rows when generating partition tombstone update when deleting a base partition, in some cases we can update the view by generating a single partition deletion update, instead of generating a row deletion update for each of the partition rows. If this is the case for all the affected views, and there are no other updates besides deleting the partition, then we can skip reading and iterating over all the rows, since this won't generate any additional updates that are not covered already.	2024-07-25 11:12:58 +03:00
Michael Litvak	d0b02dc0d0	mv: delete a partition in a single operation when applicable Currently when a partition is deleted from the base table, we generate a row tombstone update for each one of the view rows in the partition. When the partition key in the view is the same as the base, maybe in a different order, this can be done more efficiently - The whole corresponding view partition can be deleted with one partition tombstone update. With this commit, when generating view updates, if the update mutation has a partition tombstone then for the views which have the same partition key we will generate a partition tombstone update, and skip the individual row tombstone updates. Fixes scylladb/scylladb#8199	2024-07-25 11:12:58 +03:00
Michael Litvak	98cc707c76	cql-pytest: move ScyllaMetrics to util file to allow reuse ScyllaMetrics is a useful generic component for retrieving metrics in a pytest. The commit moves the implementation from test_shedding.py to util.py to make it reusable in other tests in cql-pytest.	2024-07-25 11:12:58 +03:00
Botond Dénes	1bfe73c2ea	Merge 'Order API endpoints registration in main' from Pavel Emelyanov There are few api::set_foo()-s left in main that are placed in ~~random~~ legacy order. This PR fixes it and makes few more associated cleanups. refs: #2737 Closes scylladb/scylladb#19682 * github.com:scylladb/scylladb: api: Unset cache_service endpoints on stop main: Don't ignore set_cache_service() future api: Move storage API few steps above api: Register token-metadata API next to token-metadata itsels api: Do not return zero local host-id api: Move snitch API registration next to snitch itself	2024-07-25 09:59:38 +03:00
Pavel Emelyanov	456dbc122b	api: Unset cache_service endpoints on stop They currently stay registered long after the dependent services get stopped. There's a need for batch unsetting (scylladb/seastar#1620), so currently only this explicit listing :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	61fb0ad996	main: Don't ignore set_cache_service() future The call itself seem to be in wrong place -- there's no "cache service" also the API uses database and snapshot_ctl to work on. So it deserves more cleanup, but at least don't throw the returned future<> away. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	e1eb48f9c2	api: Move storage API few steps above The sequence currently is sharded<storage_service>.start() sharded<query_processor>.invoke_on_all(start_remote) api::set_server_storage_service() The last two steps can be safely swapped to keep storage service API next to its service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	6ae09cc6bf	api: Register token-metadata API next to token-metadata itsels Right now API registration happens quite late because it waits storage service to register its "function" first. This can be done beforeheand and the t.m. API can be moved to where it should be. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	10566256fd	api: Do not return zero local host-id The local host id is read from local token metadata and returned to the caller as string. The t.m. itself starts with default-constructed host id vlaue which is updated later. However, even such "unset" host id value can be rendered as string without errors. This makes the correct work of the API endpoint depend on the initialization sequence which may (spoilter: it will) change in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:32 +03:00
Pavel Emelyanov	29738f0cb6	api: Move snitch API registration next to snitch itself Once sharded<snitch> is started, it can register its handlers Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 18:51:07 +03:00
Pavel Emelyanov	6357755624	replica: Remove keyspace::config::datadir It's finally no longer used. Now only sstables storage code "knows" that keyspace may have its on-disk directory. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 17:45:51 +03:00
Pavel Emelyanov	f767e25c8b	sstables/storage: Evaluate path for keyspace directory in storage Currently the init_keyspace_storage() expects that the caller would tell it where the ks directory is, but it's not nice as keyspace may not necessarity keep its sstables in any directory. This patch moves the directory path evaluation into storage code, specifically to the lambda that is called for on-disk sstables. The way directory is evaluated mirrors the one from make_keyspace_config() that will be removed by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 17:45:50 +03:00
Pavel Emelyanov	3ae41bd6f6	sstables/storage: Add sstables_manager arg to init_keyspace_storage() Will be needed by next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-07-24 17:41:45 +03:00
Botond Dénes	6337372b9d	test/boost/reader_concurrency_semaphore_test: un-flake test admission The admission test has a section which tests admission when the semaphore has inactive reads. This section (and therefore the enire test) became flaky lately, after a seemingly unrelated seastar upgrade, which improved timers. The cause of the flakyness is the permit which is made inactive later: this permit is created with 0 timeout (times out immediately). For some time now, when the timeout timer of a permit fires, if the permit is inactive, it is evicted. This is what makes the test fail: the inactive read times out and ends up evicting this permit, which is not expected for the test. The reason this was not a problem before, is that the test finishes very quickly, usually, before the timer could even be polled by the reactor. The recent seastar changes changed this and now the timer sometimes get polled and fires, failing the test. Fixes: #19801 Closes scylladb/scylladb#19859	2024-07-24 13:04:50 +03:00
Takuya ASADA	02b20089cb	scylla_raid_setup: install update-initramfs when it's not available scylla_raid_setup may fail on Ubuntu minimal image since it calls update-initramfs without installing. Closes scylladb/scylladb#19651	2024-07-24 11:55:16 +03:00
Pavel Emelyanov	b02d20d12d	Merge 'Minor improvements around compaction groups' from Raphael "Raph" Carvalho Minor changes, no backporting needed. Closes scylladb/scylladb#19723 * github.com:scylladb/scylladb: replica: rename for_each_const_compaction_group() replica: Fix comment about compaction group replica: remove unused compaction_group_vector	2024-07-24 11:22:24 +03:00
Nadav Har'El	edc5bca6b1	alternator: do not allow authentication with a non-"login" role Alternator allows authentication into the existing CQL roles, but roles which have the flag "login=false" should be refused in authentication, and this patch adds the missing check. The patch also adds a regression test for this feature in the test/alternator test framework, in a new test file test/alternator/cql_rbac.py. This test file will later include more tests of how the CQL RBAC commands (CREATE ROLE, GRANT, REVOKE) affect authentication and authorization in Alternator. In particular, these tests need to use not just the DynamoDB API but also CQL, so this new test file includes the "cql" fixture that allows us to run CQL commands, to create roles, to retrieve their secret keys, and so on. Fixes scylladb/scylladb#19735 Closes scylladb/scylladb#19740	2024-07-24 08:20:23 +02:00
Botond Dénes	84db147c58	Merge 'tasks: introduce virtual tasks' from Aleksandra Martyniuk Introduce virtual tasks - task manager tasks which cover cluster-wide operations. Virtual tasks aren't kept in memory, instead their statuses are retrieved from associated service when user requests them with task manager API. From API users' perspective, virtual tasks behave similarly to regular tasks, but they can be queried from any node in a cluster. Virtual tasks cannot have a parent task. They can have children on each node in a cluster, but do not keep references to them. So, if a direct child of a virtual task is unregistered from task manager, it will no longer be shown in parent's children vector. virtual_task class corresponds to all virtual tasks in one group. If users want to list all tasks in a module, a virtual_task returns all recent supported operations; if they request virtual task's status - info about the one specified operation is presented. Time to live, number of tracked operations etc. depend on the implementation of individual virtual_task. All virtual_tasks are kept only on shard 0. Refs: https://github.com/scylladb/scylladb/issues/15852 New feature, no backport needed. Closes scylladb/scylladb#16374 * github.com:scylladb/scylladb: docs: describe virtual tasks db: node_ops: filter topology request entries test: add a topology suite for testing tasks node_ops: service: create streaming tasks node_ops: register node_ops_virtual_task in task manager service: node_ops: keep node ops module in storage service node_ops: implement node_ops_virtual_task methods db: service: modify methods to get topology_requests data db: service: add request type column to topology_requests node_ops: add task manager module and node_ops_virtual_task tasks: api: add virtual task support to get_task_status_recursively tasks: api: add virtual task support tasks: api: add virtual tasks support to get_tasks tasks: add task_handler to hide task and virtual_task differences from user tasks: modify invoke_on_task tasks: implement task_manager::virtual_task::impl::get_children tasks: keep virtual tasks in task manager tasks: introduce task_manager::virtual_task	2024-07-24 08:34:28 +03:00
Botond Dénes	0bb6413ea5	Merge 'github: disable scheduled workflow on forks' from Kefu Chai as these workflows are scheduled periodically, and if they fail, notifications are sent to the repo's owner. to minimize the surprises to the contributors using github, let's disable these workflows on fork repos. Closes scylladb/scylladb#19736 * github.com:scylladb/scylladb: github: do not run clang-tidy as a cron job github: disable scheduled workflow on forks	2024-07-24 07:50:39 +03:00
Avi Kivity	3c930a61c9	Merge 'test: scylla_cluster: support more test scenarios' from Patryk Jędrzejczak We modify `ScyllaCluster.server_start` so that it changes seeds of the starting node to all currently running nodes. This allows writing tests like ```python s1 = await manager.server_add(start=False) await manager.server_add() await manager.server_start(s1.server_id) ``` However, it disallows writing tests that start multiple clusters. To fix this, we add the `seeds` parameter to `server_start`. We also improve the logic in `ScyllaCluster.add_server` to allow writing tests like ```python await manager.server_add(expected_error="...") await manager.server_add() ``` This PR only adds improvements to the `test.py` framework, no need to backport it. Closes scylladb/scylladb#19847 * github.com:scylladb/scylladb: test: scylla_cluster: improve expected_error in add_server test: scylla_cluster: support more test scenarios test: scylla_cluster: correctly change seeds in server_start	2024-07-23 22:05:31 +03:00
Patryk Jędrzejczak	02ccd2e3af	test: scylla_cluster: improve expected_error in add_server We make two changes: - we lease the IP address of a node that failed to boot because of an expected error, - we don't log "Cluster ... added ..." when a node fails to boot because of an expected error.	2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak	4079cd1a7b	test: scylla_cluster: support more test scenarios Here are some examples of tests that don't work with no initial nodes, but they should work: 1. ``` await manager.server_add(expected_error="...") await manager.server_add() ``` 2. ``` await manager.servers_add(2, expected_error="...") await manager.servers_add(2) ``` 3. ``` s1 = await manager.server_add(start=False) await manager.server_start(s1.server_id, expected_error="...") await manager.server_add() ``` 4. ``` [s1, s2] = await manager.servers_add(2, start=False) await manager.server_start(s1.server_id, expected_error="...") await manager.server_start(s2.server_id, expected_error="...") await manager.servers_add(2) ``` 5. ``` s1 = await manager.server_add(start=False) await manager.server_add() await manager.server_start(s1.server_id) ``` 6. ``` [s1, s2] = await manager.servers_add(2, start=False) await manager.servers_add(2) await manager.server_start(s1.server_id) await manager.server_start(s2.server_id) ``` In this patch, we make a few improvements to make tests like the ones presented above work. I tested all the examples above manually. From now on, servers receive correct seeds if the first servers added in the test didn't start or failed to boot. Also, we remove the assertion preventing the creation of a second cluster. This assertion failed the tests presented above. We could weaken it to make these tests pass, but it would require some work. Moreover, we have tests that intentionally create two clusters. Therefore, we go for the easiest solution and accept that a single `ScyllaCluster` may not correspond to a single Scylla cluster.	2024-07-23 14:35:09 +02:00
Patryk Jędrzejczak	e196c1727e	test: scylla_cluster: correctly change seeds in server_start We change seeds in `ScyllaCluster.server_start` to all currently running nodes. The previous code only pretended that it did it. After doing this change, writing tests that create multiple clusters is impossible. To allow it, we add the `seeds` parameter to `ManagerClient.server_start`. We use it to fix and simplify the only test that creates two clusters - `test_different_group0_ids`.	2024-07-23 14:35:08 +02:00
Aleksandra Martyniuk	d04159e7de	docs: describe virtual tasks	2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk	c64cb98bcf	db: node_ops: filter topology request entries system_keyspace::get_topology_request_entries returns entries for requests which are running or have finished after specified time. In task manager node ops task set the time so that they are shown for task_ttl seconds after they have finished.	2024-07-23 13:35:02 +02:00
Aleksandra Martyniuk	36b77c0592	test: add a topology suite for testing tasks Add topology_tasks test suite for testing task manager's node ops tasks. Add TaskManagerClient to topology_tasks for an easy usage of task manager rest api. Write a test for bootstrap, replace, rebuild, decommission and remove top level tasks using the above.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	a903971a74	node_ops: service: create streaming tasks Create tasks which cover streaming part of topology changes. These tasks are children of respective node_ops_virtual_task.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	63e82764e1	node_ops: register node_ops_virtual_task in task manager	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	8e56913fdf	service: node_ops: keep node ops module in storage service Keep task manager node ops module in storage service. It will be used to create and manage tasks related to topology changes. The module is created and registered in storage service constructor. In storage_service::stop() the module is stopped and so all the remaining tasks would be unregistered immediately after they are finished.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	b97a348361	node_ops: implement node_ops_virtual_task methods	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	94282b5214	db: service: modify methods to get topology_requests data Modify get_topology_request_state (and wait_for_topology_request_completion), so that it doesn't call on_internal_error when request_id isn't in the topology_requests table if require_entry == false. Add other methods to get topology request entry.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	880058073b	db: service: add request type column to topology_requests topology_requests table will be used by task manager node ops tasks, but it loses info about request type, which is required by tasks. Add request_type column to topology_requests.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	91fbfbf98a	node_ops: add task manager module and node_ops_virtual_task Add task manager node ops module and node_ops_virtual_task. Some methods will be implemented in later patches.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	d2e6010670	tasks: api: add virtual task support to get_task_status_recursively Virtual tasks are supported by get_task_status_recursively. Currently only local descendants' statuses are shown.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	5f7f403a15	tasks: api: add virtual task support Virtual tasks are supported by get_task_status, abort_task and wait_task. Task status returned by get_task_status and wait_task: - contains task_kind to indicate whether it's virtual (cluster) or regular (node) task; - children list apart from task_id contains node address of the task.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	20ba7ceff9	tasks: api: add virtual tasks support to get_tasks task_manager/list_module_tasks/{module} starts supporting virtual tasks, which means that their stats will also be shown for users. Additional task_kind param is added to indicate whether the task is virutal (cluster-wide) or regular (node-wide). Support in other paths will be added in following patches.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	1d85b319e0	tasks: add task_handler to hide task and virtual_task differences from user Contrary to regular tasks, which are per-operation, virtual tasks are associated with the whole group of operations. There may be many operations of each group performed at the same time. Info about each running operation will be shown to a user through the API. For virtual tasks, task manager imitates a regular task covering each operation, but task_manager::tasks aren't actually created in the memory. Instead, information (e.g. status) about the operation is retrieved from associated service and passed to a user. To hide most of the differences from user, task_handler class is created. Task handler performs appropriate actions depending on task's kind. However, users need to stay conscious about the kind of task, because: - get_task_status and wait_task do not unregister virtual tasks; - time for which a virtual tasks stays in task manager depends on associated service and tasks' implementation; - number of virtual task's children shown by get_tasks doesn't have to be monotonous. API is modified to use task_handler. API-specific classes are moved to task_handler.{cc,hh}.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	abde7ba271	tasks: modify invoke_on_task Modify task_manager::invoke_on_task to also check virtual tasks.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	6029936665	tasks: implement task_manager::virtual_task::impl::get_children Return a vector of task_identity of all children of a virtual task in a cluster.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	9de8d4b5b0	tasks: keep virtual tasks in task manager Virtual tasks are kept in task manager together with regular tasks. All virtual tasks are stored on shard 0. task_manager::module::make_task is modified to consider virtual tasks as possible parents.	2024-07-23 13:35:01 +02:00
Aleksandra Martyniuk	00cfc49d18	tasks: introduce task_manager::virtual_task A virtual task is a new kind of task supported by task manager, which covers cluster-wide operations. From users' perspective virtual tasks behave similarly to task_manager::tasks. The API side of virtual tasks will be covered in the following patches. Contrary to task_manager::task, virtual task does not update its fields proactively. Moreover, no object is kept in memory for each individual virtual task's operation. Instead a service (or services) is queried on API user's demand to learn about the status of running operation. Hence the name. task_manager::virtual_task is responsible for a whole group of virtual tasks, i.e. for tracking and generating statuses of all operations of similar type. To enable tracking of some kind of operations, one needs to override task_manager::virtual_task::impl and provide implementations of the methods returning appropriate information about the operations. task_manager::virtual_task must be kept on shard 0. Similarly to task_manager::tasks, virtual tasks can have child tasks, responsible for tracking suboperations' progress. But virtual tasks cannot have parents - they are always roots in task trees. Some methods and structs will be implemented in later patches.	2024-07-23 13:35:01 +02:00
Nadav Har'El	bac7c33313	alternator: fix "/localnodes" to not return nodes still joining Alternator's "/localnodes" HTTP request is supposed to return the list of nodes in the local DC to which the user can send requests. The existing implementation incorrectly used gossiper::is_alive() to check for which nodes to return - but "alive" nodes include nodes which are still joining the cluster and not really usable. These nodes can remain in the JOINING state for a long time while they are copying data, and an attempt to send requests to them will fail. The fix for this bug is trivial: change the call to is_alive() to a call to is_normal(). But the hard part of this test is the testing: 1. An existing multi-node test for "/localnodes" assummed that right after a new node was created, it appears on "/localnodes". But after this patch, it may take a bit more time for the bootstrapping to complete and the new node to appear in /localnodes - so I had to add a retry loop. 2. I added a test that reproduces the bug fixed here, and verifies its fix. The test is in the multi-node topology framework. It adds an injection which delays the bootstrap, which leaves a new node in JOINING state for a long time. The test then verifies that the new node is alive (as checked by the REST API), but is not returned by "/localnodes". 3. The new injection for delaying the bootstrap is unfortunately not very pretty - I had to do it in three places because we have several code paths of how bootstrap works without repair, with repair, without Raft and with Raft - and I wanted to delay all of them. Fixes #19694. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19725	2024-07-23 13:51:16 +03:00
Pavel Emelyanov	65565a56c3	Merge 's3/client: add client::upload_file()' from Kefu Chai this member function prepares for the backup feature, where the object to be stored in the object storage is already persisted as a file on local filesystem. this brings us two benefits: - with the file, we don't need to accumulate the payloads in memory and send them in batch, as we do in upload_sink and in upload_jumbo_sink. this puts less pressure on the memory subsystem. - with the file, we can read multiple parts in parallel if multpart upload applies to it, this helps to improve the throughput. so, this new helper is introduced to help upload an sstable from local filesystem to the object storage. Fixes https://github.com/scylladb/scylladb/issues/16287 Closes scylladb/scylladb#16387 * github.com:scylladb/scylladb: s3/client: add client::upload_file() s3/client: move constants related to aws constraints out	2024-07-23 12:39:27 +03:00
Kefu Chai	061def001d	s3/client: add client::upload_file() this member function prepares for the backup feature, where the object to be stored in the object storage is already persisted as a file on local filesystem. this brings us two benefits: - with the file, we don't need to accumulate the payloads in memory and send them in batch, as we do in upload_sink and in upload_jumbo_sink. this puts less pressure on the memory subsystem. - with the file, we can read multiple parts in parallel if multpart upload applies to it, this helps to improve the throughput. so, this new helper is introduced to help upload an sstable from local filesystem to the object storage. Fixes #16287 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-23 14:39:30 +08:00
Kefu Chai	6701ce50a5	s3/client: move constants related to aws constraints out minimum_part_size and aws_maximum_parts_in_piece are AWS S3 related constraints, they can be reused out of client::upload_sink and client::upload_jumbo_sink, so in this change * extract them out. * use the user-defined literal with IEC prefix for better readablity to define minimum_part_size * add "aws_" prefix to `minimum_part_size` to be more consistent. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-23 14:33:54 +08:00
Takuya ASADA	c3bea539b6	dist: support nonroot and offline mode for scylla-housekeeping Introduce support nonroot and offline mode for scylla-housekeeping. Closes #13084 Closes scylladb/scylladb#13088	2024-07-23 07:57:32 +03:00
Aleksandra Martyniuk	dfe3af40ed	test: tasks: adjust tests to new wait_task behavior After `c1b2b8cb2c` /task_manager/wait_task/ does not unregister tasks anymore. Delete the check if the task was unregistered from test_task_manager_wait. Check task status in drain_module_tasks to ensure that the task is removed from task manager. Fixes: #19351. Closes scylladb/scylladb#19834	2024-07-22 18:24:54 +03:00
Nadav Har'El	9eb47b3ef0	Merge 'config: round-trip boolean configuration variables' from Avi Kivity When you SELECT a boolean from system.config, it reads as true/false, but this isn't accepted on UPDATE (instead, we accept 1/0). This is surprising and annoying, so accept true/false in both directions. Not a regression, so a backport isn't strictly necessary. Closes scylladb/scylladb#19792 * github.com:scylladb/scylladb: config: specialize from-string conversion for bool config: wrap boost::lexical_cast<> when converting from strings	2024-07-22 17:53:02 +03:00
Botond Dénes	d3135db457	Merge 'commitlog: Add optional max lifetime parameter to cl instance' from Calle Wilund If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log. Related to #15820 The functionality here is to prevent pathological/test cases where a silent system cannot fully process stuff like compaction, GC etc due to things like CL forcing smaller GC windows etc. Closes scylladb/scylladb#15971 * github.com:scylladb/scylladb: commitlog: Make max data lifetime runtime-configurable db::config: Expose commitlog_max_data_lifetime_in_s parameter commitlog: Add optional max lifetime parameter to cl instance	2024-07-22 17:21:33 +03:00
Botond Dénes	3ff33e9c70	Update ./tools/java submodule * ./tools/java dbaf7ba7...0b4accdd (1): > cassandra-stress: Make default repl. strategy NetworkTopologyStrategy Closes scylladb/scylladb#19818	2024-07-22 17:12:09 +03:00
Kamil Braun	8ec90a0e60	docs: extend "forbidden operations" section for Raft-topology upgrade The Raft-topology upgrade procedure must not be run concurrently with version upgrade. Closes scylladb/scylladb#19746	2024-07-22 12:45:38 +03:00
Botond Dénes	591876b44e	Merge 'sstables: do not reload components of unlinked sstables' from Lakshmi Narayanan Sreethar The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes https://github.com/scylladb/scylladb/issues/19722 Closes scylladb/scylladb#19717 github.com:scylladb/scylladb: boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded sstables: do not reload components of unlinked sstables sstables/sstables_manager: introduce on_unlink method	2024-07-22 12:08:25 +03:00
Avi Kivity	358147959e	Merge 'keep table directory open for flushing' from Laszlo Ersek `filesystem_storage` methods frequently call `sync_directory()`, for the sake of flushing (sync'ing) a directory. `sync_directory()` always brackets the sync with open and close, and given that most `sync_directory()` calls target the sstable base directory, those repeated opens and closes are considered wasteful. Rework the `filesystem_storage::_dir` member (from a mere pathname) so that it stand for an `opened_directory` object, which keeps the sstable base directory open, for the purpose of repeated sync'ing. Resolves #2399. Closes scylladb/scylladb#19624 * github.com:scylladb/scylladb: sstables/storage: synch "dst_dir" more leanly in create_links_common() sstables/storage: close previous directory asynchronously upon dir change sstables/storage: futurize change_dir_for_test() sstables/storage: sync through "opened_directory" in filesystem...::move() sstables/storage: sync through "opened_directory" in the "easy" cases sstables/storage: introduce "opened_directory" class	2024-07-21 17:07:44 +03:00
Yaron Kaikov	d3cbe04130	.github/mergify.yml: update conf to support `6.1` Modify Mergify configuation to support `6.1` instead of `5.2` which is EOL Closes scylladb/scylladb#19810	2024-07-21 17:02:19 +03:00
Łukasz Paszkowski	781eb7517c	api/system: add highest_supported_sstable_format path Current upgrade dtest rely on a ccm node function to get_highest_supported_sstable_version() that looks for r'Feature (.*)_SSTABLE_FORMAT is enabled' in the log files. Starting from scylla-6.0 ME_SSTABLE_FORMAT is enabled by default and there is no cluster feature for it. Thus get_highest_supported_sstable_version() returns an empty list resulting in the upgrade tests failures. This change introduces a seperate API path that returns the highest supported sstable format (one of la, mc, md, me) by a scylla node. Fixes scylladb/scylladb#19772 Backports to 6.0 and 6.1 required. The current upgrade test in dtest checks scylla upgrades up to version 5.4 only. This patch is a prerequisite to backport the upgrade tests fix in dtest. Closes scylladb/scylladb#19787	2024-07-21 17:00:19 +03:00
Avi Kivity	36b57f3432	Merge 'token: inline optimizations' from Benny Halevy This series contains several optimizations for dht::token around its comparison functions as well as minimum_token and maximum_token definitions, by moving them inline into dht/token.hh This results in a nice improvement in perf-simple-query: ``` ==> perf-simple-query.pre <== (`21c67a5a64`) throughput: mean=95774.01 standard-deviation=1129.83 median=96243.64 median-absolute-deviation=1090.08 maximum=96864.09 minimum=94471.19 instructions_per_op: mean=41813.68 standard-deviation=16.27 median=41809.29 median-absolute-deviation=7.02 maximum=41841.64 minimum=41799.41 cpu_cycles_per_op: mean=22383.19 standard-deviation=331.01 median=22254.53 median-absolute-deviation=332.26 maximum=22744.11 minimum=21996.73 ==> perf-simple-query.post.0 <== (token: move ordering operator inline) throughput: mean=96350.01 standard-deviation=640.10 median=96228.88 median-absolute-deviation=621.45 maximum=96988.16 minimum=95478.51 instructions_per_op: mean=41627.13 standard-deviation=37.55 median=41627.06 median-absolute-deviation=2.43 maximum=41679.44 minimum=41573.31 cpu_cycles_per_op: mean=22184.65 standard-deviation=151.03 median=22163.05 median-absolute-deviation=120.83 maximum=22348.49 minimum=21967.30 ==> perf-simple-query.post.1 <== (token: operator<=>: optimize the common case) throughput: mean=96778.29 standard-deviation=1719.34 median=97021.72 median-absolute-deviation=1059.56 maximum=98300.99 minimum=93893.75 instructions_per_op: mean=41590.25 standard-deviation=5.53 median=41589.50 median-absolute-deviation=4.17 maximum=41598.39 minimum=41584.57 cpu_cycles_per_op: mean=22135.33 standard-deviation=471.98 median=21969.30 median-absolute-deviation=244.89 maximum=22905.24 minimum=21685.33 ==> perf-simple-query.post.3 <== (token: always initialize data member) throughput: mean=98264.33 standard-deviation=998.49 median=98533.02 median-absolute-deviation=780.45 maximum=99075.40 minimum=96656.51 instructions_per_op: mean=41657.61 standard-deviation=22.53 median=41648.49 median-absolute-deviation=12.89 maximum=41696.81 minimum=41642.07 cpu_cycles_per_op: mean=21808.57 standard-deviation=93.63 median=21794.56 median-absolute-deviation=75.41 maximum=21949.46 minimum=21719.55 ==> perf-simple-query.post.4 <== (token: constexpr ctors, methods, and minimum/maximum_token) throughput: mean=98095.05 standard-deviation=1333.32 median=98930.22 median-absolute-deviation=906.80 maximum=99209.38 minimum=96194.25 instructions_per_op: mean=41572.28 standard-deviation=6.04 median=41574.49 median-absolute-deviation=4.76 maximum=41579.56 minimum=41564.72 cpu_cycles_per_op: mean=21831.35 standard-deviation=169.56 median=21732.86 median-absolute-deviation=102.93 maximum=22091.66 minimum=21689.63 ==> perf-simple-query.post.5 <== (token: initialize non-key tokens with min() value) throughput: mean=99502.32 standard-deviation=1003.70 median=99744.03 median-absolute-deviation=388.87 maximum=100482.95 minimum=97813.42 instructions_per_op: mean=41593.48 standard-deviation=17.27 median=41585.25 median-absolute-deviation=8.46 maximum=41619.41 minimum=41575.86 cpu_cycles_per_op: mean=21545.90 standard-deviation=86.66 median=21578.01 median-absolute-deviation=43.17 maximum=21612.41 minimum=21395.42 ``` Optimization only. No backport required Closes scylladb/scylladb#19782 * github.com:scylladb/scylladb: token: initialize non-key tokens with min() value token: make kind-based ctor private token: constexpr ctors, methods, and minimum/maximum_token token: always initialize data member everywhere: use dht::token is_{minimum,maximum} token: operator<=>: optimize the common case token: move ordering operator inline partitioner_test: add more token-level tests	2024-07-21 15:07:36 +03:00
Benny Halevy	365e1fb1b9	token: initialize non-key tokens with min() value We already have code to return min() for the minimum and maximum tokens in long_token() and raw(), so instead of using code to return it, just make sure to set it in the _data member. Note that although this change affect serialization, the existing codebase ignores the deserialized bytes and places a constant (0 before this patch, or min() with it) in _data for non-key (minumum or maximum) tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	9f05072527	token: make kind-based ctor private Users outside of the token module don't need to mess with the token::kind. They can only create key tokens. Never, minimum or maximum tokens, with a particular datya value. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	6806112189	token: constexpr ctors, methods, and minimum/maximum_token sizeof(dht::token) is only 16 bytes and therefore it can be passed with 2 registers. There is no sense in defining minimum_token and maximum_token out of line, returning a token& to statically allocated values that require memory access/copy, while the only call sites that needs to point to the static min/max tokens are in dht::ring_position_view. Instead, they can be defined inline as constexpr functions and return their const values. Respectively, define token ctors and methods as constexpr where applicable (and noexcept while at it where applicable) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	e509ccd184	token: always initialize data member Make sure to always initalize the _data member to 0 for non-key (minimum or maximum) tokens. This allows to simplify the equality operator that now doesn't need to rely on `operator<=>` Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	850f298ccd	everywhere: use dht::token is_{minimum,maximum} The is_minimum/is_maximum predicates are more efficient than comparing the the m{minimum,maximum}_token values, respectrively. since the is_* functions need to check only the token kind. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	5a60ba5c5f	token: operator<=>: optimize the common case Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	adc1d7f68f	token: move ordering operator inline Token comparisons are abundant. The equality operator is defined inline in dht/token.hh by calling `t1 <=> t2`, and so is `tri_compare_raw`, which `operator<=>` calls in the common path, but `operator<=>` itself is defined out of line, losing the benefits of inlining. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:42 +03:00
Benny Halevy	7e745d31ed	partitioner_test: add more token-level tests Before changing how minimum and maximum tokens are represented in memory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-07-20 21:21:37 +03:00
Kamil Braun	ad68a7f799	Merge 'test: raft: fix the flaky `test_raft_recovery_stuck`' from Emil Maskovsky Use the rolling restart to avoid spurious driver reconnects. This can be eventually reverted once the scylladb/python-driver#295 is fixed. Fixes scylladb/scylladb#19154 Closes scylladb/scylladb#19771 * github.com:scylladb/scylladb: test: raft: fix the flaky `test_raft_recovery_stuck` test: raft: code cleanup in `test_raft_recovery_stuck`	2024-07-19 19:34:43 +02:00
Piotr Dulikowski	4571262e46	Merge 'Improve constness of functions schema code' from Marcin Maliszkiewicz In v4 of scylladb/scylladb#19598 the last commit of the patch was replaced but this change missed merge so submitting it in a separate patch. In the current patch, the original functions class correctly marks methods as const where appropriate, and the instance() method now returns a const object. This ensures protection against accidental modifications, as all changes must go through the change_batch object. Since the functions_changer class was intended to serve the same purpose, it is now redundant. Therefore, we are reverting the commit that introduced it. Relates scylladb/scylladb#19153 Closes scylladb/scylladb#19647 * github.com:scylladb/scylladb: cql3: functions: replace template with std::function in with_udf_iter() cql3: functions: improve functions class constness handling Revert "cql3: functions: make modification functions accessible only via batch class"	2024-07-19 19:23:11 +02:00
Emil Maskovsky	9ab25e5cbf	test: raft: replace the use of read_barrier work-around Replaced the old `read_barrier` helper from "test/pylib/util.py" by the new helper from "test/pylib/rest_client.py" that is calling the newly introduced direct REST API. Replaced in all relevant tests and decommissioned the old helper. Introduced a new helper `get_host_api_address` to retrieve the host API address - which in come cases can be different from the host address (e.g. if the RPC address is changed). Fixes: scylladb/scylladb#19662 Closes scylladb/scylladb#19739	2024-07-19 19:20:44 +02:00
Laszlo Ersek	680403d2cd	sstables/storage: synch "dst_dir" more leanly in create_links_common() filesystem_storage::create_links_common() runs on directories that generally differ from "_dir", thus, we can't replace its sync_directory() calls with _dir.sync(). We can still use a common (temporary) "opened_directory" object for synching "dst_dir" three times, saving two open and two close operations. This patch is best viewed with "git show -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:46:31 +02:00
Laszlo Ersek	0057ee2431	sstables/storage: close previous directory asynchronously upon dir change In "filesystem_storage", change_dir_for_test() and move() replace "_dir" with "opened_directory(new_dir)" using the move assignment operator. Consequently, the file descriptor underlying "_dir" is closed synchronously as a part of object destruction. Expose the async file::close() function through "opened_directory". Introduce filesystem_storage::change_dir() as a common async workhorse for both change_dir_for_test() and move(). In change_dir(), close the old directory asynchronously. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:43:19 +02:00
Laszlo Ersek	6711574646	sstables/storage: futurize change_dir_for_test() Currently change_dir_for_test() is synchronous. Make it return a future, so that we can use async operations in change_dir_for_test() overrides. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:43:19 +02:00
Laszlo Ersek	ef446c4da0	sstables/storage: sync through "opened_directory" in filesystem...::move() Near the end of filesystem_storage::move(), we sync both the old directory, and the new directory, if "delay_commit" is null. At that point, the new directory is just "_dir"; call _dir.sync() instead of sync_directory(). This patch is best viewed with "git show -W". Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:14:46 +02:00
Laszlo Ersek	4d33640481	sstables/storage: sync through "opened_directory" in the "easy" cases Replace sst.sstable_write_io_check(sync_directory, _dir.native()) with _dir.sync(sst._write_error_handler) Also replace the explicit (but still relatively "easy") open_checked_directory() + flush() + flush() operations in filesystem_storage::seal() with two _dir.sync() calls. Because filesystem_storage::create_links_common() is marked "const", we need to declare "_dir" mutable. Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:14:46 +02:00
Laszlo Ersek	2c01171a4d	sstables/storage: introduce "opened_directory" class "filesystem_storage::_dir" is currently of type "std::filesystem::path". Introduce a new class called "opened_directory", and change the type of "_dir" to the new class "opened_directory". "opened_directory" keeps the directory open, and offers synchronization on that open directory (i.e., without having to reopen the directory every time). In subsequent patches, that will be put to use. The opening and closing of the wrapped directory cannot easily be handled explicitly in the "filesystem_storage" member functions. ( Namely, test::store() and test::rewrite_toc_without_scylla_component() -- both in "test/lib/sstable_utils.hh" -- perform "open -> ... -> seal" sequences, and such a sequence may be executed repeatedly. For example, sstable_directory_shared_sstables_reshard_correctly() [test/boost/sstable_directory_test.cc] does just that; it "reopens" the "filesystem_storage" object repeatedly. ) Rather than trying to restrict the order of "filesystem_storage" member function calls, replace the "opened_directory" object with a new one whenever the directory pathname is re-set; namely in filesystem_storage::change_dir_for_test() and filesystem_storage::move(). Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>	2024-07-19 15:14:46 +02:00
Piotr Dulikowski	204a479e82	Merge 'db/hints: Test `manager::too_many_in_flight_hints_for()`' from Dawid Mędrek In `6e79d64`, the behavior of `manager::too_many_in_flight_hints_for()` was accidentally modified. It remained unnoticed for some time and then fixed. In this commit, we add a test verifying that the concurrency of hints being written to disk is indeed limited and the limitations are imposed properly. Refs scylladb/scylladb#17636 Fixes scylladb/scylladb#17660 Closes scylladb/scylladb#19741 * github.com:scylladb/scylladb: db/hints: Verify that Scylla limits the concurrency of written hints db/hints: Coroutinize `hint_endpoint_manager::store_hint()` db/hints: Move a constant value to the TU it's used in	2024-07-19 13:26:34 +02:00
Lakshmi Narayanan Sreethar	0615c8a46b	boost/bloom_filter_test: add testcase to verify unlinked sstables are not reloaded Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar	31ff69a13c	sstables: do not reload components of unlinked sstables The SSTable is removed from the reclaimed memory tracking logic only when its object is deleted. However, there is a risk that the Bloom filter reloader may attempt to reload the SSTable after it has been unlinked but before the SSTable object is destroyed. Prevent this by removing the SSTable from the reclaimed list maintained by the manager as soon as it is unlinked. The original logic that updated the memory tracking in `sstables_manager::deactivate()` is left in place as (a) the variables have to be updated only when the SSTable object is actually deleted, as the memory used by the filter is not freed as long as the SSTable is alive, and (b) the `_reclaimed.erase(*sst)` is still useful during shutdown, for example, when the SSTable is not unlinked but just destroyed. Fixes #19722 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-19 13:15:57 +05:30
Lakshmi Narayanan Sreethar	dbf22848a8	sstables/sstables_manager: introduce on_unlink method Added a new method, on_unlink() to the sstable_manager. This method is now used by the sstable to notify the manager when it has been unlinked, enabling the manager to update its bookkeeping as required. The on_unlink method doesn't do anything yet but will be updated by the next patch. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-19 13:15:55 +05:30
Kefu Chai	c52f49facb	build: cmake: do not mark cqlsh noarch in `3c7af287`, cqlsh's reloc package was marked as "noarch", and its filename was updated accordingly in `configure.py`, so let's update the CMake building system accordingly. this change should address the build failure of ``` 08:48:14 [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz". ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19710	2024-07-19 08:00:17 +03:00
Kefu Chai	34bf10050b	build: cmake: bump up the minimal required fmt to 10.0.0 in `cccec07581`, we started using a featured introduced by {fmt} v10. so we need to bump up the required version in CMake as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19709	2024-07-19 07:58:31 +03:00
Botond Dénes	79567c1c98	scripts/open-coredump.sh: allow complete bypass of S3 server In some cases, the S3 server will not know about a certain build and any attempt to open a coredump which was generated by this build will fail, because the S3 server returns an empty/illegal response. There is already a bypass for missing package-url in the S3 server response, but this doesn't help in the case when the response is also missing other metadata, like build-id and version info. Extend this existig mechanism with a new --scylla-package-url flag, which provides complete bypass. When provided, the S3 server will not be queried at all, instead the package is downloaded from the link and version metadata is extracted from the package itself. Closes scylladb/scylladb#19769	2024-07-18 21:43:53 +03:00
Avi Kivity	58a8fd6f19	Update tools/python3 submodule (install umask, selinux) * tools/python3 18fa79e...fbf12d0 (1): > install.sh: fix incorrect permission on strict umask Ref https://github.com/scylladb/scylladb/issues/8589 Ref https://github.com/scylladb/scylladb/issues/19775	2024-07-18 21:36:50 +03:00
Avi Kivity	7984e595ce	Update tools/java submodule (install selinux context) * tools/java 33938ec16f...dbaf7ba7db (1): > install.sh: apply correct security context on offline installer Ref https://github.com/scylladb/scylladb/issues/8589	2024-07-18 21:03:32 +03:00
Kefu Chai	4fbfecbb3e	Update seastar submodule * seastar 908ccd93...67065040 (44): > metrics: Use this_shard_id unconditionally > sstring: prevent fmt from formatting sstring as a sequence > coding style: allow lines up to 160 chars in length > src/core: remove unnecessary includes > when_all: stop using deprecated std::aligned_union_t > reactor: respect preempt requests in debug mode > core: fix -Wunused-but-set-variable > gate: add try_hold > sstring: declare nested type with typename > rpc: pass start time to `wait_for_reply()` which accepts `no_wait_type` > scripts/perftune.py: get rid of "SyntaxWarning: invalid escape sequence" > scripts/perftune.py: add support for tweaking VLAN interfaces > scripts/perftune.py: improve discovery of bond device slaves > scripts/perftune.py: refactor __learn_slaves() function > code-cleanup: add missing header guards > code-cleanup: remove redundant includes of 'reactor.hh' > code-cleanup: explicitly depend on io_desc.hh > scripts/perftune.py: aRFS should be disabled by default in non-MQ mode > code-cleanup: remove unneeded includes of fair_queue.hh > docker: fix mount of install-dependencies > code-cleanup: remove redundant includes of linux-aio.hh > fstream: reformat the doxygen comment of make_file_input_stream() > iostream: use new-style consumer to implement copy() > stall-analyser: use 0 for default value of --minimum > reactor: fix crash during metrics gathering > build: run socket test with linux-aio reactor backend > test: Add testing of connect()-ion abort ability > linux_perf_event: exclude_idle only on x86_64 > linux_perf_event: add make_linux_perf_event > stall-analyser: gracefully handle empty input > shared_token_bucket: resolve FIXME > io_tester: ensure that file object is valid when closing it > tutorial.md: fix typo in Dan Kegel's name > test,rpc: Extend simple ping-pong case > rpc: Calculate delay and export it via metrics > rpc: Exchange handler duration with server responses > rpc: Track handler execution time > rpc: Fix hard-coded constants when sending unknown verb reply > reactor: Unfriend alien and smp queues > reactor: Add and use stopped() getter > reactor: Generalize wakeup() callers > file: Use lighter access to map of fs-info-s > file: Fix indentation after previous patch > file: Don't return chain of ready futures from make_file_impl Closes scylladb/scylladb#19780	2024-07-18 20:00:15 +03:00
Avi Kivity	f7e24cf0b1	Update tools/jmx submodule (umask fix) * tools/jmx 3328a22...89308b7 (1): > install.sh: fix incorrect permission on strict umask Ref scylladb/scylladb#14383 Ref scylladb/scylladb#8589	2024-07-18 19:37:57 +03:00
Avi Kivity	c3b9e64713	Merge 'sstable::open_sstable: pass origin from the writer' from Lakshmi Narayanan Sreethar Pass origin when opening the sstable from the writer and store it in the sstable object. This will make the origin available for the entire write path. Closes scylladb/scylladb#19721 * github.com:scylladb/scylladb: sstables: use _origin in write path sstable::open_sstable: pass and store origin	2024-07-18 19:30:32 +03:00
Avi Kivity	926a02451e	Merge 'sstables/index_reader: abort reading during shutdown' from Lakshmi Narayanan Sreethar This PR adds support for aborting index reads from within `index_consume_entry_context::consume_input` when the server is being stopped. The abort source is now propagated down to the `index_consume_entry_context`, making it available for `consume_input` to check if an abort has been requested. If an abort is detected, `consume_input` will throw an exception to stop the index read operation. Closes scylladb/scylladb#19453 * github.com:scylladb/scylladb: test/boost: test abort behaviour during index read sstables/index_reader: stop consuming index when abort has been requested sstables::index_consume_entry_context: store abort_source sstable: drop old filter only after the new filter is built during rebuild sstables/sstables_manager: store abort_source in sstable_manager replica/database: pass abort_source to database constructor	2024-07-18 19:26:22 +03:00
Avi Kivity	0780228aa2	config: specialize from-string conversion for bool The yaml/json representation for bool is true/false, but boost::lexical_cast is 1/0. Specialize bool conversion to accept true/false (for yaml/json compatibilty) and 1/0 (for backward compatibility). This provides round-trip conversion for bool configs in system.config.	2024-07-18 18:38:22 +03:00
Avi Kivity	33eaa61cdd	config: wrap boost::lexical_cast<> when converting from strings Configuration uses boost::lexical_cast to convert strings to native values (e.g. bools/ints). However, boost::lexical_cast doesn't recognize true/false for bool. Since we can't change boost::lexical_cast, replace it with a wrapper that forwards directly to boost::lexical_cast. In the next step, we'll specialize it for bool.	2024-07-18 18:38:19 +03:00
Piotr Dulikowski	5ec8c06561	test: regression test for MV crash with tablets during decommission Regression test for scylladb/scylladb#19439. Co-authored-by: Kamil Braun <kbraun@scylladb.com>	2024-07-18 16:00:26 +02:00
Anna Mikhlin	cd007123c3	Update ScyllaDB version to: 6.2.0-dev	2024-07-18 16:07:07 +03:00
Avi Kivity	47e99f4e04	Merge 'Fix lwt semaphore guard accounting' from Gleb Natapov Currently the guard does not account correctly for ongoing operation if semaphore acquisition fails. It may signal a semaphore when it is not held. Should be backported to all supported versions. Closes scylladb/scylladb#19699 * github.com:scylladb/scylladb: test: add test to check that coordinator lwt semaphore continues functioning after locking failures paxos: do not signal semaphore if it was not acquired	2024-07-18 14:58:31 +03:00
Dawid Medrek	8b6e887e02	db/hints: Verify that Scylla limits the concurrency of written hints In `6e79d64`, the behavior of `manager::too_many_in_flight_hints_for()` was accidentally modified. It remained unnoticed for some time and then fixed. In this commit, we add a test verifying that the concurrency of hints being written to disk is indeed limited and the limitations are imposed properly.	2024-07-18 13:49:29 +02:00
Kefu Chai	db56af2e41	replication_strategy: mark fmt::formatter<..>::format() const since fmt 11, it is required that the format() to be const, otherwise its caller in fmt library would not be able to call it. and compile would fail like: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o -MF locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o.d -o locator/CMakeFiles/scylla_locator.dir/RelWithDebInfo/abstract_replication_strategy.cc.o -c /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc:9: In file included from /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:16: In file included from /home/kefu/dev/scylladb/gms/inet_address.hh:11: In file included from /usr/include/fmt/ostream.h:23: In file included from /usr/include/fmt/chrono.h:23: In file included from /usr/include/fmt/format.h:41: /usr/include/fmt/base.h:1393:23: error: no matching member function for call to 'format' 1393 \| ctx.advance_to(cf.format(static_cast<qualified_type>(arg), ctx)); \| ~~~^~~~~~ /usr/include/fmt/base.h:1374:21: note: in instantiation of function template specialization 'fmt::detail::value<fmt::context>::format_custom_arg<locator::vnode_effective_replication_map::factory_key, fmt::formatter<locator::vnode_effective_replication_map::factory_key>>' requested here 1374 \| custom.format = format_custom_arg< \| ^ /home/kefu/dev/scylladb/seastar/include/seastar/util/log.hh:299:33: note: in instantiation of function template specialization 'fmt::format_to<seastar::internal::log_buf::inserter_iterator &, locator::vnode_effective_replication_map::factory_key &, const void , 0>' requested here 299 \| return fmt::format_to(it, fmt.format, std::forward<Args>(args)...); \| ^ /home/kefu/dev/scylladb/seastar/include/seastar/util/log.hh:428:9: note: in instantiation of function template specialization 'seastar::logger::log<locator::vnode_effective_replication_map::factory_key &, const void >' requested here 428 \| log(log_level::debug, std::move(fmt), std::forward<Args>(args)...); \| ^ /home/kefu/dev/scylladb/locator/abstract_replication_strategy.cc:561:18: note: in instantiation of function template specialization 'seastar::logger::debug<locator::vnode_effective_replication_map::factory_key &, const void *>' requested here 561 \| rslogger.debug("create_effective_replication_map: found {} [{}]", key, fmt::ptr(erm.get())); \| ^ /home/kefu/dev/scylladb/locator/abstract_replication_strategy.hh:471:10: note: candidate function template not viable: 'this' argument has type 'const fmt::formatter<locator::vnode_effective_replication_map::factory_key>', but method is not marked const 471 \| auto format(const locator::vnode_effective_replication_map::factory_key& key, FormatContext& ctx) { \| ^ 1 error generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19768	2024-07-18 13:52:36 +03:00
Avi Kivity	c93e2662ae	build: regenerate toolchain for optimized clang Generate a profile-guided-optimization build of clang and install it. See `bd34f2fe46`. The optimized clang package can be found in https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-x86_64.tar.gz https://devpkg.scylladb.com/clang/clang-18.1.6-Fedora-40-aarch64.tar.gz Closes scylladb/scylladb#19685	2024-07-18 12:57:45 +03:00
Botond Dénes	8cc99973eb	Merge 'Apply sstable io error handler to exceptions generated when opening file' from Calle Wilund Fixes #19753 SSTable file open provides an `io_error_handler` instance which is applied to a file-wrapper to process any IO errors happing during read/write via the handler in `storage_service`, which in turn will effectively disable the node. However, this is not applied to the actual open operation itself, i.e. any exception generated by the file open call itself will instead just escape to caller. This PR adds filtering via the `error_handler` to sstable open + makes `storage_service` "isolate" mechanism non-module-static (thus making it testable) and adds tests to check we exhibit the same behaviour in both cases. The main motivation for this issue it discussions that secondary level IO issues (i.e. caused by extensions) should trigger the same behaviour as, for example, running out of disk space. Closes scylladb/scylladb#19766 * github.com:scylladb/scylladb: memtable_test: Add test for isolate behaviour on exceptions during flush cql_test_env: Expose storage service storage_service: Make isolate guard non-static and add test accessor sstable: apply error_handler on open exceptions	2024-07-18 08:14:40 +03:00
Avi Kivity	d5af86bd8a	test: cql-pytest: config_value_context: remove strange ast.literal_eval call cql-pytest's config_value_context is used to run a code sequence with different ScyllaDB configuration applied for a while. When it reads the original value (in order to restore it later), it applies ast.literal_eval() to it. This is strange, since the config variable isn't a Python literal. It was added in `8c464b2ddb` ("guardrails: restrict replication strategy (RS)"). Presumably, as a workaround for #19604 - it sufficiently massaged the input we read via SELECT to be acceptable later via UPDATE. Now that #19604 is fixed, we can remove the call to ast.literal_eval, but have to fix up the parameters to config_value_context to something that will be accepted without further massaging. This is a step towards fixing #15559, where we want to run some tests with a boolean configuration variable changed, and literal_eval is transforming the string representation of integers to integers and confusing the driver. Closes scylladb/scylladb#19696	2024-07-18 08:11:26 +03:00
Dawid Medrek	414ea68cac	exceptions/exceptions.hh: Wrap `#include <concepts>` within an `#ifdef` `GitHub Actions / Analyze #includes in source files` keeps reporting that the include shouldn't be present in the file. The reason is that we use FMT with version >10, so the fragment of the code that uses the include is not compiled. We move the include to a place where it's used, which should fix the warnings. Closes scylladb/scylladb#19776	2024-07-17 22:09:41 +03:00
Yaron Kaikov	ddcc6ec1e4	dist/docker/debian/build_docker.sh: Build container based on Ubuntu24.04 Now that we added support for Ubuntu24.04 and also migrating our images to be based on that (https://github.com/scylladb/scylla-machine-image/pull/530), we should also modify our docker image Fixes: https://github.com/scylladb/scylladb/issues/19738 Closes scylladb/scylladb#19764	2024-07-17 18:45:48 +03:00
Calle Wilund	91b1be6736	memtable_test: Add test for isolate behaviour on exceptions during flush Tests that certain exceptions thrown during flush to sstable does not crash the node, but does trigger io_error_handler and causes node isolation	2024-07-17 09:36:28 +00:00
Calle Wilund	f996dfc4fa	cql_test_env: Expose storage service So tests can play with it.	2024-07-17 09:36:28 +00:00
Calle Wilund	de728958d1	storage_service: Make isolate guard non-static and add test accessor Makes storage service isolate repeatable in same process and more testable. Note, since the test var now is shard-local we need to check twice: once on error, once on reaching shard zero for actual shutdown.	2024-07-17 09:36:28 +00:00
Calle Wilund	7918ec2e39	sstable: apply error_handler on open exceptions	2024-07-17 09:36:27 +00:00
Emil Maskovsky	a89facbc74	test: raft: fix the flaky `test_raft_recovery_stuck` Use the rolling restart to avoid spurious driver reconnects. This can be eventually reverted once the scylladb/python-driver#295 is fixed. Fixes scylladb/scylladb#19154	2024-07-17 09:16:06 +02:00
Emil Maskovsky	ef3393bd36	test: raft: code cleanup in `test_raft_recovery_stuck` Cleaning up the imports.	2024-07-17 09:09:46 +02:00
Lakshmi Narayanan Sreethar	7b58fa2534	sstables: use _origin in write path Now that the origin is available inside the sstable object, no need to pass it to the methods called in the write path. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:44:28 +05:30
Lakshmi Narayanan Sreethar	b762a09dcd	sstable::open_sstable: pass and store origin Pass origin when opening the sstable from the writer and store it in the sstable object. This will make the origin available for the entire write path. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:43:30 +05:30
Lakshmi Narayanan Sreethar	7d0f3ace4a	test/boost: test abort behaviour during index read Added a new boost test, index_reader_test, with a testcase to verifyi the abort behaviour during an index read using index_consume_entry_context. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	64dadd5ec2	sstables/index_reader: stop consuming index when abort has been requested When an abort is requested, stop further reading of the index file and throw and exception from index_consume_entry_context::process_state(). Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	c2524337a2	sstables::index_consume_entry_context: store abort_source Store abort source inside sstables::index_consume_entry_context, so that the next patch can implement cancelling the index read when abort is requested. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:50 +05:30
Lakshmi Narayanan Sreethar	587da62686	sstable: drop old filter only after the new filter is built during rebuild sstable::maybe_rebuild_filter_from_index drops the existing filter first and then rebuilds the new filter as the method is only called before the sstable is sealed. But to make the index read abortable, the old filter can be dropped only after the new filter is built so that in case if the index consumer gets aborted, we still have the old filter to write to disk. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:42:47 +05:30
Lakshmi Narayanan Sreethar	6a3e7a5e7a	sstables/sstables_manager: store abort_source in sstable_manager Add a new member that stores the abort_source. This can later be used by the sstables to check if an abort has been requested. Also implement sstables_manager::get_abort_source() that returns a const reference to the abort source. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:36:06 +05:30
Lakshmi Narayanan Sreethar	e2142974f8	replica/database: pass abort_source to database constructor This is in preparation for the following patch that adds abort_source variable to the sstables_manager. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-07-16 20:36:06 +05:30
Piotr Dulikowski	6af7882c59	db/view: drop view updates to replaced node marked as left When a node that is permanently down is replaced, it is marked as "left" but it still can be a replica of some tablets. We also don't keep IPs of nodes that have left and the `node` structure for such node returns an empty IP (all zeros) as the address. This interacts badly with the view update logic. The base replica paired with the left node might decide to generate a view update. Because storage proxy still uses IPs and not host IDs, it needs to obtain the view replica's IP and tell the storage proxy to write a view update to that node - so, it chooses 0.0.0.0. Apparently, storage proxy decides to write a hint towards this address - hinted handoff on the other hand operates on host IDs and not IPs, so it attempts to translate the IP back, which triggers an assertion as there is no replica with IP 0.0.0.0. As a quick workaround for this issue just drop view updates towards nodes which seem to have IPs that are all zeros. It would be more proper to keep the view updates as hints and replay them later to the new paired replica, but achieving this right now would require much more significant changes. For now, fixing a crash is more important than keeping views consistent with base replicas. Fixes: scylladb/scylladb#19439	2024-07-16 15:50:11 +02:00
Emil Maskovsky	21c67a5a64	test: raft: fix the flaky `test_change_ip` The python driver might currently trigger spurios reconnects that cause the `NoHostAvailable` to be thrown, which is not expected. This patch adds a retry mechanism to the test to make skip this failure if it occurs, as a work-around. The proper fix is expected to be done in the scylladb/python-driver#295, once fixed there this work-around can be reverted. Fixes: scylladb/scylla#18547 Closes scylladb/scylladb#19759	2024-07-16 15:46:16 +02:00
Botond Dénes	1be6cfb16e	Update tools/java submodule * tools/java 01ba3c19...33938ec1 (1): > cassandra-stress: delay before retry	2024-07-16 16:29:51 +03:00
Gleb Natapov	4178589826	test: add test to check that coordinator lwt semaphore continues functioning after locking failures	2024-07-16 12:32:25 +03:00
Gleb Natapov	87beebeed0	paxos: do not signal semaphore if it was not acquired The guard signals a semaphore during destruction if it is marked as locked, but currently it may be marked as locked even if locking failed. Fix this by using semaphore_units instead of managing the locked flag manually. Fixes: https://github.com/scylladb/scylladb/issues/19698	2024-07-16 12:32:25 +03:00
Avi Kivity	dde209390f	Merge 'sstables: fix some mixups between the writer's schema and the sstable's schema' from Michał Chojnowski There are two schemas associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. This series fixes the known mixups between the two — when setting up compression, and when setting up the bloom filters. Fixes #16065 The bug is present in all supported versions, so the patch has to be backported to all of them. Closes scylladb/scylladb#19695 * github.com:scylladb/scylladb: sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's sstables: for i_filter downcasts, use dynamic_cast instead of static_cast	2024-07-16 12:17:41 +03:00
Raphael S. Carvalho	c061ec8d1c	test: Fix max_ongoing_compaction_test test ``` DEBUG 2024-07-03 00:59:58,291 [shard 0:main] compaction_manager - Compaction task 0x51800002a480 for table tests.3 compaction_group=0 [0x503000062050]: switch_state: none -> pending: pending=2 active=0 done=0 errors=0 DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - Checking droppable sstables in tests.3, candidates=0 DEBUG 2024-07-03 01:00:02,868 [shard 0:main] compaction - time_window_compaction_strategy::newest_bucket: now 1720314000000000 buckets = { key=1720314000000000, size=2 key=1720310400000000, size=2 1720314000000000: GMT: Sunday, July 7, 2024 1:00:00 AM 1720310400000000: GMT: Sunday, July 7, 2024 12:00:00 AM ``` the test failed to complete when ran across different clock hours, as it expected all sstables produced to belong to same window of 1h size. let's fix it by reusing timestamps, so it's always consistent. Fixes #13280. Fixes #18564. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19749	2024-07-16 07:29:10 +03:00
Kefu Chai	c911832ed9	github: do not run clang-tidy as a cron job we already run it for every pull request, so no need to run it periodically. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-15 19:19:49 +08:00
Kefu Chai	dc189c67a6	github: disable scheduled workflow on forks as these workflows are scheduled periodically, and if they fail, notifications are sent to the repo's owner. to minimize the surprises to the contributors using github, let's disable these workflows on fork repos. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-15 19:19:28 +08:00
Emil Maskovsky	144794a952	raft: Fix crash in leader_host API handler The leader_host API handler was eventually using the `req` unique_ptr after it has been already destroyed (passed down to the future lambda by reference). This was causing an occassional crash in some tests. Reworked the leader_host handler to use the req only outside of the future lambda. Also updated the code to handle the possibility that the non-default leader group (other than Group 0) might reside on a different shard than the shard 0 - using the same concept of calling on all shards via `invoke_on_all()` as done for the other requests. Fixes scylladb/scylladb#19714 Closes scylladb/scylladb#19715	2024-07-15 11:06:56 +02:00
Marcin Maliszkiewicz	395dec35c1	cql3: functions: replace template with std::function in with_udf_iter() Templates are slower to compile and more difficult to read, in this case generalization is not needed and can be replaced by std::function.	2024-07-15 09:39:20 +02:00
Marcin Maliszkiewicz	85d38e013c	cql3: functions: improve functions class constness handling Declares getters as const methods. Makes instance() function return const object so that it may only be modified via change_batch class.	2024-07-15 09:39:20 +02:00
Marcin Maliszkiewicz	b9861c0bb7	Revert "cql3: functions: make modification functions accessible only via batch class" This reverts commit `3f1c2fecc2`. This access control property will be implemented differently (by using const) in subsequent commit hence revert.	2024-07-15 09:39:20 +02:00
Dawid Medrek	7301a96ff4	db/hints: Coroutinize `hint_endpoint_manager::store_hint()`	2024-07-15 04:15:25 +02:00
Avi Kivity	c11f2c9bcd	Merge 'scylla-housekeeping: fix exception on parsing version string v2' from Takuya ASADA This reverts `65fbf72ed0` and introduce new version of the patch which fixes SCT breakage after the commit merged. ---- Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to pass acceptable version string to parse_version() like '6.1.0.dev0', which is allowed on Python version scheme. reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes https://github.com/scylladb/scylladb/issues/19564 Closes https://github.com/scylladb/scylladb/pull/19572 Closes scylladb/scylladb#19670 * github.com:scylladb/scylladb: scylla-housekeeping: fix exception on parsing version string Revert "scylla-housekeeping: fix exception on parsing version string"	2024-07-14 16:24:41 +03:00
Raphael S. Carvalho	8df7f78969	replica: rename for_each_const_compaction_group() use same name as non-const-qualified variant, by relying on overloading. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-12 16:33:34 -03:00
Raphael S. Carvalho	518677d7f9	replica: Fix comment about compaction group there's not a 1:1 relationship between compaction group count and tablet count. a tablet replica has a storage group instance, which may map to multiple compaction groups during split mode. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-12 16:24:51 -03:00
Raphael S. Carvalho	f139aa1df6	replica: remove unused compaction_group_vector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-12 16:16:47 -03:00
Botond Dénes	53a6ec05ed	Merge 'replica: remove rwlock for protecting iteration over storage group map' from Raphael "Raph" Carvalho rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Fixes #18821. ``` WRITE ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets --write - BEFORE 65559.52 tps ( 59.6 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52841 insns/op, 30946 cycles/op, 0 errors) 67408.05 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53018 insns/op, 30874 cycles/op, 0 errors) 67714.72 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53026 insns/op, 30881 cycles/op, 0 errors) 67825.57 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53015 insns/op, 30821 cycles/op, 0 errors) 67810.74 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53009 insns/op, 30828 cycles/op, 0 errors) throughput: mean=67263.72 standard-deviation=967.40 median=67714.72 median-absolute-deviation=547.02 maximum=67825.57 minimum=65559.52 instructions_per_op: mean=52981.61 standard-deviation=79.09 median=53014.96 median-absolute-deviation=36.54 maximum=53025.79 minimum=52840.56 cpu_cycles_per_op: mean=30869.90 standard-deviation=50.23 median=30874.06 median-absolute-deviation=42.11 maximum=30945.94 minimum=30820.89 - AFTER 65448.76 tps ( 59.5 allocs/op, 16.4 logallocs/op, 14.3 tasks/op, 52788 insns/op, 31013 cycles/op, 0 errors) 67290.83 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30950 cycles/op, 0 errors) 67646.81 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53025 insns/op, 30909 cycles/op, 0 errors) 67565.90 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 53058 insns/op, 30951 cycles/op, 0 errors) 67537.32 tps ( 59.3 allocs/op, 16.0 logallocs/op, 14.3 tasks/op, 52983 insns/op, 30963 cycles/op, 0 errors) throughput: mean=67097.93 standard-deviation=931.44 median=67537.32 median-absolute-deviation=467.97 maximum=67646.81 minimum=65448.76 instructions_per_op: mean=52975.85 standard-deviation=108.07 median=53024.55 median-absolute-deviation=49.45 maximum=53057.99 minimum=52788.49 cpu_cycles_per_op: mean=30957.17 standard-deviation=37.43 median=30951.31 median-absolute-deviation=7.51 maximum=31013.01 minimum=30908.62 READ ===== ./build/release/scylla perf-simple-query --smp 1 --memory 2G --initial-tablets 10 --tablets - BEFORE 79423.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41840 insns/op, 26820 cycles/op, 0 errors) 81076.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41837 insns/op, 26583 cycles/op, 0 errors) 80927.36 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41829 insns/op, 26629 cycles/op, 0 errors) 80539.44 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41841 insns/op, 26735 cycles/op, 0 errors) 80793.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41864 insns/op, 26662 cycles/op, 0 errors) throughput: mean=80551.99 standard-deviation=661.12 median=80793.10 median-absolute-deviation=375.37 maximum=81076.70 minimum=79423.36 instructions_per_op: mean=41842.20 standard-deviation=13.26 median=41840.14 median-absolute-deviation=5.68 maximum=41864.50 minimum=41829.29 cpu_cycles_per_op: mean=26685.88 standard-deviation=93.31 median=26662.18 median-absolute-deviation=56.47 maximum=26820.08 minimum=26582.68 - AFTER 79464.70 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41799 insns/op, 26761 cycles/op, 0 errors) 80954.58 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41803 insns/op, 26605 cycles/op, 0 errors) 81160.90 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41811 insns/op, 26555 cycles/op, 0 errors) 81263.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41814 insns/op, 26527 cycles/op, 0 errors) 81162.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 41806 insns/op, 26549 cycles/op, 0 errors) throughput: mean=80801.25 standard-deviation=755.54 median=81160.90 median-absolute-deviation=361.72 maximum=81263.10 minimum=79464.70 instructions_per_op: mean=41806.47 standard-deviation=5.85 median=41806.05 median-absolute-deviation=4.05 maximum=41813.86 minimum=41799.36 cpu_cycles_per_op: mean=26599.22 standard-deviation=94.84 median=26554.54 median-absolute-deviation=50.51 maximum=26761.06 minimum=26527.05 ``` Closes scylladb/scylladb#19469 * github.com:scylladb/scylladb: replica: remove rwlock for protecting iteration over storage group map replica: get rid of fragile compaction group intrusive list	2024-07-12 15:45:36 +03:00
Dawid Medrek	3e02e66ca8	db/hints: Move a constant value to the TU it's used in Until now, the constant `HINT_FILE_WRITE_TIMEOUT` was declared as a static member of `db::hints::manager`. However, the constant is only ever used in one translation unit, so it makes more sense to move it there and not include boilerplate in a header.	2024-07-12 13:08:33 +02:00
Piotr Dulikowski	3cdf549da2	Merge 'remove utils::in' from Avi Kivity utils::in uses std::aligned_storage, which is deprecated. Rather than fixing it, replace its only user with simpler code and remove it. No backport needed as this isn't fixing a bug. Closes scylladb/scylladb#19683 * github.com:scylladb/scylladb: utils: remove utils/in.hh gossiper: remove initializer-list overload of add_local_application_state()	2024-07-12 12:06:09 +02:00
Takuya ASADA	373a7825b5	scylla-housekeeping: fix exception on parsing version string Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to pass acceptable version string to parse_version() like '6.1.0.dev0', which is allowed on Python version scheme. Also, release canditate version like '6.0.0~rc3' has same issue, it should be replaced to '6.0.0rc3' to compare in parse_version(). reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes #19564 Closes scylladb/scylladb#19572	2024-07-12 03:23:34 +09:00
Takuya ASADA	db04f8b16e	Revert "scylla-housekeeping: fix exception on parsing version string" This reverts commit `65fbf72ed0`, since it breaks scylla-housekeeping and SCT because the patch modified version string. We shoudn't modify version string directly, need to pass modified string just for parse_version() instead.	2024-07-12 03:23:34 +09:00
Emil Maskovsky	b9abad0515	test: raft: fix the topology failure recovery test flakiness Setting the error condition for all nodes in the cluster to avoid having to check which one is the coordinator. This should make the test more stable and avoid the flakiness observed when the coordinator node is the one that got the error condition injected. Randomizing the retrieved running servers to reproduce the issue more frequently and to avoid making any assumptions about the order of the servers. Note that only the "raft_topology_barrier_fail" needs to run on a non-coordinator node, the other error "stream_ranges_fail" can be injected on any node (including the coordinator). Fixes: scylladb/scylladb#18614 Closes scylladb/scylladb#19663	2024-07-11 16:23:26 +02:00
Piotr Dulikowski	188b4ac0fc	Merge 'service_level_controller: update configuration on raft change' from Michał Jadwiszczak This patch is a follow-up to scylladb/scylladb#16585. Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s). Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids. Fixes: scylladb/scylladb#18060 Closes scylladb/scylladb#18758 * github.com:scylladb/scylladb: test: remove `sleep()`s which were required to reload service levels configuration test/cql_test_env: remove unit test service levels data accessors service/storage_service: reload SL cache on topology_state_load() service/qos/service_level_controller: move semaphore breaking to stop service/qos/service_level_controller: maybe start and stop legacy update loop service/qos/service_level_controller: make update loop legacy raft/group0_state_machine: update submodules based on table_id service/storage_service: add a proxy method to reload sl cache	2024-07-11 16:18:48 +02:00
Kefu Chai	2a1c9ed7cb	github: use needs.read-toolchain.outputs.image for iwyu's container in `9a71543fd2`, we introduced a regression, which failed to use the proper value for the container image in which the iwyu workflow is run. in this change, we pass the correct value, as we do in clang-tidy.yaml workflow. Refs `9a71543fd2` Fixes scylladb/scylladb#19704 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19697	2024-07-11 17:17:37 +03:00
Michał Chojnowski	1a8ee69a43	sstables/mx/writer: when creating local_compression, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the compressor objects based on its own schema, but using them based based on the sstable's schema the sstable's schema. This patch forces the writer to use the sstable's schema for both.	2024-07-11 12:53:54 +02:00
Michał Chojnowski	d10b38ba5b	sstables/mx/writer: when creating filter, use the sstables's schema, not the writer's There are two schema's associated with a sstable writer: the sstable's schema (i.e. the schema of the table at the time when the sstable object was created), and the writer's schema (equal to the schema of the reader which is feeding into the writer). It's easy to mix up the two and break something as a result. The writer's schema is needed to correctly interpret and serialize the data passing through the writer, and to populate the on-disk metadata about the on-disk schema. The sstables's schema is used to configure some parameters for newly created sstable, such as bloom filter false positive ratio, or compression. The problem fixed by this patch is that the writer was wrongly creating the filter based on its own schema, while the layer outside the writer was interpreting it as if it was created with the sstable's schema. This patch forces the writer to pick the filter's parameters based on the sstable's schema instead.	2024-07-11 12:53:54 +02:00
Michał Chojnowski	a1834efd82	sstables: for i_filter downcasts, use dynamic_cast instead of static_cast As of this patch, those static_casts are actually invalid in some cases (they cast to the wrong type) because of an oversight. A later patch will fix that. But to even write a reliable reproducer for the problem, we must force the invalid casts to manifest as a crash (instead of weird results). This patch both allows writing a reproducer for the bug and serves as a bit of defensive programming for the future.	2024-07-11 12:53:54 +02:00
Tomas Nozicka	26466a3043	Allow configuring default loglevel with args for container images Closes scylladb/scylladb#19671	2024-07-11 12:37:53 +03:00
Piotr Dulikowski	19c5e1807c	Merge 'schema: fix describe of indexes on collections' from Michał Jadwiszczak If the index was created on collection (both frozen or not), its description wasn't a correct create statement. This patch fixes the bug and includes functions like `full()`, `keys()`, `values()`, ... used to create index on collections. Fixes scylladb/scylladb#19278 Closes scylladb/scylladb#19381 * github.com:scylladb/scylladb: cql-pytest/test_describe: add a test for describe indexes schema/schema: fix column names in index description	2024-07-11 09:11:01 +02:00
Kefu Chai	9a71543fd2	github: always use the tools/toolchain/image for lint workflows instead of hardwiring the toolchain image in github workflows, read it from `tools/toolchain/image`. a dedicated reusable workflow is added to read from this file, and expose its content with an output parameter. also, switch iwyu.yaml workflow to this image, more maintainable this way. please note, before this change, we are also using the latest stable build of clang, and since fedora 40 is also using the clang 18, so the behavior is not change. but with this change, we don't have the flexibility of using other clang versions provided https://apt.llvm.org in future. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19655	2024-07-10 23:45:35 +03:00
Avi Kivity	65a7fc9902	Merge 'transport, service: move definition of destructors into .cc' from Kefu Chai this changeset includes two changes: - service: move storage_service::~storage_service() into .cc - transport: move the cql_server::~cql_server() into .cc they intends to address the compile failures when building scylladb with clang-19. clang-19 is more picky when generating the defaulted destructors with incomplete types. but its behavior makes sense regarding to standard compliance. so let's update accordingly. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19668 * github.com:scylladb/scylladb: transport: move the cql_server::~cql_server() into .cc service: move storage_service::~storage_service() into .cc	2024-07-10 23:43:16 +03:00
Kefu Chai	06ba523818	sstable: extract file_writer out `sstables::write()` has multiple overloads, which are defined in `sstables/writer.hh`. two of these overloads are template functions, which have a template parameter named `W`, which has a type constraint requiring it to fulfill the `Writer` concept. but in `types.hh`, when the compiler tries to instantiate the template function with signature of `write(sstable_version_types v, W& out, const T& t)` with `file_writer` as the template parameter of `w`, `file_writer` is only forward-declared using `class file_writer` in the same header file, so this type is still an incomplete type at that moment. that's why the compiler is not able to determine if `file_writer` fulfills the constraint or not. actually, the declaration of `file_writer` is located in `sstables/writer.hh`, which in turn includes `types.hh`. so they form a cyclic dependency. in this change, in order to break this cycle, we extract file_writer out into a separate header file, so that both `sstables/writer.hh` and `sstables/types.hh` can include it. this address the build failure. Fixes scylladb/scylladb#19667 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19669	2024-07-10 23:32:47 +03:00
Michał Chojnowski	fdd8b03d4b	scylla-gdb.py: add $coro_frame() Adds a convenience function for inspecting the coroutine frame of a given seastar task. Short example of extracting a coroutine argument: ``` (gdb) p $coro_frame(seastar::local_engine->_current_task) $1 = { __resume_fn = 0x2485f80 <sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&)>, ... PointerType_7 = 0x601008e67880, ... __coro_index = 0 '\000' ... (gdb) p $downcast_vptr($->PointerType_7) $2 = (schema ) 0x601008e67880 ``` Closes scylladb/scylladb#19479	2024-07-10 21:46:27 +03:00
Avi Kivity	45e27c0da2	config, enum_option: allow round-trip string conversion The default configuration for replication_strategy_warn_list is ["SimpleStrategy"], but one cannot set this via CQL: cqlsh> select * from system.config where name = 'replication_strategy_warn_list'; name \| source \| type \| value --------------------------------+---------+---------------------------+-------------------- replication_strategy_warn_list \| default \| replication strategy list \| ["SimpleStrategy"] (1 rows) cqlsh> update system.config set value = '[NetworkTopologyStrategy]' where name = 'replication_strategy_warn_list'; cqlsh> select * from system.config where name = 'replication_strategy_warn_list'; name \| source \| type \| value --------------------------------+--------+---------------------------+----------------------------- replication_strategy_warn_list \| cql \| replication strategy list \| ["NetworkTopologyStrategy"] (1 rows) cqlsh> update system.config set value = '["NetworkTopologyStrategy"]' where name = 'replication_strategy_warn_list'; WriteFailure: Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed for system.config - received 0 responses and 1 failures from 1 CL=ONE." info={'consistency': 'ONE', 'required_responses': 1, 'received_responses': 0, 'failures': 1} Fix by allowing quotes in enum_set parsing. Bug present since `8c464b2ddb` ("guardrails: restrict replication strategy (RS)", 6.0). Fixes #19604. Closes scylladb/scylladb#19605	2024-07-10 20:39:01 +03:00
Yaron Kaikov	e33126fc3e	.github/script/label_promoted_commit.py: add label only if ref is PR we got a failure during check-commit action: ``` Run python .github/scripts/label_promoted_commits.py --commit_before_merge `30e82a81e8` --commit_after_merge `f31d5e3204` --repository scylladb/scylladb --ref refs/heads/master Commit sha is: `d5a149fc01` Commit sha is: `415457be2b` Commit sha is: `d3b1ccd03a` Commit sha is: `1fca341514` Commit sha is: `f784be6a7e` Commit sha is: `80986c17c3` Commit sha is: `492d0a5c86` Commit sha is: `7b3f55a65f` Commit sha is: `78d6471ce4` Commit sha is: `7a69d9070f` Commit sha is: `a9e985fcc9` master branch, pr number is: 19213 Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 87, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/label_promoted_commits.py", line 81, in main pr = repo.get_pull(pr_number) File "/usr/lib/python3/dist-packages/github/Repository.py", line 2746, in get_pull headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/pulls/pulls#get-a-pull-request", "status": "404"} Error: Process completed with exit code 1. ``` The reason for this failure is since in one of the promoted commits (`a9e985fcc9`) had a reference of `Closes` to an issue. Fixes: https://github.com/scylladb/scylladb/issues/19677 Closes scylladb/scylladb#19678	2024-07-10 15:27:12 +03:00
Botond Dénes	9bdcba7a46	Merge 'conf: scylla.yaml: update documentation for tablets' from Benny Halevy Tablets are no longer in experimental_features since `83d491a`, so remove them from the experimental_features section documentation. Also, expand the documentation for the `enable_tablets` option. Fixes #19456 Needs backport to 6.0 Closes scylladb/scylladb#19516 * github.com:scylladb/scylladb: conf: scylla.yaml: enable_tablets: expand documentation conf: scylla.yaml: remove tablets from experimental_features doc comment	2024-07-10 14:32:40 +03:00
Avi Kivity	8b7a2661c1	utils: remove utils/in.hh It uses deprecated std::aligned_storage and had only one user (now removed) rather than maintain it, remove.	2024-07-10 14:11:27 +03:00
Avi Kivity	d50ba03965	gossiper: remove initializer-list overload of add_local_application_state() The initializer_list overload uses a too-clever technique to avoid copies. While copies here are unlikely to pose any real problem (we're allocating map nodes anyway), it's simple enough to provide a copy-less replacement that doesn't require questionable tricks. We replace the initializer_list<..., in<>> overload with a variadic template that constructs a temporary map.	2024-07-10 14:11:27 +03:00
Michał Jadwiszczak	375499b727	test: remove `sleep()`s which were required to reload service levels configuration Previously, some service levels tests requires to sleep in order to ensure in-memory configuration of service levels was updated. Now, when we are updating the configuration as the raft log is applied, doing read barrier (for instance to execute `DROP TABLE IF EXISTS non_existing_table`) is enough and the sleeps are not needed.	2024-07-10 10:42:21 +02:00
Michał Jadwiszczak	23bebb8037	test/cql_test_env: remove unit test service levels data accessors Unit test data accessors were created to avoid starting update loop in unit test and to update controller's configuration directly. With raft data accessor and configuration updates on applying raft log, we can get rid of unit test data accessors and use the raft one. This also make unit test env a bit like real Scylla environment.	2024-07-10 10:42:21 +02:00
Michał Jadwiszczak	de857d9ce3	service/storage_service: reload SL cache on topology_state_load() Since SL cache is no longer updated in a loop, it needs to be initialized on startup and because we are updating the cache while applying raft commands, we can initialize it on topology_state_load().	2024-07-10 10:42:20 +02:00
Jadw1	cf29242962	service/qos/service_level_controller: move semaphore breaking to stop Before this, the notification semaphore was broken() in do_abort(), which was triggered by early abort source. However we are going to reload sl cache on topology state reload and it can happen after the early abort source is triggered, so it may throw broken_semaphore exception. We can move semaphore breaking to stop() method. Legacy update loop is still stopped in do_abort(), so it doesn't change the order of service level controller shutdown.	2024-07-10 10:33:24 +02:00
Michał Jadwiszczak	85119b90df	service/qos/service_level_controller: maybe start and stop legacy update loop In previous commit, we marked the update loop as legacy. For compatibility reasons, we need to start legacy update loop when the cluster is in recovery mode or it hasn't been upgraded to raft topology. Then, in the update loop we check if all conditions are met and stop the loop. This commit also moves start of update loop later (after topology state is loaded) in main.cc. There is no risk in doing it later.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	b0f76db9f2	service/qos/service_level_controller: make update loop legacy Rename method which started update loop to better reflect what it does. Previously the method was named `update_from_distributed_data`, however it doesn't update anything but only start the update loop, which we are making legacy.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	5ddf5e3d7d	raft/group0_state_machine: update submodules based on table_id We want to update service levels cache when any new mutations are applied to service levels table. To not create new raft command type, this commit changes design of `write_mutations` to updated in-memory structures based on mutations' table_id.	2024-07-10 10:23:04 +02:00
Michał Jadwiszczak	b61047a3f8	service/storage_service: add a proxy method to reload sl cache In this series of patches, we want to reload service levels cache when any changes to SL table are applied. Firstly we need to have a way to trigger reload of the cache from `group0_state_machines`. To not introduce another dependency, we can use `storage_service` (which has access to SL controller) and add a proxy method to it.	2024-07-10 10:23:04 +02:00
Nadav Har'El	c6cffe36dd	Merge 'cql: forbid having counter columns in tablets tables' from Piotr Smaron Counter updates break under tablet migration (#18180), and for this reason counters need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters: 1) Adding a counter column to a table without counters: ``` cqlsh> ALTER TABLE temp.cf ADD (col_name counter); ConfigurationException: Cannot add a counter column (col_name) in a non counter column family ``` 2) Altering a column to be of the counter type: ``` cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter; ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible. ``` Fixes: #19449 Fixes: https://github.com/scylladb/scylladb/issues/18876 Need to backport to 6.0, as this is broken there. Closes scylladb/scylladb#19518 * github.com:scylladb/scylladb: doc: add notes to feature pages which don't support tablets cql: adjust warning about tablets cql: forbid having counter columns in tablets tables	2024-07-10 10:18:30 +03:00
Michał Jadwiszczak	b65a4c66f0	cql-pytest/test_describe: add a test for describe indexes	2024-07-10 07:14:46 +02:00
Kefu Chai	7e4e685964	transport: move the cql_server::~cql_server() into .cc because transport/server.cc has the complete definition of event_notifier, the compiler can default-generate the destructor of `cql_server` with the necessary information. otherwise, clang-19 would fail to build, like: ``` FAILED: CMakeFiles/scylla.dir/Dev/main.cc.o /home/kefu/.local/bin/clang++ -DBOOST_PROGRAM_OPTIONS_DYN_LINK -DBOOST_PROGRAM_OPTIONS_NO_LIB -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT CMakeFiles/scylla.dir/Dev/main.cc.o -MF CMakeFiles/scylla.dir/Dev/main.cc.o.d -o CMakeFiles/scylla.dir/Dev/main.cc.o -c /home/kefu/dev/scylladb/main.cc In file included from /home/kefu/dev/scylladb/main.cc:11: In file included from /usr/include/yaml-cpp/yaml.h:10: In file included from /usr/include/yaml-cpp/parser.h:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql_transport::cql_server::event_notifier' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql_transport::cql_server::event_notifier>::operator()' requested here 398 \| get_deleter()(std::move(__ptr)); \| ^ /home/kefu/dev/scylladb/transport/server.hh:135:7: note: in instantiation of member function 'std::unique_ptr<cql_transport::cql_server::event_notifier>::~unique_ptr' requested here 135 \| class cql_server : public seastar::peering_sharded_service<cql_server>, public generic_server::server { \| ^ /home/kefu/dev/scylladb/transport/server.hh:135:7: note: in implicit destructor for 'cql_transport::cql_server' first required here /home/kefu/dev/scylladb/transport/server.hh:149:11: note: forward declaration of 'cql_transport::cql_server::event_notifier' 149 \| class event_notifier; \| ^ 1 error generated. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-10 12:52:51 +08:00
Kefu Chai	79ffde063a	service: move storage_service::~storage_service() into .cc as repair/repair.cc has the complete definition of node_ops_meta_data, the compiler can default-generate the destructor of `storage_service` with the necessary information. otherwise, clang-19 would fail to build, like: ``` FAILED: repair/CMakeFiles/repair.dir/Dev/repair.cc.o /home/kefu/.local/bin/clang++ -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Dev\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -O2 -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/Dev/repair.cc.o -MF repair/CMakeFiles/repair.dir/Dev/repair.cc.o.d -o repair/CMakeFiles/repair.dir/Dev/repair.cc.o -c /home/kefu/dev/scylladb/repair/repair.cc In file included from /home/kefu/dev/scylladb/repair/repair.cc:9: In file included from /home/kefu/dev/scylladb/repair/repair.hh:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:291:11: error: field has incomplete type 'service::node_ops_meta_data' 291 \| _T2 second; ///< The second member \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/ext/aligned_buffer.h:93:28: note: in instantiation of template class 'std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>' requested here 93 \| : std::aligned_storage<sizeof(_Tp), __alignof__(_Tp)> \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:334:43: note: in instantiation of template class '__gnu_cxx::__aligned_buffer<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here 334 \| __gnu_cxx::__aligned_buffer<_Value> _M_storage; \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:373:7: note: in instantiation of template class 'std::__detail::_Hash_node_value_base<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>' requested here 373 \| : _Hash_node_value_base<_Value> \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:1662:21: note: in instantiation of template class 'std::__detail::_Hash_node_value<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, false>' requested here 1662 \| ._M_bucket_index(declval<const __node_value_type&>(), \| ^ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:109:11: note: in instantiation of member function 'std::_Hashtable<utils::tagged_uuid<node_ops_id_tag>, std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>, std::allocator<std::pair<const utils::tagged_uuid<node_ops_id_tag>, service::node_ops_meta_data>>, std::__detail::_Select1st, std::equal_to<utils::tagged_uuid<node_ops_id_tag>>, std::hash<utils::tagged_uuid<node_ops_id_tag>>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true>>::~_Hashtable' requested here 109 \| class unordered_map \| ^ /home/kefu/dev/scylladb/service/storage_service.hh:109:7: note: forward declaration of 'service::node_ops_meta_data' 109 \| class node_ops_meta_data; \| ^ In file included from /home/kefu/dev/scylladb/repair/repair.cc:9: In file included from /home/kefu/dev/scylladb/repair/repair.hh:11: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/unordered_map:41: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unordered_map.h:33: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable.h:35: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/hashtable_policy.h:34: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/tuple:38: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_pair.h:60: ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-10 12:52:51 +08:00
Michał Jadwiszczak	253feb6811	schema/schema: fix column names in index description Previously description of index didn't include functions for indexes on collections like full(), keys(), values(), etc...	2024-07-09 22:37:05 +02:00
Raphael S. Carvalho	c539b7c861	replica: remove rwlock for protecting iteration over storage group map rwlock was added to protect iterations against concurrent updates to the map. the updates can happen when allocating a new tablet replica or removing an old one (tablet cleanup). the rwlock is very problematic because it can result in topology changes blocked, as updating token metadata takes the exclusive lock, which is serialized with table wide ops like split / major / explicit flush (and those can take a long time). to get rid of the lock, we can copy the storage group map and guard individual groups with a gate (not a problem since map is expected to have a maximum of ~100 elements). so cleanup can close that gate (carefully closed after stopping individual groups such that migrations aren't blocked by long-running ops like major), and ongoing iterations (e.g. triggered by nodetool flush) can skip a group that was closed, as such a group is being migrated out. Check documentation added to compaction_group.hh to understand how concurrent iterations and updates to the map work without the rwlock. Yielding variants that iterate over groups are no longer returning group id since id stability can no longer be guaranteed without serializing split finalization and iteration. Fixes #18821. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:59:24 -03:00
Raphael S. Carvalho	ad5c5bca5f	replica: get rid of fragile compaction group intrusive list It was added to make integration of storage groups easier, but it's complicated since it's another source of truth and we could have problems if it becomes inconsistent with the group map. Fixes #18506. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-07-09 16:53:35 -03:00
Piotr Smaron	531659f8dc	doc: add notes to feature pages which don't support tablets There's already a page which lists which features are not working with tablets: architecture/tablets.html#limitations-and-unsupported-features, but it's also helpful for users to be warned about this when visiting a specific feature doc page.	2024-07-09 18:18:05 +02:00
Avi Kivity	f31d5e3204	Merge 'repair/streaming: enable toggling tombstone gc with a config item' from Botond Dénes We currently disable tombstone GC for compaction done on the read path of streaming and repair, because those expired tombstones can still prevent data resurrection. With time-based tombstone GC, missing a repair for long enough can cause data resurrection because a tombstone is potentially GC'd before it could be spread to every node by repair. So repair disseminating these expired tombstones helps clusters which missed repair for long enough. It is not a guarantee because compaction could have done the GC itself, but it is better than nothing. This last resort is getting less important with repair-based tombstone GC. Furthermore, we have seen this cause huge repair amplification in a cluster, where expired tombstones triggered repair replicating otherwise identical rows. This series makes tombstone GC on the streaming/repair compaction path configurable with a config item. This new config item defaults to `false` (current behaviour), setting it to `true`, will enable tombstone GC. Fixes: https://github.com/scylladb/scylladb/issues/19015 Not a regression, no backport needed Closes scylladb/scylladb#19016 * github.com:scylladb/scylladb: test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming() db/config: introduce enable_tombstone_gc_for_streaming_and_repair	2024-07-09 19:04:11 +03:00
Piotr Smaron	5bfabff9a0	cql: adjust warning about tablets Made it shorter, simpler and mentioned also that counters aren't supported with tablets. Fixes: #18876	2024-07-09 18:01:37 +02:00
Piotr Smaron	c70f321c6f	cql: forbid having counter columns in tablets tables Counter updates break under tablet migration (#18180), and for this reason they need to be disabled until the problem is fixed. It's enough to forbid creating a table with counters, as altering a table without counters already cannot result in the table having counters: 1) Adding a counter column to a table without counters: ``` cqlsh> ALTER TABLE temp.cf ADD (col_name counter); ConfigurationException: Cannot add a counter column (col_name) in a non counter column family ``` 2) Altering a column to be of the counter type: ``` cqlsh> ALTER TABLE temp.cf ALTER col_name TYPE counter; ConfigurationException: Cannot change col_name from type int to type counter: types are incompatible. ``` Fixes: #19449	2024-07-09 18:01:31 +02:00
Patryk Wrobel	a89e3d10af	code-cleanup: add missing header guards The following command had been executed to get the list of headers that did not contain '#pragma once': 'grep -rnw . -e "#pragma once" --include *.hh -L' This change adds missing include guard to headers that did not contain any guard. Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com> Closes scylladb/scylladb#19626	2024-07-09 18:31:35 +03:00
Calle Wilund	8295980d14	commitlog: Make max data lifetime runtime-configurable	2024-07-09 12:30:49 +00:00
Calle Wilund	0c6679e55f	db::config: Expose commitlog_max_data_lifetime_in_s parameter To allow user control of commitlog time based expiry. Set to 24h initially.	2024-07-09 12:30:48 +00:00
Calle Wilund	55d6afda6e	commitlog: Add optional max lifetime parameter to cl instance If set, any remaining segment that has data older than this threshold will request flushing, regardless of data pressure. I.e. even a system where nothing happends will after X seconds flush data to free up the commit log.	2024-07-09 12:30:48 +00:00
Takuya ASADA	cae999c094	toolchain: change optimized clang install method to standard one Previously optimized clang installation was not used standard build script, it overwrites preinstalled Fedora's clang binaries instead. However this breaks on clang-18.1.8, since libLTO versioning convention. To avoid such problem, let's switch to standard installation method and swith install prefix to /usr/local. Fixes #19203 Closes scylladb/scylladb#19505	2024-07-09 14:22:42 +03:00
Tomasz Grabiec	252110bc54	Merge 'mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion' from Michał Chojnowski apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes https://github.com/scylladb/scylladb/issues/19552 Closes scylladb/scylladb#19617 * github.com:scylladb/scylladb: mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion logalloc: add hold_reserve logalloc: generalize refill_emergency_reserve()	2024-07-09 13:09:01 +02:00
Anna Stuchlik	948459b1ac	doc: replace a link on the CDC+Kafka page This commit replaces a link to the installation section with a link to the getting started section. Closes scylladb/scylladb#19658	2024-07-09 12:35:43 +03:00
Michael Litvak	ed33e59714	storage_proxy: remove response handler if no targets When writing a mutation, it might happen that there are no live targets to send the mutation to, yet the request can be satisfied. For example, when writing with CL=ANY to a dead node, the request is completed by storing a local hint. Currently, in that case, a write response handler is created for the request and it remains active until it timeouts because it is not removed anywhere, even though the write is completed successfuly after storing the hint. The response handler should be removed usually when receiving responses from all targets, but in this case there are no targets to trigger the removal. In this commit we check if we don't have live targets to send the mutation to. If so, we remove the response handler immediately. Fixes scylladb/scylladb#19529 Closes scylladb/scylladb#19586	2024-07-09 12:11:05 +03:00
Kamil Braun	98c18d8904	Merge 'Add API for read barrier' from Emil Maskovsky Introduce REST API for triggering a read barrier. This is to make sure the database schema is up to date on the node where the read barrier is triggered. One of the use cases is the database backup via the Scylla Manager, which requires that the schema backed up is matching the data or newer (data can be migrated, but an older schema would cause issues). Fixes scylladb/scylladb#19213 Closes scylladb/scylladb#19597 * github.com:scylladb/scylladb: raft: add the read barrier REST API raft: use `raft_timeout` in trigger_snapshot raft: use bad_param_exception for consistency test: raft: verify schema updated after read barrier	2024-07-09 10:58:21 +02:00
Kefu Chai	6af989782c	test: sstable_directory_test: use THREADSAFE_BOOST_REQUIRE_EQUAL when appropriate for better debugging experience. before this change, we have ``` fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed ``` after this change, we have ``` fatal error: in "sstable_directory_test_generation_sanity": critical check sst->generation() == sst1->generation() has failed [3ghm_0ntw_29vj625yegw7jodysc != 3ghm_0ntw_29vj625yegw7jodysd] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19639	2024-07-09 10:54:23 +03:00
Kefu Chai	30e82a81e8	test: do not define boost_test_print_type() for types with operator<< before this change, we provide `boost_test_print_type()` for all types which can be formatted using {fmt}. these types includes those who fulfill the concept of range, and their element can be formatted using {fmt}. if the compilation unit happens to include `fmt/ranges.h`. the ranges are formatted with `boost_test_print_type()` as well. this is what we expect. in other words, we use {fmt} to format types which do not natively support {fmt}, but they fulfill the range concept. but `boost::unit_test::basic_cstring` is one of them - it can be formatted using operator<<, but it does not provide fmt::format specialization - it fulfills the concept of range - and its element type is `char const`, which can be formatted using {fmt} that's why it's formatted like: ``` test/boost/sstable_directory_test.cc(317): fatal error: in "sstable_directory_test_generation_sanity": critical check ['s', 's', 't', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')', ' ', '=', '=', ' ', 's', 's', 't', '1', '-', '>', 'g', 'e', 'n', 'e', 'r', 'a', 't', 'i', 'o', 'n', '(', ')'] has failed` ``` where the string is formatted as a sequence-alike container. this is far from readable. so, in this change, we do not define `boost_test_print_type()` for the types which natively support `operator<<` anymore. so they can be printed with `operator<<` when boost::test prints them. Fixes scylladb/scylladb#19637 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19638	2024-07-09 10:34:37 +03:00
Botond Dénes	9544c364be	scylla-gdb.py: introduce scylla large-objects The equivalent of small-objects, but for large objects (spans). Allows listing object of a large-class, and therefore investigating a run-away class, by attempting to identify the owners of the objects in it. Written to investigate #16493 Closes scylladb/scylladb#16711	2024-07-09 10:21:09 +03:00
Emil Maskovsky	a9e985fcc9	raft: add the read barrier REST API This will allow to trigger the read barrier directly via the API, instead of doing work-arounds (like dropping a non-existent table). The intended use-case is in the Scylla Manager, to make sure that the database schema is up to date after the data has been backed up and before attempting to backup the database schema. The database schema in particular is being backed up just on a single node, which might not yet have the schema at least as new as the data (data can be migrated to a newer schema, but not a vice-versa). The read barrier issued on the node should ensure that the node should have the schema at least as new as the data or newer. Closes #19213	2024-07-08 18:16:27 +02:00
Emil Maskovsky	7a69d9070f	raft: use `raft_timeout` in trigger_snapshot Migrate the "trigger_snapshot" to use the standardized `raft_timeout` approach.	2024-07-08 18:13:31 +02:00
Michał Chojnowski	78d6471ce4	mutation_partition_v2: in apply_monotonically(), avoid bad_alloc on sentinel insertion apply_monotonically() is run with reclaim disabled. So with some bad luck, sentinel insertion might fail with bad_alloc even on a perfectly healthy node. We can't deal with the failure of sentinel insertion, so this will result in a crash. This patch prevents the spurious OOM by reserving some memory (1 LSA segment) and only making it available right before the critical allocations. Fixes scylladb/scylladb#19552	2024-07-08 16:08:27 +02:00
Michał Chojnowski	7b3f55a65f	logalloc: add hold_reserve mutation_partition_v2::apply_monotonically() needs to perform some allocations in a destructor, to ensure that the invariants of the data structure are restored before returning. But it is usually called with reclaiming disabled, so the allocations might fail even in a perfectly healthy node with plenty of reclaimable memory. This patch adds a mechanism which allows to reserve some LSA memory (by asking the allocator to keep it unused) and make it available for allocation right when we need to guarantee allocation success.	2024-07-08 16:08:27 +02:00
Wojciech Przytuła	691e245152	storage_proxy: fix uninitialized LWT contention counter When debugging the issue of high LWT contention metric, we (the drivers team) discovered that at least 3 drivers (Go, Java, Rust) cause high numbers in that metrics in LWT workloads - we doubted that all those drivers route LWT queries badly. We tried to understand that metric and its semantics. It took 3 people over 10 hours to figure out what it is supposed to count. People from core team suspected that it was the drivers sending requests to different shards, causing contention. Then we ran the workload against a single node single shard cluster... and observed contention. Finally, we looked into the Scylla code and saw it. Uninitialized stack value. The core member was shocked. But we, the drivers people, felt we always knew it. It's yet another time that we are blamed for a server-side issue. We rebuilt scylla with the variable initialized to 0 and the metric kept being 0. To prevent such errors in the future, let's consider some lints that warn against uninitialized variables. This is such an obvious feature of e.g. Rust, and yet this has shown to be cause a painful bug in 2024. Closes scylladb/scylladb#19625	2024-07-08 16:55:46 +03:00
Emil Maskovsky	492d0a5c86	raft: use bad_param_exception for consistency Replace the `std::runtime_error` by the `bad_param_exception` that is used in other places.	2024-07-08 14:31:11 +02:00
Takuya ASADA	cbf33aba5c	scylla_coredump_setup: install systemd-coredump before has_zstd() On Ubuntu/Debian, we have to install systemd-coredump before running has_ztd(), since it detect ZSTD support by running coredumpctl. Move pkg_install('systemd-coredump') to the head of the script. Fixes #19643 Closes scylladb/scylladb#19648	2024-07-08 15:04:34 +03:00
Kefu Chai	229250ef3e	.github: use scylla-toolchain for newer fmt in `cccec07581`, we started using a featured introduced by {fmt} v10. but we are still using the {fmt} cooked using seastar, and it is 9.1.0, so this breaks the build when running the clang-tidy workflow. in this change, instead of building on ubuntu jammy, we use the scylladb/scylla-toolchain image based on fedora 40, which provides {fmt} v10.2.1. since we are have clang 18 in fedora 40, this change does not sacrifice anything. after this change, clang-tidy workflow should be back to normal. Fixes scylladb/scylladb#19621 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19628	2024-07-08 11:14:02 +02:00
Emil Maskovsky	80986c17c3	test: raft: verify schema updated after read barrier Regression test for #19213.	2024-07-08 10:50:32 +02:00
Piotr Dulikowski	3c535641fd	Merge 'service/storage_proxy: Add metrics keeping track of incoming hints' from Dawid Mędrek Although Scylla already exposes metrics keeping track of various information related to hinted handoff, all of them correspond to either storing or sending hints. However, when debugging, it's also crucial to be aware of how many hints are coming to a given node and what their size is. Unfortunately, the existing metrics are not enough to obtain that information. This PR introduces the following new metrics: * `sent_bytes_total` – the total size of the hints that have been sent from a given shard, * `received_hints_total` – the total number of hints that a given shard has received, * `received_hints_bytes_total` – the total size of the hints a given shard has received. It also renames `hints_manager_sent` to `hints_manager_sent_total` to avoid conflicts of prefixes between that metric and `sent_bytes_total` in tests. Fixes scylladb/scylladb#10987 Closes scylladb/scylladb#18976 * github.com:scylladb/scylladb: db/hints: Add a metric for the size of sent hints service/storage_proxy: Add metrics for received hints	2024-07-08 10:29:53 +02:00
Botond Dénes	56c194e52c	Merge 'compaction: not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19581 * github.com:scylladb/scylladb: .github: add compaction to iwyu's CLEANER_DIR compaction: not include unused headers	2024-07-08 10:03:51 +03:00
Israel Fruchter	32e6725b8e	Update tools/cqlsh submodule * tools/cqlsh 73bdbeb0...86a280a1 (1): > remove cassandra from the shiv package Ref: scylladb/scylla-cqlsh#96 Closes scylladb/scylladb#19558	2024-07-08 10:00:59 +03:00
Michael Litvak	407274e828	view: drain view builder before database The view builder is doing write operations to the database. In order for the view builder to shutdown gracefully without errors, we need to ensure the database can handle writes while it is drained. The commit changes the drain order, so that view builder is drained before the database shuts down. Fixes scylladb/scylladb#18929 Closes scylladb/scylladb#19609	2024-07-05 22:17:40 +03:00
Botond Dénes	103bd8334a	service/paxos/paxos_state: restore resilience against dropped tables Recently, the code in paxos_state::prepare(), paxos_state::accept() and paxos_state::learn() was coroutinized by `58912c2cc1`, `887a5a8f62` and `2b7acdb32c` respectively. This introduced a regression: the latency histogram updater code, was moved from a finally() to a defer(). Unlike the former, the latter runs in a noexcept context so the possible replica::no_such_column_family raised from the latency update code now crashes the node, instead of failing just the paxos operation as before. Fix by only updating the latency histogram if the table still exists. Fixes: scylladb/scylladb#19620 Closes scylladb/scylladb#19623	2024-07-05 14:58:11 +02:00
Anna Stuchlik	8759dfae96	doc: add Run in Docker page to the documentation The page was missing from the docs. I created the page based on the information in the download center (which will be closed down soon) and other ScyllaDB resources. Closes scylladb/scylladb#19577	2024-07-04 20:20:03 +03:00
Dawid Medrek	0e1cb0dc73	db/hints: Add logging when ignoring hint directories In `2446cce`, we stopped trying to attempt to create endpoint managers for invalid hint directories even when their names represented IP addresses or host IDs. In this commit, we add logging informing the user about it. Refs scylladb/scylladb#19173 Closes scylladb/scylladb#19618	2024-07-04 20:14:52 +03:00
Botond Dénes	155acbb306	reader_concurrency_semaphore: execution_loop(): move maybe_admit_waiters() to the inner loop Now that the CPU concurency limit is configurable, new reads might be ready to execute right after the current one was executed. So move the poll for admitting new reads into the inner loop, to prevent the situation where the inner loop yields and a concurrent do_wait_admission() finds that there are waiters (queued because at the time they arrived to the semaphore, the _ready_list was not empty) but it is is possible to admit a new read. When this happens the semaphore will dump diagnostics to help debug the apparent contradiction, which can generate a lot of log spam. Moving the poll into the inner loop prevents the false-positive contradiction detection from firing. Refs: scylladb/scylladb#19017 Closes scylladb/scylladb#19600	2024-07-04 17:47:52 +03:00
Avi Kivity	0626e0487d	Merge 'Add copy on write to functions schema code' from Marcin Maliszkiewicz This is the first patch from series which would allow us to unify raft command code. Property we want to achieve is that all modifications performed by a single raft command can be made visible atomically. This helps to exclude accidental dependencies across subsystem updates and make easier to reason about state. Here we alter functions schema code so that changes are first applied to a copy of declared functions and then made visible atomically. Later work will apply similar strategy to the whole schema. Relates scylladb/scylladb#19153 Closes scylladb/scylladb#19598 * github.com:scylladb/scylladb: cql3: functions: make modification functions accessible only via batch class db: replica: batch functions schema modifications cql3: functions: introduce class for batching functions modifications cql3: functions: make functions class non-static cql3: functions: remove reduntant class access specifiers cql3: functions: remove unused java snippet	2024-07-04 17:40:23 +03:00
Anna Stuchlik	822a58f964	doc: remove support for Debian 10 This PR removes support for Debian 10, which reached end of life on June 30, 2024. Refs https://github.com/scylladb/scylla-enterprise/issues/4377 Closes scylladb/scylladb#19616	2024-07-04 17:24:57 +03:00
Marcin Maliszkiewicz	3f1c2fecc2	cql3: functions: make modification functions accessible only via batch class This is to assure that all the code is using batching	2024-07-04 13:10:26 +02:00
Marcin Maliszkiewicz	32fe101f9d	db: replica: batch functions schema modifications Before each function change was immediately visible as during event notification logic yielded. Now we first gather the modifications and then commit them. Further work will broaden the scope of atomicity to the whole schema and even across other subsystems.	2024-07-04 13:10:26 +02:00
Michał Chojnowski	f784be6a7e	logalloc: generalize refill_emergency_reserve() In the next patch, we will want to do the thing as refill_emergency_reserve() does, just with a quantity different than _emergency_reserve_max. So we split off the shareable part to a new function, and use it to implement refill_emergency_reserve().	2024-07-04 12:19:01 +02:00
Pavel Emelyanov	9a654730a7	tablet_allocator: Put more info into failed-to-drain exception When balancer fails to find a node to balance drained tablets into, it throws an exception with tablet id and node id, but it's also good to know more details about the balancing state that lead to failure refs: #19504 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19588	2024-07-04 12:18:50 +02:00
Marcin Maliszkiewicz	4d937c5a17	cql3: functions: introduce class for batching functions modifications It will hold a temporary shallow copy of declared functions. Then each modification adds/removes/replaces stored function object. At the end change is commited by moving temporary copy to the main functions class instance.	2024-07-04 12:14:36 +02:00
Nadav Har'El	96dff367f8	Merge 'storage_proxy: update view update backlog on correct shard when writing' from Wojciech Mitros This series is another approach of https://github.com/scylladb/scylladb/pull/18646 and https://github.com/scylladb/scylladb/pull/19181. In this series we only change where the view backlog gets updated - we do not assure that the view update backlog returned in a response is necessarily the backlog that increased due to the corresponding write, the returned backlog may be outdated up to 10ms. Because this series does not include this change, it's considerably less complex and it doesn't modify the common write patch, so no particular performance considerations were needed in that context. The issue being fixed is still the same, the full description can be seen below. When a replica applies a write on a table which has a materialized view it generates view updates. These updates take memory which is tracked by `database::_view_update_concurrency_sem`, separate on each shard. The fraction of units taken from the semaphore to the semaphore limit is the shard's view update backlog. Based on these backlogs, we want to estimate how busy a node is with its view updates work. We do that by taking the max backlog across all shards. To avoid excessive cross-shard operations, the node's (max) backlog isn't calculated each time we need it, but up to 1 time per 10ms (the `_interval`) with an optimization where the backlog of the calculating shard is immediately up-to-date (we don't need cross-shard operations for it): ``` update_backlog node_update_backlog::fetch() { auto now = clock::now(); if (now >= _last_update.load(std::memory_order_relaxed) + _interval) { _last_update.store(now, std::memory_order_relaxed); auto new_max = boost::accumulate( _backlogs, update_backlog::no_backlog(), [] (const update_backlog& lhs, const per_shard_backlog& rhs) { return std::max(lhs, rhs.load()); }); _max.store(new_max, std::memory_order_relaxed); return new_max; } return std::max(fetch_shard(this_shard_id()), _max.load(std::memory_order_relaxed)); } ``` For the same reason, even when we do calculate the new node's backlog, we don't read from the `_view_update_concurrency_sem`. Instead, for each shard we also store a update_backlog atomic which we use for calculation: ``` struct per_shard_backlog { // Multiply by 2 to defeat the prefetcher alignas(seastar::cache_line_size * 2) std::atomic<update_backlog> backlog = update_backlog::no_backlog(); need_publishing need_publishing = need_publishing::no; update_backlog load() const { return backlog.load(std::memory_order_relaxed); } }; std::vector<per_shard_backlog> _backlogs; ``` Due to this distinction, the update_backlog atomic need to be updated separately, when the `_view_update_concurrency_sem` changes. This is done by calling `storage_proxy::update_view_update_backlog`, which reads the `_view_update_concurrency_sem` of the shard (in `database::get_view_update_backlog`) and then calls node`_update_backlog::add` where the read backlog is stored in the atomic: ``` void storage_proxy::update_view_update_backlog() { _max_view_update_backlog.add(get_db().local().get_view_update_backlog()); } void node_update_backlog::add(update_backlog backlog) { _backlogs[this_shard_id()].backlog.store(backlog, std::memory_order_relaxed); _backlogs[this_shard_id()].need_publishing = need_publishing::yes; } ``` For this implementation of calculating the node's view update backlog to work, we need the atomics to be updated correctly when the semaphores of corresponding shards change. The main event where the view update backlog changes is an incoming write request. That's why when handling the request and preparing a response we update the backlog calling `storage_proxy::get_view_update_backlog` (also because we want to read the backlog and send it in the response): backlog update after local view updates (`storage_proxy::send_to_live_endpoints` in `mutate_begin`) ``` auto lmutate = [handler_ptr, response_id, this, my_address, timeout] () mutable { return handler_ptr->apply_locally(timeout, handler_ptr->get_trace_state()) .then([response_id, this, my_address, h = std::move(handler_ptr), p = shared_from_this()] { // make mutation alive until it is processed locally, otherwise it // may disappear if write timeouts before this future is ready got_response(response_id, my_address, get_view_update_backlog()); }); }; backlog update after remote view updates (storage_proxy::remote::handle_write) auto f = co_await coroutine::as_future(send_mutation_done(netw::messaging_service::msg_addr{reply_to, shard}, trace_state_ptr, shard, response_id, p->get_view_update_backlog())); ``` Now assume that on a certain node we have a write request received on shard A, which updates a row on shard B (A!=B). As a result, shard B will generate view updates and consume units from its `_view_update_concurrency_sem`, but will not update its atomic in `_backlogs` yet. Because both shards in the example are on the same node, shard A will perform a local write calling `lmutate` shown above. In the `lmutate` call, the `apply_locally` will initiate the actual write on shard B and the `storage_proxy::update_view_update_backlog` will be called back on shard A. In no place will the backlog atomic on shard B get updated even though it increased in size due to the view updates generated there. Currently, what we calculate there doesn't really matter - it's only used for the MV flow control delays, so currently, in this scenario, we may only overload a replica causing failed replica writes which will be later retried as hints. However, when we add MV admission control, the calculated backlog will be the difference between an accepted and a rejected request. Fixes: https://github.com/scylladb/scylladb/issues/18542 Without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Closes scylladb/scylladb#19341 * github.com:scylladb/scylladb: test: add test for view backlog not being updated on correct shard test: move auxiliary methods for waiting until a view is built to util mv: update view update backlog when it increases on correct shard	2024-07-04 11:40:09 +03:00
Marcin Maliszkiewicz	16b770ff1a	cql3: functions: make functions class non-static This is done to ease code reuse in the following commit. It'd also help should we ever want properly mount functions class to schema object instead of static storage.	2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz	47033dce7a	cql3: functions: remove reduntant class access specifiers	2024-07-04 10:24:57 +02:00
Marcin Maliszkiewicz	e86191b19f	cql3: functions: remove unused java snippet It doesn't seem to serve any purpose now.	2024-07-04 10:24:57 +02:00
Kefu Chai	cccec07581	db: use format_as() in favor of fmt::streamed() since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also, we've switched to the build image based on fedora 40, which ships fmt-devel v10.2.1, there is no need to use fmt::streamed() when the corresponding format_as() as available. simpler this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19594	2024-07-04 11:10:43 +03:00
Kefu Chai	35e7a0b36f	test/cql-pytest: use offset-aware API to avoid deprecate warning to avoid warning like ``` DeprecationWarning: datetime.datetime.utcfromtimestamp() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.fromtimestamp(timestamp, datetime.UTC). ``` and to be future-proof, let's use the offset-aware timestamp. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19536	2024-07-04 10:48:00 +03:00
Kefu Chai	03e1fce7aa	zstd: include external header with brackets zstd.h is a header provided by libzstd, so let's include it with brackets, more consistent this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19538	2024-07-04 10:42:29 +03:00
Takuya ASADA	09e22690dc	scylla_coredump_setup: enable compress by default when zstd support detected We disabled coredump compression by default because it was too slow, but recent versions of systemd-coredump supports faster zstd based compression, so let's enable compression by default when zstd support detected. Related scylladb/scylla-machine-image#462 Closes scylladb/scylladb#18854	2024-07-04 10:38:22 +03:00
Botond Dénes	e3e5f8209d	Merge 'alternator: fix "/localnodes" to use broadcast_rpc_address' from Nadav Har'El This short series fixes Alternator's "/localnodes" request to allow a node's external IP address - configured with `broadcast_rpc_address` - to be listed instead of its usual, internal, IP address. The first patch fixes a bug in gossiper::get_rpc_address(), which the second patch needs to implement the feature. The second patch also contains regression tests. Fixes #18711. Closes scylladb/scylladb#18828 * github.com:scylladb/scylladb: alternator: fix "/localnodes" to use broadcast_rpc_address gossiper: fix get_rpc_address() for this node	2024-07-04 10:37:28 +03:00
Takuya ASADA	65fbf72ed0	scylla-housekeeping: fix exception on parsing version string Since Python 3.12, version parsing becomes strict, parse_version() does not accept the version string like '6.1.0~dev'. To fix this, we need to replace version string from '6.1.0~dev' to '6.1.0.dev0', which is allowed on Python version scheme. reference: https://packaging.python.org/en/latest/specifications/version-specifiers/ Fixes #19564 Closes scylladb/scylladb#19572	2024-07-04 10:27:51 +03:00
Avi Kivity	69450780a7	docs: explain tuning for a node that is overcommitted at the hypervisor level Closes scylladb/scylladb#19589	2024-07-04 10:23:25 +03:00
Pavel Emelyanov	8809b99736	s3/client: Unmark put-object lambdas from mutable They don't need to modify the captured objects. In fact, they must not do it in the first place, because the request can be called more than once and the buffers must not change between those invocations. For the memory_sink_buffers there must be const method to get the vector of temporary_buffers themselves. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19599	2024-07-04 10:07:48 +03:00
Lakshmi Narayanan Sreethar	c80df8504c	sstables::maybe_rebuild_filter_from_index: log sstable origin Log the sstable origin when its bloom filter is being rebuilt. The origin has to be passed to the method by the caller as it is not available in the sstable object when the filter is rebuilt. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#19601	2024-07-04 10:01:23 +03:00
Wojciech Mitros	1fdc65279d	test: add test for view backlog not being updated on correct shard This patch adds a test for reproducing issue https://github.com/scylladb/scylladb/issues/18542 The test performs writes on a table with a materialized view and checks that the view backlog increases. To get the current view update backlog, a new metric "view_update_backlog" is added to the `storage_proxy` metrics. The metric differs from the metric from `database` metric with the same name by taking the backlog from the max_view_update_backlog which keeps view update backlogs from all shards which may be a bit outdated, instead of taking the backlog by checking the view_update_semaphore which the backlog is based on directly.	2024-07-03 23:18:52 +02:00
Wojciech Mitros	c4f5659c11	test: move auxiliary methods for waiting until a view is built to util In many materialized view tests we need to wait until a view is built before actually working on it, future tests will also need it. In existing tests we use the same, duplicated method for achieving that. In this patch the method is deduplicated and moved to pylib/util.py and existing tests are modified to use it instead.	2024-07-03 23:18:52 +02:00
Wojciech Mitros	fd9c7d4d59	mv: update view update backlog when it increases on correct shard When performing a write, we should update the view update backlog on the shard where the mutation is actually applied. Instead, currently we only update it on the shard that initially received the write request (which didn't change at all) and as a result, the backlog on the correct shard and the aggregated max view update backlog are not updated at all. This patch enables updating the backlog on the correct shard. The update is now performed just after the view generation and propagation finishes, so that all backlog increases are noted and the backlog is ready to be used in the write response. Additionally, after this patch, we no longer (falsely) assume that the backlog is modified on the same shard as where we later read it to attach to a response. However, we still compare the aggregated backlog from all shards and the backlog from the shard retrieving the max, as with a shard-aware driver, it's likely the exact shard whose backlog changed.	2024-07-03 23:18:52 +02:00
Avi Kivity	3fc4e23a36	forward_service: rename to mapreduce_service forward_service is nondescriptive and misnamed, as it does more than forward requests. It's a classic map/reduce algorithm (and in fact one of its parameters is "reducer"), so name it accordingly. The name "forward" leaked into the wire protocol for the messaging service RPC isolation cookie, so it's kept there. It's also maintained in the name of the logger (for "nodetool setlogginglevel") for compatibility with tests. Closes scylladb/scylladb#19444	2024-07-03 19:29:47 +03:00
Avi Kivity	f798217293	Merge 'build: cmake: include the whole archive of zstd.a' from Kefu Chai before this change, when linking scylla-main, the linker discards the unreferenced symbols defined by zstd.cc. but we use constructor of static variable `registerator` to register the zstd compressor, this variable is not used from the linker's point of view. but we do rely on the side effect of its constructor. that's why the rules generated by CMake fails to build tests and scylla executables with zstd support. that's why we have following test failure: ``` boost.sstable_3_x_test.test_uncompressed_collections_read ... [Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor' == [File] - seastar/src/testing/seastar_test.cc == [Line] - 43 ``` in this change, we single out zstd.cc and build it as an archive, so that scylla-main can include as a whole. an alternative is to link scylla-main as a whole archive, but that might increase the disk foot print when building lots of tests -- some of them do not use all symbols exposed by scylla-main, and can potentially have smaller size if linker can discard the unused symbols. Refs https://github.com/scylladb/scylladb/issues/2717 --- cmake related change, hence no need to backport. Closes scylladb/scylladb#19539 * github.com:scylladb/scylladb: build: cmake: include the whole archive of zstd.a build: cmake: find libzstd before using it	2024-07-03 17:38:22 +03:00
Botond Dénes	fca0a58674	Merge 'Close output_stream in get_compaction_history() API handler' from Pavel Emelyanov If an httpd body writer is called with output_stream<>, it mist close the stream on its own regardless of any exceptions it may generate while working, otherwise stream destructor may step on non-closed assertion. Stepped on with different handler, see #19541 Coroutinize the handler as the first step while at it (though the fix would have been notably shorter if done with .finally() lambda) Closes scylladb/scylladb#19543 * github.com:scylladb/scylladb: api: Close response stream of get_compaction_history() api: Flush output stream in get_compaction_history() call api: Coroutinize get_compaction_history inner function	2024-07-03 17:00:26 +03:00
Kefu Chai	fd5c04acbb	.github: use the latest dbuild image scylla does not build using scylla-toolchain:fedora-38-20240521, like: ``` FAILED: repair/CMakeFiles/repair.dir/repair.cc.o /usr/bin/clang++ -DBOOST_NO_CXX98_FUNCTION_BASE -DDEVEL -DFMT_SHARED -DSCYLLA_BUILD_MODE=dev -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=7 -DSEASTAR_ENABLE_ALLOC_FAILURE_INJECTION -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/build/gen -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -isystem /__w/scylladb/scylladb/abseil -O2 -std=gnu++2b -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb=. -march=westmere -U_FORTIFY_SOURCE -Werror=unused-result -fstack-clash-protection -MD -MT repair/CMakeFiles/repair.dir/repair.cc.o -MF repair/CMakeFiles/repair.dir/repair.cc.o.d -o repair/CMakeFiles/repair.dir/repair.cc.o -c /__w/scylladb/scylladb/repair/repair.cc In file included from /__w/scylladb/scylladb/repair/repair.cc:10: In file included from /__w/scylladb/scylladb/repair/row_level.hh:14: In file included from /__w/scylladb/scylladb/repair/task_manager_module.hh:14: In file included from /__w/scylladb/scylladb/tasks/task_manager.hh:20: In file included from /__w/scylladb/scylladb/seastar/include/seastar/coroutine/parallel_for_each.hh:24: /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:6161:14: error: requires clause differs in template redeclaration requires forward_range<_Vp> ^ /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/ranges:5860:14: note: previous template declaration is here requires input_range<_Vp> ^ ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19547	2024-07-03 16:57:22 +03:00
Kefu Chai	a88496318b	alternator: use std::to_underlying() when appropriate now that we can use C++23 features, there is no need to hardcode the underlying type anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19546	2024-07-02 18:51:29 +03:00
Kefu Chai	57def6f1e2	docs: install in `non-package` node when running `make setup`, we could have following failure: ``` Installing the current project: scylla (4.3.0) The current project could not be installed: No file/folder found for package scylla If you do not want to install the current project use --no-root ``` because docs is not a proper python project named "scylla", and do not have a directory structure expected by poetry. what we expect from poetry, is to manage the dependencies for building the document. so, in this change, we install in the `non-package` mode when running `poetry install`, this skips the root package, which does not exist. as an alternative, we could put an empty `scylla.py` under `docs` directory, but that'd be overkill. or we could pass `--no-root` to `poetry install`, but would be ideal if we can keep the settings in a single place. see also https://python-poetry.org/docs/basic-usage/#operating-modes, and https://python-poetry.org/docs/cli/#options-2, for more details on the settings and command line options of poetry. please note this setting was added to poetry 1.8, so the required poetry version is updated. we might need to upgrade poetry in existing installation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19498	2024-07-02 18:03:20 +03:00
Michael Litvak	08b29460fc	mv: skip building view updates on a pending replica Currently, a pending replica that applies a write on a table that has materialized views, will build all the view updates as a normal replica, only to realize at a late point, in db::view::get_view_natural_endpoint(), that it doesn't have a paired view replica to send the updates to. It will then either drop the view updates, or send them to a pending view replica, if such exists. This work is unnecessary since it may be dropped, and even if there is a pending view replica to send the updates to, the updates that are built by the pending replica may be wrong since it may have incomplete information. This commit fixes the inefficiency by skipping the view update building step when applying an update on a pending replica. The metric total_view_updates_on_wrong_node is added to count the cases that a view update is determined to be unnecessary. The test reproduces the scenario of writing to a table and applying the update on a pending replica, and verifies that the pending replica doesn't try to build view updates. Fixes scylladb/scylladb#19152 Closes scylladb/scylladb#19488	2024-07-02 13:10:18 +02:00
Nadav Har'El	d61513c41c	Merge 'reader_concurrency_semaphore: make CPU concurrency configurable' from Botond Dénes The reader concurrency semaphore restricts the concurrency of reads that require CPU (intention: they read from the cache) to 1, meaning that if there is even a single active read which declares that it needs just CPU to proceed, no new read is admitted. This is meant to keep the concurrency of reads in the cache at 1. The idea is that concurrency in the cache is not useful: it just leads to the reactor rotating between these reads, all of the finishing later then they could if they were the only active read in the cache. This was observed to backfire in the case where there reads from a single table are mostly very fast, but on some keys are very slow (hint: collection full of tombstones). In this case the slow read keeps up the fast reads in the queue, increasing the 99th percentile latencies significantly. This series proposes to fix this, by making the CPU concurrency configurable. We don't like tunables like this and this is not a proper fix, but a workaround. The proper fix would be to allow to cut any page early, but we cannot cut a page in the middle of a row. We could maybe have a way of detecting slow reads and excluding them from the CPU concurrency. This would be a heuristic and it would be hard to get right. So in this series a robust and simple configurable is offered, which can be used on those few clusters which do suffer from the too strict concurrency limit. We have seen it in very few cases so far, so this doesn't seem to be wide-spread. Fixes: https://github.com/scylladb/scylladb/issues/19017 This fixes a regression introduced in 5.0, so we have to backport to all currently supported releases Closes scylladb/scylladb#19018 * github.com:scylladb/scylladb: test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc Please enter the commit message for your changes. Lines starting test/boost/reader_concurrency_semaphore_test: hoist require_can_admit reader_concurrency_semaphore: wire in the configurable cpu concurrency reader_concurrency_semaphore: add cpu_concurrency constructor parameter db/config: introduce reader_concurrency_semahore_cpu_concurrency	2024-07-02 13:39:00 +03:00
Tzach Livyatan	6ea475ec76	Docs: Fix a typo in sstable-corruption.rst Closes scylladb/scylladb#19515	2024-07-02 11:58:27 +02:00
Kamil Braun	bcfdeda080	Merge 'co-routinize paxos_state functions' from Gleb Co-routinize paxos_state functions to make them more readable. * 'gleb/coroutineze-paxos-state' of github.com:scylladb/scylla-dev: paxos: simplify paxos_state::prepare code to not work with raw futures paxos: co-routinize paxos_state::learn function paxos: remove no longer used with_locked_key functions paxos: co-routinize paxos_state::accept function paxos: co-routinize paxos_state::prepare function paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access	2024-07-02 11:54:13 +02:00
Tzach Livyatan	4938927fc2	Docs: fix typo in config-commands.rst This is a leftover from https://github.com/scylladb/scylladb/pull/19578, which mistakenly update the "scylla" script name to "ScyllaDB" Closes scylladb/scylladb#19583	2024-07-02 10:54:47 +02:00
Kamil Braun	edeb266fc2	Merge 'docs, config: render logging related options' from Kefu Chai this changeset adds a filter to customize the rendering of default values, and enables the `scylladb_cc_properties` extension to display the logging message related options. it prepares for the further improvements in https://opensource.docs.scylladb.com/master/reference/configuration-parameters.html. this changeset also prepare for the improvements requested by #19463 --- it's an improvement in the document, hence no need to backport. Closes scylladb/scylladb#19483 * github.com:scylladb/scylladb: config: add descriptions for default_log_level and friends config: define log_to_syslog in a different line docs: parse log_legacy_value as declarations of config option	2024-07-02 10:44:50 +02:00
Kefu Chai	aedd145d6b	.github: add compaction to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-02 14:06:42 +08:00
Kefu Chai	e87b64b7bb	compaction: not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-02 14:06:42 +08:00
Tzach Livyatan	91401f7da5	docs: Update Scylla to ScyllaDB in all RST docs files v3 Closes scylladb/scylladb#19578	2024-07-01 18:04:21 +02:00
Andrei Chekun	b6aabca9a7	Add documentation how to use allure reporting Add documentation how to install and basic usage example of the allure reporting tool. Fix typo test/README.md Related: scylladb/qa-tasks#1665 Depends on: scylladb/scylladb#18169 Closes scylladb/scylladb#18710	2024-07-01 16:21:50 +02:00
Gleb Natapov	9ebdb23002	raft: add more raft metrics to make debug easier	2024-07-01 10:55:22 +02:00
Kamil Braun	94bc9d4f5b	Merge 'Do not expire local addres in raft address map since the local node cannot disappear' from Gleb Natapov A node may wait in the topology coordinator queue for awhile before been joined. Since the local address is added as expiring entry to the raft address map it may expire meanwhile and the bootstrap will fail. The series makes the entry non expiring. Fixes scylladb/scylladb#19523 Needs to be backported to 6.0 since the bug may cause bootstrap to fail. Closes scylladb/scylladb#19557 * github.com:scylladb/scylladb: test: add test that checks that local address cannot expire between join request placemen and its processing storage_service: make node's entry non expiring in raft address map	2024-07-01 09:12:48 +02:00
Kefu Chai	90be71d959	build: cmake: include the whole archive of zstd.a before this change, when linking scylla-main, the linker discards the unreferenced symbols defined by zstd.cc. but we use constructor of static variable `registerator` to register the zstd compressor, this variable is not used from the linker's point of view. but we do rely on the side effect of its constructor. that's why the rules generated by CMake fails to build tests and scylla executables with zstd support. that's why we have following test failure: ``` boost.sstable_3_x_test.test_uncompressed_collections_read ... [Exception] - no_such_class: unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor' == [File] - seastar/src/testing/seastar_test.cc == [Line] - 43 ``` in this change, we single out zstd.cc and build it as an archive, so that scylla-main can include as a whole. an alternative is to link scylla-main as a whole archive, but that might increase the disk foot print when building lots of tests -- some of them do not use all symbols exposed by scylla-main, and can potentially have smaller size if linker can discard the unused symbols. Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 11:51:19 +08:00
Kefu Chai	1e0af0fb7e	build: cmake: find libzstd before using it we use libzstd in zstd.cc. so let's find this library before using it. this helps user to identify problem when preparing the building environment, instead of being greeted by a compile-time failure. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 11:51:19 +08:00
Kefu Chai	b71b638b2e	config: add descriptions for default_log_level and friends so that their description can be displayed in `reference/configuration-parameters/` web page. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	b486f4ef01	config: define log_to_syslog in a different line before this change, docs/_ext/scylladb_cc_properties.py parses the options line by line, because `log_to_stdout` and `log_to_syslog` are defined in a single line, this script is not able to parse them, hence fails to display them on the `reference/configuration-parameters/` web page. after this change, these two member variables are defined on different lines. both of them can be displayed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	34cab80103	docs: parse log_legacy_value as declarations of config option before this change, we only consider "named_value<type>" as the declaration of option, and the "Type" field of the corresponding option is displayed if its declaration is found. otherwise, "Type" field is not rendered. but some logging related options are declared using `log_legacy_value`, so they are missing. after this change, they are displayed as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-07-01 09:47:28 +08:00
Kefu Chai	405f624776	cql3: define dtor of modification_statement in .cc file before this change, we rely on the compiler to use the definition of `cql3::attributes` to generate the defaulted destructor in .cc file. but with clang-19, it insists that we should have a complete definition available for defining the defaulted destructor, otherwise it fails the build: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -MF CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o.d -o CMakeFiles/scylla-main.dir/RelWithDebInfo/table_helper.cc.o -c /home/kefu/dev/scylladb/table_helper.cc In file included from /home/kefu/dev/scylladb/table_helper.cc:10: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/coroutine.hh:25: In file included from /home/kefu/dev/scylladb/seastar/include/seastar/core/future.hh:30: In file included from /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/memory:78: /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:91:16: error: invalid application of 'sizeof' to an incomplete type 'cql3::attributes' 91 \| static_assert(sizeof(_Tp)>0, \| ^~~~~~~~~~~ /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:398:4: note: in instantiation of member function 'std::default_delete<cql3::attributes>::operator()' requested here 398 \| get_deleter()(std::move(__ptr)); \| ^ /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in instantiation of member function 'std::unique_ptr<cql3::attributes>::~unique_ptr' requested here 40 \| class modification_statement : public cql_statement_opt_metadata { \| ^ /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:40:7: note: in implicit destructor for 'cql3::statements::modification_statement' first required here /home/kefu/dev/scylladb/cql3/statements/modification_statement.hh:28:7: note: forward declaration of 'cql3::attributes' 28 \| class attributes; \| ^ ``` so, in this change, we define the destructor in .cc file, where the complete definition of `cql3::attributes` is available. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19545	2024-06-30 19:35:05 +03:00
Avi Kivity	0ce00ebfbd	Merge 'Close output stream in task manager's API get_tasks handler' from Pavel Emelyanov If client stops reading response early, the server-side stream throws but must be closed anyway. Seen in another endpoint and fixed by #19541 Closes scylladb/scylladb#19542 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close response stream on error api: Flush response output stream before closing	2024-06-30 19:34:00 +03:00
Avi Kivity	3a85d88b68	Merge 'Close output_stream in get_snapshot_details() API handler' from Pavel Emelyanov All streams used by httpd handlers are to be closed by the handler itself, caller doesn't take care of that. fixes: #19494 Closes scylladb/scylladb#19541 * github.com:scylladb/scylladb: api: Fix indentation after previous patch api: Close output_stream on error api: Flush response output stream before closing	2024-06-30 19:33:16 +03:00
Avi Kivity	2fbc532e4d	Update tools/python3 submodule * tools/python3 3e833f1...18fa79e (1): > reloc: use `--add-rpath` and not `--set-rpath`	2024-06-30 19:31:23 +03:00
Kefu Chai	77d2d5821d	build: cmake: do not mark cqlsh noarch in `3c7af287`, cqlsh's reloc package was marked as "noarch", and its filename was updated accordingly in `configure.py`, so let's update the CMake building system accordingly. this change should address the build failure of ``` 08:48:14 [3325/4124] Generating ../Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 FAILED: Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 cd /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist && /usr/bin/cmake -E copy /jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz 08:48:14 Error copying file "/jenkins/workspace/scylla-master/scylla-ci/scylla/tools/cqlsh/build/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz" to "/jenkins/workspace/scylla-master/scylla-ci/scylla/build/Debug/dist/tar/scylla-cqlsh-6.1.0~dev-0.20240629.60955ead75ef.noarch.tar.gz". ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19544	2024-06-30 19:26:54 +03:00
Nadav Har'El	44e036c53c	alternator: fix "/localnodes" to use broadcast_rpc_address Alternator's non-standard "/localnodes" HTTP request returns a list of live nodes on this DC, to consider for load balancing. The returned node addresses should be external IP addresses usable by the clients. Scylla has a configuration parameter - broadcast_rpc_address - which defines for a node an external IP address. If such a configuration exists, we need to use those external IP addresses, not the internal ones. Finding these broadcast_rpc_address of all nodes is easy, because the gossiper already gossips them. This patch also tests the new feature: 1. The existing single-node test is extended to verify that without broadcast_rpc_address we get the usual IP address. 2. A new two-node test is added to check that when broadcast_rpc_address is configured, we get that address and not the usual internal IP addresses. Fixes #18711. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-30 18:38:15 +03:00
Nadav Har'El	2a2e8167c8	gossiper: fix get_rpc_address() for this node Commit `dd46a92e23` introduced a function gossiper::get_rpc_address() as a shortcut for get_application_state_ptr(endpoint, RPC_ADDRESS) - i.e., it fetches the endpoint's configured broadcast_rpc_address (despite its confusing name, this is the endpoint's external IP address that clients can use to make CQL connections). But strangely, the implementation get_rpc_address() made an exception for asking about the current host - where instead of getting this node's broadcast_rpc_address, it returns its internal address, which is not what this function was supposed to do - it's not useful for it to do one thing for this node, and a different thing for other nodes, and when I wrote code that uses this function (see the next patch), this resulted in wrong results for the current node. The fix is simple - drop the wrong if(), and get the broadcast_rpc_address stored by the gossiper unconditionally - the gossiper knows it for this node just like for other nodes. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-06-30 18:38:15 +03:00
Gleb Natapov	3f136cf2eb	test: add test that checks that local address cannot expire between join request placemen and its processing	2024-06-30 15:52:23 +03:00
Gleb Natapov	5d8f08c0d7	storage_service: make node's entry non expiring in raft address map Local address map entry should never expire in the address map.	2024-06-30 15:08:50 +03:00
Kefu Chai	947e28146d	dbuild: pass --tty when running in interactive mode podman does not allocate a tty by default, so without `-t` or `--tty`, one cannot use a functional terminal when interacting with the container. that what one can expect when running `dbuild -i --`, and we are greeted with : ``` bash: cannot set terminal process group (-1): Inappropriate ioctl for device bash: no job control in this shell ``` after this change, one can enjoy the good-old terminal as usual after being dropped to the container provided by `dbuild -i --`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19550	2024-06-30 12:06:55 +03:00
Pavel Emelyanov	d034cde01f	Merge 'build: update C++ standard to C++23' from Avi Kivity Switch the C++ standard from C++20 to C++23. This is straightforward, but there are a few fallouts (mostly due to std::unique_ptr that became constexpr) that need to be fixed first. Internal enhancement - no backport required Closes scylladb/scylladb#19528 * github.com:scylladb/scylladb: build: switch to C++23 config: avoid binding an lvalue reference to an rvalue reference readers: define query::partition_slice before using it in default argument test: define table_for_tests earlier compaction: define compaction_group::table_state earlier compaction: compaction_group: define destructor out-of-line compaction_manager: define compaction_manager::strategy_control earlier	2024-06-28 18:02:33 +03:00
Avi Kivity	cf66f233aa	build: remove aarch64 workarounds In `90a6c3bd7a` ("build: reduce release mode inline tuning on aarch64") we reduced inlining on aarch64, due to miscompiles. In `224a2877b9` ("build: disable -Og in debug mode to avoid coroutine asan breakage") we disabled optimization in debug mode, due to miscompiles. With clang 18.1, it appears the miscompiles are gone, and we can remove the two workarounds. Closes scylladb/scylladb#19531	2024-06-28 17:53:51 +03:00
Pavel Emelyanov	b4f9387a9d	api: Close response stream of get_compaction_history() The function must close the stream even if it throws along the way. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:56:53 +03:00
Pavel Emelyanov	6d4ba98796	api: Flush output stream in get_compaction_history() call It's currently implicitly flushed on its close, but in that case close can throw while flusing. Next patch wants close not to throw and that's possible if flushing the stream in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:55:58 +03:00
Pavel Emelyanov	acb351f4ee	api: Coroutinize get_compaction_history inner function The handler returns a function which is then invoked with output_stream argument to render the json into. This function is converted into coroutine. It has yet another inner lambda that's passed into compaction_manager::get_compaction_history() as consumer lambda. It's coroutinized too. The indentation looks weird as preparation for future patching. Hopefullly it's still possible to understand what's going on. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:53:46 +03:00
Pavel Emelyanov	1be8b2fd25	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:07:21 +03:00
Pavel Emelyanov	986a04cb11	api: Close response stream on error The handler's lambda is called with && stream object and must close the stream on its own regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:06:41 +03:00
Pavel Emelyanov	4897d8f145	api: Flush response output stream before closing The .close() method flushes the stream, but it may throw doing it. Next patch will want .close() not to throw, for that stream must be flushed explicitly before closing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 16:05:20 +03:00
Pavel Emelyanov	1839030e3b	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:41:12 +03:00
Pavel Emelyanov	a0c1552cea	api: Close output_stream on error If the get_snapshot_details() lambda throws, the output stream remains non-closed which is bad. Close it regardless of what. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:40:42 +03:00
Pavel Emelyanov	d1fd886608	api: Flush response output stream before closing Otherwise close() may throw and this is what next patch will want not to happen. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-28 15:40:00 +03:00
Piotr Dulikowski	f00c4eaf72	Merge '[test.py] add --extra-scylla-cmdline-options argument for test.py' from Artsiom Mishuta this PR has 2 commits - [test: pass Scylla extra CMD args from test.py args](`6b367a04b5`) - [test: adjust scylla_cluster.merge_cmdline_options behavior](`c60b36090a`) the main goal is to solve [test.py: provide an easy-to-remember, univeral way to run scylla with trace level logging](https://github.com/scylladb/scylladb/issues/14960) issue but also can be used to easily apply additional arguments for all UnitTests and PythonTests on the fly from the test.py CMD Closes scylladb/scylladb#19509 * github.com:scylladb/scylladb: test: adjust scylla_cluster.merge_cmdline_options behavior test: pass scylla extra CMD args from test.py args	2024-06-28 11:11:29 +02:00
Kamil Braun	6ec8143e56	Merge 'Remove dead code from migration_manager and schema_tables' from Benny Halevy This short series removed some ancient legacy code from migration_manager and schema_tables, before I make further changes in this area. We have more such code under the cql3 hierarchy but it can be dealt with as a follow up. No backport required Closes scylladb/scylladb#19530 * github.com:scylladb/scylladb: schema_tables: remove dead code migration_manager: remove dead code	2024-06-28 10:59:21 +02:00
Piotr Smaron	88eda47f13	cql: forbid switching from tablets to vnodes in ALTER KS This check is already in place, but isn't fully working, i.e. switching from a vnode KS to a tablets KS is not allowed, but this check doesn't work in the other direction. To fix the latter, `ks_prop_defs::get_initial_tablets()` has been changed to handle 3 states: (1) init_tablets is set, (2) it was skipped, (3) tablets are disabled. These couldn't fit into std::optional, so a new local struct to hold these states has been introduced. Callers of this function have been adjusted to set init_tablets to an appropriate value according to the circumstances, i.e. if tablets are globally enabled, but have been skipped in the CQL, init_tablets is automatically set to 0, but if someone executes ALTER KS and doesn't provide tablets options, they're inherited from the old KS. I tried various approaches and this one resulted in the least lines of code changed. I also provided testcases to explain how the code behaves. Fixes: #18795 Closes scylladb/scylladb#19368	2024-06-28 11:41:41 +03:00
Gleb Natapov	5c72af7a93	paxos: simplify paxos_state::prepare code to not work with raw futures	2024-06-28 07:30:45 +03:00
Gleb Natapov	2b7acdb32c	paxos: co-routinize paxos_state::learn function	2024-06-28 07:30:45 +03:00
Gleb Natapov	6bf307ffe8	paxos: remove no longer used with_locked_key functions	2024-06-28 07:30:45 +03:00
Gleb Natapov	887a5a8f62	paxos: co-routinize paxos_state::accept function	2024-06-28 07:30:45 +03:00
Benny Halevy	b7f00ba4bf	schema_tables: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:34:02 +03:00
Benny Halevy	5f6c411656	migration_manager: remove dead code Well, even after 10 years, the c++ compilers still do not compile Java... And having that legacy code laying around not only it doesn't help anyone understand what's going on, but on the contrary, it's confusing and distracting. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 20:30:33 +03:00
Avi Kivity	4d85db9f39	build: switch to C++23 Set the C++ dialect to C++23, allowing us to use the new features.	2024-06-27 19:36:13 +03:00
Avi Kivity	d14eec8160	config: avoid binding an lvalue reference to an rvalue reference config_file::add_deprecated_options() returns an lvalue reference to a parameter which itself is an rvalue reference. In C++20 this is bad practice (but not a bug in this case) as rvalue references are not expected to live past the call. In C++23, it fails to compile. Fix by accepting an lvalue reference for the parameter, and adjust the caller.	2024-06-27 19:36:13 +03:00
Avi Kivity	ed816afac4	readers: define query::partition_slice before using it in default argument C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is the default argument of make_reversing_reader; define it earlier (by including its header) to build with C++23.	2024-06-27 19:36:13 +03:00
Piotr Dulikowski	f9abe52d3b	Merge 'test: auth: add random tag to resources in test_auth_v2_migration' from Marcin Maliszkiewicz Those tests are sometimes failing on CI and we have two hypothesis: 1. Something wrong with consistency of statements 2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run) To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run. Related scylladb/scylladb#18931 Related scylladb/scylladb#18319 backport: no, just a test fix Closes scylladb/scylladb#19484 * github.com:scylladb/scylladb: test: auth: add random tag to resources in test_auth_v2_migration test: extend unique_name with random sufix	2024-06-27 17:35:14 +02:00
Gleb Natapov	58912c2cc1	paxos: co-routinize paxos_state::prepare function	2024-06-27 18:10:49 +03:00
Gleb Natapov	4f546b8b79	paxos: introduce get_replica_lock() function to take RAII guard for local paxos table access	2024-06-27 18:09:30 +03:00
Avi Kivity	e5807555bd	test: define table_for_tests earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is table_for_tests; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Avi Kivity	d5ba0b4041	compaction: define compaction_group::table_state earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is compaction_group::table_state; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Avi Kivity	9ecf4ada49	compaction: compaction_group: define destructor out-of-line Define compaction_group::~compaction_group() out-of-line to prevent problems instantiating compaction_group::_table_state, which is an std::unique_ptr. In C++23, std::unique_ptr is constexpr, which means its methods (in this case the destructor) require seeing the definition of the class at the point of instantiation.	2024-06-27 17:54:12 +03:00
Avi Kivity	050e7bbd64	compaction_manager: define compaction_manager::strategy_control earlier C++23 made std::unique_ptr constexpr. A side effect of this (presumably) is that the compiler compiles it more eagerly, requiring the full definition of the class in std::make_unique, while it previously was content with finding the definition later. One victim of this change is compaction_manager::strategy_control; define it earlier to build with C++23.	2024-06-27 17:54:12 +03:00
Andrei Chekun	561e88f00e	[test.py] Throw meaningful error when something wrong wit Scylla binary Fixes: https://github.com/scylladb/scylladb/issues/19489 There is already a check that Scylla binary is executable, but it's done on later stage. So in logs for specific test file there will be a message about something wrong with binary, but in console there will be now signs of that. Moreover, there will be an error that completely misleads what actually happened and why test run failed. With this check test will fail earlier providing the correct reason why it's failed Closes scylladb/scylladb#19491	2024-06-27 17:38:32 +03:00
Avi Kivity	581d619572	storage_proxy: trace speculative retries A speculative retry can appear out of the blue[1] and confuse people, as it looks like the consistency level was elevated. Fix by adding such a tracepoint. Sample output: ``` activity \| timestamp \| source \| source_elapsed \| client ---------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+----------- Execute CQL3 query \| 2024-06-27 14:25:58.947000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0] \| 2024-06-27 14:25:58.947918 \| 127.0.0.1 \| 2 \| 127.0.0.1 Processing a statement for authenticated user: anonymous [shard 0] \| 2024-06-27 14:25:58.948025 \| 127.0.0.1 \| 108 \| 127.0.0.1 Creating read executor for token -4069959284402364209 with all: [127.0.0.1, 127.0.0.2] targets: [127.0.0.2] repair decision: NONE [shard 0] \| 2024-06-27 14:25:58.948125 \| 127.0.0.1 \| 209 \| 127.0.0.1 Added extra target 127.0.0.1 for speculative read [shard 0] \| 2024-06-27 14:25:58.948128 \| 127.0.0.1 \| 212 \| 127.0.0.1 Creating speculating_read_executor [shard 0] \| 2024-06-27 14:25:58.948129 \| 127.0.0.1 \| 213 \| 127.0.0.1 read_data: sending a message to /127.0.0.2 [shard 0] \| 2024-06-27 14:25:58.948138 \| 127.0.0.1 \| 222 \| 127.0.0.1 Launching speculative retry for data [shard 0] \| 2024-06-27 14:25:58.948234 \| 127.0.0.1 \| 318 \| 127.0.0.1 read_data: querying locally [shard 0] \| 2024-06-27 14:25:58.948235 \| 127.0.0.1 \| 319 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] \| 2024-06-27 14:25:58.948246 \| 127.0.0.1 \| 330 \| 127.0.0.1 [reader concurrency semaphore user] admitted immediately [shard 0] \| 2024-06-27 14:25:58.948250 \| 127.0.0.1 \| 334 \| 127.0.0.1 [reader concurrency semaphore user] executing read [shard 0] \| 2024-06-27 14:25:58.948258 \| 127.0.0.1 \| 342 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] \| 2024-06-27 14:25:58.948281 \| 127.0.0.1 \| 365 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2024-06-27 14:25:58.948311 \| 127.0.0.1 \| 395 \| 127.0.0.1 Querying is done [shard 0] \| 2024-06-27 14:25:58.948320 \| 127.0.0.1 \| 404 \| 127.0.0.1 read_data: message received from /127.0.0.1 [shard 0] \| 2024-06-27 14:25:58.948351 \| 127.0.0.2 \| 12 \| 127.0.0.1 Done processing - preparing a result [shard 0] \| 2024-06-27 14:25:58.948354 \| 127.0.0.1 \| 438 \| 127.0.0.1 Start querying singular range {{-4069959284402364209, pk{000400000001}}} [shard 0] \| 2024-06-27 14:25:58.948370 \| 127.0.0.2 \| 31 \| 127.0.0.1 [reader concurrency semaphore user] admitted immediately [shard 0] \| 2024-06-27 14:25:58.948374 \| 127.0.0.2 \| 35 \| 127.0.0.1 [reader concurrency semaphore user] executing read [shard 0] \| 2024-06-27 14:25:58.948388 \| 127.0.0.2 \| 49 \| 127.0.0.1 Querying cache for range {{-4069959284402364209, pk{000400000001}}} and slice [(-inf, +inf)] [shard 0] \| 2024-06-27 14:25:58.948405 \| 127.0.0.2 \| 66 \| 127.0.0.1 Page stats: 1 partition(s), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead) and 0 range tombstone(s) [shard 0] \| 2024-06-27 14:25:58.948424 \| 127.0.0.2 \| 85 \| 127.0.0.1 Querying is done [shard 0] \| 2024-06-27 14:25:58.948430 \| 127.0.0.2 \| 91 \| 127.0.0.1 read_data handling is done, sending a response to /127.0.0.1 [shard 0] \| 2024-06-27 14:25:58.948436 \| 127.0.0.2 \| 97 \| 127.0.0.1 read_data: got response from /127.0.0.2 [shard 0] \| 2024-06-27 14:25:58.949140 \| 127.0.0.1 \| 1224 \| 127.0.0.1 Request complete \| 2024-06-27 14:25:58.947449 \| 127.0.0.1 \| 449 \| 127.0.0.1 ``` Ref #18988 [1] not completely out of the blue, `ff29f430` indicates that a speculative read can happen. Closes scylladb/scylladb#19520	2024-06-27 17:37:36 +03:00
Botond Dénes	b4f3809ad2	test/boost/reader_concurrency_semaphore_test: add test for live-configurable cpu concurrenc Please enter the commit message for your changes. Lines starting	2024-06-27 09:57:11 -04:00
Botond Dénes	9cbdd8ef92	test/boost/reader_concurrency_semaphore_test: hoist require_can_admit This is currently a lambda in a test, hoist it into the global scope and make it into a function, so other tests can use it too (in the next patch).	2024-06-27 09:57:11 -04:00
Botond Dénes	07c0a8a6f8	reader_concurrency_semaphore: wire in the configurable cpu concurrency Before this patch, the semaphore was hard-wired to stop admission, if there is even a single permit, which is in the need_cpu state. Therefore, keeping the CPU concurrency at 1. This patch makes use of the new cpu_concurrency parameter, which was wired in in the last patches, allowing for a configurable amount of concurrent need_cpu permits. This is to address workloads where some small subset of reads are expected to be slow, and can hold up faster reads behind them in the semaphore queue.	2024-06-27 09:57:11 -04:00
Botond Dénes	59faa6d4ff	reader_concurrency_semaphore: add cpu_concurrency constructor parameter In the case of the user semaphore, this receives the new reader_concurrency_semaphore_cpu_limit config item. Not used yet.	2024-06-27 09:57:11 -04:00
Benny Halevy	7f05f95ec4	conf: scylla.yaml: enable_tablets: expand documentation The exiting documentation comment for `enable_tablets` is very terse and lacks details about the effect of enabling or disabling tablets. This change adds more details about the impact of `enable_tablets` on newly created keyspaces, and hot to disable tablets when keyspaces are created. Also, a note was added to warn about the irreversibility of the tablets enablement per keyspace. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 14:41:43 +03:00
Avi Kivity	0d23b8165e	build: update frozen toolchain to Fedora 40 with clang 18.1.6 This refreshes our dependencies to a supported distribution. Closes scylladb/scylladb#19205	2024-06-27 14:27:21 +03:00
Yaron Kaikov	efa94b06c2	.github/scripts/label_promoted_commits.py: fix adding labels when PR is closed `prs = response.json().get("items", [])` will return empty when there are no merged PRs, and this will just skip the all-label replacement process. This is a regression following the work done in #19442 Adding another part to handle closed PRs (which is the majority of the cases we have in Scylla core) Fixes: https://github.com/scylladb/scylladb/issues/19441 Closes scylladb/scylladb#19497	2024-06-27 14:00:44 +03:00
Pavel Emelyanov	6c1e5c248f	main,proxy: Drain proxy in its stop_remote Currently proxy initialization is pretty disperse, in particular it's stopped in several steps -- first drain_on_shutdown() then stop_remote(). In between there's nothing that needs proxy in any particular sate, so those two steps can be merged into one. refs: scylladb/scylladb#2737 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19344	2024-06-27 12:26:51 +02:00
Pavel Emelyanov	1a219c674c	s3/client: Always retry http requests Real S3 server is known to actively close connections, thus breaking S3 storage backend at random places. The recent http client update is more robust against that, but the needed feature is OFF by default. refs: scylladb/seastar#1883 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19461	2024-06-27 13:14:24 +03:00
Artsiom Mishuta	919d44e0c7	test: adjust scylla_cluster.merge_cmdline_options behavior adjust merge_cmdline_options behaviour to append --logger-log-level option instead of merge this behaviour can be changed(if needed) to previour version(all merge): merge_cmdline_options(list1, list2, appending_options=[]) or, to append different cmd options: merge_cmdline_options(list1, list2, appending_options=[option1,option2])	2024-06-27 10:03:31 +02:00
Artsiom Mishuta	440785bc41	test: pass scylla extra CMD args from test.py args this commit introduces a test.py option --extra-scylla-cmdline-options to pass extra scylla cmdline options for all tests. Options should be space-separated: '--logger-log-level raft=trace --default-log-level error'	2024-06-27 10:02:55 +02:00
Artsiom Mishuta	677173bf8b	test: generate core dumps on crashes in nodetool tests The nodetool tests does not set the asan/ubsan options to abort on error and create core dumps Fix by setting the environment variables in nodetool tests. Closes scylladb/scylladb#19503	2024-06-27 10:44:33 +03:00
Marcin Maliszkiewicz	b708c5701f	test: auth: add random tag to resources in test_auth_v2_migration Those tests are sometimes failing on CI and we have two hypothesis: 1. Something wrong with consistency of statements 2. Interruption from another test run (e.g. same queries performed concurrently or data remained after previous run) To exclude or confirm 2. we add random marker to avoid potential collision, in such case it will be clearly visible that wrong data comes from a different run. Related scylladb/scylladb#18931 Related scylladb/scylladb#18319	2024-06-27 09:28:27 +02:00
Marcin Maliszkiewicz	d08a80b34f	test: extend unique_name with random sufix This reduces collision risk in an unlikely and incorrect setup where tests would be run concurrently by multiple processes.	2024-06-27 09:28:02 +02:00
Anna Stuchlik	e2994a19d5	doc: update Scylla Doctor installation This commit updates the instuctions on how to download and run Scylla Doctor, following the changes in how Scylla Doctor is released. Closes scylladb/scylladb#19510	2024-06-27 10:22:08 +03:00
Botond Dénes	2fe50cda22	Merge 'chunked_vector enhancements' from Benny Halevy This short series enhances utils::chunked_vector so it could be used more easily to convert dht::partition_range_vector to chunked_vector, for example. - utils: chunked_vector: document invalidation of iterators on move - utils: chunked_vector: add ctor from std::initializer_list - utils: chunked_vector: add ctor from a single value No backport required Closes scylladb/scylladb#19462 * github.com:scylladb/scylladb: chunked_vector_test: add tests for value-initialization constructor utils: chunked_vector: add ctor from std::initializer_list utils: chunked_vector: document invalidation of iterators on move	2024-06-27 10:20:47 +03:00
Benny Halevy	92f8d219b3	conf: scylla.yaml: remove tablets from experimental_features doc comment tablets are no longer in experimental_features since `83d491af02`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-27 08:55:30 +03:00
Anna Stuchlik	072542a5cc	doc: add a page with ScyllaDB limits This commit adds a page listing the ScyllDB limits we know today. The page can and should be extended when other limits are confirmed. Closes scylladb/scylladb#19399	2024-06-27 08:28:51 +03:00
Kefu Chai	52f1168a3d	repair: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19508	2024-06-26 21:57:03 +03:00
Israel Fruchter	3c7af28725	cqlsh: update cqlsh submodule this change updates the cqlsh submodule: * tools/cqlsh/ ba83aea3...73bdbeb0 (4): > install.sh: replace tab with spaces > define the the debug packge is empty > tests: switch from using cqlsh bash to the test the python file > package python driver as wheels it also includes follow change to package cqlsh as a regular rpm instead of as a "noarch" rpm: so far cqlsh bundles the python-driver in, but only as source. meaning the package wasn't architecture, and also didn't have the libev eventloop compiled in. Since from python 3.12 and up, that would mean we would fallback into asyncio eventloop (which still exprimental) or into error (once we'll sync with the driver upstream) so to avoid those, we are change the packaging of cqlsh to be architecture specific, and get cqlsh compiled, and bundle all of it's requirements as per architecture installed bundle of wheels. using `shiv`, i.e. one file virtualenv that we'll be packing into our artifacts Ref: https://github.com/scylladb/scylla-cqlsh/issues/90 Ref: https://github.com/scylladb/scylla-cqlsh/pull/91 Ref: https://github.com/linkedin/shiv Closes scylladb/scylladb#19385 * tools/cqlsh ba83aea...242876c (1): > Merge 'package python driver as wheels' from Israel Fruchter Update tools/cqlsh/ submodule in which, the change of `define the the debug packge is empty` should address the build failure like ``` Processing files: scylla-cqlsh-debugsource-6.1.0~dev-0.20240624.c7748f60c0bc.aarch64 error: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list RPM build errors: Empty %files file /jenkins/workspace/scylla-master/next/scylla/tools/cqlsh/build/redhat/BUILD/scylla-cqlsh/debugsourcefiles.list ``` Closes scylladb/scylladb#19473	2024-06-26 12:07:21 +03:00
Botond Dénes	1fca341514	test/topology_custom/test_repair: add test for enable_tombstone_gc_for_streaming_and_repair	2024-06-26 04:05:17 -04:00
Botond Dénes	d3b1ccd03a	replica/table: maybe_compact_for_streaming(): toggle tombstone GC based on the control flag Now enable_tombstone_gc_for_streaming_and_repair is wired in all the way to maybe_compact_for_streaming(), so we can implement the toggling of tombstone GC based on it.	2024-06-26 04:05:17 -04:00
Botond Dénes	415457be2b	replica: propagate enable_tombstone_gc_for_streaming_and_repair to maybe_compact_for_streaming() Just wiring, the new flag will be used in the next patch.	2024-06-26 04:05:17 -04:00
Botond Dénes	d5a149fc01	db/config: introduce enable_tombstone_gc_for_streaming_and_repair To control whether the compacting reader (if enabled) for streaming and repair can garbage-collect tombstones. Default is false (previous behaviour). Not wired yet.	2024-06-26 04:05:17 -04:00
Pavel Emelyanov	263668bc85	transport: Use sharded<>::invoke_on_others() When preparing statement, the server code first does it on non-local shards, then on local one. The former call is done the hard way, while there's a short sugar sharded<> class method doing it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19485	2024-06-25 22:17:59 +03:00
Kamil Braun	13fc2bd854	Merge `notify other nodes on boot` from Gleb The series adds a step during node's boot process, just before completing the initialization, in which the node sends a notification to all other normal nodes in the cluster that it is UP now. Other nodes wait for this node to be UP and in normal state before replying. This ensures that, in a healthy cluster, when a node start serving queries the entire cluster knows its up-to-date state. The notification is a best effort though. If some nodes are down or do not reply in time the boot process continues. It is somewhat similar to shutdown notification in this regard. * 'gleb/notify-up-v2' of github.com:scylladb/scylla-dev: gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization Wait for booting node to be marked UP before complete booting. gossiper: move gossip verbs to the idl	2024-06-25 17:58:17 +02:00
Aleksandra Martyniuk	2394e3ee7a	repair: drop timeout from table_sync_and_check Delete 10s timeout from read barrier in table_sync_and_check, so that the function always considers all previous group0 changes. Fixes: #18490. Closes scylladb/scylladb#18752	2024-06-25 17:44:31 +02:00
Avi Kivity	c80dc57156	Merge 'batchlog replay: bypass tombstones generated by past replays' from Botond Dénes The `system.batchlog` table has a partition for each batch that failed to complete. After finally applying the batch, the partition is deleted. Although the table has gc_grace_second = 0, tombstones can still accumulate in memory, because we don't purge partition tombstones from either the memtable or the cache. This can lead to the cache and memtable of this table to accumulate many thousands of even millions of tombstones, making batchlog replay very slow. We didn't notice this before, because we would only replay all failed batches on unbootstrap, which is rare and a heavy and slow operation on its own right already. With repair-based tombstone-gc however, we do a full batchlog replay at the beginning of each repair, and now this extra delay is noticeable. Fix this by making sure batchlog replays don't have to scan through all the tombstones generated by previous replays: * flush the `system.batchlog` memtable at the end of each batchlog replay, so it is cleared of tombstones * bypass the cache Fixes: https://github.com/scylladb/scylladb/issues/19376 Although this is not a regression -- replay was like this since forever -- now that repair calls into batchlog replay, every release which uses repair-based tombstone-gc should get this fix Closes scylladb/scylladb#19377 * github.com:scylladb/scylladb: db/batchlog_manager: bypass cache when scanning batchlog table db/batchlog_manager: replace open-coded paging with internal one db/batchlog_manager: implement cleanup after all batchlog replay cql3/query_processor: for_each_cql_result(): move func to the coro frame	2024-06-25 16:11:01 +03:00
Avi Kivity	371e37924f	Merge 'Rebuild bloom filters that have bad partition estimates' from Lakshmi Narayanan Sreethar The bloom filters are built with partition estimates because the actual partition count might not be available in all cases. If the estimate is inaccurate, the bloom filters might end up being too large or too small compared to their optimal sizes. This PR rebuilds bloom filters with inaccurate partition estimates using the actual partition count before the filter is written to disk. A bloom filter is considered to have an inaccurate estimate if its false positive rate based on the current bitmap size is either less than 75% or more than 125% of the configured false positive rate. Fixes #19049 A manual test was run to check the impact of rebuild on compaction. Table definition used : CREATE TABLE scylla_bench.simple_table (id int PRIMARY KEY); Setup : 3 billion random rows with id in the range [0, 1e8) were inserted as batches of 5 rows into scylla_bench.simple_table via 80 threads. Compaction statistics : scylla_bench.simple_table : (a) Total number of compactions : `1501` (b) Total time spent in compaction : `9h58m47.269s` (c) Number of compactions which rebuilt bloom filters : `16` (d) Total time taken by these 16 compactions which rebuilt bloom filters : `2h55m11.89s` (e) Total time spent by these 16 compactions to rebuild bloom filters : `8m6.221s` which is - `4.63%` of the total time taken by the compactions which rebuilt filters (d) - `1.35%` of the total compaction time (b). (f) Total bytes saved by rebuilding filters : `388 MB` system.compaction_history : (a) Total number of compactions : `77` (b) Total time spent in compaction : `21.24s` (c) Number of compactions which rebuilt bloom filters : `74` (d) Time taken by these 74 compactions which rebuilt bloom filters : `20.48s` (e) Time spent by these 74 compactions to rebuild bloom filters : `377ms` which is - `1.84%` of the total time taken by the compactions which rebuilt filters (d) - `1.77%` of the total compaction time (b). (f) Total bytes saved by rebuilding filters : `20 kB` The following tables also had compactions and the bloom filter was rebuilt in all those compactions. However, the time taken for every rebuild was observed as 0ms from the logs as it completed within a microsecond : system.raft : (a) Total number of compactions : `2` (b) Total time spent in compaction : `106ms` (c) Total bytes saved by rebuilding filters : `960 B` system_schema.tables : (a) Total number of compactions : `1` (b) Total time spent in compaction : `25ms` (c) Total bytes saved by rebuilding filter : `312 B` system.topology : (a) Total number of compactions : `1` (b) Total time spent in compaction : `25ms` (c) Total bytes saved by rebuilding filter : `320 B` Closes scylladb/scylladb#19190 * github.com:scylladb/scylladb: bloom_filter_test: add testcase to verify filter rebuilds test/boost: move bloom filter tests from sstable_datafile_test into a new file sstables/mx/writer: rebuild bloom filters with bad partition estimates sstables/mx/writer: add variable to track number of partitions consumed sstable: introduce sstable::maybe_rebuild_filter_from_index() sstable: add method to return filter format for the given sstable version utils/i_filter: introduce get_filter_size()	2024-06-25 15:35:09 +03:00
Nadav Har'El	35ace0af5c	Merge 'Move some /storage_proxy API endpoints to config.cc' from Pavel Emelyanov API endpoints that need a particular service to get data from are registered next to this service (#2737). In /storage_proxy function there live some endpoints that work with config, so this PR moves them to the existing config.cc with config-related endpoints. The path these endpoints are registered with remains intact, so some tweak in proxy API registration is also here. Closes scylladb/scylladb#19417 * github.com:scylladb/scylladb: api: Use provided db::config, not the one from ctx api: Move some config endpoints from proxy to config api: Split storage_proxy api registration api: Unset config endpoints	2024-06-25 13:55:58 +03:00
Michał Chojnowski	c7dc3b9b58	scylla-gdb.py: add line information to coroutine names in `scylla fiber` For convenience. Note that this line info only points to the function as a whole, not to the current suspend point. I think there's no facility for converting the `__coro_index` to the current suspend point automatically. Before: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` After: ``` (gdb) scylla fiber seastar::local_engine->_current_task [shard 1] #0 (task) 0x0000601008e8e970 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) at sstables/sstables.cc:352) [shard 1] #1 (task) 0x00006010092acf10 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) at sstables/sstables.cc:570) [shard 1] #2 (task) 0x0000601008e648d0 0x000000000047aae0 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const at sstables/sstables.cc:992) ``` Closes scylladb/scylladb#19478	2024-06-25 13:55:10 +03:00
Kefu Chai	def432617d	docs: print out invalid branch name to help user to understand what the extension is expecting. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19477	2024-06-25 13:17:25 +03:00
Botond Dénes	31c0fa07d8	db/batchlog_manager: bypass cache when scanning batchlog table Scans should not pollute the cache with cold data, in general. In the case of the batchlog table, there is another reason to bypass the cache: this table can have a lot of partition tombstones, which currently are not purged from the cache. So in certain cases, using the cache can make batch replay very slow, because it has to scan past tombstones of already replayed batches.	2024-06-25 06:15:47 -04:00
Botond Dénes	29f610d861	db/batchlog_manager: replace open-coded paging with internal one query_processor has built-in paging support, no need to open-code paging in batchlog manager code.	2024-06-25 06:15:47 -04:00
Botond Dénes	2dd057c96d	db/batchlog_manager: implement cleanup after all batchlog replay We have a commented code snippet from Origin with cleanup and a FIXME to implement it. Origin flushes the memtables and kicks a compaction. We only implement the flush here -- the flush will trigger a compaction check and we leave it up to the compaction manager to decide when a compaction is worthwhile. This method used to be called only from unbootstrap, so a cleanup was not really needed. Now it is also called at the end of repair, if the table is using repair-based tombstone-gc. If the memtable is filled with tombstones, this can add a lot of time to the runtime of each repair. So flush the memtable at the end, so the tombstones can be purged (they aren't purged from memtables yet).	2024-06-25 06:15:47 -04:00
Botond Dénes	4e96e320b4	cql3/query_processor: for_each_cql_result(): move func to the coro frame Said method has a func parameter (called just f), which it receives as rvalue ref and just uses as a reference. This means that if caller doesn't keep the func alive, for_each_cql_result() will run into use-after-free after the first suspention point. This is unexpected for callers, who don't expect to have to keep something alive, which they passed in with std::move(). Adjust the signature to take a value instead, value parameters are moved to the coro frame and survive suspention points. Adjust internal callers (query_internal()) the same way. There are no known vulnerable external callers.	2024-06-25 06:15:25 -04:00
Benny Halevy	3f23016cc0	perf-simple-query: add mean and standard deviation stats Currently, perf-simple-query summarizes the statistics only for the throughput, printing the median, median absolute deviation, minimum, and maximum. But the throughput put is typically highly variable and its median is noisy. This patch calculates also the mean and standard deviation and does that also for instructions_per_op and cpu_cycles_per_op to present a fuller picture of the performance metrics. Output example: ``` random-seed=3383668492 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 95613.97 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42456 insns/op, 22117 cycles/op, 0 errors) 97538.45 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42454 insns/op, 22094 cycles/op, 0 errors) 95883.37 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42438 insns/op, 22268 cycles/op, 0 errors) 96791.45 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42433 insns/op, 22256 cycles/op, 0 errors) 97894.71 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42420 insns/op, 22010 cycles/op, 0 errors) throughput: mean=96744.39 standard-deviation=996.89 median=96791.45 median-absolute-deviation=861.02 maximum=97894.71 minimum=95613.97 instructions_per_op: mean=42440.08 standard-deviation=14.99 median=42437.59 median-absolute-deviation=13.58 maximum=42456.15 minimum=42420.10 cpu_cycles_per_op: mean=22148.98 standard-deviation=110.43 median=22117.04 median-absolute-deviation=106.89 maximum=22267.70 minimum=22010.42 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19450	2024-06-25 12:25:59 +03:00
Yaron Kaikov	394cba3e4b	.github/workflow: close and replace label when backport promoted Today after Mergify opened a Backport PR, it will stay open until someone manually close the backport PR , also we can't track using labels which backport was done or not since there is no indication for that except digging into the PR and looking for a comment or a commit ref The following changes were made in this PR: * trigger add-label-when-promoted.yaml also when the push was made to `branch-x.y` * Replace label `backport/x.y` with `backport/x.y-done` in the original PR (this will automatically update the original Issue as well) * Add a comment on the backport PR and close it Fixes: https://github.com/scylladb/scylladb/issues/19441 Closes scylladb/scylladb#19442	2024-06-25 12:11:28 +03:00
Benny Halevy	8daf755f8a	statement_restrictions: partition_ranges_from_singles: no need to default-initialize result Currently, the returned `ranges` vector is first initialized to `product_size` and then the returned partition ranges are copied into it. Instead, we can simply reserve the vector capacity, without initializing it, and then emplace all partition ranges onto it using std::back_inserter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19457	2024-06-25 12:11:28 +03:00
Laszlo Ersek	656a9468bb	HACKING.md: fix typo in "--overprovisioned" option name Grepped the tree for "--overprovisioned" (coming from <https://university.scylladb.com/courses/scylla-essentials-overview/lessons/high-availability/topic/consistency-level-demo-part-1/>), and noticed that this instance was not matched by grep (while another one just below was). Fixes: `4f838a82e2` Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19458	2024-06-25 12:11:28 +03:00
Kefu Chai	adca415245	bytes: drop unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. the callers in alternator/streams.cc is updated to use `fmt::print()` to format the `bytes` instances. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19448	2024-06-25 12:11:28 +03:00
Kefu Chai	94e36d4af4	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. this change addresses the leftover of 850ee7e170a. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19467	2024-06-25 12:11:28 +03:00
Benny Halevy	378578b481	chunked_vector_test: add tests for value-initialization constructor Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 12:08:11 +03:00
Benny Halevy	5bd2ee7507	utils: chunked_vector: add ctor from std::initializer_list Prepare for using utils::chunked_vector for dht::partition_range_vector Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 12:08:06 +03:00
Benny Halevy	7780af2e84	utils: chunked_vector: document invalidation of iterators on move chunked_vector differs from std::vector where the latter's move constructor is required to preserve and iterators to the moved-from vector. In contrast, chunked_vector::iterator keeps a pointer to the chunked_vector::_chunks data, which is a utils::small_vector, and when moved, it might invalidate the iterator since the moved-to _chunks might copy the contents of the internal capacity rather than moving the allocated capacity. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-25 11:44:50 +03:00
Botond Dénes	c7317be09a	db/config: introduce reader_concurrency_semahore_cpu_concurrency To allow increasing the semaphore's CPU concurrency, which is currently hard-limited to 1. Not wired yet.	2024-06-25 04:00:11 -04:00
Piotr Dulikowski	85219e9294	configure.py: fix the 'configure' rule generated during regeneration The Ninja makefile (build.ninja) generated by the ./configure.py script is smart enough to notice when the configure.py script is modified and re-runs the script in order to regenerate itself. However, this operation is currently not idempotent and quickly breaks because information about the Ninja makefile's name is not passed properly. This is the rule used for makefile's regeneration: ``` rule configure command = {python} configure.py --out={buildfile}.new $configure_args && mv {buildfile}.new {buildfile} generator = 1 description = CONFIGURE $configure_args ``` The `buildfile` variable holds the value of the `--out` option which is set to `build.ninja` if not provided explicitly. Note that regenerating the makefile passes a name with the `.new` suffix added to the end; we want to first write the file in full and then overwrite the old file via a rename. However, notice that the script was called with `--out=build.ninja.new`; the `configure` rule in the regenerated file will have `configure.py --out=build.ninja.new.new` and then `mv build.ninja.new.new build.ninja.new`. So, second regeneration will just leave a build.ninja.new file which is not useful. Fix this by introducing an additional parameter `--out-final-name`. This parameter is only supposed to be used in the regeneration rule and its purpose is to preserve information about the original file name. After this change I no longer see `build.ninja.new` being created after a sequence of `touch configure.py && ninja` calls. Closes scylladb/scylladb#19428	2024-06-24 21:20:32 +03:00
Laszlo Ersek	a4c6ae688a	install-dependencies.sh: set file mode creation mask to 0022 The docs [1] clearly say "install-dependencies.sh" should be run as "root"; however, the script silently assumes that the umask inherited from the calling environment is 0022. That's not necessarily the case, and there's an argument to be made for "root" setting umask 0077 by default. The script behaves unexpectedly under such circumstances; files and directories it creates under /opt and /usr/local are then not accessible to unprivileged users, leading to compilation failures later on. Set the creation mask explicitly to 0022. [1] https://github.com/scylladb/scylladb/blob/master/HACKING.md#dependencies Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com> Closes scylladb/scylladb#19464	2024-06-24 19:46:15 +03:00
Marcin Maliszkiewicz	a4e26585e5	git: add build.ninja.new to .gitignore Since some time executing our ninja build targets generates also build.ninja.new file. Adding it to .gitignore for convenience as we won't commit this file. Closes scylladb/scylladb#19367	2024-06-24 16:48:50 +03:00
Kefu Chai	e61061d19f	test.py: improve help message on tests selection Since `3afbd21f`, we are able to selectively choose a single test in a boost test executable which represents a test suite, and to choose a single test in a pytest script with the syntax of "test_suite::test_case". it's very handy for manual testing. so let's document in the command line help message as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19454	2024-06-24 14:27:02 +03:00
Kefu Chai	e9d8c25e86	alternator: define static variable before this change, when linking an executable referencing `marker`, we could have following error: ``` 13:58:02 ld.lld: error: undefined symbol: alternator::event_id::marker 13:58:02 >>> referenced by streams.cc 13:58:02 >>> build/dev/alternator/streams.o:(from_string_helper<rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>, alternator::event_id>::Set(rapidjson::GenericValue<rapidjson::UTF8<char>, rjson::internal::throwing_allocator>&, alternator::event_id, rjson::internal::throwing_allocator&)) 13:58:02 clang-16: error: linker command failed with exit code 1 (use -v to see invocation) ``` it turns out `event_id::marker` is only declared, but never defined. please note, the non-inline static member variable in its class definition is not considered as a definition, see [class.static.data](https://eel.is/c++draft/class.static.data#3) > The declaration of a non-inline static data member in its class > definition is not a definition and may be of an incomplete type > other than cv void. so, let's declare it as a `constexpr` instead. it implies `inline`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19452	2024-06-24 13:15:00 +03:00
Kefu Chai	af2b0b030b	test/pylib: use raw string to avoid using escape sequence before this change, when running test like: ```console ./test.py --mode release topology_experimental_raft/test_tablets /home/kefu/dev/scylladb/test/pylib/scylla_cluster.py:333: SyntaxWarning: invalid escape sequence '$' deleted_sstable_re = f"^./{keyspace}/{table}-[0-9a-f]{{32}}/. \(deleted$$" ``` we could have the warning above. because `\(` is not a valid escape sequence, but the Python interpreter accepts it as two separated characters of `\(` after complaining. but it's still annoying. so, let's use a raw string here, as we want to match "(deleted)". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19451	2024-06-24 11:11:44 +03:00
Lakshmi Narayanan Sreethar	a09556a49f	bloom_filter_test: add testcase to verify filter rebuilds Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:11:37 +05:30
Lakshmi Narayanan Sreethar	4aa5698f0d	test/boost: move bloom filter tests from sstable_datafile_test into a new file Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	21e463b108	sstables/mx/writer: rebuild bloom filters with bad partition estimates The bloom filters are built with partition estimates, as the actual partition count might not be available in all the cases. If the estimate was bad, the bloom filters might end up too large or too small than their optimal sizes. Rebuild such bloom filters with actual partition count before the filter is written to disk and the sstable is sealed. Fixes #19049 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	afc90657d6	sstables/mx/writer: add variable to track number of partitions consumed Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	fccb1a11e5	sstable: introduce sstable::maybe_rebuild_filter_from_index() Add method sstable::maybe_rebuild_filter_from_index() that rebuilds bloom filters which had bad partition estimates when they were built. The method checks the false positive rate based on the current bitset size against the configured false positive rate to decide whether a filter needs to be rebuilt. If the current false positive rate is within 75% to 125% of the configured false positive rate, the bloom filter will not be rebuilt. Otherwise, the filter will be rebuilt from the index entries. This method should only be called before an SSTable is sealed as the bloom filter is updated in-place. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:02 +05:30
Lakshmi Narayanan Sreethar	a7d77f6304	sstable: add method to return filter format for the given sstable version Extract out the filter format computing logic from sstable::read_filter into a separate function. This is done so that the subsequent patches can make use of this function. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:01 +05:30
Botond Dénes	6dd6f0198e	utils/i_filter: introduce get_filter_size() Currently, the only way to get the size of a filter, for certain parameters is to actually create one. This requires a seastar thread context and potentially also allocates huge amount of memory. Provdide a method which just calculates the size, without any of the above mentioned baggage. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-24 12:06:01 +05:30
Kefu Chai	a230ecc4eb	utils/murmur_hash: replace rotl64() with std::rotl() since we are now able to use C++20, there is no need to use the homebrew rotl64(). so in this change, we replace rotl64() with std::rotl(), and remove the former from the source tree. the underlying implementations of these two solutions are equivalent, so no performance changes are expected. all caller sites have been audited: all of them pass `uint64` as the first parameter. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19447	2024-06-24 08:24:43 +03:00
Marcin Maliszkiewicz	794440eb85	test: skip checking default role in test_auth_v2_migration Default role creation in auth-v1 is asynchronous and all nodes race to create it so we'd need to delay the test and wait. Checking this particular role doesn't bring much value to the test as we check other roles to demonstrate correctness. Fixes scylladb/scylladb#19039 Closes scylladb/scylladb#19424	2024-06-23 19:50:55 +03:00
Avi Kivity	0d52f0684a	Merge 'Sanitize gossiper API endpoints management' from Pavel Emelyanov Gossiper has two blocs of endpoints, both are registered in legacy/random place in main. This PR moves them next to gossiper start and adds unregistration for both. refs: #2737 Closes scylladb/scylladb#19425 * github.com:scylladb/scylladb: api: Remove dedicated failure_detector registration method api: Move failure_detector endpoints set/unset to gossiper api: Unset failure detector endpoints method api: (Un)Register gossiper API in correct place api: Unset gossiper endpoints on stop asi: Coroutinize set_server_gossip()	2024-06-23 19:35:11 +03:00
Kefu Chai	850ee7e170	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19429	2024-06-23 19:25:23 +03:00
Kefu Chai	72fdee1efb	README.md: add badges for cron jobs these jobs are scheduled to verify the builds of scylla, like if it builds with the latest Seastar, if scylla can generated reproducible builds, and if it builds with the nightly build of clang. the failure of these workflow are not very visible without clicking into the corresponding workflow in https://github.com/scylladb/scylladb/actions. in this change, we add their badges in the testing section of README.md, so one can identify the test failures of them if any, Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19430	2024-06-23 19:24:40 +03:00
Kefu Chai	a7e38ada8e	test: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19432	2024-06-23 18:02:52 +03:00
zhouxiang	694014591a	test/alternator/test_projection_expression.py: remove useless comparisons pytest.raises expects a block of code that will raise an exception, not a comparison of results. Closes scylladb/scylladb#19436	2024-06-23 13:53:14 +03:00
Pavel Emelyanov	d8009ed843	api/cache_service: Don't use database to perform map+reduce on The sharded<database> is used as a map_reduce0() method provider, there's no real need in database itself. Simple smp::map_reduce() would work just as good. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19364	2024-06-21 19:47:25 +03:00
Kefu Chai	f781c3babe	.github: add reproducible-build workflow to verify that scylla builds are reproducible. the new workflow builds scylla twice with master HEAD, and compares the md5sums of the built scylla executables. it fails if the md5sum:s do not match. this workflow is triggered at 5AM every Friday. its status can be found at https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml after it's built for the first time. Refs #19225 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19409	2024-06-21 19:39:37 +03:00
Nadav Har'El	81a02f06dd	test/cql-pytest: add more tests for SELECT's LIMIT SELECT's "LIMIT" feature is tested in combination with other features in different test/cql-pytest/*.py source files - for examples the combination of LIMIT and GROUP BY is tested in test_group_by.py. This patch adds a new test file, test_limit.py, for testing aspects basic usage of LIMIT that weren't already tested in other files. The new file also has a comment saying where we have other tests for LIMIT combined with other features. All the new tests pass (on both Scylla and Cassandra). But they can be useful as regression tests to test patches which modify the behavior of LIMIT - e.g., pull reques #18842. This patch also adds another test in test_group_by.py. This adds to one of the tests for the combination of LIMIT and GROUP BY (in this case, GROUP BY of clustering prefix, no aggregation) also a check for paging, that was previously missing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19392	2024-06-21 19:35:15 +03:00
Pavel Emelyanov	755be887a6	api: Remove dedicated failure_detector registration method It's now empty and can be dropped Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:54 +03:00
Pavel Emelyanov	2bfa1b3832	api: Move failure_detector endpoints set/unset to gossiper These two api functions both need gossiper service and only it, and thus should have set/unset calls next to each other. It's worth putting them into a single place Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:54 +03:00
Pavel Emelyanov	88a6094121	api: Unset failure detector endpoints method There's one more set of endpoints that need gossiper -- the failure_detector ones. They are registered, but not unregistered, so here's the method to do it. It's not called by any code yet, because next patch would need to rework the caller anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	f84694166e	api: (Un)Register gossiper API in correct place Each service's endpoints are to be registered just after the service itself, so should gossiper's Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	19f3a9805a	api: Unset gossiper endpoints on stop Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Pavel Emelyanov	c7547b9c7e	asi: Coroutinize set_server_gossip() One of the next patches will add more async calls here, so not to create then-chains, convert it into a coroutine Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 19:30:53 +03:00
Kefu Chai	eef64a6bb8	build: cmake: do not add "absl::headers" to include dirs `absl::headers` is a library, not the path to its headers. before this change, the command lines of genereated build rule look like: ``` -I/home/kefu/dev/scylladb/repair/absl::headers ``` this does not hurt, as other libraries might add the intended include dir to the compiler command line, but this is just wrong. so let's remove it. please note, `repair` target already links against `absl::headers`. so we don't need to add `absl::headers` to its linkage again. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19384	2024-06-21 19:22:17 +03:00
Kefu Chai	7b10cc8079	treewide: include seastar headers with brackets this change was created in the same spirit of `ebff5f5d`. despite that we include Seastar as a submodule, Seastar is not a part of scylla project. so we'd better include its headers using brackets. `ebff5f5d` addressed this cosmetic issue a while back. but probably clangd's header-insertion helped some of contributor to insert the missing headers with `"`. so this style of `include` returned to the tree with these new changes. unfortunately, clangd does not allow us to configure the style of `include` at the time of writing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19406	2024-06-21 19:20:27 +03:00
Kefu Chai	987fd59f21	test: correct some misspellings fix a typo in source code. this typo was identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19412	2024-06-21 19:16:11 +03:00
Kefu Chai	52693fc21c	Update seastar submodule * seastar 9ce62705...908ccd93 (42): > include/seastar: do not include unused headers > timer-set: Add missing sanity headers > tutorial.md: fix typos > Update tutorial.md to reflect update preemption methods > tutorial.md: remove trailing whitespace > json: Add a test for jsonable objects > json: Make formatter::write(vector/map/umap) copy their arguments > json: Make formatter call write for jsonable > test: futures: verify stream yields the consumed value > build: add pyyaml to install-dependencies.sh > stall-analyser: remove unused variable > stall-analyser: use itertools.dropwhile when appropriate > scripts: sort packages alphanumerically > docker: bind the file instead of copying during the build stage > docker: lint dockerfile > dns: use undeprecated c-ares APIs > stall-analyser: use argparse.FileType when appropriate > http/client: Retry request over fresh connection in case old one failed > http/client: Fix indentation after previous patch > http/client: Pass request and handle by reference > http/client: Introduce make_new_connection() > http/client: Fix parser result checking > http/client: Document max_connections > test/http: Generalize http connection factory > loopback_socket: Shutdown socket on EOF close > loopback_socket: Rename buffer's shutdown() to abort() > test: Add test for sharded<>::invoke_on_...() compilation > net/tls: Added additional error codes > io-tester.md: update available parameters for job description > io_tester: expose extent_allocation_size_hint via job param > file: Unfriend reactor class > memory.cc: fix cross-shard shrinking realloc > sharded: Mark invoke_on_others() helper lambda mutable > scheduling: Unfriend reactor from scheduling_group_key > reactor: Make allocate_scheduling_group_specific_data() accept key_id argument > reactor: Add local key_id variable to allocate_scheduling_group_specific_data() > timer: Unfriend reactor > reactor: Generalize timer removal > timer: Add type alias for timer_set > reactor: Move reactor::complete_timers() to timer_set > tests: test protobuf support in prometheus_test.py > tests: enable prometheus_test.py to test metrics without aggregation Closes scylladb/scylladb#19405	2024-06-21 18:52:58 +03:00
Dawid Medrek	2446cce272	db/hints: Initialize endpoint managers only for valid hint directories Before these changes, it could happen that Scylla initialized endpoint managers for hint directories representing * host IDs before migrating hinted handoff to using host IDs, * IP addresses after the migration. One scenario looked like this: 1. Start Scylla and upgrade the cluster to using host IDs. 2. Create, by hand, a hint directory representing an IP address. 3. Trigger changing the host filter in hinted handoff; it could be achieved by, for example, restricting the set of data centers Scylla is allowed to save hints for. When changing the host filter, we browse the hint directories and create endpoint managers if we can send hints towards the node corresponding to a given hint directory. We only accepted hint directories representing IP addresses and host IDs. However, we didn't check whether the local node has already been upgraded to host-ID-based hinted handoff or not. As a result, endpoint managers were created for both IP addresses and host IDs, no matter whether we were before or after the migration. These changes make sure that any time we browse the hint directories, we take that into account. Fixes scylladb/scylladb#19172 Closes scylladb/scylladb#19173	2024-06-21 15:59:49 +02:00
Avi Kivity	3cfb0503a9	Update tools/cqlsh submodule for v6.0.21-scylla * tools/cqlsh 0d58e5c...ba83aea (1): > requirements: update scylla-driver	2024-06-21 16:04:21 +03:00
Piotr Dulikowski	cf2b4bf721	Merge 'cdc: do not include unused headers' from Kefu Chai also add `auth` and `cdc` to iwyu's `CLEANER_DIR` setting. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19410 * github.com:scylladb/scylladb: .github: add auth and cdc to iwyu's CLEANER_DIR cdc: do not include unused headers	2024-06-21 13:44:40 +02:00
Pavel Emelyanov	0330640b4d	api: Use provided db::config, not the one from ctx The set_server_config() already has the db::config reference for endpoints to work with, there's no need to obtain one via ctx and database. This change kills two birds with one stone -- less users of database as config provider, less places that need http context -> database dependency. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:30:54 +03:00
Pavel Emelyanov	afb48d8ab9	api: Move some config endpoints from proxy to config Those getting (and setting, but these are not implemented) various timeouts work on config, whilst register themselves in storage_proxy function. Since the "service" they need to work with is config, move the endpoints to config endpoints code. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:29:38 +03:00
Pavel Emelyanov	0aad406a2f	api: Split storage_proxy api registration The set_server_storage_proxy() does two things -- registers storage_proxy "function" and sets proxy routes, that depend on it. Next patches will move some /storage_proxy/... endpoints registration to earlier stage, so the function should be ready in advance. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:28:29 +03:00
Pavel Emelyanov	473cb62a9a	api: Unset config endpoints The set_server_config() needs the stop-time peer, here it is. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-21 13:28:06 +03:00
Kefu Chai	c429a8d8ae	sstables: use "me" sstable format by default in `7952200c`, we changed the `selected_format` from `mc` to `me`, but to be backward compatible the cluster starts with "md", so when the nodes in cluster agree on the "ME_SSTABLE_FORMAT" feature, the format selector believes that the node is already using "ME", which is specified by `_selected_format`. even it is actually still using "md", which is specified by `sstable_manager::_format`, as changed by `54d49c04`. as explained above, it was specified to "md" in hope to be backward compatible when upgrading from an existign installation which might be still using "md". but after a second thought, since we are able to read sstables persisted with older formats, this concern is not valid. in other words, `7952200c` introduced a regression which changed the "default" sstable format from `me` to `md`. to address this, we just change `sstable_manager::_format` to "me", so that all sstables are created using "me" format. a test is added accordingly. Fixes #18995 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19293	2024-06-21 12:56:01 +03:00
Yaron Kaikov	57428d373b	[actions] fix sync label from PR to linked issue in `b8c705bc54` i modified the even name to `pull_request_target`, This caused skipping sync process when PR label was added/removed Fixing it Closes scylladb/scylladb#19408	2024-06-21 11:39:44 +03:00
Kamil Braun	627d566811	Merge 'join_token_ring, gossip topology: recalculate sync nodes in wait_alive' from Patryk Jędrzejczak The node booting in gossip topology waits until all NORMAL nodes are UP. If we removed a different node just before, the booting node could still see it as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop. Although the issue fixed by this PR caused only test flakiness, it could also manifest in real clusters. It's best to backport this PR to 5.4 and 6.0. Fixes scylladb/scylladb#17526 Closes scylladb/scylladb#19387 * github.com:scylladb/scylladb: join_token_ring, gossip topology: update obsolete comment join_token_ring, gossip topology: fix indendation after previous patch join_token_ring, gossip topology: recalculate sync nodes in wait_alive	2024-06-21 10:22:32 +02:00
Piotr Dulikowski	c3536015e4	Merge 'cql3/statement/select_statement: do not parallelize single-partition aggregations' from Michał Jadwiszczak This patch adds a check if aggregation query is doing single-partition read and if so, makes the query to not use forward_service and do not parallelize the request. Fixes scylladb/scylladb#19349 Closes scylladb/scylladb#19350 * github.com:scylladb/scylladb: test/boost/cql_query_test: add test for single-partition aggregation cql3/select_statement: do not parallelize single-partition aggregations	2024-06-21 08:50:00 +02:00
Kefu Chai	694fe58d6e	.github: add auth and cdc to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-21 14:29:48 +08:00
Kefu Chai	1a4740ddc0	cdc: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-21 14:29:48 +08:00
Avi Kivity	fdc1449392	treewide: rename flat_mutation_reader_v2 to mutation_reader flat_mutation_reader_v2 was introduced in a pair of commits in 2021: `e3309322c3` "Clone flat_mutation_reader related classes into v2 variants" `08b5773c12` "Adapt flat_mutation_reader_v2 to the new version of the API" as a replacement for flat_mutation_reader, using range_tombstone_change instead of range_tombstone to represent represent range tombstones. See those commits for more information. The transition was incremental; the last use of the original flat_mutation_reader was removed in 2022 in commit `026f8cc1e7` "db: Use mutation_partition_v2 in mvcc" In turn, flat_mutation_reader was introduced in 2017 in commit `748205ca75` "Introduce flat_mutation_reader" To transition from a mutation_reader that nested rows within a partition in a separate stream, to a flat reader that streamed partitions and rows in the same stream. Here, we reclaim the original name and rename the awkward flat_mutation_reader_v2 to mutation_reader. Note that mutation_fragment_v2 remains since we still use the original for compatibilty, sometimes. Some notes about the transition: - files were also renamed. In one case (flat_mutation_reader_test.cc), the rename target already existed, so we rename to mutation_reader_another_test.cc. - a namespace 'mutation_reader' with two definitions existed (in mutation_reader_fwd.hh). Its contents was folded into the mutation_reader class. As a result, a few #includes had to be adjusted. Closes scylladb/scylladb#19356	2024-06-21 07:12:06 +03:00
Avi Kivity	185338c8cf	Merge 'Reduce TWCS off-strategy space overhead' from Raphael "Raph" Carvalho Normally, the space overhead for TWCS is 1/N, where is number of windows. But during off-strategy, the overhead is 100% because input sstables cannot be released earlier. Reshaping a TWCS table that takes ~50% of available space can result in system running out of space. That's fixed by restricting every TWCS off-strategy job to 10% of free space in disk. Tables that aren't big will not be penalized with increased write amplification, as all input (disjoint) sstables can still be compacted in a single round. Fixes #16514. Closes scylladb/scylladb#18137 * github.com:scylladb/scylladb: compaction: Reduce twcs off-strategy space overhead to 10% of free space compaction: wire storage free space into reshape procedure sstables: Allow to get free space from underlying storage replica: don't expose compaction_group to reshape task	2024-06-20 18:51:25 +03:00
Kefu Chai	42b9784650	build: cmake: mark wasm "ALL" so that "wasm" target is built. "wasm" generates the text format of wasm code. and these wasm applications are used by the test_wasm tests. the rules generated by `configure.py` adds these .wat files as a dependency of `{mode}-build`, which is in turn a dependency of `{mode}`. in this change, let's mirror this behavior by making `wasm` ALL, so it is built by the default target. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19391	2024-06-20 18:45:31 +03:00
Kefu Chai	caf1149f11	cql-pytest/test_sstable: do not import unused modules Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19389	2024-06-20 17:14:28 +03:00
Avi Kivity	02cf17f4dc	Merge 'Sanitize load_meter API handlers management' from Pavel Emelyanov The service in question is pretty small one, but it has its API endpoint that lives in /storage_service group. Currently when a service starts and has any endpoints that depend on it, the endpoint registration should follow it (#2737). Here's the PR that does it for load meter. Another goal of this change is that http context now has one less dependency onboard. Closes scylladb/scylladb#19390 * github.com:scylladb/scylladb: api: Remove ctx->load_meter dependency api: Use local load_meter reference in handlers api: Fix indentation after previous patch api: Coroutinize load_meter::get_load_map handler api: Move load meter handlers api: Add set/unset methods for load_meter	2024-06-20 17:07:19 +03:00
Gleb Natapov	7bc05c3880	gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization When a node bootstraps it may happen that some nodes still see it as bootstrapping while the node itself already is in normal state and ready to serve queries. We want to delay the bootstrap completion until all nodes see the new node as normal. Piggy back on UP notification to do so and what of the node that sent the notification to be seen as normal. Fixes #18678	2024-06-20 16:37:56 +03:00
Anna Stuchlik	027cf3f47d	doc: remove the link to Scylladb Google group The group is no longer active and should be removed from resources. Closes scylladb/scylladb#19379	2024-06-20 15:31:03 +02:00
Yaron Kaikov	f2705b3887	[action] add github context info for better debugging It seems that we skip the sync label process between PR and linked Issues Adding those debug prints will allow us to understand why Closes scylladb/scylladb#19393	2024-06-20 16:17:04 +03:00
Gleb Natapov	28c0a27467	Wait for booting node to be marked UP before complete booting. Currently a node does not wait to be marked UP by other nodes before complete booting which creates a usability issue: during a rolling restart it is not enough to wait for local CQL port to be opened before restarting next node, but it is also needed to check that all other nodes already see this node as alive otherwise if next node is restarted some nodes may see two node as dead instead of one. This patch improves the situation by making sure that boot process does not complete before all other nodes do not see the booting one as alive. This is still a best effort thing: if some nodes are unreachable or gossiper propagation takes too much time the boot process continues anyway. Fixes scylladb/scylladb#19206	2024-06-20 14:55:40 +03:00
Pavel Emelyanov	de80094815	Merge 'treewide: remove unused operator<<' from Kefu Chai since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. there are more occurrences of unused operator<< in the tree, but let's do the cleanup piecemeal. --- this is a cleanup, so no need to backport Closes scylladb/scylladb#19346 * github.com:scylladb/scylladb: types: remove unused operator<< node_ops: remove unused operator<< lang: remove unused operator<< gms: remove unused operator<< dht: remove unused operator<< test: do not use operator<< for std::optional	2024-06-20 13:18:59 +03:00
Pavel Emelyanov	873d76c02b	api: Remove ctx->load_meter dependency Now the API uses captured reference and the explicit dependency is not needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:38:28 +03:00
Pavel Emelyanov	d85e70ef98	api: Use local load_meter reference in handlers Now it uses ctx.lm dependency, but the idiomatic way for API is to use the argument one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:48 +03:00
Pavel Emelyanov	bc5e360066	api: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:39 +03:00
Pavel Emelyanov	e54f651beb	api: Coroutinize load_meter::get_load_map handler Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:37:18 +03:00
Pavel Emelyanov	40c178bee2	api: Move load meter handlers Now they are in storage service set/unset helper, but there's the dedicated set/unset pair for meter's enpoints. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:36:38 +03:00
Pavel Emelyanov	724d62aa87	api: Add set/unset methods for load_meter The meter is pretty small sevice and its API is also tiny. Still, it's a standalone top-level service, and its API should come next to it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-20 12:35:58 +03:00
Botond Dénes	b09196ac49	Merge 'tasks: fix tasks abort' from Aleksandra Martyniuk Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: #19304. Requires backport to all versions containing task manager Closes scylladb/scylladb#19305 * github.com:scylladb/scylladb: test: add test for abort while a task is being unregistered tasks: fix tasks abort	2024-06-20 12:09:30 +03:00
Kefu Chai	1a724f22f9	mutation: silence false alarm from clang-tidy before this change, because it seems that we move away from `p2` in each iteration, so the succeeding iterations are moving from an empty `p2`, clang-tidy warns at seeing this. but we only move from `p2._static_row` in the first iteration when the dest `mutation_partition` instance's static row is empty. and in the succeeding iterations, the dest `mutation_partition` instance's static row is not empty anymore if it is set. so, this is a false alarm. in this change, we silence this warning. another option is to extract the single-shot mutation out of the loop, and pass the `std::move(p2)` only for the single-shot mutation, but that'd be a much more intrusive change. we can revisit this later. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19331	2024-06-20 12:05:20 +03:00
Kefu Chai	9f0b60c7a0	rust: disable incremental build for release build so that the release build is reproducible. a reproduciable helps developers to perform postmortem debugging. Fixes #19225 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19374	2024-06-20 12:01:14 +03:00
Patryk Jędrzejczak	bcc0a352b7	join_token_ring, gossip topology: update obsolete comment The code mentioned in the comment has already been added. We change the comment to prevent confusion.	2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak	7735bd539b	join_token_ring, gossip topology: fix indendation after previous patch	2024-06-20 10:59:50 +02:00
Patryk Jędrzejczak	017134fd38	join_token_ring, gossip topology: recalculate sync nodes in wait_alive Before this patch, if we booted a node just after removing a different node, the booting node may still see the removed node as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop.	2024-06-20 10:59:49 +02:00
Anna Stuchlik	680405b465	doc: separate Entrprise- from OSS-only content This commit adds files that contain Open Source-specific information and includes these files with the .. scylladb_include_flag:: directive. The files include a) a link and b) Table of Contents. The purpose of this update is to enable adding Open Source/Enterprise-specific information in the Reference section. Closes scylladb/scylladb#19362	2024-06-20 11:58:32 +03:00
Piotr Dulikowski	75441ee120	Merge 'mv: fix value of the gossiped view update backlog' from Wojciech Mitros Currently, when calculating the view update backlog for gossip, we start with `db::view::update_backlog()` and compare it to backlogs from all shards. However, this backlog can't be compared to other backlogs - it has size 0 and we compare the fraction current/size when comparing backlogs, causing us to compare with `NaN`. This patch fixes it by starting the comparisons with an empty backlog. The patch introducing this issue (`f70f774e40`) wasn't backported, so this one doesn't need to be either Closes scylladb/scylladb#19247 * github.com:scylladb/scylladb: mv: make the view update backlog unmofidiable mv: fix value of the gossiped view update backlog	2024-06-20 06:27:11 +02:00
Piotr Dulikowski	78a40dbe2c	Merge 'cql: remove global_req_id from schema_altering_statement' from Marcin Maliszkiewicz Such field is no longer needed as the information comes directly from group0_batch. Fixes scylladb/scylladb#19365 Backport: no, we don't backport code cleanups Closes scylladb/scylladb#19366 * github.com:scylladb/scylladb: cql: remove global_req_id from schema_altering_statement cql: switch alter keyspace prepare_schema_mutations to use group0_batch	2024-06-20 06:21:48 +02:00
Dawid Medrek	c56de90a26	test/boost/hint_test.cc: Add missing parse() callback Before these changes, compilation was failing with the following error: In file included from test/boost/hint_test.cc:12: /usr/include/fmt/ranges.h:298:7: error: no member named 'parse' in 'fmt::formatter<db::hints::sync_point::host_id_or_addr>' 298 \| f.parse(ctx); \| ~ ^ We add the missing callback. Closes scylladb/scylladb#19375	2024-06-19 23:19:33 +02:00
Wojciech Mitros	cde14a5788	mv: make the view update backlog unmofidiable Currently, a view update backlog may reach an invalid state, when its max is 0 and its relative_size() is NaN as a result. This can be achieved either by constructing the backlog with a 0 max or by modifying the max of an existing backlog. In particular, this happens when creating the backlog using the default constructor. In this patch the the default constructor is deleted and a check is added to make sure that the max is different than 0 is added to its constructor - if the check fails, we construct an empty backlog instead, to handle the possibility of getting an invalid backlog sent from a node with a version that's missing this check. Additionally, we make the backlogs members private, exposing them only through const getters.	2024-06-19 19:44:57 +02:00
Pavel Emelyanov	5fe4290f66	gitattributes: Mark swagger .js files as binary The goal is the same as in `29768a2d02` (gitattributes: Mark *.svg as binary) -- prevent grep from searching patterns in those files. Despite those files are, in fact, javascript code, the way they are formatted is not suitable for human reading, so it's unlikely that anyone would be interested in grep-ing patters in it. At the same time, those files consist of of very long lines, so if a grep finds a pattern in one of those, the output is spoiled. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19357	2024-06-19 15:07:56 +03:00
Botond Dénes	9d1fa828be	Merge 'utils/large_bitset: replace reserve_partial with utils::reserve_gently' from Lakshmi Narayanan Sreethar Replace the reserve_partial loop in large_bitset constructor with a new function - reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container. Closes scylladb/scylladb#19361 * github.com:scylladb/scylladb: utils/large_bitset: replace reserve_partial with utils::reserve_gently utils/stall_free: introduce reserve_gently	2024-06-19 14:31:59 +03:00
Michał Jadwiszczak	8eb5ca8202	test/boost/cql_query_test: add test for single-partition aggregation	2024-06-19 09:24:17 +02:00
Piotr Dulikowski	7567b87e72	Merge 'auth: reuse roles select query during cache population' from Marcin Maliszkiewicz With big number of shards in the cluster (e.g. 500+) due to cache periodic refresh we experience high load on role_permissions table (e.g. 1k op/s). The load on roles table is amplified because to populate single entry in the cache we do several selects on roles table. Some of this can't be avoided because roles are arranged in a tree-like structure where permissions can be inherited. This patch tries to reuse queries which are simply duplicated. It should reduce the load on roles table by up to 50%. Fixes scylladb/scylladb#19299 Closes scylladb/scylladb#19300 * github.com:scylladb/scylladb: auth: reuse roles select query during cache population auth: coroutinize service::get_uncached_permissions auth: coroutinize service::has_superuser	2024-06-19 07:53:47 +02:00
Marcin Maliszkiewicz	56707e2965	cql: remove global_req_id from schema_altering_statement Such field is no longer needed as the information comes directly from group0_batch. Fixes scylladb/scylladb#19365	2024-06-18 20:26:09 +02:00
Lakshmi Narayanan Sreethar	9ad800cfb9	utils/large_bitset: replace reserve_partial with utils::reserve_gently Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-18 23:36:30 +05:30
Lakshmi Narayanan Sreethar	31414f54c6	utils/stall_free: introduce reserve_gently Add reserve_gently() that can reserve memory without stalling by repeatedly calling reserve_partial() method of the passed container. Update the comments of existing reserve_partial() methods to mention this newly introduced reserve_gently() wrapper. Also, add test to verify the functionality. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-18 23:36:30 +05:30
Marcin Maliszkiewicz	685aecde61	cql: switch alter keyspace prepare_schema_mutations to use group0_batch This is needed to simplify the code in the following commit.	2024-06-18 19:54:55 +02:00
Michał Jadwiszczak	e9ace7c203	cql3/select_statement: do not parallelize single-partition aggregations Currently reads with WHERE clause which limits them to be single-partition reads, are unnecessarily parallelized. This commit checks this condition and the query doesn't use forward_service in single-partition reads.	2024-06-18 19:21:32 +02:00
Pavel Emelyanov	f7d5d4877c	Merge '[test.py] Fix several issues in log gathering' from Andrei Chekun Related: https://github.com/scylladb/scylladb/issues/17851 Fix the issue that test logs were not deleted Fix the issue that the URL to the failed test directory was incorrectly shown even when artifacts_dir_url option was not provided Fix the issue that there were no node logs when it failed to join the cluster Closes scylladb/scylladb#19115 * github.com:scylladb/scylladb: [test.py] Fix logs had multiplication of lines [test.py] Fix log not deleted [test.py] Fix log for failed node was nod added to failed directory [test.py] Fix URl for failed logs directory in CI	2024-06-18 15:37:29 +03:00
Aleksandra Martyniuk	50cb797d95	test: add test for abort while a task is being unregistered	2024-06-18 13:41:51 +02:00
Aleksandra Martyniuk	3463f495b1	tasks: fix tasks abort Currently if task_manager::task::impl::abort preempts before children are recursively aborted and then the task gets unregistered, we hit use after free since abort uses children vector which is no longer alive. Modify abort method so that it goes over all tasks in task manager and aborts those with the given parent. Fixes: #19304.	2024-06-18 13:39:29 +02:00
Botond Dénes	2123b22526	Merge 'doc: add 6.x.y to 6.x.z and remove 5.x.y to 5.x.z upgrade guide' from Anna Stuchlik This PR removes the 5.x.y to 5.x.z upgrade guide and adds the 6.x.y to 6.x.z upgrade guide. The previous maintenance upgrade guides, such as from 5.x.y to 5.x.z, consisted of several documents - separate for each platform. The new 6.x.y to 6.x.z upgrade guide is one document - there are tabs to include platform-specific information (we've already done it for other upgrade guides as one generic document is more convenient to use and maintain). I did not modify the procedures. At some point, they have been reviewed for previous upgrade guides. Fixes https://github.com/scylladb/scylladb/issues/19322 - This PR must be backported to branch-6.0, as it adds 6.x specific content. Closes scylladb/scylladb#19340 * github.com:scylladb/scylladb: doc: remove the 5.x.y to 5.x.z upgrade guide doc: add the 6.x.y to 6.x.z upgrade guide-6	2024-06-18 14:24:38 +03:00
Wojciech Mitros	1de5566cfa	mv: fix value of the gossiped view update backlog Currently, when calculating the view update backlog for gossip, we start with `db::view::update_backlog()` and compare it to backlogs from all shards. However, this backlog can't be compared to other backlogs - it has size 0 and we compare the fraction current/size when comparing backlogs, causing us to compare with `NaN`. This patch fixes it by starting the comparisons with an empty backlog.	2024-06-18 13:15:18 +02:00
Kefu Chai	87247c6542	.github: add workflow to build with latest seastar so we can be awared that if scylla builds with seastar master HEAD, and to be prepared if a build failure is found. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19135	2024-06-18 13:34:43 +03:00
Andrei Chekun	6a4b441bf2	[test.py] Fix logs had multiplication of lines Since the test name was not unique across the run and when we were using a --repeat option, there were several handlers for the same file. With this change test name and accordingly, the log name will be different for the same test but different repeat case. Remove mode from the test name since it's already in mode directory.	2024-06-18 11:14:07 +02:00
Andrei Chekun	b01a5f9bd9	[test.py] Fix log not deleted One of the created log files was not deleted at all, because there was no delete command. Unlink moved on later stage explicitly after removing the handler that writing to this file to avoid the possibility that something will be added after removing the file.	2024-06-18 11:14:01 +02:00
Kefu Chai	0a74d45425	build: cmake: add commitlog_cleanup_test in `94cdfcaa94`, we added commitlog_cleanup_test to `configure.py`, but didn't add it to the CMake building system. in this change, let's add it to the CMake building system. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19314	2024-06-18 12:12:28 +03:00
Kefu Chai	68ef7dda79	config: correct the comment on printable_to_json() seastar::format() does not use operator<< under the hood, it uses {fmt}, so update the comment accordingly. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19315	2024-06-18 12:08:59 +03:00
Nadav Har'El	2ec1e0f0d5	test/cql-pytest: tests verifying UUID sort order In issue #15561 some doubts were raised regarding the way ScyllaDB sorts UUID values. This patch adds a heavily-commented cql-pytest test that helps understand - and verify that understanding - of the way Scylla sorts UUIDs, and shows there is some reason in the madness (in particular, Version 1 UUIDs (time uuids) are sorted like timeuuids, and not as byte arrays. The new tests check the different cases (see the comments in the test), and as usual for cql-pytest tests - they passes also on Cassandra, which allows us to confirm that the sort order we used is identical to the one used by Cassandra and not something that Scylla mis-implemented. Having this test in our suite will also ensure that the UUID ordering never changes accidentally in the future. If it ever changes, it can break access to existing tables that use UUID clustering keys, so it shouldn't change. Fixes #15561 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19343	2024-06-18 12:05:30 +03:00
Pavel Emelyanov	147552c34a	Merge 'configurable maintenance (streaming) semaphore count resource limit' from Botond Dénes Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness. Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field. Refs: #18269 No OSS backport needed. Closes scylladb/scylladb#19248 * github.com:scylladb/scylladb: replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit db/config: introduce maintenance_reader_concurrency_semaphore_count_limit reader_concurrency_semaphore: make count parameter live-update	2024-06-18 12:02:24 +03:00
Gleb Natapov	fb764720d3	topology coordinator: add more trace level logging for debugging Add more logging that provide more visibility into what happens during topology loading. Message-ID: <ZnE5OAmUbExVZMWA@scylladb.com>	2024-06-18 10:34:03 +02:00
Botond Dénes	1acc57e19d	Merge 'schema: Make "describe" use extensions to string' from Calle Wilund Fixes #19334 Current impl uses hardcoded printing of a few extensions. Instead, use extension options to string and print all. Note: required to make enterprise CI happy again. Closes scylladb/scylladb#19337 * github.com:scylladb/scylladb: schema: Make "describe" use extensions to string schema_extensions: Add an option to string method	2024-06-18 11:28:11 +03:00
Botond Dénes	495f7160da	Update tools/jmx submodule * tools/jmx 53696b13...3328a229 (1): > scylla-apiclient: add missing license for SBOM report	2024-06-18 11:11:57 +03:00
Kefu Chai	fd0de02b81	types: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	2c1a3e7191	node_ops: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	84f0fd6823	lang: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	ec5f0fccce	gms: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 15:55:22 +08:00
Kefu Chai	51d686ea9f	dht: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 11:26:20 +08:00
Kefu Chai	ef0f4eaef2	test: do not use operator<< for std::optional we don't provide it anymore, and if any of existing type provides constructor accepting an `optional<>`, and hence can be formatted using operator<< after converting it, neither shall we rely on this behavior, as it is fragile. so, in this change, we switch to `fmt::print()` to use {fmt} to print `optional<>`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-18 10:41:48 +08:00
Andrei Chekun	3c921d5712	Add allure pytest adaptor to the toolchain Add allure-pytest pip dependency to be able to use it for generating the allure report later. Main benefits of the allure report: 1. Group test failures 2. Possibility to attach log files to she test itself 3. Timeline of test run 4. Test description on the report 5. Search by test name or tag [avi: regenerate toolchain] Closes scylladb/scylladb#19335	2024-06-17 23:17:01 +03:00
Nadav Har'El	4faceeaa33	Merge 'treewide: drop thrift support' from Kefu Chai thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> - [x] not a fix, no need to backport Closes scylladb/scylladb#18453 * github.com:scylladb/scylladb: config: expand on rpc_keepalive's description api: s/rpc/thrift/ db/system_keyspace: drop thrift_version from system.local table transport: do not return client_type from cql_server::connection::make_client_key() treewide: drop thrift support	2024-06-17 22:36:49 +03:00
Andrei Chekun	8845978ec5	[test.py] Unbreak cql-pytest and alternator Provide possibility to run pytest without explicitly providing mode parameter Closes scylladb/scylladb#19342	2024-06-17 21:41:09 +03:00
Piotr Dulikowski	85128c5b10	Merge 'cql3: always return created event in create keyspace statement' from Marcin Maliszkiewicz cql3: always return created event in create ks/table/type/view statement In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS and later USE KEYSPACE it can happen that schema in driver's session is out of sync because it synces when it receives special message from CREATE KEYSPACE response. Similar situation occurs with other schema change statements. In this patch we fix only create keyspace/table/type/view statements by always sending created event. Behavior of any other schema altering statements remains unchanged. Fixes https://github.com/scylladb/scylladb/issues/16909 backport: no, it's not a regression Closes scylladb/scylladb#18819 * github.com:scylladb/scylladb: cql3: always return created event in create ks/table/type/view statement cql3: auth: move auto-grant closer to resource creation code cql3: extract create ks/table/type/view event code	2024-06-17 19:58:38 +02:00
Anna Stuchlik	ea35982764	doc: remove the 5.x.y to 5.x.z upgrade guide This commit removes the upgrade guide from 5.x.y to 5.x.z. It is reduntant in version 6.x.	2024-06-17 17:28:39 +02:00
Anna Stuchlik	ead201496d	doc: add the 6.x.y to 6.x.z upgrade guide-6 This commit adds the upgrade guide from 6.x.y to 6.x.z.	2024-06-17 17:23:00 +02:00
Marcin Maliszkiewicz	95673907ca	auth: reuse roles select query during cache population With big number of shards in the cluster (e.g. 500+) due to cache periodic refresh we experience high load on role_permissions table (e.g. 1k op/s). The load on roles table is amplified because to populate single entry in the cache we do several selects on roles table. Some of this can't be avoided because roles are arranged in a tree-like structure where permissions can be inherited. This patch tries to reuse queries which are simply duplicated. It should reduce the load on roles table by up to 50%. Fixes scylladb/scylladb#19299	2024-06-17 16:46:33 +02:00
Marcin Maliszkiewicz	547eb6d59b	auth: coroutinize service::get_uncached_permissions	2024-06-17 16:46:28 +02:00
Marcin Maliszkiewicz	00a24507cb	auth: coroutinize service::has_superuser	2024-06-17 16:46:22 +02:00
Kefu Chai	a5a5ca0785	auth: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19312	2024-06-17 17:33:55 +03:00
Yaniv Michael Kaul	9b0eb82175	dist/common/scripts/scylla_coredump_setup: fix typo Does not able -> Unable Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#19328	2024-06-17 17:33:46 +03:00
Kefu Chai	b64126fe1c	db: remove unused operator<< since we've switched almost all callers of the operator<< to {fmt}, let's drop the unused operator<<:s. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19313	2024-06-17 17:33:31 +03:00
Calle Wilund	73abc56d79	schema: Make "describe" use extensions to string Fixes #19334 Current impl uses hardcoded printing of a few extensions. Instead, use extension options to string and print all.	2024-06-17 13:30:24 +00:00
Calle Wilund	d27620e146	schema_extensions: Add an option to string method Allow an extension to describe itself as the CQL property string that created it (and is serialized to schema tables) Only paxos extension requires override.	2024-06-17 13:30:10 +00:00
Gleb Natapov	09556bff0e	gossiper: move gossip verbs to the idl	2024-06-17 12:47:17 +03:00
Kefu Chai	7e9550e9f9	test/py/minio_server.py: do not reference non-existent old_env in `51c53d8db6`, we check `self.old_env[env]` for None, but there are chances `self.old_env` does not contain a value with `env`. in that case, we'd have following failure: ``` Traceback (most recent call last): File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 307, in <module> asyncio.run(main()) File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/base_events.py", line 687, in run_until_complete return future.result() ^^^^^^^^^^^^^^^ File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 304, in main await server.stop() File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 274, in stop self._unset_environ() File "/home/kefu/dev/scylladb/test/pylib/minio_server.py", line 211, in _unset_environ if self.old_env[env] is not None: ~~~~~~~~~~~~^^^^^ KeyError: 'S3_CONFFILE_FOR_TEST' ``` this happens if we run `pylib/minio_server.py` as a standalone application. in this change, instead of getting the value with index, we use `dict.get()`, so that it does not throw when the dict does not have the given key. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19291	2024-06-17 12:42:43 +03:00
Andrei Chekun	293cf355df	[test.py] Fix log for failed node was nod added to failed directory If something happens during nod adding to the cluster, it will not be registered as a part of the cluster. This leads to situations during log gathering that logs for a such node will be missing.	2024-06-17 11:16:55 +02:00
Andrei Chekun	7bbb8d9260	[test.py] Fix URl for failed logs directory in CI Incorrect passing of the artifacts_dir_url parameter from test.py to pytest leads to the situation when it will pass None as a string and pytest will generate incorrect URL.	2024-06-17 11:16:48 +02:00
Aleksandra Martyniuk	fb3153d253	api: task_manager: delete module from full_task_status Delete module field from full_task_status as it is unused. Closes scylladb/scylladb#18853	2024-06-17 09:03:19 +03:00
Nadav Har'El	9fc70a28ca	test: unflake test test_alternator_ttl_scheduling_group This test in topology_experimental_raft/test_alternator.py wants to check that during Alternator TTL's expiration scans, ALL of the CPU was used in the "streaming" scheduling group and not in the "statement" scheduling group. But to allow for some fluke requests (e.g., from the driver), the test actually allows work in the statement group to be up to 1% of the work. Unfortunately, in one test run - a very slow debug+aarch64 run - we saw the work on the statement group reach 1.4%, failing the test. I don't know exactly where this work comes from, perhaps the driver, but before this bug was fixed we saw more than 58% of the work in the wrong scheduling group, so neither 1% or 1.4% is a sign that the bug came back. In fact, let's just change the threshold in the test to 10%, which is also much lower than the pre-fix value of 58%, so is still a valid regression test. Fixes #19307 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19323	2024-06-17 08:39:38 +03:00
Yaron Kaikov	996be2e235	dbuild: update toolchain to get latest scylla-api-client a new Scylla-api-client was released to get a proper license information in our SBOM report, Refs: https://github.com/scylladb/scylla-jmx/issues/237 Closes scylladb/scylladb#19324	2024-06-17 08:37:49 +03:00
Dawid Medrek	670830091c	db/hints: Use dedicated functions to lock a shared mutex Seastar has functions implementing locking a `seastar::shared_mutex`. We should use those now instead of reimplementing them in Scylla. Closes scylladb/scylladb#19253	2024-06-14 20:31:37 +02:00
Kamil Braun	bbb424a757	Merge '[test.py] Add uniqueness to the test name' from Andrei Chekun In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973 Closes scylladb/scylladb#19235 * github.com:scylladb/scylladb: [test.py] Add uniqueness to the test name [test.py] Refactor alternator, nodetool, rest_api	2024-06-14 17:59:07 +02:00
Botond Dénes	5b87fa4cea	Merge 'doc: document `keyspace` and `table` for `nodetool ring`' from Kefu Chai these two arguments are critical when tablets are enabled. Fixes https://github.com/scylladb/scylladb/issues/19296 --- 6.0 is the first release with tablets support. and `nodetool ring` is an important tool to understand the data distribution. so we need to backport this document change to 6.0 Closes scylladb/scylladb#19297 * github.com:scylladb/scylladb: doc: document `keyspace` and `table` for `nodetool ring` doc: replace tab with space	2024-06-14 16:04:23 +03:00
Kefu Chai	ea3b8c5e4f	doc: document `keyspace` and `table` for `nodetool ring` these two arguments are critical when tablets are enabled. Fixes #19296 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 21:01:14 +08:00
Botond Dénes	c563acdbe9	Merge 'build: cmake: use path to be compatible with CI' from Kefu Chai this change is created in the same spirit of `1186ddef16`, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz: ``` 08:02:47 ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz 08:02:47 ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory` ``` so, in this change, we fix the path. Refs #2717 --- * cmake related change, hence no need to backport. Closes scylladb/scylladb#19290 * github.com:scylladb/scylladb: build: cmake: use per-mode path for building unstripped_dist_pkg build: cmake: use path to be compatible with CI	2024-06-14 15:35:26 +03:00
Kefu Chai	d498ca3afa	test: randomized_nemesis_test: use BOOST_REQUIRE_* when appropriate for better debuggability. Refs #17030 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19282	2024-06-14 15:33:07 +03:00
Kefu Chai	d887fd2402	build: use default modes when no modes are selected when `--use-cmake` option is passed to `configure.py`, - before this change, all modes are selected if no `--mode` options are passed to `configure.py`. - after this change, only the modes whose `build_by_default` is `True` are selected, if no `--mode` options are specfied. the new behavior matches the existing behavior. otherwise, `ninja -C build mode_list` would list the mode which is not built by default. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19292	2024-06-14 15:31:58 +03:00
Botond Dénes	b2ebc172d0	Merge 'Fix usage of utils/chunked_vector::reserve_partial' from Lakshmi Narayanan Sreethar utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This PR updates the method comment and all the callers to use the right way. Fixes #19254 Closes scylladb/scylladb#19279 * github.com:scylladb/scylladb: utils/large_bitset: remove unused includes identified by clangd utils/large_bitset: use thread::maybe_yield() test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial utils/lsa/chunked_managed_vector: fix reserve_partial() utils/chunked_vector: return void from reserve_partial and make_room test/boost/chunked_vector_test: fix testcase tests_reserve_partial utils/chunked_vector::reserve_partial: fix usage in callers	2024-06-14 15:31:00 +03:00
Kefu Chai	5c41073e00	tools/scylla-sstable: format error message with compile-time check before this change, we use runtime format string to format error messages. but it does not have the compile time format check. if we pass arguments which are not formattable, {fmt} throws at runtime, instead of error out at compile-time. this could be very annoying, because we format error messages at the error handling path. but if user ends up seeing an exception for {fmt} instead of a nice error message, it would be far from helpful. in this change, we - use compile-time format string - fix two caller sites, where we pass `std::exception_ptr` to {fmt}, but `std::exception_ptr` is not formattable by {fmt} at the time of writing. we do have operator<< based formatter for it though. so we delegate to `fmt::streamed` to format it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19294	2024-06-14 15:30:19 +03:00
Kefu Chai	aef1718833	doc: replace tab with space more consistent this way, also easier to format in a regular editor without additional setup. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 18:46:09 +08:00
Kamil Braun	982fa31250	Merge 'test: servers_add: fix the expected_error parameter' from Patryk Jędrzejczak This PR fixes two problems with the `expected_error` parameter in `server_add` and `servers_add`. 1. It didn't work in `server_add` if the cluster was empty because of an incorrect attempt to connect the driver. 2. It didn't work in `servers_add` completely because the `seeds` parameter was handled incorrectly. This PR only adds improvements in the testing framework, no need to backport it. Closes scylladb/scylladb#19255 * github.com:scylladb/scylladb: test: manager_client, scylla_cluster: fix type annotations in add_servers test: manager_client: don't connect driver after failed server_{add, start} test: scylla_cluster: pass seeds to add_servers	2024-06-14 11:33:21 +02:00
Wojciech Mitros	d31437b589	mv: replicate the gossiped backlog to all shards On each shard of each node we store the view update backlogs of other nodes to, depending on their size, delay responses to incoming writes, lowering the load on these nodes and helping them get their backlog to normal if it were too high. These backlogs are propagated between nodes in two ways: the first one is adding them to replica write responses. The seconds one is gossiping any changes to the node's backlog every 1s. The gossip becomes useful when writes stop to some node for some time and we stop getting the backlog using the first method, but we still want to be able to select a proper delay for new writes coming to this node. It will also be needed for the mv admission control. Currently, the backlog is gossiped from shard 0, as expected. However, we also receive the backlog only on shard 0 and only update this shard's backlogs for the other node. Instead, we'd want to have the backlogs updated on all shards, allowing us to use proper delays also when requests are received on shards different than 0. This patch changes the backlog update code, so that the backlogs on all shards are updated instead. This will only be performed up to once per second for each other node, and is done with a lower priority, so it won't severly impact other work. Fixes: scylladb/scylladb#19232 Closes scylladb/scylladb#19268	2024-06-14 11:24:20 +02:00
Andrei Chekun	8d1d206aff	[test.py] Add uniqueness to the test name In CI test always executed with option --repeat=3 that leads to generate 3 test results with the same name. Junit plugin in CI cannot distinguish correctly the difference between these results. In case when we have two passes and one fail, the link to test result will sometimes be redirected to the incorrect one because the test name is the same. To fix this ReportPlugin added that will be responsible to modify the test case name during junit report generation adding to the test name mode and run id. Fixes: https://github.com/scylladb/scylladb/issues/17851 Fixes: https://github.com/scylladb/scylladb/issues/15973	2024-06-14 11:23:04 +02:00
Wojciech Mitros	9bae1814ab	test: add test for failed view building write For various reasons, a view building write may fail. When that happens, the view building should not finish until these writes are successfully retried and they should not interfere with any writes that are performed to the base table while the view is building. The test introduced in this patch confirms that this is the case. Refs scylladb/scylladb#19261 Closes scylladb/scylladb#19263	2024-06-14 10:38:21 +02:00
Lakshmi Narayanan Sreethar	c49f6391ab	utils/large_bitset: remove unused includes identified by clangd Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	83190fa075	utils/large_bitset: use thread::maybe_yield() Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	310c5da4bb	test/boost/chunked_managed_vector_test: fix testcase tests_reserve_partial Update the maximum size tested by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (12.8 KB). So, use twice the max_chunk_capacity as the test size distribution upper limit to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	d4f8b91bd6	utils/lsa/chunked_managed_vector: fix reserve_partial() Fix the method comment and return types of chunked_managed_vector's reserve_partial() similar to chunked_vector's reserve_partial() as it has the same issues mentioned in #19254. Also update the usage in the chunked_managed_vector_test. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:47:10 +05:30
Lakshmi Narayanan Sreethar	0a22759c2a	utils/chunked_vector: return void from reserve_partial and make_room Since reserve_partial does not depend on the number of remaining capacity to be reserved, there is no need to return anything from it and the make_room method. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:43:07 +05:30
Lakshmi Narayanan Sreethar	29f036a777	test/boost/chunked_vector_test: fix testcase tests_reserve_partial Fix the usage of reserve_partial in the testcase. Also update the maximum chunk size used by the testcase. The test always created only one chunk as the maximum size tested by it (1 << 12 = 4KB) was less than the default max chunk size (128 KB). So, use smaller chunk size, 512 bytes, to verify that partial_reserve can reserve multiple chunks. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-14 13:43:07 +05:30
Kefu Chai	df094061e3	test: randomized_nemesis_test: define static variable before this change, when linking randomized_nemesis_test with ld.lld: ``` [4/4] Linking CXX executable test/raft/RelWithDebInfo/randomized_nemesis_test FAILED: test/raft/RelWithDebInfo/randomized_nemesis_test : && /home/kefu/.local/bin/clang++ -ffunction-sections -fdata-sections -O3 -g -gz -Xlinker --build-id=sha1 --ld-path=ld.lld -dynamic-linker=/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////lib64/ld-linux-x86-64.so.2 -Xlinker --gc-sections test/raft/CMakeFiles/test-raft-helper.dir/RelWithDebInfo/helpers.cc.o test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o -o test/raft/RelWithDebInfo/randomized_nemesis_test -L/home/kefu/dev/scylladb/idl/absl::headers -Wl,-rpath,/home/kefu/dev/scylladb/idl/absl::headers test/lib/RelWithDebInfo/libtest-lib.a seastar/RelWithDebInfo/libseastar.a /usr/lib64/libxxhash.so seastar/RelWithDebInfo/libseastar_testing.a test/lib/RelWithDebInfo/libtest-lib.a -Xlinker --push-state -Xlinker --whole-archive auth/RelWithDebInfo/libscylla_auth.a -Xlinker --pop-state /usr/lib64/libcrypt.so cdc/RelWithDebInfo/libcdc.a compaction/RelWithDebInfo/libcompaction.a mutation_writer/RelWithDebInfo/libmutation_writer.a -Xlinker --push-state -Xlinker --whole-archive dht/RelWithDebInfo/libscylla_dht.a -Xlinker --pop-state types/RelWithDebInfo/libtypes.a index/RelWithDebInfo/libindex.a -Xlinker --push-state -Xlinker --whole-archive locator/RelWithDebInfo/libscylla_locator.a -Xlinker --pop-state message/RelWithDebInfo/libmessage.a gms/RelWithDebInfo/libgms.a sstables/RelWithDebInfo/libsstables.a readers/RelWithDebInfo/libreaders.a schema/RelWithDebInfo/libschema.a -Xlinker --push-state -Xlinker --whole-archive tracing/RelWithDebInfo/libscylla_tracing.a -Xlinker --pop-state RelWithDebInfo/libscylla-main.a abseil/absl/strings/RelWithDebInfo/libabsl_cord.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_info.a abseil/absl/strings/RelWithDebInfo/libabsl_cord_internal.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_functions.a abseil/absl/strings/RelWithDebInfo/libabsl_cordz_handle.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_cord_state.a abseil/absl/crc/RelWithDebInfo/libabsl_crc32c.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_internal.a abseil/absl/crc/RelWithDebInfo/libabsl_crc_cpu_detect.a abseil/absl/strings/RelWithDebInfo/libabsl_str_format_internal.a /usr/lib64/libz.so service/RelWithDebInfo/libservice.a node_ops/RelWithDebInfo/libnode_ops.a service/RelWithDebInfo/libservice.a node_ops/RelWithDebInfo/libnode_ops.a -lsystemd raft/RelWithDebInfo/libraft.a repair/RelWithDebInfo/librepair.a streaming/RelWithDebInfo/libstreaming.a replica/RelWithDebInfo/libreplica.a db/RelWithDebInfo/libdb.a mutation/RelWithDebInfo/libmutation.a data_dictionary/RelWithDebInfo/libdata_dictionary.a cql3/RelWithDebInfo/libcql3.a transport/RelWithDebInfo/libtransport.a cql3/RelWithDebInfo/libcql3.a transport/RelWithDebInfo/libtransport.a lang/RelWithDebInfo/liblang.a /usr/lib64/liblua-5.4.so -lm /usr/lib64/libsnappy.so.1.1.10 abseil/absl/container/RelWithDebInfo/libabsl_raw_hash_set.a abseil/absl/hash/RelWithDebInfo/libabsl_hash.a abseil/absl/hash/RelWithDebInfo/libabsl_city.a abseil/absl/types/RelWithDebInfo/libabsl_bad_variant_access.a abseil/absl/hash/RelWithDebInfo/libabsl_low_level_hash.a abseil/absl/types/RelWithDebInfo/libabsl_bad_optional_access.a abseil/absl/container/RelWithDebInfo/libabsl_hashtablez_sampler.a abseil/absl/profiling/RelWithDebInfo/libabsl_exponential_biased.a abseil/absl/synchronization/RelWithDebInfo/libabsl_synchronization.a abseil/absl/debugging/RelWithDebInfo/libabsl_stacktrace.a abseil/absl/synchronization/RelWithDebInfo/libabsl_graphcycles_internal.a abseil/absl/synchronization/RelWithDebInfo/libabsl_kernel_timeout_internal.a abseil/absl/debugging/RelWithDebInfo/libabsl_symbolize.a abseil/absl/debugging/RelWithDebInfo/libabsl_debugging_internal.a abseil/absl/base/RelWithDebInfo/libabsl_malloc_internal.a abseil/absl/debugging/RelWithDebInfo/libabsl_demangle_internal.a abseil/absl/time/RelWithDebInfo/libabsl_time.a abseil/absl/strings/RelWithDebInfo/libabsl_strings.a abseil/absl/strings/RelWithDebInfo/libabsl_strings_internal.a abseil/absl/strings/RelWithDebInfo/libabsl_string_view.a abseil/absl/base/RelWithDebInfo/libabsl_throw_delegate.a abseil/absl/numeric/RelWithDebInfo/libabsl_int128.a abseil/absl/base/RelWithDebInfo/libabsl_base.a abseil/absl/base/RelWithDebInfo/libabsl_raw_logging_internal.a abseil/absl/base/RelWithDebInfo/libabsl_log_severity.a abseil/absl/base/RelWithDebInfo/libabsl_spinlock_wait.a -lrt abseil/absl/time/RelWithDebInfo/libabsl_civil_time.a abseil/absl/time/RelWithDebInfo/libabsl_time_zone.a rust/RelWithDebInfo/libwasmtime_bindings.a rust/librust_combined.a /usr/lib64/libdeflate.so utils/RelWithDebInfo/libutils.a /usr/lib64/libxxhash.so /usr/lib64/libcryptopp.so /usr/lib64/libboost_regex.so.1.83.0 /usr/lib64/libicui18n.so /usr/lib64/libicuuc.so /usr/lib64/libboost_unit_test_framework.so.1.83.0 seastar/RelWithDebInfo/libseastar_testing.a seastar/RelWithDebInfo/libseastar.a /usr/lib64/libboost_program_options.so /usr/lib64/libboost_thread.so /usr/lib64/libboost_chrono.so /usr/lib64/libboost_atomic.so /usr/lib64/libcares.so /usr/lib64/libfmt.so.10.2.1 /usr/lib64/liblz4.so -ldl /usr/lib64/libgnutls.so -latomic /usr/lib64/libsctp.so /usr/lib64/libprotobuf.so /usr/lib64/libyaml-cpp.so /usr/lib64/libhwloc.so //usr/lib64/liburing.so /usr/lib64/libnuma.so /usr/lib64/libboost_unit_test_framework.so && : ld.lld: error: undefined symbol: append_seq::magic >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38) >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(__cxx_global_var_init.38) >>> referenced by impl.hpp:92 (/usr/include/boost/test/tools/old/impl.hpp:92) >>> test/raft/CMakeFiles/randomized_nemesis_test.dir/RelWithDebInfo/randomized_nemesis_test.cc.o:(append_seq::append(int) const) >>> referenced 5 more times clang++: error: linker command failed with exit code 1 (use -v to see invocation) ``` it turns out `append_seq::magic` is only declared, but never defined. please note, the non-inline static member variable in its class definition is not considered as a definition, see [class.static.data](https://eel.is/c++draft/class.static.data#3) > The declaration of a non-inline static data member in its class > definition is not a definition and may be of an incomplete type > other than cv void. so, let's declare it as a `constexpr` instead. it implies `inline`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19283	2024-06-14 10:00:21 +03:00
Kefu Chai	4c1006a5bb	dist: s/SafeConfigParser/ConfigParser/ `SafeConfigParser` was renamed to `ConfigParser` in Python 3.2, and Python warns us: > scylla-housekeeping:183: DeprecationWarning: The SafeConfigParser > class has been renamed to ConfigParser in Python 3.2. This alias will > be removed in Python 3.12. Use ConfigParser directly instead. see https://docs.python.org/3.2/library/configparser.html#configparser.ConfigParser and https://docs.python.org/3.1/library/configparser.html#configparser.SafeConfigParser Fixes #13046 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19285	2024-06-14 09:59:22 +03:00
Kefu Chai	3a5898880e	alternator: drop unused friend declaration in `57c408ab`, we dropped operator<< for `parsed::path`, but we forgot to drop the friend declaration for it along with the operator. so in this change, let's drop the friend declaration. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19287	2024-06-14 09:58:09 +03:00
Kefu Chai	83c6ae10c4	sstables/compress: put type constraints into template type param more compact this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19284	2024-06-14 09:50:55 +03:00
Kefu Chai	6556cd684e	cql3: remove unused operator<< as these operators are not used anymore. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19288	2024-06-14 09:45:35 +03:00
Botond Dénes	d50688efee	Merge 'api: do not include unused headers' from Kefu Chai these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. also, add api to iwyu github workflow's CLEANER_DIR, to prevent future violations. --- it's a cleanup, hence no need to backport. Closes scylladb/scylladb#19269 * github.com:scylladb/scylladb: .github: add api to iwyu's CLEANER_DIR api: do not include unused headers	2024-06-14 09:34:13 +03:00
Kefu Chai	28a4298005	build: cmake: use per-mode path for building unstripped_dist_pkg `before this change, we use "scylla" as the dependecy of unstripped_dist_pkg, but that's implies the scylla built with the default mode. if the build rules is generated using the multi-config generator, the default mode does not necessarily identical to the current `$<CONFIG>`, so let's be more explicit. otherwise, we could run into built failure like ``` FAILED: dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz cd /jenkins/workspace/scylla-master/scylla-ci/scylla && scripts/create-relocatable-package.py --build-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo --node-exporter-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/node_exporter --debian-dir /jenkins/workspace/scylla-master/scylla-ci/scylla/build/debian /jenkins/workspace/scylla-master/scylla-ci/scylla/build/dist/RelWithDebInfo/scylla-unstripped-6.1.0~dev-0.20240614.5f36888e7fbd.x86_64.tar.gz ldd: /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla: No such file or directory Traceback (most recent call last): File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 109, in <module> libs.update(ldd(exe)) ^^^^^^^^ File "/jenkins/workspace/scylla-master/scylla-ci/scylla/scripts/create-relocatable-package.py", line 37, in ldd for ldd_line in subprocess.check_output( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/subprocess.py", line 466, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/subprocess.py", line 571, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ldd', '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/scylla']' returned non-zero exit status 1. ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 13:27:26 +08:00
Kefu Chai	b94420a9dd	build: cmake: use path to be compatible with CI this change is created in the same spirit of `1186ddef16`, which updated the rule for generating the stripped dist pkg, but it failed to update the one for generating the unstripped dist pkg. what's why we have build failure when the workflow is looking for the unstripped tar.gz: ``` 08:02:47 ++ ls /jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz 08:02:47 ls: cannot access '/jenkins/workspace/scylla-master/scylla-ci/scylla/build/RelWithDebInfo/dist/tar/scylla-unstripped-6.1.0~dev-0.20240613.d5bdddaeb40b.x86_64.tar.gz': No such file or directory` ``` so, in this change, we fix the path. Refs #2717 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-14 13:27:26 +08:00
Botond Dénes	ea40567bbc	Merge 'Some cleanups for replica table' from Raphael "Raph" Carvalho backport not needed, these are just cleanups. Closes scylladb/scylladb#19260 * github.com:scylladb/scylladb: replica: simplify perform_cleanup_compaction() replica: return storage_group by reference on storage_group_for*() replica: devirtualize storage_group_of()	2024-06-14 08:14:58 +03:00
Botond Dénes	bf429695b6	Merge 'test_tablets: add test_tablet_storage_freeing' from Michał Chojnowski Before work on tablets was completed, it was noticed that — due to some missing pieces of implementation — Scylla doesn't properly close sstables for migrated-away tablets. Because of this, disk space wasn't being reclaimed properly. Since the missing pieces of implementation were added, the problem should be gone now. This patch adds a test which was used to reproduce the problem earlier. It's expected to pass now, validating that the issue was fixed. Should be backported to branch-6.0, because the tested problem was also affecting that branch. Fixes #16946 Closes scylladb/scylladb#18906 * github.com:scylladb/scylladb: test_tablets: add test_tablet_storage_freeing test: pylib: add get_sstables_disk_usage()	2024-06-14 08:08:54 +03:00
Raphael S. Carvalho	f143f5b90d	replica: remove linear search when picking memtable_list for range scan with tablets with tablets, we're expected to have a worst of ~100 tablets in a given table and shard, so let's avoid linear search when looking for the memtable_list in a range scan. we're bounded by ~100 elements, so shouldn't be a big problem, but it's an inefficiency we can easily get rid of. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19286	2024-06-14 08:00:17 +03:00
Benny Halevy	fb3db7d81f	perf-simple-query: add cpu_cycles / op metric Example output: ``` bhalevy@[] scylla$ build/release/scylla perf-simple-query --default-log-level=error -c 1 --duration 10 random-seed=4058714023 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, frontend=cql, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 86912.75 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42346 insns/op, 22811 cycles/op, 0 errors) 91348.29 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42306 insns/op, 22362 cycles/op, 0 errors) 87965.84 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42338 insns/op, 22966 cycles/op, 0 errors) 90793.67 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42351 insns/op, 22783 cycles/op, 0 errors) 90104.27 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42358 insns/op, 22875 cycles/op, 0 errors) 90397.13 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42355 insns/op, 22735 cycles/op, 0 errors) 89142.39 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42363 insns/op, 22996 cycles/op, 0 errors) 90410.40 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 42363 insns/op, 22725 cycles/op, 0 errors) 88173.10 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42366 insns/op, 23160 cycles/op, 0 errors) 88416.51 tps ( 63.1 allocs/op, 0.0 logallocs/op, 14.2 tasks/op, 42379 insns/op, 23102 cycles/op, 0 errors) median 90104.26849997675 median absolute deviation: 1244.02 maximum: 91348.29 minimum: 86912.75 ``` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18818	2024-06-14 07:42:09 +03:00
Lakshmi Narayanan Sreethar	64768b58e5	utils/chunked_vector::reserve_partial: fix usage in callers The method reserve_partial(), when used as documented, quits before the intended capacity can be reserved fully. This can lead to overallocation of memory in the last chunk when data is inserted to the chunked vector. The method itself doesn't have any bug but the way it is being used by the callers needs to be updated to get the desired behaviour. Instead of calling it repeatedly with the value returned from the previous call until it returns zero, it should be repeatedly called with the intended size until the vector's capacity reaches that size. This commit updates the method comment and all the callers to use the right way. Fixes #19254 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2024-06-13 21:42:11 +05:30
Raphael S. Carvalho	ace4e5111e	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 13:06:51 -03:00
Raphael S. Carvalho	0ce8ee03f1	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:53:27 -03:00
Raphael S. Carvalho	51c7ee889e	sstables: Allow to get free space from underlying storage That will be used in turn to restrict reshape to 10% of available space in underlying storage. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:43:14 -03:00
Raphael S. Carvalho	b8bd4c51c2	replica: don't expose compaction_group to reshape task compaction_group sits in replica layer and compaction layer is supposed to talk to it through compaction::table_state only. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-13 12:43:14 -03:00
Andrei Chekun	93b9b85c12	[test.py] Refactor alternator, nodetool, rest_api Make alternator, nodetool and rest_api test directories as python packages. Move scylla-gdb to scylla_gdb and make it python package.	2024-06-13 13:56:10 +02:00
Avi Kivity	f1819419cc	Merge 'scylla-sstable: add method to load the schema from the sstable itself' from Botond Dénes As it turns out, each sstable carries its own schema in its serialization header (Statistics component). This schema is incomplete -- the names of the key columns are not stored, just their type. Static and regular columns do have names and types stored however. This bare-bones schema is enough to parse and display the content of the sstable. Another thing missing is schema options (the stuff after the `WITH` keyword, except the clustering order). The only options stored are the compression options (in the CompressionInfo component), this is actually needed to read the Data component. This series adds a new method to `tools/schema_loader.cc` to extract the schema stored in the sstable itself. This new schema load method is used as the last fall-back for obtaining the schema, in case scylla-sstable is trying to autodetect the schema of the sstable. Although, right now this bare-bones schema is enough for everything scylla-sstable does, it is more future proof to stick to the "full" schema if possible, so this new method is the last resort for now. Fixes: https://github.com/scylladb/scylladb/issues/17869 Fixes: https://github.com/scylladb/scylladb/issues/18809 New functionality, no backport needed. Closes scylladb/scylladb#19169 * github.com:scylladb/scylladb: tools/scylla-sstable: log loaded schema with trace level tools/scylla-sstable: load schema from the sstable as fallback tools/schema_loader: introduce load_schema_from_sstable() test/lib/random_schema: remove assert on min number of regular columns sstables: introduce load_metadata()	2024-06-13 12:21:09 +03:00
Benny Halevy	34dfa4d3a3	storage_service: join_token_ring: reject replace on different dc or rack Do not allow replacing a node on one dc/rack with a node on a different dc/rack as this violates the assumption of replace node operation that all token ranges previously owned by the dead node would be rebuilt on the new node. Fixes scylladb/scylladb#16858 Refs scylladb/scylla-enterprise#3518 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16862	2024-06-13 11:19:47 +02:00
Botond Dénes	6868add228	replica/database: wire in maintenance_reader_concurrency_semaphore_count_limit Making the count resources on the maintenance (streaming) semaphore live update via config. This will allow us to improve repair speed on mixed-shard clusters, where we suspect that reader trashing -- due to the combination of high number of readers on each shard and very conservative reader count limit (10) -- is the main cause of the slowness. Making this count limit confgurable allows us to start experimenting with this fix, without committing to a count limit increase (or removal), addressing the pain in the field.	2024-06-13 01:59:21 -04:00
Botond Dénes	665fdd6ce4	db/config: introduce maintenance_reader_concurrency_semaphore_count_limit To control the amount of count resources of the maintenance (streaming) semaphore. Not wired yet.	2024-06-13 01:59:21 -04:00
Botond Dénes	ba0cc29d82	reader_concurrency_semaphore: make count parameter live-update So that the amount of count resources can be changed at run-time, triggered by a e.g. a config change. Previous constant-count based constructor is left intact, to avoid patching all clients, as only a small subset will want the new functionality.	2024-06-13 01:59:21 -04:00
Nadav Har'El	44ea1993ba	test/cql-pytest: tests CREATE/DROP INDEX during paged query This patch includes extensive testing for what happens to an ongoing paged query when a secondary index is suddenly added or dropped. Issue #18992 was opened suggesting that this would be broken, and indeed the tests included here show that it is indeed broken. The four tests included in this patch are heavily commented to explain what they are testing and why, but here is a short summary of what is being tested by each of them: 1. A paged query filtering on v=17 continues correctly even if an index is created on v. 2. A paged query filtering on v1 and v2 where v2 is indexed, continues correctly even if an index is created on v1 (remember that Scylla prefers to use the first index mentioned in the query). 3. A paged query using an index on v continues correctly even if that index is deleted. 4. However, if the query doesn't say "ALLOW FILTERING", it cannot be continued after the index is deleted. All these tests pass on Cassandra, but all of them except the fourth fail on Scylla, reproducing issue #18992. Somewhat to my suprise, the failure of the query in all the failed tests is silent (i.e., trying to fetch the next page just fetches nothing and says the iteration is done). I was expecting more dramatic failures ("marshaling error" messages, crashes, etc.) but didn't get them. Refs #18992 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19000	2024-06-13 08:39:22 +03:00
Botond Dénes	145a67f77c	tools/scylla-sstable: log loaded schema with trace level The schema of the sstable can be interesting, so log it with trace level. Unfortunately, this is not the nice CQL statement we are used to (that requires a database object), but the not-nearly-so-nice CFMetadata printout. Still, it is better then nothing.	2024-06-13 01:32:17 -04:00
Botond Dénes	43c44f0af5	tools/scylla-sstable: load schema from the sstable as fallback When auto-detecting the schema of the sstable, if all other methods failed, load the schema from the sstable's serialization header. This schema is incomplete. It is just enough to parse and display the content of the sstable. Although parsing and displaying the content of the sstable is all scylla-sstable does, it is more future-compatible to us the full schema when possible. So the always-available but minimal schema that each sstable has on itself, is used just as a fallback. The test which tested the case when all schema load attempts fail, doesn't work now, because loading the serialization header always succeeds. So convert this test into two positive tests, testing the serialization header schema fallback instead.	2024-06-13 01:32:17 -04:00
Botond Dénes	8f2ba03465	tools/schema_loader: introduce load_schema_from_sstable() Allows loading the schema from an sstable's serialization header. This schema is incomplete, but it is enough to parse and display the content of the sstable.	2024-06-13 01:32:17 -04:00
Botond Dénes	0d7335dd27	test/lib/random_schema: remove assert on min number of regular columns It is legal for a schema to have 0 regular columns, so remove the assert on the schema specification's regular column count.	2024-06-13 01:32:17 -04:00
Piotr Dulikowski	0b5a0c969a	Merge 'hinted handoff: migrate sync point to host ID' from Michael Litvak Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of new type the translation is avoided. Fixes #18653 Closes scylladb/scylladb#19134 * github.com:scylladb/scylladb: db/hints: migrate sync point to host ID db/hints: rename sync point structures with _v1 suffix to _v1_v2	2024-06-13 06:16:00 +02:00
Kefu Chai	9d8d9168e6	.github: add api to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Kefu Chai	c03141b4b2	api: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-13 09:32:51 +08:00
Anna Stuchlik	603c662049	doc: remove an entry about seeds from FAQ This commit removes a useless entry from the FAQ page. It contains a false recommendation to configure multiple seeds. Closes scylladb/scylladb#19259	2024-06-12 19:11:52 +02:00
Dawid Medrek	dc41086c57	db/hints: Add a metric for the size of sent hints In this commit, we add a new metric `sent_total_size` keeping track of how many bytes of hints a node has sent. The metric is supposed to complement its counterpart in storage proxy that counts how many bytes of hints a node has received. That information should prove useful in analyzing statistics of a cluster -- load on given nodes and where it comes from. We also change the name of the matric `sent` to `sent_total` to avoid the conflict of prefixes between the two metrics.	2024-06-12 18:20:08 +02:00
Raphael S. Carvalho	f3a1f5df83	replica: simplify perform_cleanup_compaction() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 12:44:21 -03:00
Raphael S. Carvalho	6214dda506	replica: return storage_group by reference on storage_group_for*() those functions cannot return nullptr, will throw when group is not found, so better return ref instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 11:53:06 -03:00
Patryk Jędrzejczak	a7ab9a015a	test: manager_client, scylla_cluster: fix type annotations in add_servers	2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak	1eb25d22c6	test: manager_client: don't connect driver after failed server_{add, start} If adding or starting a server fails expectedly, there is no reason to update or connect the driver. Moreover, before this patch, we couldn't use `server_add` and `servers_add` with `expected_error` if the cluster was empty. After expected bootstrap failures, we tried to connect the driver, which rightfully failed on `assert len(hosts) > 0` in `cluster_con`.	2024-06-12 16:51:20 +02:00
Patryk Jędrzejczak	8f486de8d3	test: scylla_cluster: pass seeds to add_servers This parameter was incorrectly missing. For this reason, `expected_error` was passed from `add_servers` to `add_server` as `seeds`, which caused strange crashes.	2024-06-12 16:51:19 +02:00
Botond Dénes	435c01d1e6	sstables: introduce load_metadata() Loads just the metadata components. No validation. Split off from load(), to allow scylla-sstable to partially load an sstable.	2024-06-12 10:46:38 -04:00
Botond Dénes	aa27f8f365	Merge 'Improve handling of outdated --experimental-features' from Pavel Emelyanov Some time ago it turned out that if unrecognized feature name is met in scylla.yaml, the whole experimental features list is ignored, but scylla continues to boot. There's UNUSED feature which is the proper way to deprecate a feature, and this PR improves its handling in several ways. 1. The recently removed "tablets" feature is partially brought back, but marked as UNUSED 2. Any UNUSED features met while parsing are printed into logs 3. The enum_option<> helper is enlightened along the way refs: #18968 Closes scylladb/scylladb#19230 * github.com:scylladb/scylladb: config: Mark tablets feature as unused main: Warn unused features enum_option: Carry optional key on board enum_option: Remove on-board _map member	2024-06-12 17:33:14 +03:00
Botond Dénes	d2a4cd9cae	Merge 'Register API endpoints next to corresponding services' from Pavel Emelyanov The API endpoints are registered for particular services (with rare exceptions), and once the corresponding service is ready, its endpoints section can be registered too. Same but reversed is for shutdown, and it's automatic with deferred actions. refs: #2737 Closes scylladb/scylladb#19208 * github.com:scylladb/scylladb: main: Register task manager API next to task manager itself main: Register messaging API next to messaging service main: Register repair API next to repair service	2024-06-12 17:31:30 +03:00
Kefu Chai	2eca8b54de	auth/role_or_anonymous: drop operator<< for role_or_anonymous its declaration was removed in `84a9d2fa`, which failed to remove the implementation from .cc file. in this change, let's remove operator<< for role_or_anonymous completely. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19243	2024-06-12 17:30:20 +03:00
Raphael S. Carvalho	9c1d3bcc02	replica: devirtualize storage_group_of() can be made private to tablet_storage_group_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-06-12 11:29:49 -03:00
Kamil Braun	a441d06d6c	raft: fsm: add details to on_internal_error_noexcept message If we receive a message in the same term but from a different leader than we expect, we print: ``` Got append request/install snapshot/read_quorum from an unexpected leader ``` For some reason the message did not include the details (who the leader was and who the sender was) which requires almost zero effort and might be useful for debugging. So let's include them. Ref: scylladb/scylla-enterprise#4276 Closes scylladb/scylladb#19238	2024-06-12 17:29:42 +03:00
Pavel Emelyanov	4400f9082e	lang: Return context as future, not via reference argument Commit `882b2f4e9f` (cql3, schema_tables: Generalize function creation) erroneously says that optional<context> is not suitable for future<> type, but in fact it is. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19204	2024-06-12 16:54:46 +03:00
Kefu Chai	8c99d9e721	.github: use libstdc++-13 since gcc-13 is packaged by ppa:ubuntu-toolchain-r, and GCC-13 was released 1 year ago, let's use it instead. less warnings, as the standard library from GCC-13 is more standard compliant. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19162	2024-06-12 16:52:05 +03:00
Botond Dénes	e91f82fd5c	Merge '.github: add workflow to build with clang nightly' from Kefu Chai to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler. * it is an improvement in our CI, no need to backport. Closes scylladb/scylladb#19164 * github.com:scylladb/scylladb: .github: add workflow to build with clang nightly .github: rename clang-tidy-matcher.json to clang-matcher.json	2024-06-12 16:50:21 +03:00
Pavel Emelyanov	24c818453d	main: Start view builder earlier Commit `47dbf23773` (Rework view services and system-distributed-keyspace dependencies) made streaming and repair services depend on view builder, but missed the fact that the builder itself starts much later. Move view builder earlier, that's safe, no activity is started upon that, real building is kicked much later when invoke_on_all(start) happens. Other than than, start system distributed keyspace earlier, which also looks safe, as it's also started "for real" later, by storage service when it joins the ring. fixes: #19133 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19250	2024-06-12 16:46:55 +03:00
Anna Stuchlik	3f9cc0ec3f	doc: reorganize ToC of the Reference section This commit adds a proper ToC to the Reference section to improve how it renders. Closes scylladb/scylladb#18901	2024-06-12 16:16:04 +03:00
Kefu Chai	da59710fb9	doc: remove unused documents upgrade/_common are document fragments included by other documents. but quite a few the documents previously including these fragments were removed. but we didn't remove these fragments along with them. in this change, we drop them. Fixes #19245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19251	2024-06-12 16:14:57 +03:00
Botond Dénes	cd05de6cfb	Merge 'test: memtable_test: increase unspooled_dirty_soft_limit ' from Kefu Chai before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes https://github.com/scylladb/scylladb/issues/19034 --- the issue applies to both 5.4 and 6.0, and this issue hurts the CI stability, hence we should backport it. Closes scylladb/scylladb#19252 * github.com:scylladb/scylladb: test: memtable_test: increase unspooled_dirty_soft_limit test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE	2024-06-12 16:14:05 +03:00
Dawid Medrek	23bea50de0	service/storage_proxy: Add metrics for received hints In this commit, we add two new metrics to storage proxy: * `received_hints_total`, * `received_hints_bytes_total`. Before these changes, we had to rely solely on other metrics indicating how many hints nodes have written, rejected, sent, etc. Because hints are subject to many more or less controllable factors, e.g. a target node still being a replica for a mutation, it was very difficult to approximate how many hints a given node might have received or what part of its load they were. The newly introduced metrics are supposed to help reason about those.	2024-06-12 14:44:47 +02:00
Kefu Chai	223fba3243	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Kefu Chai	2df4e9cfc2	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-12 19:17:27 +08:00
Pavel Emelyanov	c752bda0a2	Merge '.github: change severity to error in clang-include-cleaner ' from Kefu Chai in this changeset, we tighten the clang-include-cleaner workflow, and address the warnings in two more subdirectories in the source tree. * it's a cleanup, no need to backport Closes scylladb/scylladb#19155 * github.com:scylladb/scylladb: .github: add alternator to iwyu's CLEANER_DIR alternator: do not include unused headers .github: change severity to error in clang-include-cleaner exceptions: do not include unused headers	2024-06-12 10:16:17 +03:00
Kefu Chai	0c9ea654f5	service/paxos: drop operator<< for proposal since we stopped using the generic container formatters which in turn use operator<< for formatting the elemements. we can drop more operator<< operators. so, in this change, we drop operator<< for proposal. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19156	2024-06-12 10:14:47 +03:00
Dawid Medrek	431ec55f6c	service/storage_proxy: Move a comment to its relevant place In `b92fb35`, we put a comment in the wrong place. These changes move it to the right one. Closes scylladb/scylladb#19215	2024-06-12 10:10:02 +03:00
Avi Kivity	dffd0901b3	dist: scylla_util: sysconfig_parser: replace deprecated ConfigParser.readfp ConfigParser.readfp was deprecated in Python 3.2 and removed in Python 3.12. Under Fedora 40, the container fails to launch because it cannot parse its configuration. Fix by using the newer read_file(). Closes scylladb/scylladb#19236	2024-06-12 10:07:10 +03:00
Benny Halevy	2ed81cbf84	locator/topology: update_node: format also shard_count in debug log message The format string is missing `shard_count={}` Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19242	2024-06-12 10:04:23 +03:00
Kefu Chai	4175e02d9d	clustering_bounds_comparator: drop operator<< for bound_kind turns out operator<< for bound_kind is not used anymore, so let's drop it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19159	2024-06-11 18:01:06 +02:00
Avi Kivity	6608f49718	Merge 'make enable_compacting_data_for_streaming_and_repair truly live-update' from Botond Dénes This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken. This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series. Fixes: https://github.com/scylladb/scylladb/issues/18674 - [x] This patch has to be backported because it fixes broken functionality Closes scylladb/scylladb#18705 * github.com:scylladb/scylladb: test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update test/pylib: rest_client: add get_injection() api/error_injection: add getter for error_injection utils/error_injection: add set_parameter() replica/database: fix live-update enable_compacting_data_for_streaming_and_repair	2024-06-11 15:53:19 +03:00
Kefu Chai	d05db52d11	build: remove coverage compiling options from the cxx_flags in `44e85c7d`, we remove coverage compiling options from the cflags when building abseil. but in `535f2b21`, these options were brought back as parts of cxx_flags. so we need to remove them again from cxx_flags. Fixes #19219 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19220	2024-06-11 14:58:27 +03:00
Pavel Emelyanov	b2520b8185	config: Mark tablets feature as unused This features used to be there for a while, but then it was removed by `83d491af02`. This patch partially takes it back, but maps to UNUSED, so that if met in config, it's warned, but other features are parsed as well. refs: #18968 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:58:19 +03:00
Pavel Emelyanov	b85a02a3fe	main: Warn unused features When seeing an UNUSED feature -- print it into log. This is where the enum_option::key is in use. The thing is that experimental features map different unused feature names into the single UNUSED feature enum value, so once the feature is parsed its configured name only persists in the option's key member (saved by previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:56:51 +03:00
Pavel Emelyanov	0c0a7d9b9a	enum_option: Carry optional key on board It facilitates option formatting, but the main purpose is to be able to find out the exact keys, not values, later (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:55:14 +03:00
Pavel Emelyanov	f56cdb1cac	enum_option: Remove on-board _map member The map in question is immutable and can obtained from the Mapper type at any time, there's no need in keeping its copy on each enum_option Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-11 12:54:39 +03:00
Michael Litvak	afc9a1a8a6	db/hints: migrate sync point to host ID Change the format of sync points to use host ID instead of IPs, to be consistent with the use of host IDs in hinted handoff module. Introduce sync point v3 format which is the same as v2 except it stores host IDs instead of IPs. The encoding of sync points now always uses the new v3 format with host IDs. The decoding supports both formats with host IDs and IPs, so a sync point contains now a variant of either types, and in the case of the new format the translation from IP to host ID is avoided.	2024-06-11 11:07:00 +02:00
Michael Litvak	b824e73418	db/hints: rename sync point structures with _v1 suffix to _v1_v2 rename sync point types and variables to have v1/v2 suffix according to their use.	2024-06-11 11:05:59 +02:00
Avi Kivity	03e776ce3e	Update tools/java submodule * tools/java 88809606c8...01ba3c196f (3): > Revert "build: don't add nonexistent directory 'lib' to relocatable packages" > build: run antlr in a separate process > build: don't add nonexistent directory 'lib' to relocatable packages	2024-06-11 11:58:56 +03:00
Botond Dénes	8ef4fbdb87	test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update Avoid this the live-update feature of this config item breaking silently.	2024-06-11 04:17:48 -04:00
Botond Dénes	0c61b1822c	test/pylib: rest_client: add get_injection() The /v2/error_injection/{injection} endpoint now has a GET method too, expose this.	2024-06-11 04:17:48 -04:00
Botond Dénes	feea609e37	api/error_injection: add getter for error_injection Allow external code to obtain information about an error injection point, including whether it is enabled, and importantly, what its parameters are. Together with the `set_parameter()` added in the previous patch, this allows tests to read out the values of internal parameters, via a set_parameter() injection point.	2024-06-11 04:17:48 -04:00
Botond Dénes	4590026b38	utils/error_injection: add set_parameter() Allow injection points to write values into the parameter map, which external code can then examine. This allows exfiltrating the values if internal variables, to be examined by tests, without exposing these variables via an "official" path.	2024-06-11 04:17:48 -04:00
Pavel Emelyanov	1b9cedb3f3	test: Reduce failure detector timeout for failed tablets migration test Most of the time this test spends waiting for a node to die. Helps 3x times Was real 9m21,950s user 1m11,439s sys 1m26,022s Now real 3m37,780s user 0m58,439s sys 1m13,698s refs: #17764 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19222	2024-06-11 09:55:06 +02:00
Calle Wilund	dfd996e7c1	describe_statement: Filter out "extension internal" keyspaces in DESC SCHEMA Fixes /scylladb/scylla-enterprise#4168 Unless listing all (including system) keyspaces, filter out "extension internal" keyspaces. These are to be considered "system" for the purposes of exposing to end user. Closes scylladb/scylladb#19214	2024-06-11 10:01:42 +03:00
Botond Dénes	dbccb61636	replica/database: fix live-update enable_compacting_data_for_streaming_and_repair This config item is propagated to the table object via table::config. Although the field in table::config, used to propagate the value, was utils::updateable_value<T>, it was assigned a constant and so the live-update chain was broken. This patch fixes this.	2024-06-11 01:15:20 -04:00
Raphael S. Carvalho	7b41630299	replica: Refresh mutation source when allocating tablet replicas Consider the following: 1) table A has N tablets and views 2) migration starts for a tablet of A from node 1 to 2. 3) migration is at write_both_read_old stage 4) coordinator will push writes to both nodes (pending and leaving) 5) A has view, so writes to it will also result in reads (table::push_view_replica_updates()) 6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in) 7) so read on step 5 is not being able to find sstable set for tablet migrating in Causes the following error: "tablets - SSTable set wasn't found for tablet 21 of table mview.users" which means loss of write on pending replica. The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot. It's not a problem to refresh the cache snapshot as long as the logical state of the data hasn't changed, which is true when allocating new tablet replicas. That's also done in the context of compactions for example. Fixes #19052. Fixes #19033. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19099	2024-06-11 06:59:04 +03:00
Calle Wilund	51c53d8db6	main/minio_server.py: Respect any preexisting AWS_ACCESS_KEY_ID/AWS_SECRET_ACCESS_KEY vars Fixes scylladb/scylla-pkg#3845 Don't overwrite (or rather change) AWS credentials variables if already set in enclosing environment. Ensures EAR tests for AWS KMS can run properly in CI. v2: * Allow environment variables in reading obj storage config - allows CI to use real credentials in env without risking putting them info less seure files * Don't write credentials info from miniserver into config, instead use said environment vars to propagate creds. v3: * Fix python launch scripts to not clear environment, thus retaining above aws envs. Closes scylladb/scylladb#19086	2024-06-11 06:59:04 +03:00
Nadav Har'El	73dfa4143a	cql-pytest: translate Cassandra's tests for SELECT DISTINCT This is a translation of Cassandra's CQL unit test source file DistinctQueryPagingTest.java into our cql-pytest framework. The 5 tests did not reproduce any previously-unknown bug, but did provide additional reproducers for one already-known issue: Refs #10354: SELECT DISTINCT should allow filter on static columns, not just partition keys Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18971	2024-06-11 06:59:04 +03:00
Michał Chojnowski	823da140dd	test_tablets: add test_tablet_storage_freeing Tests that tablet storage is freed after it is migrated away. Fixes #16946	2024-06-10 14:25:37 +02:00
Michał Chojnowski	7741491b47	test: pylib: add get_sstables_disk_usage() Adds an util for measuring the disk usage of the given table on the given node. Will be used in a follow-up patch for testing that sstables are freed properly.	2024-06-10 14:25:37 +02:00
Pavel Emelyanov	b10ddcfd18	main: Register task manager API next to task manager itself Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:49:11 +03:00
Pavel Emelyanov	02c36ebd2e	main: Register messaging API next to messaging service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:49:02 +03:00
Pavel Emelyanov	f7e4724770	main: Register repair API next to repair service Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-10 12:48:51 +03:00
Anna Stuchlik	55ed18db07	doc: mark tablets as GA in the CREATE KEYSPACE section This commit removes the information that tablets are an experimental feature from the CREATE KEYSPACE section. In addition, it removes the notes and cautions that are redundant when a feature is GA, especially the information and warnings about the future plans. Fixes https://github.com/scylladb/scylladb/issues/18670 Closes scylladb/scylladb#19063	2024-06-10 12:36:36 +03:00
Kefu Chai	069be01451	lang: remove redundant std::move() C++ standard enforces copy elision in this case. and copy elision is more performant than constructing the return value with a move constructor, so no need to use `std:move()` here. and GCC-14 rightfully points this out: ``` /home/kefu/dev/scylladb/lang/lua.cc: In member function ‘data_value {anonymous}::from_lua_visitor::operator()(const utf8_type_impl&)’: /var/ssd/scylladb/lang/lua.cc:797:25: error: redundant move in return statement [-Werror=redundant-move] 797 \| return std::move(s); \| ~~~~~~~~~^~~ /home/kefu/dev/scylladb/lang/lua.cc:797:25: note: remove ‘std::move’ call ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19187	2024-06-10 07:41:25 +03:00
Botond Dénes	7b2aad56c4	test/boost/sstable_datafile_test: remove unused semaphores The tests use the ones from test_env, the explicitely created ones are unused. Closes scylladb/scylladb#19167	2024-06-09 20:43:59 +03:00
Kefu Chai	535f2b2134	build: populate cxxflags to abseil before this change, when building abseil, we don't pass cxxflags to compiler, and abseil libraries are build with the default optimization level. in the case of clang, its default optimization level is `-O0`, it compiles the fastest, but the performance of the emitted code is not optimized for runtime performance. but we expect good performance for the release build. a typical command line for building abseil looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -MF absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o.d -o absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/base/internal/scoped_set_env.cc ``` so, in this change, we populate cxxflags to abseil, so that the per-mode `-O` option can be populated when building abseil. after this change, the command line building abseil in release mode looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffunction-sections -fdata-sections -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -MF absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o.d -o absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/flags/internal/commandlineflag.cc ``` Refs `0b0e661a85` Fixes #19161 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19160	2024-06-09 20:01:50 +03:00
Tomasz Grabiec	c8f71f4825	test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout The check query may be executed on a node which doesn't yet see that the downed server is down, as it is not shut down gracefully. The query coordinator can choose the down node as a CL=1 replica for read and time out. To fix, wait for all nodes to notice the node is down before executing the checking query. Fixes #17938 Closes scylladb/scylladb#19137	2024-06-09 19:39:57 +03:00
Kefu Chai	b5dce7e3d0	docs: correct the link pointing to Scylla U before this change it points to https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/ which then redirects the browser to https://university.scylladb.com/courses/scylla-operations/, but it should have point to https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/ in this change, the hyperlink is corrected. Fixes #19163 Refs `6e97b83b60` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19182	2024-06-09 19:37:21 +03:00
Avi Kivity	7b301f0cb9	Merge 'Encapsulate wasm and lua management in lang::manager service' from Pavel Emelyanov After wasm udf appeared, code in main, create_function_statement and schema_tables got some involvements into details of wasm engine management. Also, even prior to this, there was duplication in how function context is created by statement code and schema_tables code. This PR generalizes function context creation and encapsulates the management in sharded<lang::manager> service. Also it removes the wasm::startup_context thing and makes wasm start/stop be "classical" (see #2737) Closes scylladb/scylladb#19166 * github.com:scylladb/scylladb: code: Enlighten wasm headers usage lang: Unfriend wasm context from manager lang, cql3, schema_tables: Don't mess with db::config lang: Don't use db::config to create lua context lang: Don't use db::config to create wasm context lang: Drop manager::precompile() method cql3, schema_tables: Generalize function creation wasm: Replace startup_context with wasm_config lang: Add manager::start() method lang: Move manager to lang namespace lang: Move wasm::manager to its .cc/.hh files	2024-06-09 19:32:26 +03:00
Kefu Chai	9318d21a22	sstables: change const_iterator::value_type to uint64_t in general, the value_type of a `const_iterator` is `T` instead of `const T`, what has the const specifier is `reference`. because, when dereferencing an iterator, the value type does not matter any more, as it always a copy. and GCC-14 points this out: ``` /home/kefu/dev/scylladb/sstables/compress.hh:224:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] 224 \| value_type operator() const { \| ^~~~~~~~~~ /home/kefu/dev/scylladb/sstables/compress.hh:228:13: error: type qualifiers ignored on function return type [-Werror=ignored-qualifiers] 228 \| value_type operator[](ssize_t i) const { \| ^~~~~~~~~~ ``` so, in this change, let's change the value_type to `uint64_t`. please note, it's not typical to return `value_type` from `operator` or `operator[]` of an iterator. but due to the design of segmented_offsets, we cannot return a reference, so let's keep it this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19186	2024-06-09 19:21:16 +03:00
Avi Kivity	b2a500a9a1	Merge 'alternator: keep TTL work in the maintenance scheduling group' from Botond Dénes Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. Closes scylladb/scylladb#18729 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-09 19:20:18 +03:00
Kefu Chai	58edee8d93	mutation/mutation_rebuilder: remove redundant std::move() GCC-14 rightfully points out: ``` /var/ssd/scylladb/mutation/mutation_rebuilder.hh: In member function ‘const mutation& mutation_rebuilder::consume_new_partition(const dht::decorated_key&)’: /var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: error: redundant move in initialization [-Werror=redundant-move] 24 \| _m = mutation(_s, std::move(dk)); \| ~~~~~~~~~^~~~ /var/ssd/scylladb/mutation/mutation_rebuilder.hh:24:36: note: remove ‘std::move’ call ``` as `dk` is passed with a const reference, `std::move()` does not help the callee to consume from it. so drop the `std::move()` here. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19188	2024-06-09 19:19:37 +03:00
Nadav Har'El	13cf6c543d	test/alternator: fix flaky test test_item_latency The Alternator test test_metrics.py::test_item_latency confirms that for several operation types (PutItem, GetItem, DeleteItem, UpdateItem) we did not forget to measure their latencies. The test checked that a latency was updated by checking that two metrics increases: scylla_alternator_op_latency_count scylla_alternator_op_latency_sum However, it turns out that the "sum" is only an approximate sum of all latencies, and when the total sum grows large it sometimes does not increase when a short latency is added to the statistics. When this happens, this test fails on the assertion that the "sum" increases after an operation. We saw this happening sometimes in CI runs. The simple fix is to stop checking _sum at all, and only verify that the _count increases - this is really an integer counter that unconditionally increases when a latency is added to the histogram. Don't worry that the strength of this test is reduced - this test was never meant to check the accuracy or correctness of the histograms - we should have different (and better) tests for that, unrelated to Alternator. The purpose of this test is only to verify that for some specific operation like PutItem, Alternator didn't forget to measure its latency and update the histogram. We want to avoid a bug like we had in counters in the past (#9406). Fixes #18847. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19080	2024-06-09 19:19:09 +03:00
Botond Dénes	37fd568139	sstables/compress.hh: remove unused forward declaration struct compress if forward declared right before its definition. At some point in the past there was probably some code there using it, but now its gone so remove it. Closes scylladb/scylladb#19168	2024-06-09 17:52:05 +03:00
Guilherme Nogueira	cf157e4423	Remove comma that breaks CQL DML on tablets.rst The current sample reads: ```cql CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3, } AND tablets = { 'enabled': false }; ``` The additional comma after `'replication_factor': 3` breaks the query execution. Closes scylladb/scylladb#19177	2024-06-09 14:58:13 +03:00
Botond Dénes	6e3b997e04	docs: nodetool status: document keyspace and table arguments Also fix the example nodetool status invocation. Fixes: #17840 Closes scylladb/scylladb#18037	2024-06-09 00:37:12 +02:00
Kefu Chai	f4706be8a8	test: test_topology_ops: adapt to tablets in `e7d4e080`, we reenabled the background writes in this test, but when running with tablets enabled, background writes are still disabled because of #17025, which was fixed last week. so we can enable background writes with tablets. in this change, * background writes are enabled with tablets. * increase the number of nodes by 1 so that we have enough nodes to fulfill the needs of tablets, which enforces that the number of replicas should always satisfy RF. * pass rf to `start_writes()` explicitly, so we have less magic numbers in the test, and make the data dependencies more obvious. Fixes #17589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18707	2024-06-08 17:46:37 +02:00
Dawid Medrek	a5528a2093	db/hints: Log when ignoring invalid hint directories In `58784cd`, `aa4b06a` and other commits migrating hinted handoff from IPs to host IDs (scylladb/scylladb#15567), we started ignoring hint directories of invalid names, i.e. those that represent neither an IP address, nor a host ID. They remain on disk and are taken into account while computing e.g. the total size of hints, but they're not used in any way. These changes add logs informing the user when Scylla encounters such a directory. Closes scylladb/scylladb#17566	2024-06-07 19:19:15 +02:00
Michał Chojnowski	fee48f67ef	storage_proxy: avoid infinite growth of _throttled_writes storage_proxy has a throttling mechanism which attempts to limit the number of background writes by forcefully raising CL to ALL (it's not implemented exactly like that, but that's the effect) when the amount of background and queued writes is above some fixed threshold. If this is applied to a write, it becomes "throttled", and its ID is appended to into _throttled_writes. Whenever the amount of background and queued writes falls below the threshold, writes are "unthrottled" — some IDs are popped from _throttled_writes and the writes represented by these IDs — if their handlers still exist — have their CL lowered back. The problem here is that IDs are only ever removed from _throttled_writes if the number of queued and background writes falls below the threshold. But this doesn't have to happen in any finite time, if there's constant write pressure. And in fact, in one load test, it hasn't happened in 3 hours, eventually causing the buffer to grow into gigabytes and trigger OOM. This patch is intended to be a good-enough-in-practice fix for the problem. Fixes scylladb/scylladb#17476 Fixes scylladb/scylladb#1834 Closes scylladb/scylladb#19136	2024-06-07 15:56:23 +02:00
Gleb Natapov	34cf5c81f6	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes scylladb/scylladb#18863 Closes scylladb/scylladb#19138	2024-06-07 15:31:44 +02:00
Pavel Emelyanov	bebd121936	code: Enlighten wasm headers usage Now when function context creation is encapsulated in lang::manager, some .cc files can stop using wasm-specific headers and just go with the lang/manager.hh one. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	ceebbc5948	lang: Unfriend wasm context from manager The friendship was needed to get engine and instance cache from manager, but there's a shorter way to create cotnext with the info it needs. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	b0ffc03599	lang, cql3, schema_tables: Don't mess with db::config Not function context creation is encapsulated in lang::manager so it's possible to patch-out few more places that use database as config provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	b854bf4b83	lang: Don't use db::config to create lua context Similarly to previous patch, lua context needs db::config for creation. It's better to get the configurables via lang::manager::config. One thing to note -- lua config carries updateable_values on board, but respective db::config options and _not_ LiveUpdate-able, so the lua config could just use simple data types. This patch keeps updateable values intact for brevity. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	783ccc0a74	lang: Don't use db::config to create wasm context The managerr needs to get two "fuel" configurables from db::config in order to create context. Instead of carrying db config from callers, keep the options on existing lang::manager::config and use them. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	f277bd89f5	lang: Drop manager::precompile() method It's not helping much any longer. Manager can call wasm:: stuff directly with less code involved. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	882b2f4e9f	cql3, schema_tables: Generalize function creation When a function is created with the CREATE FUNCTION statement, the statement handler does all the necessary preparations on its own. The very same code exists in schema_tables, when the function is loaded on boot. This patch generalizes both and keeps function language-specific context creation inside lang/ code. The creation function returns context via argument reference. It would have been nicer if it was returned via future<>, but it's not suitable for future<T> type :( Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 13:07:05 +03:00
Pavel Emelyanov	fe7ff7172d	wasm: Replace startup_context with wasm_config The lang::manager starts with the help of a context because it needs to have std::shared_ptr<> pointg to cross-shard shared wasm engine and runner thread. For that a context is created in advance, that then helps sharing the engine and runner across manager instances. This patch removes the "context" and replaces it with classical manager::config. With it, it's lang::manager who's now responsible for initializing itself. In order to have cross-shard engine and thread pointers, the start() method uses invoke_on_others() facility to share the pointer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	0dad72b736	lang: Add manager::start() method Just like any other sharded<> service, the lang::manager now starts and stops in a classical sequence of await sharded<manager>::start() defer([] { await sharded<manager>::stop() }) await sharded<manager>::invoke_on_all(&manager::start) For now the method is no-op, next patches will start using it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	f950469af5	lang: Move manager to lang namespace And, while at it, rename local variable to refer to it to as "manager" not "wasm". Query processor and database also have getters named "wasm()", these are not renamed yet to keep patch smaller (and those getters are going to be reworked further anyway). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Pavel Emelyanov	1dec79e97d	lang: Move wasm::manager to its .cc/.hh files It's going to become a facade in front of both -- wasm and lua, so keep it in files with language independent names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-07 12:35:57 +03:00
Marcin Maliszkiewicz	c13fea371c	cql3: always return created event in create ks/table/type/view statement In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS and later USE KEYSPACE it can happen that schema in driver's session is out of sync because it synces when it receives special message from CREATE KEYSPACE response. Similar situation occurs with other schema change statements. In this patch we fix only create keyspace/table/type/view statements by always sending created event. Behavior of any other schema altering statements remains unchanged.	2024-06-07 10:36:40 +02:00
Marcin Maliszkiewicz	f6108a72d3	cql3: auth: move auto-grant closer to resource creation code This should reduce the risk of re-introducing issue similar to the one fixed in `ab6988c52f` When grant code is closer to actual creation code (announcing mutations) there is lower chance of those two effects being triggered differently, if we ever call grant_permissions_to_creator and not announce mutations that's very likely a security vulnerability. Additionally comment was rewritten to be more accurate.	2024-06-07 10:26:32 +02:00
Piotr Dulikowski	e18aeb2486	Merge 'mv: gossip the same backlog if a different backlog was sent in a response' from Wojciech Mitros Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog - this can be observed in the following scenario: 1. Cluster starts, all nodes gossip their empty view update backlog to one another 2. On node N, `view_update_backlog_broker` (the backlog gossiper) performs an iteration of its backlog update loop, sees no change (backlog has been empty since the start), schedules the next iteration after 1s 3. Within the next 1s, coordinator (different than N) sends a write to N causing a remote view update (which we do not wait for). As a result, node N replies immediately with an increased view update backlog, which is then noted by the coordinator. 4. Still within the 1s, node N finishes the view update in the background, dropping its view update backlog to 0. 5. In the next and following iterations of `view_update_backlog_broker` on N, backlog is empty, as it was in step 2, so no change is seen and no update is sent due to the check ``` auto backlog = _sp.local().get_view_update_backlog(); if (backlog_published && backlog_published == backlog) { sleep_abortable(gms::gossiper::INTERVAL, _as).get(); continue; } ``` After this scenario happens, the coordinator keeps an information about an increased view update backlog on N even though it's actually already empty This patch fixes the issue this by notifying the gossip that a different backlog was sent in a response, causing it to send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461 Similarly to https://github.com/scylladb/scylladb/pull/18646, without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none Tests: manual. Currently this patch only affects the length of MV flow control delay, which is not reliable to base a test on. A proper test will be added when MV admission control is added, so we'll be able to base the test on rejected requests Closes scylladb/scylladb#18663 github.com:scylladb/scylladb: mv: gossip the same backlog if a different backlog was sent in a response node_update_backlog: divide adding and fetching backlogs	2024-06-07 10:20:21 +02:00
Marcin Maliszkiewicz	281c06ba2e	cql3: extract create ks/table/type/view event code So that the code in subsequent commit is cleaner. Create function/aggregate code was not changed as it would require bigger refactor.	2024-06-07 10:07:50 +02:00
Wojciech Mitros	4aa7ada771	exceptions: make view update timeouts inherit from timed_out_error Currently, when generating and propagating view updates, if we notice that we've already exceeded the time limit, we throw an exception inheriting from `request_timeout_exception`, to later catch and log it when finishing request handling. However, when catching, we only check timeouts by matching the `timed_out_error` exception, so the exception thrown in the view update code is not registered as a timeout exception, but an unknown one. This can cause tests which were based on the log output to start failing, as in the past we were noticing the timeout at the end of the request handling and using the `timed_out_error` to keep processing it and now, even though we do notice the timeout even earlier, due to it's type we log an error to the log, instead of treating it as a regular timeout. In this patch we make the error thrown on timeout during view updates inherit from `timed_out_error` instead of the `request_timeout_exception` (it is also moved from the "exceptions" directory, where we define exceptions returned to the user). Aside from helping with the issue described above, we also improve our metrics, as the `request_timeout_exception` is also not checked for in the `is_timeout_exception` method, and because we're using it to check whether we should update write timeout metrics, they will only start getting updated after this patch. Closes scylladb/scylladb#19102	2024-06-07 09:54:48 +02:00
Kefu Chai	01568a36a5	.github: add workflow to build with clang nightly to be prepared for changes from clang, and enjoy the new warnings/errors from this compiler. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 14:23:06 +08:00
Kefu Chai	bbeabe2989	.github: rename clang-tidy-matcher.json to clang-matcher.json as the matcher actually applies to all warnings from clang frontend, and hence can be reused when building the tree with clang, so let's rename it before using it in the clang build workflows. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 14:23:06 +08:00
Anna Stuchlik	582bafabb3	doc: set 6.0 as the latest stable version This commit updates the configuration for ScyllaDB documentation so that: 6.0 is the latest version. 6.0 is removed from the list of unstable versions. It must be merged when ScyllaDB 6.0 is released. No backport is required. Closes scylladb/scylladb#19003	2024-06-07 09:13:56 +03:00
Kefu Chai	571ab9f5f0	config: expand on rpc_keepalive's description before this change, we use "RPC or native". but before thrift support is removed "RPC" implies "thrift", now that we've dropped thrift support, "RPC" could be confusing here, so let's be more specific, and put all connection types in place of "RPC or native". Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	c75442bc2a	api: s/rpc/thrift/ replace all occurrences of "rpc" in function names and debugging messages to "thrift", as "rpc" is way too general, and since we are removing "thrift" support, let's take this opportunity to use a more specific name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	36239ec592	db/system_keyspace: drop thrift_version from system.local table so we don't create new sstables with this unused column, but we can still open old sstables of this table which was created with the old schema. Refs #3811 Refs #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:10 +08:00
Kefu Chai	f688fa16bc	transport: do not return client_type from cql_server::connection::make_client_key() since we've dropped the thift support, the `client_type` is always `cql`, there is no need to differentiate different clients anymore. so, we change `make_client_key()` so that it only return the IP address and port. Refs #3811 Refs #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 09:23:06 +08:00
Kefu Chai	0e04a033af	.github: add alternator to iwyu's CLEANER_DIR to avoid future violations of include-what-you-use. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:45:00 +08:00
Kefu Chai	a2f54ded80	alternator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:45:00 +08:00
Kefu Chai	0ff66bf564	.github: change severity to error in clang-include-cleaner since we've addressed all warnings, we are ready to tighten the standards of this workflow, so that contributors are awared of the violation of include-what-you-use policy. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:28:52 +08:00
Kefu Chai	d33ab21ef8	exceptions: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 07:28:52 +08:00
Kefu Chai	ad649be1bf	treewide: drop thrift support thrift support was deprecated since ScyllaDB 5.2 > Thrift API - legacy ScyllaDB (and Apache Cassandra) API is > deprecated and will be removed in followup release. Thrift has > been disabled by default. so let's drop it. in this change, * thrift protocol support is dropped * all references to thrift support in document are dropped * the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well. * "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool. * `rpc_port` and `start_rpc` options are preserved, but they are marked as "Unused". so that the new release of scylladb can consume existing scylla.yaml configurations which might contain these settings. by making them deprecated, user will be able get warned, and update their configurations before we actually remove them in the next major release. Fixes #3811 Fixes #18416 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-07 06:44:59 +08:00
Avi Kivity	cd553848c1	Merge 'auth-v2: use a single transaction in auth related statements ' from Marcin Maliszkiewicz Due to gradual raft introduction into statements code in cases when single statement modified more than one table or mutation producing function was composed out of simpler ones we violated transactional logic and statement execution was not atomic as whole. This patch changes that, so now either all changes resulting from statement execution are applied or none. Affected statements types are: - schema modification - auth modifications - service levels modifications Fixes https://github.com/scylladb/scylladb/issues/17738 Closes scylladb/scylladb#17910 * github.com:scylladb/scylladb: raft: rename mutations_collector to group0_batch raft: rename announce to commit cql3: raft: attach description to each mutations collector group auth: unify mutations_generator type auth: drop redundant 'this' keyword auth: remove no longer used code from standard_role_manager::legacy_modify_membership cql3: auth: use mutation collector for service levels statements cql3: auth: use mutation collector for alter role cql3: auth: use mutation collector for grant role and revoke role cql3: auth: use mutation collector for drop role and auto-revoke auth: add refactored modify_membership func in standard_role_manager auth: implement empty revoke_all in allow_all_authorizer auth: drop request_execution_exception handling from default_authorizer::revoke_all Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" cql3: auth: use mutation collector for grant and revoke permissions cql3: extract changes_tablets function in alter_keyspace_statement cql3: auth: use mutation collector for create role statement auth: move create_role code into service auth: add a way to announce mutations having only client_state ref auth: add collect_mutations common helper auth: remove unused header in common.hh auth: add class for gathering mutations without immediate announce auth: cql3: use auth facade functions consistently on write path auth: remove unused is_enforcing function	2024-06-06 17:31:26 +03:00
Yaniv Michael Kaul	82875095e9	Raft: improve descriptions of metrics 1. Fixed a single typo (send -> sent) 2. Rephrase 'How many' to 'Number of' and use less passive tense. 3. Be more specific in the description of the different metrics insteda of the more generic descriptions. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#19067	2024-06-06 15:18:47 +03:00
Kefu Chai	bac7e1e942	doc: document "enable_tablets" option it sets the cluster feature of tablets, and is a prerequisite for using tablets. Refs #18670 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19090	2024-06-06 15:06:32 +03:00
Marcin Maliszkiewicz	63e6334a64	raft: rename mutations_collector to group0_batch	2024-06-06 13:26:34 +02:00
Kamil Braun	57e810c852	Merge 'Serialize repair with tablet migration' from Tomasz Grabiec We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requests start using the new topology. Also, if transitions are already running, repair waits for them to finish. A blocked tablet migration (e.g. due to down node) will block repair, whereas before it would fail. Once admin resolves the cause of blocked migration, repair will continue. Fixes #17658. Fixes #18561. Closes scylladb/scylladb#18641 * github.com:scylladb/scylladb: test: pylib: Do not block async reactor while removing directories repair: Exclude tablet migrations with tablet repair repair_service: Propagate topology_state_machine to repair_service main, storage_service: Move topology_state_machine outside storage_service storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() tablet_scheduler: Make disabling of balancing interrupt shuffle mode tablet_scheduler: Log whether balancing is considered as enabled	2024-06-06 11:27:03 +02:00
Kamil Braun	256517b570	Merge 'tablets: Filter-out left nodes in get_natural_endpoints()' from Tomasz Grabiec The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 Closes scylladb/scylladb#18955 * github.com:scylladb/scylladb: tablets: Filter-out left nodes in get_natural_endpoints() test: pylib: Extract start_writes() load generator utility	2024-06-06 11:23:27 +02:00
Wojciech Mitros	f70f774e40	mv: gossip the same backlog if a different backlog was sent in a response Currently, there are 2 ways of sharing a backlog with other nodes: through a gossip mechanism, and with responses to replica writes. In gossip, we check each second if the backlog changed, and if it did we update other nodes with it. However if the backlog for this node changed on another node with a write response, the gossiped backlog is currently not updated, so if after the response the backlog goes back to the value from the previous gossip round, it will not get sent and the other node will stay with an outdated backlog. This patch changes this by notifying the gossip that a the backlog changed since the last gossip round so a different backlog could have been send through the response piggyback mechanism. With that information, gossip will send an unchanged backlog to other nodes in the following gossip round. Fixes: https://github.com/scylladb/scylladb/issues/18461	2024-06-06 10:45:15 +02:00
Wojciech Mitros	272e80fe0a	node_update_backlog: divide adding and fetching backlogs Currently, we only update the backlogs in node_update_backlog at the same time when we're fetching them. This is done using storage_proxy's method get_view_update_backlog, which is confusing because it's a getter with side-effects. Additionally, we don't always want to update the backlog when we're reading it (as in gossip which is only on shard 0) and we don't always want to read it when we're updating it (when we're not handling any writes but the backlog drops due to background work finish). This patch divides the node_view_backlog::add_fetch as well the storage_proxy::get_view_update_backlog both into two methods; one for updating and one for reading the backlog. This patch only replaces the places where we're currently using the view backlog getter, more situations where we should get/update the backlog should be considered in a following patch.	2024-06-06 10:45:13 +02:00
Botond Dénes	8ff1742182	Merge 'Relax production_snitch_base's property file parsing' from Pavel Emelyanov It consists of reading method and parsing one and it uses class fields to carry data between those two. The former is additionally built with curly continuation chains, while it's naturally linear, so turn it into a coroutine while at it Closes scylladb/scylladb#18994 * github.com:scylladb/scylladb: snitch: Remove production_snitch_base::_prop_file_contents snitch: Remove production_snitch_base::_prop_file_size snitch: Coroutinize load_property_file()	2024-06-06 09:14:33 +03:00
Botond Dénes	cd10beb89d	Merge 'Don't use db::config by gossiper' from Pavel Emelyanov All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places. Closes scylladb/scylladb#19051 * github.com:scylladb/scylladb: gossiper: Stop using db::config gossiper: Move force_gossip_generation on gossip_config gossiper: Move failure_detector_timeout_ms on gossip_config main: Fix indentation after previous patch main: Make gossiper config a sharded parameter main: Add local variable for set of seeds main: Add local variable for group0 id main: Add local variable for cluster_name	2024-06-06 09:12:51 +03:00
Botond Dénes	44975abe18	Merge 'Sanitize start-stop of protocol servers' from Pavel Emelyanov Protocol servers are started last, and are registered in storage_service, which stops them. Also there are deferred actions scheduled to stop protocol servers on aborted start and a FIXME asking to make even this case rely on storage_service. Also, there's a (rather rare) aborted-start bug in alternator and redis. Yet, thrift can be left started in some weird circumstances. This patch fixes it all. As a side effect, the start-stop code becomes shorter and a bit better structured. refs: #2737 Closes scylladb/scylladb#19042 * github.com:scylladb/scylladb: main: Start alternator expiration service earlier main: Start redis transparently main: Start alternator transparently main: Start thrift transparently main: Start native transport transparently storage_service: Make register_protocol_server() start the server storage_service: Turn register_protocol_server() async method storage_service: Outline register_protocol_server() main: Schedule deferred drain_on_shutdown() prior to protocol servers main: Move some trailing startup earlier	2024-06-06 09:08:05 +03:00
Botond Dénes	db5c23491e	Merge '.github: annotate the report from clang-include-cleaner' from Kefu Chai this series * add annotation to the github pull request when extraneous `#include` processor macros are identified * add `exceptions` subdirectory to `CLEANER_DIRS` to demonstrate the annotation. we will fix the identified issue in a follow-up change. --- * This is a CI workflow improvement. No backporting is required. Closes scylladb/scylladb#19037 * github.com:scylladb/scylladb: .github: add exception to CLEANER_DIRS .github: annotate the report from clang-include-cleaner .github: build headers before running clang-include-cleaner	2024-06-06 09:02:26 +03:00
Pavel Emelyanov	acc438e98b	view-update-generator: Start in provided scheduling group Currently it gets the streaming/maintenance one from database, but it can as well just assume that it's already running in the correct one, and the main code fulfils this assumption. This removes one more place that uses database as sched groups provider. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19078	2024-06-06 08:58:05 +03:00
Tzach Livyatan	c30f81c389	Docs: fix start command in Update replace-dead-node.rst Fix #18920 Closes scylladb/scylladb#18922	2024-06-06 08:56:07 +03:00
Botond Dénes	7aa9bfa661	Merge 'util/result_try: pass template arg list explicitly' from Kefu Chai clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6). so, to be future-proof and to be standard compliant, let's pass the template arguments. otherwise we'd have build failure like ``` error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] ``` --- no need to backport. as this change only addresses a FTBFS with a recent build of clang-19. but our CI is not a clang built from llvm's main HEAD. Closes scylladb/scylladb#19100 * github.com:scylladb/scylladb: util/result_try: pass template arg list explicitly util/result_try: pass func as `const F&` instead of `F&&`	2024-06-06 08:54:42 +03:00
Nadav Har'El	b5fd854c77	cql-pytest: be more forgiving to ancient versions of Scylla We recently added to cql-pytest tests the ability to check if tablets are enabled or not (for some tablet-specific tests). When running tests against Cassandra or old pre-tablet versions of Scylla, this fact is detected and "False" is returned immediately. However, we still look at a system table which didn't exist on really ancient versions of Scylla, and tests couldn't run against such versions. The fix is trivial: if that system table is missing, just ignore the error and return False (i.e., no tablets). There were no tablets on such ancient versions of Scylla. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19098	2024-06-06 08:53:26 +03:00
Pavel Emelyanov	4606302ead	distributed_loader: Remove base_path from populator It's unused, populator uses it to print debugging messages, but it can as well use table->dir() for it, just as sstable_directory does. One message looks useless and is removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19113	2024-06-06 08:49:41 +03:00
Pavel Emelyanov	84f0bab27c	hints/manager: Simplify hints dir evaluation Currently the code wraps simple "if" with std::invoke over a lambda. Also, the local variable that gets the result, is declared as const one, which prevents it from being std::move()-d in the very next line. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19106	2024-06-06 08:31:30 +03:00
Pavel Emelyanov	ad0e6b79fc	replica: Remove all_datadir from keyspace config This vector of paths is only used to generate the same vector of paths for table config, but the latter already has all the needed info. It's the part of the plan to stop using paths/directories in keyspaces and tables, because with storage-options tables no longer keep their data in "files on disk", so this information goes to sstables storage manager (refs #12707) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19119	2024-06-06 08:30:34 +03:00
Kefu Chai	4a36918989	topology_coordinator: handle/wait futures when stopping topology_coordinator before this change, unlike other services in scylla, topology_coordinator is not properly stopped when it is aborted, because the scylla instance is no longer a leader or is being shut down. its `run()` method just stops the grand loop and bails out before topology_coordinator is destroyed. but we are tracking the migration state of tablets using a bunch of futures, which might not be handled yet, and some of them could carry failures. in that case, when the `future` instances with failure state get destroyed, seastar calls `report_failed_future`. and seastar considers this practice a source a bug -- as one just fails to handle an error. that's why we have following error: ``` WARN 2024-05-19 23:00:42,895 [shard 0:strm] seastar - Exceptional future ignored: seastar::rpc::unknown_verb_error (unknown verb), backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56c14e /home/bhalevy/.ccm/scylla-repository/local_tarball/libre loc/libseastar.so+0x56c770 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56ca58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x38c6ad 0x29cdd07 0x29b376b 0x29a5b65 0x108105a /home/bhalevy/.ccm/scylla-repository/local_tarbal l/libreloc/libseastar.so+0x3ff1df /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x400367 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff838 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36de58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d092 0x1017cba 0x1055080 0x1016ba7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x1015524 ``` and the backtrace looks like: ``` seastar::current_backtrace_tasklocal() at ??:? seastar::current_tasktrace() at ??:? seastar::current_backtrace() at ??:? seastar::report_failed_future(seastar::future_state_base::any&&) at ??:? service::topology_coordinator::tablet_migration_state::~tablet_migration_state() at topology_coordinator.cc:? service::topology_coordinator::~topology_coordinator() at topology_coordinator.cc:? service::run_topology_coordinator(seastar::sharded<db::system_distributed_keyspace>&, gms::gossiper&, netw::messaging_service&, locator::shared_token_metadata&, db::system_keyspace&, replica::database&, service::raft_group0&, service::topology_state_machine&, seastar::abort_source&, raft::server&, seastar::noncopyable_function<seastar::future<service::raft_topology_cmd_result> (utils::tagged_tagged_integer<raft::internal::non_final, raft::term_tag, unsigned long>, unsigned long, service::raft_topology_cmd const&)>, service::tablet_allocator&, std::chrono::duration<long, std::ratio<1l, 1000l> >, service::endpoint_lifecycle_notifier&) [clone .resume] at topology_coordinator.cc:? seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:? seastar::reactor::run_some_tasks() at ??:? seastar::reactor::do_run() at ??:? seastar::reactor::run() at ??:? seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:? ``` and even worse, these futures are indirectly owned by `topology_coordinator`. so there are chances that they could be used even after `topology_coordinator` is destroyed. this is a use-after-free issue. because the `run_topology_coordinator` fiber exits when the scylla instance retires from the leader's role, this use-after-free could be fatal to a running instance due to undefined behavior of use after free. so, in this change, we handle the futures in `_tablets`, and note down the failures carried by them if any. Fixes #18745 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18991	2024-06-06 07:55:03 +03:00
Israel Fruchter	1fd600999b	Update tools/cqlsh submodule v6.0.20 * tools/cqlsh c8158555...0d58e5ce (6): > cqlsh.py: fix server side describe after login command > cqlsh: try server-side DESCRIBE, then client-side > Refactor tests to accept both client and server side describe > github actions: support testing with enterprise release > Add the tab-completion support of SERVICE_LEVEL statements > reloc/build_reloc.sh: don't use `--no-build-isolation` Closes scylladb/scylladb#18990	2024-06-06 07:32:05 +03:00
Tomasz Grabiec	2c3f7c996f	test: pylib: Fetch all pages by default in run_async Fetching only the first page is not the intuitive behavior expected by users. This causes flakiness in some tests which generate variable amount of keys depending on execution speed and verify later that all keys were written using a single SELECT statement. When the amount of keys becomes larger than page size, the test fails. Fixes #18774 Closes scylladb/scylladb#19004	2024-06-05 18:07:24 +03:00
Tomasz Grabiec	5ca54a6e88	test: pylib: Do not block async reactor while removing directories This fixes a problem where suite cleanup schedules lots of uninstall() tasks for servers started in the suite, which schedules lots of tasks, which synchronously call rmtree(). These take over a minute to finish, which blocks other tasks for tests which are still executing. In particular, this was observed to case ManagerClient.server_stop_gracefully() to time-out. It has a timeout of 60 seconds. The server was stopped quickly, but the RESTful API response was not processed in time and the call timed out when it got the async reactor.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	98323be296	repair: Exclude tablet migrations with tablet repair We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requets start using the new topology. Also, if transitions are already running, repair waits for them to finish. Fixes #17658. Fixes #18561.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	e97acf4e30	repair_service: Propagate topology_state_machine to repair_service	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	c45ce41330	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	476c076a21	storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() Will be used later in a place which doesn't have access to storage_service but has to toplogy_state_machine. It's not necessary to start group0 operation around polling because the busy() state can be checked atomically and if it's false it means the topology is no longer busy.	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	1513d6f0b0	tablet_scheduler: Make disabling of balancing interrupt shuffle mode Tests will rely on that, they will run in shuffle mode, and disable balancing around section which otherwise would be infinitely blocked by ongoing shuffling (like repair).	2024-06-05 16:11:22 +02:00
Tomasz Grabiec	6c64cf33df	tablet_scheduler: Log whether balancing is considered as enabled	2024-06-05 16:11:22 +02:00
Benny Halevy	b2fa954d82	gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory Assigning to a member of an uninitialized optional does not initialize the object before assigning to it. This resulted in the AddressSanitizer detecting attempt to double-free when the uninitialized string contained apprently a bogus pointer. The change emplaces the returned optional when needed without resorting to the copy-assignment operator. So it's not suceptible to assigning to uninitialized memory, and it's more efficient as well... Fixes scylladb/scylladb#19041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#19043	2024-06-05 13:09:01 +03:00
Kamil Braun	18f5d6fd89	Merge 'Fail bootstrap if ip mapping is missing during double write stage' from Gleb Natapov If a node restart just before it stores bootstrapping node's IP it will not have ID to IP mapping for bootstrapping node which may cause failure on a write path. Detect this and fail bootstrapping if it happens. Closes scylladb/scylladb#18927 * github.com:scylladb/scylladb: raft topology: fix indentation after previous commit raft topology: do not add bootstrapping node without IP as pending test: add test of bootstrap where the coordinator crashes just before storing IP mapping schema_tables: remove unused code	2024-06-05 11:15:15 +02:00
Raphael S. Carvalho	3983f69b2d	topology_experimental_raft/test_tablets: restore usage of check_with_down `e7246751b6` incorrectly dropped its usage in test_tablet_missing_data_repair. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#19092	2024-06-05 10:11:02 +02:00
Kefu Chai	b7994ee4f6	util/result_try: pass template arg list explicitly clang-19 introduced a change which enforces the change proposed by [CWG 96](https://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#96), which was accepted by C++20 in [P1787R6](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1787r6.html), as [[temp.names]p5](https://eel.is/c++draft/temp.names#6). so, to be future-proof and to be standard compliant, let's pass the template arguments. otherwise we'd have build failure like ``` error: a template argument list is expected after a name prefixed by the template keyword [-Wmissing-template-arg-list-after-template-kw] ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-05 13:19:45 +08:00
Kefu Chai	e2158a0c72	util/result_try: pass func as `const F&` instead of `F&&` as we the functor passed to `invoke()` is not a rvalue, if we specify the template parameter explicitly, clang errors out like: ``` /home/kefu/.local/bin/clang++ -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -I/home/kefu/dev/scylladb/build -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/abseil -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -MF transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o.d -o transport/CMakeFiles/transport.dir/RelWithDebInfo/server.cc.o -c /home/kefu/dev/scylladb/transport/server.cc In file included from /home/kefu/dev/scylladb/transport/server.cc:39: /home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke' 210 \| return Converter::template invoke<const Cb, const Ex&>(_cb, ex); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here 194 \| return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 327 \| first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)), \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 518 \| result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...); \| ^ /home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here 484 \| return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument 33 \| invoke(F&& f, Args&&... args) { \| ^ ~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:210:28: error: no matching function for call to 'invoke' 210 \| return Converter::template invoke<const Cb, const Ex&>(_cb, ex); \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/utils/result_try.hh:194:143: note: while substituting into a lambda expression here 194 \| return [this, cont = std::forward<Continuation>(cont)] (bool& already_caught) mutable -> typename Converter::template wrapped_type<R> { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:327:40: note: in instantiation of function template specialization 'utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>::wrap_in_catch<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, (lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here 327 \| first_handler.template wrap_in_catch<R, Converter, Continuation>(std::forward<Continuation>(cont)), \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:326:79: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:194:16)>' requested here 326 \| return try_catch_chain_impl<R, Converter, CatchHandlers...>::template invoke_in_try_catch<>( \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:518:54: note: in instantiation of function template specialization 'utils::internal::try_catch_chain_impl<boost::outcome_v2::basic_result<seastar::foreign_ptr<std::unique_ptr<cql_transport::response>>, utils::exception_container<exceptions::mutation_write_timeout_exception, exceptions::read_timeout_exception, exceptions::read_failure_exception, exceptions::rate_limit_exception>, utils::exception_container_throw_policy>, utils::internal::noop_converter, utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>::invoke_in_try_catch<(lambda at /home/kefu/dev/scylladb/utils/result_try.hh:518:76)>' requested here 518 \| result_type res = try_catch_chain_type::template invoke_in_try_catch<>([&fun] (bool&) { return fun(); }, handlers...); \| ^ /home/kefu/dev/scylladb/transport/server.cc:484:83: note: in instantiation of function template specialization 'utils::result_try<(lambda at /home/kefu/dev/scylladb/transport/server.cc:484:94), utils::internal::result_catcher<exceptions::unavailable_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:521:68)>, utils::internal::result_catcher<exceptions::read_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:526:69)>, utils::internal::result_catcher<exceptions::read_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:531:69)>, utils::internal::result_catcher<exceptions::mutation_write_timeout_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:536:79)>, utils::internal::result_catcher<exceptions::mutation_write_failure_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:541:79)>, utils::internal::result_catcher<exceptions::already_exists_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:546:71)>, utils::internal::result_catcher<exceptions::prepared_query_not_found_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:551:81)>, utils::internal::result_catcher<exceptions::function_execution_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:556:75)>, utils::internal::result_catcher<exceptions::rate_limit_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:561:67)>, utils::internal::result_catcher<exceptions::cassandra_exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:566:66)>, utils::internal::result_catcher<std::exception, (lambda at /home/kefu/dev/scylladb/transport/server.cc:578:49)>, utils::internal::result_catcher_dots<(lambda at /home/kefu/dev/scylladb/transport/server.cc:591:38)>>' requested here 484 \| return utils::result_into_future<result_with_foreign_response_ptr>(utils::result_try([&] () -> result_with_foreign_response_ptr { \| ^ /home/kefu/dev/scylladb/utils/result_try.hh:33:5: note: candidate function template not viable: expects an rvalue for 1st argument 33 \| invoke(F&& f, Args&&... args) { \| ^ ~~~~~ ``` so to prepare for the change to pass template parameter explicitly, let's pass `f` as a `const` reference, instead of as a rvalue refernece. also, this parameter type matches with our usage case -- we always pass a member variable `_cb` to `invoke`, and we don't expect that `invoke()` would move it away. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-05 13:19:40 +08:00
Kefu Chai	cfd6084edd	Update seastar submodule * seastar 914a4241...9ce62705 (18): > github: do not set --dpdk-machine haswell > io_tester: correct calculation of writes count > io-tester.md: update information about file size > reactor: align used hint for extent size to 128KB for XFS > Fix compilation failure on Ubuntu 22.04 > io_tester: align the used file size to 1MB > circular_buffer_fixed_capacity: arrow operator instead of . operator > posix-file-impl: Do not keep device-id on board > github: s/clang++-18/clang++/ > include: include used headers > include: include used headers > iotune: allow user to set buffer size for random IO > abort_source: add method to get exception pointer > github: cancel a job if it takes longer than 40 minutes > std-compat: remove #include:s which were added for pre C++17 > perf_tests: measure and report also cpu cycles > linux_perf_events: add user_cpu_cycles_retired > linux_perf_event: user_instructions_retired: exclude_idle Closes scylladb/scylladb#19019	2024-06-05 08:13:55 +03:00
Michał Chojnowski	c901139d07	scylla-gdb.py: print coroutine names in `scylla fiber` Enriches the output of `scylla fiber` with resolved names of coroutine resume functions. Before: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 ``` After: ``` [shard 2] #0 (task) 0x0000602004c9fbf0 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is seastar::future<void> sstables::parse<unsigned int, std::pair<sstables::metadata_type, unsigned int> >(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::disk_array<unsigned int, std::pair<sstables::metadata_type, unsigned int> >&) [clone .resume] ) [shard 2] #1 (task) 0x0000602000344c90 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::parse(schema const&, sstables::sstable_version_types, sstables::random_access_reader&, sstables::statistics&) [clone .resume] ) [shard 2] #2 (task) 0x0000602004b30c50 0x0000000000642880 vtable for seastar::internal::coroutine_traits_base<void>::promise_type + 16 (.resume is sstables::sstable::read_simple<(sstables::component_type)8, sstables::statistics>(sstables::statistics&)::{lambda(sstables::sstable_version_types, seastar::file&&, unsigned long)#1}::operator()(sstables::sstable_version_types, seastar::file&&, unsigned long) const [clone .resume] ) ``` Closes scylladb/scylladb#19091	2024-06-04 22:32:17 +03:00
Pavel Emelyanov	dcc083110d	gossiper: Stop using db::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	00d8590d7e	gossiper: Move force_gossip_generation on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	e3abc5d2fd	gossiper: Move failure_detector_timeout_ms on gossip_config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	53906aa431	main: Fix indentation after previous patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:47 +03:00
Pavel Emelyanov	fcab847f31	main: Make gossiper config a sharded parameter Next patches will put updateable_value's on it, but plain copy of them across shard doesn't work (see #7316) Indentation is deliberately left broken Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:19:26 +03:00
Pavel Emelyanov	77361e1661	main: Add local variable for set of seeds Next patch will do seeds assignment to gossiper config on each shard, so it's good to have it once, then copy around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:18:47 +03:00
Pavel Emelyanov	9c719a0a02	main: Add local variable for group0 id Next patch will do group0_id assignment to gossiper config on each shard, so it's good to have it once, then copy around Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:17:58 +03:00
Pavel Emelyanov	b069544d16	main: Add local variable for cluster_name It's modified if its empty, next patch will make this code be called on each shard, so modification must happen only once Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-04 20:17:58 +03:00
Marcin Maliszkiewicz	ac0e164a6b	raft: rename announce to commit Old wording was derived from existing code which originated from schema code. Name commit better describes what we do here.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	370a5b547e	cql3: raft: attach description to each mutations collector group This description is readable from raft log table. Previously single description was provided for the whole announce call but since it can contain mutations from various subsystems now description was moved to add_mutation(s)/add_generator function calls.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	3289fbd71e	auth: unify mutations_generator type mutation_collector supports generators but it was added to /service/raft code so it couldn't depend on /auth/ but once it's added we can remove generator type from /auth/ as it can depend on /service/raft.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	64b635bb58	auth: drop redundant 'this' keyword	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	b639350933	auth: remove no longer used code from standard_role_manager::legacy_modify_membership Since we gruadually switched all auth-v2 code paths to use modify_membership it's now safe to delete unused code.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a88b7fc281	cql3: auth: use mutation collector for service levels statements This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	97a5da5965	cql3: auth: use mutation collector for alter role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	a12c8ebfce	cql3: auth: use mutation collector for grant role and revoke role This is done to achieve single transaction semantics.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	5ba7d1b116	cql3: auth: use mutation collector for drop role and auto-revoke The main theme of this commit is executing drop keyspace/table/aggregate/function statements in a single transaction together with auth auto-revoke logic. This is the logic which cleans related permissions after resource is deleted. It contains serveral parts which couldn't easily be split into separate commits mainly because mutation collector related paths can't be mixed together. It would require holding multiple guards which we don't support. Another reason is that with mutation collector the changes are announced in a single place, at the end of statement execution, if we'd announce something in the middle then it'd lead to raft concurrent modification infinite loop as it'd invalidate our guard taken at the begining of statement execution. So this commit contains: - moving auto-revoke code to statement execution from migration_listener * only for auth-v2 flow, to not break the old one * it's now executed during statement execution and not merging schemas, which means it produces mutations once as it should and not on each node separately * on_before callback family wasn't used because I consider it much less readable code. Long term we want to remove auth_migration_listener. - adding mutation collector to revoke_all * auto-revoke uses this function so it had to be changed, auth::revoke_all free function wrapper was added as cql3 layer should not use underlying_authorizer() directly. - adding mutation collector to drop_role * because it depends on revoke_all and we can't mix old and new flows * we need to switch all functions auth::drop_role call uses * gradual use of previously introduced modify_membership, otherwise we would need to switch even more code in this commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	9ca15a3ada	auth: add refactored modify_membership func in standard_role_manager The new function is simplified and handles only auth-v2 flow with mutation_collector (single transaction logic). It's not used in this commit and we'll switch code paths gradually in subsequent commits.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	f67761f5b6	auth: implement empty revoke_all in allow_all_authorizer There is no need to throw an exception because it was always ignored later with an empty catch block.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	75ccab9693	auth: drop request_execution_exception handling from default_authorizer::revoke_all The change applies only to auth-v2 code path. It seems nothing in the code except cdc and truncate throws this exception so it's probably dead code. I'll keep it for now in other places to not accidentally break things in auth-v1, in auth-v2 even if this exception is used it should likely fail the query because otherwise data consistency is silently violated.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	01fb43e35f	Revert "Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks" This reverts commit `80ed442be2`. This logic was replaced in previous commit by dynamic cast. Hopefully even this cast will be eliminated in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	0573fee2a9	cql3: auth: use mutation collector for grant and revoke permissions This is done to achieve single transaction semantics. The change includes auto-grant feature. In particular for schema related auto-grant we don't use normal mutation collector announce path but follow migration manager, this may be unified in the future.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	9ddfc2ce4b	cql3: extract changes_tablets function in alter_keyspace_statement It will be used outside this class in the following commit	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	2a6cfbfb33	cql3: auth: use mutation collector for create role statement This is done to achieve single transaction semantics. grant_permissions_to_creator is logically part of create role but its change will be included in following commits as it spans multiple usages. Additinally we disabled rollback during create role as it won't work and is not needed with single transaction logic.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	e4a83008b6	auth: move create_role code into service We need this later as we'll add condition based on legacy_mode(qp) and free function doesn't have access to qp. Moreover long term we should get rid of this weird free function pattern bloat.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	6f654675c6	auth: add a way to announce mutations having only client_state ref Statements code have only access to client_state from which it takes auth::service. It doesn't have abort_source nor group0_client so we need to add them to auth::service. Additionally since abort_source can't be const the whole announce_mutations method needs non const auth::service so we need to remove const from the getter function.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	47864b991a	auth: add collect_mutations common helper It will be used in subsequent commits.	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	b2cbcb21e8	auth: remove unused header in common.hh	2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz	7e0a801f53	auth: add class for gathering mutations without immediate announce To achieve write atomicity across different tables we need to announce mutations in a single transaction. So instead of each function doing a separate announce we need to collect mutations and announce them once at the end.	2024-06-04 15:43:04 +02:00
Piotr Dulikowski	01ff8108c1	Merge 'db/hints: Use host ID to IP mappings to choose the ep manager to drain when node is leaving' from Dawid Mędrek In `d0f5873`, we introduced mappings IP–host ID between hint directories and the hint endpoint managers managing them. As a consequence, it may happen that one hint directory stores hints towards multiple nodes at the same time. If any of those nodes leaves the cluster, we should drain the hint directory. However, before these changes that doesn't happen – we only drain it when the node of the same host ID as the hint endpoint manager leaves the cluster. This PR fixes that draining issue in the pre-host-ID-based hinted handoff. Now no matter which of the nodes corresponding to a hint directory leaves the cluster, the directory will be drained. We also introduce error injections to be able to test that it indeed happens. Fixes scylladb/scylladb#18761 Closes scylladb/scylladb#18764 * github.com:scylladb/scylladb: db/hints: Introduce an error injection to test draining db/hints: Ensure that draining happens	2024-06-04 10:17:14 +02:00
Botond Dénes	d120f0d7d3	Merge 'tasks: introduce task manager's task folding' from Aleksandra Martyniuk Task manager's tasks stay in memory after they are finished. Moreover, even if a child task is unregistered from task manager, it is still alive since its parent keeps a foreign pointer to it. Also, when a task has finished successfully there is no point in keeping all of its descendants in memory. The patch introduces folding of task manager's tasks. Whenever a task which has a parent is finished it is unregistered from task manager and foreign_ptr to it (kept in its parent) is replaced with its status. Children's statuses of the task are dropped unless they or one of their descendants failed. So for each operation we keep a tree of tasks which contains: - a root task and its direct children (status if they are finished, a task otherwise); - running tasks and their direct children (same as above); - a statuses path from root to failed tasks. /task_manager/wait_task/ does not unregister tasks anymore. Refs: #16694. - [ ] Backport reason (please explain below if this patch should be backported or not) Requires backport to 6.0 as task number exploded with tablets. Closes scylladb/scylladb#18735 * github.com:scylladb/scylladb: docs: describe task folding test: rest_api: add test for task tree structure test: rest_api: modify new_test_module tasks: test: modify test_task methods api: task_manager: do not unregister task in /task_manager/wait_task/ tasks: unregister tasks with parents when they are finished tasks: fold finished tasks info their parents tasks: make task_manager::task::impl::finish_failed noexcept tasks: change _children type	2024-06-04 08:43:44 +03:00
Pavel Emelyanov	9e65434692	main: Start alternator expiration service earlier Prior to registering drain_on_shutdown and all the protorocl servers. To keep the natural sequence - start core - register drain-on-shutdown - start transport(s) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	d7c231ede9	main: Start redis transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	4204d7f4f9	main: Start alternator transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Also move the controller variable lower to keep it all next to each other. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	d3e1121793	main: Start thrift transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. It also fixes a rare bug. If thrifst is not asked to be started on boot, its deferred shutdown action isn't created, so it it's later started via the API, it won't be stopped on shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Pavel Emelyanov	830a87e862	main: Start native transport transparently It's now possible to start protocol server when registered. It will also be stopped automatically on shutdown / aborted shutdown. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 23:01:17 +03:00
Marcin Maliszkiewicz	09b26208e9	auth: cql3: use auth facade functions consistently on write path Auth interface is quite mixed-up but general rule is that cql statements code calls auth::* free functions from auth/service.hh to execute auth logic. There are many exceptions where underlying_authorizer or underlying_role_manager or auth::service method is used instead. Service should not leak it's internal APIs to upper layers so functions like underlying_role_manager should not exists. In this commit we fix tiny fragment related to auth write path.	2024-06-03 14:27:13 +02:00
Marcin Maliszkiewicz	126c82a6f5	auth: remove unused is_enforcing function	2024-06-03 14:27:13 +02:00
Wojciech Mitros	2cafa573df	mv: update the backlogs when view updates finish Currently, the backlog used for MV flow control is only updated after we generate view updates as a result of a write request. However, when the resources are no longer used, we should also notice that to prevent excessive slowdowns caused by the MV flow control calulating the delays based of an outdated, large backlog. This patch makes it so the backlogs are updated every time a view update finishes, and not only when the updates start. Fixes #18783 Closes scylladb/scylladb#18804	2024-06-03 14:10:49 +03:00
Avi Kivity	f133ae945a	Merge 'repair: Introduce new primary replica selection algorithm for tablets' from Benny Halevy Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica or get_primary_replica_within_dc, respectively to see if this node is the primary replica for each tablet or not. Fixes https://github.com/scylladb/scylladb/issues/17752 No backport is required before 6.0 as tablets (and tablet repair) are introduced in 6.0 Closes scylladb/scylladb#18784 * github.com:scylladb/scylladb: repair: repair_tablets: use get_primary_replica repair: repair_tablets: no need to check ranges_specified per tablet locator: tablet_map: add get_primary_replica_within_dc locator: tablet_map: get_primary_replica: do not copy tablet info locator: tablet_map: get_primary_replica: return tablet_replica	2024-06-03 13:16:49 +03:00
Kefu Chai	0da0461668	build: cmake: do not scan for C++20 modules when creating the build rules using CMake 3.28 and up, it generates the rules to scan for C++20 modules for C++20 projects by default. but this slows down the compilation, and introduces unnecessary dependencies for each of the targets when building .cc files. also, it prevents the static analysis tools from running from a repo which only have its building system generated, but not yet built. as, these tools would need to process the source files just like a compiler does, and if any of the included header files is missing, they just fail. so, before we migrate to C++20 modules, let's disable this feature. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#19038	2024-06-03 12:51:40 +03:00
Pavel Emelyanov	9292d326b7	storage_service: Make register_protocol_server() start the server After a protocol server is registered, it can be instantly started by the main code. It makes sense to generalize this sequence by teaching register_protocol_server() start it. For now it's a no-op change, as "start_instantly" is false by default, but next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	2aab9f6340	storage_service: Turn register_protocol_server() async method To make the next patch shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	eb033e3c5f	storage_service: Outline register_protocol_server() To make next patch shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:12:03 +03:00
Pavel Emelyanov	315ef4c484	main: Schedule deferred drain_on_shutdown() prior to protocol servers Nex patches will remove protocol servers' deferred stops and will rely on drain_on_shutdown -> stop_transport to do it, so the drain deferred action some come before protocol servers' registration. This also fixes a bug. Currently alternator and redis both rely on protocol servers to stop them on shutdown. However, when startup is aborted prior to drain_on_shutdown() registration, protocol servers are not stopped and alternator and redis can remain stopped. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:11:04 +03:00
Pavel Emelyanov	2fa89d8696	main: Move some trailing startup earlier The set_abort_on_ebadf() call and some api endpoints registration come after protocol servers. The latter is going to be shuffled, so move the former earlier not to hang around. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-06-03 12:01:24 +03:00
Kefu Chai	c6691d3217	.github: add exception to CLEANER_DIRS to cover more directories to prevent regressions of violating the "include what you use" policy in this directory. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 12:45:04 +08:00
Kefu Chai	21bdda550a	.github: annotate the report from clang-include-cleaner before this change, user has to click into the "Details" link for access the report from clang-include-cleaner. but this is neither convenient nor obvious. after this change, the report is annotated in the github web interface, this helps the reviewers and contributers to user this tool in a more efficient way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 12:45:04 +08:00
Kefu Chai	3d056a0cf2	.github: build headers before running clang-include-cleaner clang-include-cleaner actually interprets the preprocessor macros, and looks at the symbols. so we have to prepare the included headers before using it. so, but in ScyllaDB, we don't have a single target for building all the used headers, so we have to build them either in batch of separately. in this change, we build the included headers before running clang-include-cleaner. this allows us to run clang-include-cleaner on more source files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-06-03 11:30:31 +08:00
Nadav Har'El	95db1c60d6	test/alternator: fix a test failing on Amazon DynamoDB The test test_table.py::test_concurrent_create_and_delete_table failed on Amazon DynamoDB because of a silly typo - "false" instead of "False". A function detecting Scylla tried to return false when noticing this isn't Scylla - but had a typo, trying to return "false" instead of "False". This patch fixes this typo, and the test now works on DynamoDB: test/alternator/run --aws test_table.py::test_concurrent_create_and_delete_table Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17799	2024-06-02 22:25:56 +03:00
Avi Kivity	79d0711c7e	Merge 'tablets: load balancer: Use random selection of candidates when moving tablets' from Tomasz Grabiec In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and another one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824 Closes scylladb/scylladb#18885 * github.com:scylladb/scylladb: tablets: load balancer: Use random selection of candidates when moving tablets test: perf: Add test for tablet load balancer effectiveness load_sketch: Extract get_shard_minmax() load_sketch: Allow populating only for a given table	2024-06-02 22:03:37 +03:00
Benny Halevy	18df36d920	repair: repair_tablets: use get_primary_replica Tablet allocation does not guarantee fairness of the first replica in the replicas set across dcs. The lack of this fix cause the following dtest to fail: repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc Use the tablet_map get_primary_replica* functions to get the primary replica for each tablet, possibly within a dc. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:28:39 +03:00
Benny Halevy	009767455d	repair: repair_tablets: no need to check ranges_specified per tablet The code already turns off `primary_replica_only` if `!ranges_specified.empty()`, so there's no need to check it again inside the per-tablet loop. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	84761acc31	locator: tablet_map: add get_primary_replica_within_dc Will be needed by repair in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	2de79c39dc	locator: tablet_map: get_primary_replica: do not copy tablet info Currently, the function needlessly copies the tablet_info (all tablet replicas in particular) to a local variable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Benny Halevy	c52f70f92c	locator: tablet_map: get_primary_replica: return tablet_replica This is required by repair when it will start using get_primary_replica in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-06-02 20:26:09 +03:00
Tomasz Grabiec	603abddca9	tablets: load balancer: Use random selection of candidates when moving tablets In order to avoid per-table tablet load imbalance balance from forming in the cluster after adding nodes, the load balancer now picks the candidate tablet at random. This should keep the per-table distribution on the target node similar to the distribution on the source nodes. Currently, candidate selection picks the first tablet in the unordered_set, so the distribution depends on hashing in the unordered set. Due to the way hash is calculated, table id dominates the hash and a single table can be chosen more often for migration away. This can result in imbalance of tablets for any given table after bootstrapping a new node. For example, consider the following results of a simulation which starts with a 6-node cluster and does a sequence of node bootstraps and decommissions. One table has 4096 tablets and RF=1, and the other has 256 tablets and RF=2. Before the patch, the smaller table has node overcommit of 2.34 in the worst topology state, while after the patch it has overcommit of 1.65. overcommit is calculated as max load (tablet count per node) dividied by perfect average load (all tablets / nodes): Run #861, params: {iterations=6, nodes=6, tablets1=4096 (10.7/sh), tablets2=256 (1.3/sh), rf1=1, rf2=2, shards=64} Overcommit : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit : worst: {table1={shard=1.23, node=1.10}, table2={shard=9.85, node=1.65}} Overcommit (old) : init : {table1={shard=1.03, node=1.00}, table2={shard=1.51, node=1.01}} Overcommit (old) : worst: {table1={shard=1.31, node=1.12}, table2={shard=64.00, node=2.34}} The worst state before the patch had the following distribution of tablets for the smaller table: Load on host ba7f866d...: total=171, min=1, max=7, spread=6, avg=2.67, overcommit=2.62 Load on host 4049ae8d...: total=102, min=0, max=6, spread=6, avg=1.59, overcommit=3.76 Load on host 3b499995...: total=89, min=0, max=4, spread=4, avg=1.39, overcommit=2.88 Load on host ad33bede...: total=63, min=0, max=3, spread=3, avg=0.98, overcommit=3.05 Load on host 0c2e65dc...: total=57, min=0, max=3, spread=3, avg=0.89, overcommit=3.37 Load on host 3f2d32d4...: total=27, min=0, max=2, spread=2, avg=0.42, overcommit=4.74 Load on host 9de9f71b...: total=3, min=0, max=1, spread=1, avg=0.05, overcommit=21.33 One node has as many as 171 tablets of that table and the one has as few as 3. After the patch, the worst distribution looks like this: Load on host 94a02049...: total=121, min=1, max=6, spread=5, avg=1.89, overcommit=3.17 Load on host 65ac6145...: total=87, min=0, max=5, spread=5, avg=1.36, overcommit=3.68 Load on host 856a66d1...: total=80, min=0, max=5, spread=5, avg=1.25, overcommit=4.00 Load on host e3ac4a41...: total=77, min=0, max=4, spread=4, avg=1.20, overcommit=3.32 Load on host 81af623f...: total=66, min=0, max=4, spread=4, avg=1.03, overcommit=3.88 Load on host 4a038569...: total=47, min=0, max=2, spread=2, avg=0.73, overcommit=2.72 Load on host c6ab3fe9...: total=34, min=0, max=3, spread=3, avg=0.53, overcommit=5.65 Most-loaded node has 121 tablets and least loaded node has 34 tablets. It's still not good, a better distribution is possible, but it's an improvement. Refs #16824	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	7b1eea794b	test: perf: Add test for tablet load balancer effectiveness	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	c9bcb5e400	load_sketch: Extract get_shard_minmax()	2024-06-02 14:23:00 +02:00
Tomasz Grabiec	3be6120e3b	load_sketch: Allow populating only for a given table	2024-06-02 14:23:00 +02:00
Avi Kivity	db4e4df762	alternator: yield while converting large responses to json text We have two paths for generating the json text representation, one for large items and one for small items, but the large item path is lacking: - it doesn't yield, so a response with many items will stall - it doesn't wait for network sends to be accepted by the network stack, so it will allocate a lot of memory Fix by moving the generation to a thread. This allows us to wait for the network stack, which incidentally also fixes stalls. The cost of the thread is amortized by the fact we're emitting a large response. Fixes #18806 Closes scylladb/scylladb#18807	2024-06-02 13:07:13 +03:00
Michał Jadwiszczak	5b4e688668	docs/procedures/backup-restore: use `DESC SCHEMA WITH INTERNALS` Update docs for backup procedure to use `DESC SCHEMA WITH INTERNALS` instead of plain `DESC SCHEMA`. Add a note to use cqlsh in a proper version (at least 6.0.19). Closes scylladb/scylladb#18953	2024-05-31 15:26:36 +02:00
Aleksandra Martyniuk	beef77a778	docs: describe task folding	2024-05-31 10:40:04 +02:00
Aleksandra Martyniuk	d7e80a6520	test: rest_api: add test for task tree structure Add test which checks whether the tasks are folded into their parent as expected.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	fc0796f684	test: rest_api: modify new_test_module Remove remaining test tasks when a test module is removed, so that a node could shutdown even if a test fails.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	30f97ea133	tasks: test: modify test_task methods Wait until the task is done in test_task::finish_failed and test_task::finish to ensure that it is folded into its parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	c1b2b8cb2c	api: task_manager: do not unregister task in /task_manager/wait_task/ If /task_manager/wait_task/ unregisters the task, then there is no way to examine children failures, since their statuses can be checked only through their parent.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	a82a2f0624	tasks: unregister tasks with parents when they are finished Unregister children that are finished from task manager. They can be examined through they parents.	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	e6c50ad2d0	tasks: fold finished tasks info their parents Currently, when a child task is unregistered, it is still kept by its parent. This leads to excessive memory usage, especially when the tasks are configured to be kept in task manager after they are finished (task_ttl_in_seconds). Introduce task_essentials struct which keeps only data necesarry for task manager API. When a task which has a parent is finished, a foreign pointer to it in its parent is replaced with respective task_essentials. Once a parent task is finished it is also folded into its parent (if it has one). Children details of a folded task are lost, unless they (or some of their subtrees) failed. That is, when a task is finished, we keep: - a root task (until it is unregistered); - task_essentials of root's direct children; - a path (of task_essentials) from root to each failed task (so that the reason of a failure could be examined).	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	319e799089	tasks: make task_manager::task::impl::finish_failed noexcept	2024-05-31 10:27:09 +02:00
Aleksandra Martyniuk	6add9edf8a	tasks: change _children type Keep task children in a map. It's a preparation for further changes.	2024-05-31 10:27:09 +02:00
Pavel Emelyanov	273dca6f27	query_processor: Coroutinize stop() This effectively removes "finally" block so if authorized_prepared_cache.stop() resolves with exception, the prepared_cache.stop() is skipped. But that's not a problem -- even if .stop() throws the shole scylla stop aborts so we don't really care if it was clean or not. Also, authorized_prepared_cache.stop() closes the gate and cancels the timer. None of those can resolve with exception. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#19001	2024-05-31 10:22:08 +03:00
Benny Halevy	427acb393e	data_dictionary: keyspace_metadata: format: print also initial_tablets Currently, there is no indication of tablets in the logged KSMetaData. Print the tablets configuration of either the`initial` number of tablets, if enabled, or {'enabled':false} otherwise. For example: ``` migration_manager - Create new Keyspace: KSMetaData{name=tablets_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"initial":0}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004d446a8} migration_manager - Create new Keyspace: KSMetaData{name=vnodes_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"enabled":false}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004c33ea8} Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18998	2024-05-31 10:09:58 +03:00
Nadav Har'El	c786621b4c	test/cql-pytest: reproduce bug of secondary index used before built This patch adds a test reproducing for the known issue #7963, where after adding a secondary-index to a table, queries might immediately start to use this index - even before it is built - and produce wrong results. The issue is still open and unfixed, so the new test is marked "xfail". Interestingly, even though Cassandra claims to have found and fixed a similar bug in 2015 (CASSANDRA-8505), this test also fails on Cassandra - trying a query right after CREATE INDEX and before it was fully built may cause the query to fail. Refs #7963 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18993	2024-05-31 10:05:00 +03:00
Raphael S. Carvalho	b396b05e20	replica: Fix race of tablet snapshot with compaction tablet snapshot, used by migration, can race with compaction and can find files deleted. That won't cause data loss because the error is propagated back into the coordinator that decides to retry streaming stage. So the consequence is delayed migration, which might in turn reduce node operation throughput (e.g. when decommissioning a node). It should be rare though, so shouldn't have drastic consequences. Fixes #18977. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18979	2024-05-31 09:58:49 +03:00
Lakshmi Narayanan Sreethar	3d7d1fa72a	db/config.cc: increment components_memory_reclaim_threshold config default Incremented the components_memory_reclaim_threshold config's default value to 0.2 as the previous value was too strict and caused unnecessary eviction in otherwise healthy clusters. Fixes #18607 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18964	2024-05-30 18:03:51 +03:00
Botond Dénes	0ead3570b4	Merge 'Run sstables loader in scheduling group' from Pavel Emelyanov Currently the loader is called via API, which inherits the maintenance scheduling group from API http server. The loader then can either do load_and_stream() or call (legacy) distributed_loader::upload_new_sstables(). The latter first switches into streaming scheduling group, but the former doesn't and continues running in the maintenance one. All this is not really a problem, because streaming sched group and maintenance sched group is one group under two different variable names. However, it's messy and worth delegating the sched group switch (even if it's a no-op) to the sstables-loader. As a nice side effect, this patch removes one place that uses database as proxy object to get configuration parameters. Closes scylladb/scylladb#18928 * github.com:scylladb/scylladb: sstables-loader: Run loading in its scheduling group sstables-loader: Add scheduling group to constructor	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	83d491af02	config: Remove experimental TABLETS feature ... and replace it with boolean enable_tablets option. All the places in the code are patched to check the latter option instead of the former feature. The option is OFF by default, but the default scylla.yaml file sets this to true, so that newly installed clusters turn tablets ON. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18898	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	dc588d1eef	replication_strategy: Remove unused factory_key::to_sstring() declaration Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18908	2024-05-30 18:03:51 +03:00
Anna Stuchlik	8f5c15b78f	doc: add support for Ubuntu 24.04 Closes scylladb/scylladb#18954	2024-05-30 18:03:51 +03:00
Pavel Emelyanov	91f74989ba	snitch: Remove production_snitch_base::_prop_file_contents This fiend was used to carry string with property file contents into the parse_property_file(), but it can go with an argument just as well Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:55:14 +03:00
Pavel Emelyanov	1cdeabdc50	snitch: Remove production_snitch_base::_prop_file_size This field was used to carry property file size across then-lambdas, now the code is coroutinized and can live with on-stack variable Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:54:30 +03:00
Pavel Emelyanov	b62aa276d1	snitch: Coroutinize load_property_file() Cleaner and easier to read this way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-30 13:54:15 +03:00
Kefu Chai	fb87ab1c75	compress, auth: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used header directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18986	2024-05-30 09:16:23 +03:00
Kefu Chai	810da830ef	build: add sanitizer compiling options directly before this change, in order to avoid repeating/hardwiring the compiling options set by Seastar, we just inherit the compiling options of Seastar for building Abseil, as the former exposes the options to enable sanitizers. this works fine, despite that, strictly speaking, not all options are necessary for building abseil, as abseil is not a Seastar application -- it is just a C++ library. but when we introduce dependencies which are only generated at build time, and these dependencies are passed to the compiler at build time, this breaks the build of Abseil. because these dependencies are exposed by the Seastar's .pc file, and consumed by Abseil. when building Abseil, apparently, the building process driven by ninja is not started yet, so we are not able to build Abseil with these settings due to missing dependencies. so instead of inheriting the compiling options from Seastar, just set the sanitizer related compiling options directly, to avoid referencing these missing dependencies. the upside is that we pass a much smaller set of compiling options to compiler when building Abseil, the downside is that we hardwire these options related to sanitizer manually, they are also detected by Seastar's building system. but fortunately, these options are relatively stable across the building environements we support. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18987	2024-05-30 09:14:03 +03:00
Aleksandra Martyniuk	8a72324ff1	docs: add docs to task manager Closes scylladb/scylladb#18967	2024-05-30 09:05:02 +03:00
Raphael S. Carvalho	a56664b8e9	readers: combined: Avoid reallocation in prepare_forwardable_readers() reserve() is missing conditional addition of single and galloping readers. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18980	2024-05-30 08:57:27 +03:00
Dawid Medrek	e855794327	db/hints: Introduce an error injection to test draining We want to verify that a hint directory is drained when any of the nodes correspodning to it leaves the cluster. The test scenario should happen before the whole cluster has been migrated to the host-ID-based hinted handoff, so when we still rely on the mappings between hint endpoint managers and the hint directories managed by them. To make such a test possible, in these changes we introduce an error injection rejecting incoming hints. We want to test a scenario when: 1. hints are saved towards a given node -- node N1, 2. N1 changes its IP to a different one, 3. some other node -- node N2 -- changes its IP to the original IP of N1, 4. hints are saved towards N2 and they are stored in the same directory as the hints saved towards N1 before, 5. we start draining N2. Because at some point N2 needs to be stopped, it may happen that some mutations towards a distributed system table generate a hint to N2 BEFORE it has finished changing its IP, effectively creating another hint directory where ALL of the hints towards the node will be stored from there on. That would disturb the test scenario. Hence, this error injection is necessary to ensure that all of the steps in the test proceed as expected.	2024-05-29 19:32:41 +02:00
Dawid Medrek	745a9c6ab8	db/hints: Ensure that draining happens Before hinted handoff is migrated to using host IDs to identify nodes in the cluster, we keep track of mappings between hint endpoint managers identified by host IDs and the hint directories managed by them and represented by IP addresses. As a consequence, it may happen that one hint directory corresponds to multiple nodes -- it's intended. See `64ba620` for more details. Before these changes, we only started the draining process of a hint directory if the node leaving the cluster corresponded to that hint directory AND was identified by the same host ID as the hint endpoint manager managing that directory. As a result, the draining did not always happen when it was supposed to. Draining should start no matter which of the nodes corresponding to a hint directory is leaving the cluster. This commit ensures that it happens.	2024-05-29 19:32:38 +02:00
Wojciech Mitros	0de3a5f3ff	test mv: remove injection delaying shutdown of a node In the test_mv_topology_change case, we use an injection to delay the view updates application, so that the ERMs have a chance to change in the process. This injection was also enabled on a new node in the test, which was later decommissioned. During the shutdown, writes were still being performed, causing view update generation and delays due to the injection which in turn delayed the node shutdown, causing the test to timeout. This patch removes the injection for the node being shut down. At the same time, the force_gossip_topology_changes=True option is also removed from its config, but for that option it's enough to enable on the first node in the cluster and all nodes use it. Fixes: https://github.com/scylladb/scylladb/issues/18941 Closes scylladb/scylladb#18958	2024-05-29 15:29:55 +02:00
Kefu Chai	a415bb07ab	sl_controller: fix a typo in comment s/necessairy/necessary/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18950	2024-05-29 16:23:31 +03:00
Nadav Har'El	4b04ed1360	test/alternator: be more forgiving on authorizer configuration The Alternator test suite usually runs on a specific configuration of Scylla set up by test.py or test/alternator/run. However, we do consider it an important design goal of this test suite that developers should be able to run these tests against any DynamoDB-API implementation, including any version Scylla manually run by the developer in any way he or she pleases. The recent commit `dc80b5dafe` changed the way we retrieve the configured autentication key, which is needed if Scylla is run with --alternator-enforce-authorization. However, the new code assumed that Scylla was also run with --authenticator PasswordAuthenticator --authorizer CassandraAuthorizer so that the default role of "cassandra" has a valid, non-null, password (namely, "cassandra"). If the developer ran Scylla manually without these options, the test initialization code broke, and all tests in the suite failed. This patch fixes this breakage. You can now run the Alternator test suite against Scylla run manually without any of the aforementioned options, and everything will work except some tests in test_authorization.py will fail as expected. This patch has no affect on the usual test.py or test/alternator/run runs, as they already run Scylla with all the aforementioned options and weren't exposed to the problem fixed here. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18957	2024-05-29 16:22:45 +03:00
Raphael S. Carvalho	578a6c1e07	replica: Only consume memtable of the tablet intersecting with range read storage_proxy is responsible for intersecting the range of the read with tablets, and calling replica with a single tablet range, therefore it makes sense to avoid touching memtables of tablets that don't intersect with a particular range. Note this is a performance issue, not correctness one, as memtable readers that don't intersect with current range won't produce any data, but cpu is wasted until that's realized (they're added to list of readers in mutation_reader_merger, more allocations, more data sources to peek into, etc). That's also important for streaming e.g. after decommission, that will consume one tablet at a time through a reader, so we don't want memtables of streamed tablets (that weren't cleaned up yet) to be consumed. Refs #18904. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18907	2024-05-29 15:58:33 +03:00
Tomasz Grabiec	0d596a425c	tablets: Filter-out left nodes in get_natural_endpoints() The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843	2024-05-29 14:49:49 +02:00
Anna Stuchlik	888d7601a2	doc: add the tablets information to the nodetool describering command This commit adds an explanation of how the `nodetool describering` command works if tablets are enabled. Closes scylladb/scylladb#18940	2024-05-29 15:31:46 +03:00
Pavel Emelyanov	e74a4b038f	Merge 'tablets: alter keyspace' from Piotr Smaron This change supports changing replication factor in tablets-enabled keyspaces. This covers both increasing and decreasing the number of tablets replicas through first building topology mutations (`alter_keyspace_statement.cc`) and then tablets/topology/schema mutations (`topology_coordinator.cc`). For the limitations of the current solution, please see the docs changes attached to this PR. Fixes: #16129 Closes scylladb/scylladb#16723 * github.com:scylladb/scylladb: test: Do not check tablets mutations on nodes that don't have them test: Fix the way tablets RF-change test parses mutation_fragments test/tablets: Unmark RF-changing test with xfail docs: document ALTER KEYSPACE with tablets Return response only when tablets are reallocated cql-pytest: Verify RF is changes by at most 1 when tablets on cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 Reject ALTER with 'replication_factor' tag Implement ALTER tablets KEYSPACE statement support Parameterize migration_manager::announce by type to allow executing different raft commands Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks Extend system.topology with 3 new columns to store data required to process alter ks global topo req Allow query_processor to check if global topo queue is empty Introduce new global topo `keyspace_rf_change` req New raft cmd for both schema & topo changes Add storage service to query processor tablets: tests for adding/removing replicas tablet_allocator: make load_balancer_stats_manager configurable by name	2024-05-29 14:17:51 +03:00
Gleb Natapov	f91db0c1e4	raft topology: fix indentation after previous commit	2024-05-29 12:11:28 +03:00
Gleb Natapov	6853b02c00	raft topology: do not add bootstrapping node without IP as pending If there is no mapping from host id to ip while a node is in bootstrap state there is no point adding it to pending endpoint since write handler will not be able to map it back to host id anyway. If the transition sate requires double writes though we still want to fail. In case the state is write_both_read_old we fail the barrier that will cause topology operation to rollback and in case of write_both_read_new we assert but this should not happen since the mapping is persisted by this point (or we failed in write_both_read_old state). Fixes: scylladb/scylladb#18676	2024-05-29 12:11:18 +03:00
Gleb Natapov	27445f5291	test: add test of bootstrap where the coordinator crashes just before storing IP mapping On the next boot there is no host ID to IP mapping which causes node to crash again with "No mapping for :: in the passed effective replication map" assertion.	2024-05-29 11:46:23 +03:00
Marcin Maliszkiewicz	1b1bc6f9bb	docs: document if not exists option for create index Closes scylladb/scylladb#18956	2024-05-29 11:35:01 +03:00
Gleb Natapov	1faef47952	schema_tables: remove unused code	2024-05-29 11:30:24 +03:00
Tomasz Grabiec	3e1ba4c859	test: pylib: Extract start_writes() load generator utility	2024-05-29 10:02:56 +02:00
Piotr Smaron	8a77a74d0e	cql: fix a crash lurking in `ks_prop_defs::get_initial_tablets` `tablets_options->erase(it);` invalidates `it`, but it's still referred to later in the code in the last `else`, and when that code is invoked, we get a `heap-use-after-free` crash. Fixes: #18926 Closes scylladb/scylladb#18936	2024-05-28 23:46:43 +03:00
Botond Dénes	aae3cfaff4	readers: compacting_reader: remove unused _ignore_partition_end This member is read-only since `ac44efea11` so remove it. Closes scylladb/scylladb#18726	2024-05-28 20:53:00 +03:00
Kefu Chai	719d53a565	service/storage_proxy: coroutinize handle_paxos_accept() for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18765	2024-05-28 20:51:10 +03:00
Nadav Har'El	00d10aa84a	alternator: clean up target string splitting This patch cleans up a bit the code in Alternator which splits up the operation's X-Amz-Target header (the second part of it is the name of the operation, e.g., CreateTable). The patch doesn't change any functionality or change performance in any meaningful way. I was just reviewing this code and was annoyed by the unnecessary variable and unnecessary creation of strings and vectors for such a simple operation - and wanted to clean it up. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18830	2024-05-28 20:42:47 +03:00
Botond Dénes	d37eca0593	test/boost/mutation_reader_test: compacting_reader_next_partition: fix partition order The test creates two partitions and passes them through the reader, but the partitions are out-of-order. This is benign but best to fix it anyway. Found after bumping validation level inside the compactor. Closes scylladb/scylladb#18848	2024-05-28 20:41:54 +03:00
Aleksandra Martyniuk	b7ae7e0b0e	test: fix test_tombstone_gc.py Tests in test_tombstone_gc.py are parametrized with string instead of bool values. Fix that. Use the value to create a keyspace with or without tablets. Fixes: #18888. Closes scylladb/scylladb#18893	2024-05-28 20:40:15 +03:00
Kefu Chai	f58f6dfe20	data_dictionary: include <variant> otherwise when compiling with the new seastar, which removed `#include <variant>` from `std-compat.hh`, the {mode}-headers target would fail to build, like: ``` ./data_dictionary/storage_options.hh:34:29: error: no template named 'variant' in namespace 'std' 10:45:15 using value_type = std::variant<local, s3>; 10:45:15 ~~~~~^ 10:45:15 ./data_dictionary/storage_options.hh:35:5: error: unknown type name 'value_type'; did you mean 'std::_Bit_const_iterator::value_type'? 10:45:15 value_type value = local{}; 10:45:15 ^~~~~~~~~~ 10:45:15 std::_Bit_const_iterator::value_type ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18921	2024-05-28 20:38:55 +03:00
Anna Stuchlik	cfa3cd4c94	doc: add the tablet limitation to the manual recovery procedure This commit adds the information that the manual recovery procedure is not supported if tablets are enabled. In addition, the content in the Manual Recovery Procedure is reorganized by adding the Prerequisites and Procedure subsections - in this way, we can limit the number of Note and Warning boxes that made the page hard to follow. Fixes https://github.com/scylladb/scylladb/issues/18895 Closes scylladb/scylladb#18935	2024-05-28 18:19:22 +02:00
Nadav Har'El	1fe8f22d89	alternator, scheduler: test reproducing RPC scheduling group bug This patch adds a test for issue #18719: Although the Alternator TTL work is supposedly done in the "streaming" scheduling group, it turned out we had a bug where work sent on behalf of that code to other nodes failed to inherit the correct scheduling group, and was done in the normal ("statement") group. Because this problem only happens when more than one node is involved, the test is in the multi-node test framework test/topology_experimental_raft. The test uses the Alternator API. We already had in that framework a test using the Alternator API (a test for alternator+tablets), so in this patch we move the common Alternator utility functions to a common file, test_alternator.py, where I also put the new test. The test is based on metrics: We write expiring data, wait for it to expire, and then check the metrics on how much CPU work was done in the wrong scheduling group ("statement"). Before #18719 was fixed, a lot of work was done there (more than half of the work done in the right group). After the issue was fixed in the previous patch, the work on the wrong scheduling group went down to zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2024-05-28 10:58:08 -04:00
Anna Stuchlik	2bfdb1b583	doc: document RF limitation This commit adds the information that the Replication Factor must be the same or higher than the number of nodes. Closes scylladb/scylladb#18760	2024-05-28 17:14:40 +03:00
Botond Dénes	5d3f7c13f9	main: add maintenance tenant to messaging_service's scheduling config Currently only the user tenant (statement scheduling group) and system (default scheduling group) tenants exist, as we used to have only user-initiated operations and sytem (internal) ones. Now there is need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).	2024-05-28 10:08:46 -04:00
Wojciech Mitros	519317dc58	mv: handle different ERMs for base and view table When calculating the base-view mapping while the topology is changing, we may encounter a situation where the base table noticed the change in its effective replication map while the view table hasn't, or vice-versa. This can happen because the ERM update may be performed during the preemption between taking the base ERM and view ERM, or, due to `f2ff701`, the update may have just been performed partially when we are taking the ERMs. Until now, we assumed that the ERMs are synchronized while calling finding the base-view endpoint mapping, so in particular, we were using the topology from the base's ERM to check the datacenters of all endpoints. Now that the ERMs are more likely to not be the same, we may try to get the datacenter of a view endpoint that doesn't exist in the base's topology, causing us to crash. This is fixed in this patch by using the view table's topology for endpoints coming from the view ERM. The mapping resulting from the call might now be a temporary mapping between endpoints in different topologies, but it still maps base and view replicas 1-to-1. Fixes: #17786 Fixes: #18709 Closes scylladb/scylladb#18816	2024-05-28 16:01:39 +02:00
Botond Dénes	aae263ef0a	Merge 'Harden the repair_service shutdown path' from Benny Halevy This series ignores errors in `load_history()` to prevent `abort_requested_exception` coming from `get_repair_module().check_in_shutdown()` from escaping during `repair_service::stop()`, causing ``` repair_service::~repair_service(): Assertion `_stopped' failed. ``` Fixes https://github.com/scylladb/scylladb/issues/18889 Backport to 6.0 required due to `523895145d` Closes scylladb/scylladb#18890 * github.com:scylladb/scylladb: repair: load_history: warn and ignore all errors repair_service: debug stop	2024-05-28 15:30:39 +03:00
Pavel Emelyanov	66f6001c77	test: Do not check tablets mutations on nodes that don't have them The check is performed by selecting from mutation_fragments(table), but it's known that this query crashes Scylla when there's no tablet replica on that node. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Pavel Emelyanov	6e0e2674f0	test: Fix the way tablets RF-change test parses mutation_fragments When the test changes RF from 2 to 3, the extra node executes "rebuild" transition which means that it streams tablets replicas from two other peers. When doing it, the node receives two sets of sstables with mutations from the given tablet. The test part that checks if the extra node received the mutations notices two mutation fragments on the new replica and errorneously fails by seeing, that RF=3 is not equal to the number of mutations found, which is 4. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Pavel Emelyanov	2567e300d1	test/tablets: Unmark RF-changing test with xfail Now the scailing works and test must check it does Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 13:56:46 +02:00
Piotr Smaron	1b913dd880	docs: document ALTER KEYSPACE with tablets	2024-05-28 13:56:46 +02:00
Piotr Smaron	39181c4bf2	Return response only when tablets are reallocated Up until now we waited until mutations are in place and then returned directly to the caller of the ALTER statement, but that doesn't imply that tablets were deleted/created, so we must wait until the whole processing is done and return only then.	2024-05-28 13:56:46 +02:00
Dawid Medrek	ec5708bdee	cql-pytest: Verify RF is changes by at most 1 when tablets on This commit adds a test verifying that we can only change the RF of a keyspace for any DC by at most 1 when using tablets. Fixes #18029	2024-05-28 13:56:46 +02:00
Dawid Medrek	951915ed84	cql3/alter_keyspace_statement: Do not allow for change of RF by more than 1 We want to ensure that when the replication factor of a keyspace changes, it changes by at most 1 per DC if it uses tablets. The rationale for that is to make sure that the old and new quorums overlap by at least one node. After these changes, attempts to change the RF of a keyspace in any DC by more than 1 will fail.	2024-05-28 13:56:46 +02:00
Piotr Smaron	b875151405	Reject ALTER with 'replication_factor' tag This patch removes the support for the "wildcard" replication_factor option for ALTER KEYSPACE when the keyspace supports tablets. It will still be supported for CREATE KEYSPACE so that a user doesn't have to know all datacenter names when creating the keyspace, but ALTER KEYSPACE will require that and the user will have to specify the exact change in replication factors they wish to make by explicitly specifying the datacenter names. Expanding the replication_factor option in the ALTER case is unintuitive and it's a trap many users fell into. See #8881, #15391, #16115	2024-05-28 13:56:46 +02:00
Piotr Smaron	fbd75c5c06	Implement ALTER tablets KEYSPACE statement support This commit adds support for executing ALTER KS for keyspaces with tablets and utilizes all the previous commits. The ALTER KS is handled in alter_keyspace_statement, where a global topology request in generated with data attached to system.topology table. Then, once topology state machine is ready, it starts to handle this global topology event, which results in producing mutations required to change the schema of the keyspace, delete the system.topology's global req, produce tablets mutations and additional mutations for a table tracking the lifetime of the whole req. Tracking the lifetime is necessary to not return the control to the user too early, so the query processor only returns the response while the mutations are sent.	2024-05-28 13:56:42 +02:00
Piotr Smaron	7081215552	Parameterize migration_manager::announce by type to allow executing different raft commands Since ALTER KS requires creating topology_change raft command, some functions need to be extended to handle it. RAFT commands are recognized by types, so some functions are just going to be parameterized by type, i.e. made into templates. These templates are instantiated already, so that only 1 instances of each template exists across the whole code base, to avoid compiling it in each translation unit.	2024-05-28 13:55:11 +02:00
Piotr Smaron	80ed442be2	Introduce TABLET_KEYSPACE event to differentiate processing path of a vnode vs tablets ks	2024-05-28 13:55:11 +02:00
Piotr Smaron	59d3fd615f	Extend system.topology with 3 new columns to store data required to process alter ks global topo req Because ALTER KS will result in creating a global topo req, we'll have to pass the req data to topology coordinator's state machine, and the easiest way to do it is through sytem.topology table, which is going to be extended with 3 extra columns carrying all the data required to execute ALTER KS from within topology coordinator.	2024-05-28 13:55:11 +02:00
Piotr Smaron	6fd0a49b63	Allow query_processor to check if global topo queue is empty With current implementation only 1 global topo req can be executed at a time, so when ALTER KS is executed, we'll have to check if any other global topo req is ongoing and fail the req if that's the case.	2024-05-28 13:55:11 +02:00
Piotr Smaron	c174eee386	Introduce new global topo `keyspace_rf_change` req It will be used when processing ALTER KS statement, but also to create a separate processing path for a KS with tablets (as opposed to a vnode KS).	2024-05-28 13:54:48 +02:00
Kamil Braun	247eb9020b	Merge 'cdc, raft topology: fix and test cdc in the recovery mode' from Patryk Jędrzejczak This PR ensures that CDC keeps working correctly in the recovery mode after leaving the raft-based topology. We update `system.cdc_local` in `topology_state_load` to ensure a node restarting in the recovery mode sees the last CDC generation created by the topology coordinator. Additionally, we extend the topology recovery test to verify that the CDC keeps working correctly during the whole recovery process. In particular, we test that after restarting nodes in the recovery mode, they correctly use the active CDC generation created by the topology coordinator. Fixes scylladb/scylladb#17409 Fixes scylladb/scylladb#17819 Closes scylladb/scylladb#18820 * github.com:scylladb/scylladb: test: test_topology_recovery_basic: test CDC during recovery test: util: start_writes_to_cdc_table: add FIXME to increase CL test: util: start_writes_to_cdc_table: allow restarting with new cql storage_service: update system.cdc_local in topology_state_load	2024-05-28 11:53:28 +02:00
Patryk Jędrzejczak	c44d8eca15	test: test_topology_ops: run correctly without tablets This patch fixes two bugs in `test_topology_ops`: 1. The values of `tablets_enabled` were nonempty strings, so they always evaluated to `True` in the if statement responsible for enabling writing workers only if tablets are disabled. Hence, the writing workers were always disabled. 2. The `topology_experimental_raft suite` uses tablets by default, so we need a config with empty `experimental_features` to disable them. Ensuring this test works with and without tablets is considered a part of 6.0, so we should backport this patch. Closes scylladb/scylladb#18900	2024-05-28 10:08:41 +02:00
Pavel Emelyanov	ae622d711e	sstables-loader: Run loading in its scheduling group Now the loading code has two different paths, and only one of them switches sched group. It's cleaner and more natural to switch the sched group in the loader itself, so that all code paths run in it and don't care switching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:58 +03:00
Pavel Emelyanov	7fefd57b74	sstables-loader: Add scheduling group to constructor So that it knows in which group to run its code in the future. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:22 +03:00
Nadav Har'El	b7fa5261c8	Merge 'Fix parsing of initial tablets by ALTER' from Pavel Emelyanov If the user wants to change the default initial tablets value, it uses ALTER KEYSPACE statement. However, specifying `WITH tablets = { initial: $value }` will take no effect, because statement analyzer only applies `tablets` parameters together with the `replication` ones, so the working statement should be `WITH replication = $old_parameters AND tablets = ...` which is not very convenient. This PR changes the analyzer so that altering `tablets` happens independently from `replication`. Test included. fixes: #18801 Closes scylladb/scylladb#18899 * github.com:scylladb/scylladb: cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS cql3: Fix parsing of ALTER KEYSPACE's tablets parameters cql3: Remove unused ks_prop_defs/prepare_options() argument	2024-05-27 23:10:39 +03:00
Kefu Chai	e42d83dc46	treewide: include used headers before this change, we rely on `seastar/util/std-compat.hh` to include the used headers provided by stdandard library. this was necessary before we moved to a C++20 compliant standard library implementation. but since Seastar has dropped C++17 support. its `seastar/util/std-compat.hh` is not responsible for providing these headers anymore. so, in this change, we include the used headers directly instead of relying on `seastar/util/std-compat.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18883	2024-05-27 17:34:38 +03:00
Anna Stuchlik	806dd5a68a	doc: describe Tablets in ScyllaDB This commit adds the main description of tablets and their benefits. The article can be used as a reference in other places across the docs where we mention tablets. Closes scylladb/scylladb#18619	2024-05-27 15:41:37 +02:00
Botond Dénes	2d79b0106c	Merge 'storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability. If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count. Fixes #18085. Closes scylladb/scylladb#18287 * github.com:scylladb/scylladb: test: Fix flakiness in topology_experimental_raft/test_tablets service: Use tablet read selector to determine which replica to account table stats storage_service: Fix race between tablet split and stats retrieval	2024-05-27 16:32:54 +03:00
Pavel Emelyanov	1003391ed6	cql-pytest: Add validation of ALTER KEYSPACE WITH TABLETS There's a test that checks how ALTER changes the initial tablets value, but it equips the statement with `replication` parameters because of limitations that parser used to impose. Now the `tablets` parameters can come on their own, so add a new test. The old one is kept from compatibility considerations. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:27:45 +03:00
Pavel Emelyanov	a172ef1bdf	cql3: Fix parsing of ALTER KEYSPACE's tablets parameters When the `WITH` doesn't include the `replication` parameters, the `tablets` one is ignoded, even if it's present in the statement. That's not great, those two parameter sets are pretty much independent and should be parsed individually. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:25:38 +03:00
Pavel Emelyanov	8a612da155	cql3: Remove unused ks_prop_defs/prepare_options() argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-27 16:25:22 +03:00
Benny Halevy	c32c418cd5	repair: load_history: warn and ignore all errors Currently, the call to `get_repair_module().check_in_shutdown()` may throw `abort_requested_exception` that causes `repair_service::stop()` to fail, and trigger assertion failure in `~repair_service`. We alredy ignore failure from `update_repair_time`, so expand the logic to cover the whole function body. Fixes scylladb/scylladb#18889 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-27 15:57:54 +03:00
Patryk Jędrzejczak	7c1e6ba8b3	test: test_topology_ops: stop a write worker after the first error `test_topology_ops` is flaky, which has been uncovered by gating in scylladb/scylladb#18707. However, debugging it is harder than it should be because write workers can flood the logs. They may send a lot of failed writes before the test fails. Then, the log file can become huge, even up to 20 GB. Fix this issue by stopping a write worker after the first error. This test is important for 6.0, so we can backport this change. Closes scylladb/scylladb#18851	2024-05-27 13:49:30 +02:00
Piotr Dulikowski	fa142a9ce7	Merge 'qos/raft_service_level_distributed_data_accessor: print correct error message when trying to modify a service level in recovery mode' from Michał Jadwiszczak Raft service levels are read-only in recovery mode. This patch adds check and proper error message when a user tries to modify service levels in recovery mode. Fixes https://github.com/scylladb/scylladb/issues/18827 Closes scylladb/scylladb#18841 * github.com:scylladb/scylladb: test/auth_cluster/test_raft_service_levels: try to create sl in recovery service/qos/raft_sl_dda: reject changes to service levels in recovery mode service/qos/raft_sl_dda: extract raft_sl_dda steps to common function	2024-05-27 13:26:06 +02:00
Kefu Chai	cbc83f92d3	.github: add iwyu workflow iwyu is short for "include what you use". this workflow is added to identify missing "#include" and extraneous "#include" in C++ source files. This workflow is triggered when a pull request is created targetting the "master" branch. It uses the clang-include-cleaner tool provided by clang-tools package to analyze all the ".cc" and ".hh" source files. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18122	2024-05-27 14:19:11 +03:00
Kefu Chai	e70b116333	api/api-doc/utils: fix a typo in description s/mintues/minutes/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18869	2024-05-27 14:15:23 +03:00
Kefu Chai	2d7545ade6	test/lib: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18884	2024-05-27 14:13:51 +03:00
Piotr Smaron	06008970fb	New raft cmd for both schema & topo changes Allows executing combined topology & schema mutations under a single RAFT command	2024-05-27 12:48:44 +02:00
Piotr Smaron	cb40f13831	Add storage service to query processor Query processor needs to access storage service to check if global topology request is still ongoing and to be able to wait until it completes.	2024-05-27 12:48:44 +02:00
Paweł Zakrzewski	c888945354	tablets: tests for adding/removing replicas Note we're suppressing a UBSanitizer overflow error in UTs. That's because our linter complains about a possible overflow, which never happens, but tests are still failing because of it.	2024-05-27 12:48:44 +02:00
Paweł Zakrzewski	65deddd967	tablet_allocator: make load_balancer_stats_manager configurable by name This is needed, because the same name cannot be used for 2 separate entities, because we're getting double-metrics-registration error, thus the names have to be configurable, not hardcoded.	2024-05-27 12:48:44 +02:00
Benny Halevy	38845754c4	repair_service: debug stop Seen the following unexplained assertion failure with pytest -s -v --scylla-version=local_tarball --tablets repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc ``` INFO 2024-05-27 11:18:05,081 [shard 0:main] init - Shutting down repair service INFO 2024-05-27 11:18:05,081 [shard 0:main] task_manager - Stopping module repair INFO 2024-05-27 11:18:05,081 [shard 0:main] task_manager - Unregistered module repair INFO 2024-05-27 11:18:05,081 [shard 1:main] task_manager - Stopping module repair INFO 2024-05-27 11:18:05,081 [shard 1:main] task_manager - Unregistered module repair scylla: repair/row_level.cc:3230: repair_service::~repair_service(): Assertion `_stopped' failed. Aborting on shard 0. Backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3f040c /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x41c7a1 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dbaf /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x8e883 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x3dafd /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2687e /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x2679a /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x36186 0x26f2428 0x10fb373 0x10fc8b8 0x10fc809 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456c6d /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x456bcf 0x10fc65b 0x10fc5bc 0x10808d0 0x1080800 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff22f /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x4003b7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff888 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36dea8 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d0e2 0x101cefa 0x105a390 0x101bde7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x101a764 ``` Decoded: ``` ~repair_service at ./repair/row_level.cc:3230 ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491 (inlined by) ~shared_ptr_count_for at ././seastar/include/seastar/core/shared_ptr.hh:491 ~shared_ptr at ././seastar/include/seastar/core/shared_ptr.hh:569 (inlined by) seastar::shared_ptr<repair_service>::operator=(seastar::shared_ptr<repair_service>&&) at ././seastar/include/seastar/core/shared_ptr.hh:582 (inlined by) seastar::shared_ptr<repair_service>::operator=(decltype(nullptr)) at ././seastar/include/seastar/core/shared_ptr.hh:588 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:727 (inlined by) seastar::future<void> seastar::futurize<seastar::future<void> >::invoke<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&) at ././seastar/include/seastar/core/future.hh:2035 (inlined by) seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::smp_submit_to_options, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:367 seastar::futurize<std::invoke_result<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>::type>::type seastar::smp::submit_to<seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}>(unsigned int, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}::operator()(unsigned int) const::{lambda()#1}&&) at ././seastar/include/seastar/core/smp.hh:394 (inlined by) operator() at ././seastar/include/seastar/core/sharded.hh:725 (inlined by) seastar::future<void> std::__invoke_impl<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(std::__invoke_other, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:61 (inlined by) std::enable_if<is_invocable_r_v<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>, seastar::future<void> >::type std::__invoke_r<seastar::future<void>, seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int>(seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/invoke.h:114 (inlined by) std::_Function_handler<seastar::future<void> (unsigned int), seastar::sharded<repair_service>::stop()::{lambda(seastar::future<void>)#1}::operator()(seastar::future<void>) const::{lambda(unsigned int)#1}>::_M_invoke(std::_Any_data const&, unsigned int&&) at /usr/bin/../lib/gcc/x86_64-redhat-linux/13/../../../../include/c++/13/bits/std_function.h:290 ``` FWIW, gdb crashed when opening the coredump. This commit will help catch the issue earlier when repair_service::stop() fails (and it must never fail) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-05-27 13:02:10 +03:00
Kefu Chai	61b5bfae6d	docs: fix typos in dev documents these typos were identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18871	2024-05-27 12:28:34 +03:00
Botond Dénes	c137f84535	Merge 'Mark prepare_statement as immutable' from Pavel Emelyanov Users of prepared statement reference it with the help of "smart" pointers. None of the users are supposed to modify the object they point to, so mark the respective pointer type as `pointer<const prepared_statement>`. Also mark the fields of prepared statement itself with const's (some of them already are) Closes scylladb/scylladb#18872 * github.com:scylladb/scylladb: cql3: Mark prepared_statement's fields const cql3: Define prepared_statement weak pointer as const	2024-05-27 12:27:54 +03:00
Kefu Chai	f1f3f009e7	docs: fix typos in upgrade document s/Montioring/Monitoring/ Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18870	2024-05-27 12:26:59 +03:00
Patryk Jędrzejczak	2111cb01df	test: test_topology_recovery_basic: test CDC during recovery In topology on raft, management of CDC generations is moved to the topology coordinator. We extend the topology recovery test to verify that the CDC keeps working correctly during the whole recovery process. In particular, we test that after restarting nodes in the recovery mode, they correctly use the active CDC generation created by the topology coordinator. A node restarting in the recovery mode should learn about the active generation from `system.cdc_local` (or from gossip, but we don't want to rely on it). Then, it should load its data from `system.cdc_generations_v3`. Fixes scylladb/scylladb#17409	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	388db33dec	test: util: start_writes_to_cdc_table: add FIXME to increase CL	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	68b6e8e13e	test: util: start_writes_to_cdc_table: allow restarting with new cql This patch allows us to restart writing (to the same table with CDC enabled) with a new CQL session. It is useful when we want to continue writing after closing the first CQL session, which happens during the `reconnect_driver` call. We must stop writing before calling `reconnect_driver`. If a write started just before the first CQL session was closed, it would time out on the client. We rename `finish_and_verify` - `stop_and_verify` is a better name after introducing `restart`.	2024-05-27 10:39:04 +02:00
Patryk Jędrzejczak	4351eee1f6	storage_service: update system.cdc_local in topology_state_load When the node with CDC enabled and with the topology on raft disabled bootstraps, it reads system.cdc_local for the last generation. Nodes with both enabled use group0 to get the last generation. In the following scenario with a cluster of one node: 1. the node is created with CDC and the topology on raft enabled 2. the user creates table T 3. the node is restarted in the recovery mode 4. the CDC log of T is extended with new entries 5. the node restarts in normal mode The generation created in the step 3 is seen in system_distributed.cdc_generation_timestamps but not in system.cdc_generations_v3, thus there are used streams that the CDC based on raft doesn't know about. Instead of creating a new generation, the node should use the generation already committed to group0. Save the last CDC generation in the system.cdc_local during loading the topology state so that it is visible for CDC not based on raft. Fixes scylladb/scylladb#17819	2024-05-27 10:39:04 +02:00
Kefu Chai	f70e888ed5	build: cmake: pass -fprofile-list to compiler to mirror the behavior of the build.ninja generated by configure.py Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18734	2024-05-27 11:22:55 +03:00
Botond Dénes	47dbf23773	Merge 'Rework view services and system-distributed-keyspace dependencies' from Pavel Emelyanov The system-distributed-keyspace and view-update-generator often go in pair, because streaming, repair and sstables-loader (via distributed-loader) need them booth to check if sstable is staging and register it if it's such. The check is performed by messing directly with system_distributed.view_build_status table, and the registration happens via view-update-generator. That's not nice, other services shouldn't know that view status is kept in system table. Also view-update-generator is a service to generae and push view updates, the fact that it keeps staging sstables list is the implementation detail. This PR replaces dependencies on the mentioned pair of services with the single dependency on view-builder (repair, sstables-loader and stream-manager are enlightened) and hides the view building-vs-staging details inside the view_builder. Along the way, some simplification of repair_writer_impl class is done. Closes scylladb/scylladb#18706 * github.com:scylladb/scylladb: stream_manager: Remove system_distributed_keyspace and view_update_generator repair: Remove system_distributed_keyspace and view_update_generator streaming: Remove system_distributed_keyspace and view_update_generator sstables_loader: Remove system_distributed_keyspace and view_update_generator distributed_loader: Remove system_distributed_keyspace and view_update_generator view: Make register_staging_sstable() a method of view_builder view: Make check_view_build_ongoing() helper a method of view_builder streaming: Proparage view_builder& down to make_streaming_consumer() repair: Keep view_builder& on repair_writer_impl distributed_loader: Propagate view_builder& via process_upload_dir() stream_manager: Add view builder dependency repair_service: Add view builder dependency sstables_loader: Add view_bulder dependency main: Start sstables loader later repair: Remove unwanted local references from repair_meta	2024-05-27 10:51:11 +03:00
Botond Dénes	e0f4d79f3b	Merge 'Do not export statement scheduling group from database' from Pavel Emelyanov Database used to be (and still is in many ways) an object used to get configuration from. Part of the configuration is the set of pre-configured scheduling groups. That's not nice, services should use each other for some real need, not as proxies to configuration. This patch patches the places that explicitly switch to statement group _not_ to use database to get the group itself. fixes: #17643 Closes scylladb/scylladb#18799 * github.com:scylladb/scylladb: database: Don't export statement scheduling group test: Use async attrs and cql-test-env scheduling groups test: Use get_scheduling_groups() to get scheduling groups api: Don't switch sched group to start/stop protocol servers main: Don't switch sched group to start protocol servers code: Switch to sched group in request_stop_server() code: Switch to server sched group in start() protocol_server: Keep scheduling group on board code: Add scheduling group to controllers redis: Coroutinize start() method	2024-05-27 10:48:33 +03:00
Kefu Chai	46d993a283	test: revert `4c1b6f04` in `4c1b6f04`, we added a concept for fmt::is_formattable<>. but it was not ncessary. the fmt::is_formattable<> trait was enough. the reason `4c1b6f04` was actually a leftover of a bigger change which tried to add trait for the cases where fmt::is_formattable<> was not able to cover. but that was based on the wrong impression that fmt::is_formattable<> should be able to work with container types without including, for instance `fmt/ranges.h`. but in `222dbf2c`, we include `fmt/ranges.h` in tests, where the range-alike formatter is used, that enables `fmt::is_formattable<>` to tell that container types are formattable. in short, `4c1b6f04` was created based on a misunderstanding, and it was a reduced type trait, which is proved to be not necessary. so, in this change, it is dropped. but the type constraints is preserved to make the build failure more explicit, if the fallback formatter does not match with the type to be formatted by Boost.test. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18879	2024-05-27 10:14:59 +03:00
Marcin Maliszkiewicz	2ab143fb40	db: auth: move auth tables to system keyspace Separate keyspace which also behaves as system brings little benefit while creating some compatibility problems like schema digest mismatch during rollback. So we decided to move auth tables into system keyspace. Fixes https://github.com/scylladb/scylladb/issues/18098 Closes scylladb/scylladb#18769	2024-05-26 22:30:42 +03:00
Avi Kivity	56d523b071	Merge 'build, test: disable operator<< for vector and unordered_map' from Kefu Chai this series disables operator<<:s for vector and unordered_map, and drop operator<< for mutation, because we don't have to keep it to work with these operator:s anymore. this change is a follow up of https://github.com/scylladb/seastar/issues/1544 this change is a cleanup. so no need to backport Closes scylladb/scylladb#18866 * github.com:scylladb/scylladb: mutation,db: drop operator<< for mutation and seed_provider_type& build: disable operator<< for vector and unordered_map db/heat_load_balance: include used header test: define a more generic boost_test_print_type test/boost: define fmt::formatter for service_level_controller_test.cc test/boost: include test/lib/test_utils.hh	2024-05-26 19:19:20 +03:00
Kefu Chai	4e9596a5a9	treewide: replace std::result_of_t with std::invoke_result_t in theory, std::result_of_t should have been removed in C++20. and std::invoke_result_t is available since C++17. thanks to libstdc++, the tree is compiling. but we should not rely on this. so, in this change, we replace all `std::result_of_t` with `std::invoke_result_t`. actually, clang + libstdc++ is already warning us like: ``` In file included from /home/runner/work/scylladb/scylladb/multishard_mutation_query.cc:9: In file included from /home/runner/work/scylladb/scylladb/schema/schema_registry.hh:11: In file included from /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/unordered_map:38: Warning: /usr/bin/../lib/gcc/x86_64-linux-gnu/12/../../../../include/c++/12/type_traits:2624:5: warning: 'result_of<void (noop_compacted_fragments_consumer::*(noop_compacted_fragments_consumer &))()>' is deprecated: use 'std::invoke_result' instead [-Wdeprecated-declarations] 2624 \| using result_of_t = typename result_of<_Tp>::type; \| ^ /home/runner/work/scylladb/scylladb/mutation/mutation_compactor.hh:518:43: note: in instantiation of template type alias 'result_of_t' requested here 518 \| if constexpr (std::is_same_v<std::result_of_t<decltype(&GCConsumer::consume_end_of_stream)(GCConsumer&)>, void>) { \| ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18835	2024-05-26 16:45:42 +03:00
Pavel Emelyanov	9108952a52	test/cql-pytest: Add test for token() filter againts mutation_fragments() When selecting from mutation_fragments(table) one may want to apply token() filtering againts partition key. This doesn't work currently, but used to crash. This patch adds a regression test for that refs: #18637 refs: #18768 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18759	2024-05-26 15:31:20 +03:00
Kefu Chai	125464f2d9	migration_manager: do not reference moved-away smart pointer this change is inspired by clang-tidy. it warns like: ``` [752/852] Building CXX object service/CMakeFiles/service.dir/migration_manager.cc.o Warning: /home/runner/work/scylladb/scylladb/service/migration_manager.cc:891:71: warning: 'view' used after it was moved [bugprone-use-after-move] 891 \| db.get_notifier().before_create_column_family(keyspace, view, mutations, ts); \| ^ /home/runner/work/scylladb/scylladb/service/migration_manager.cc:886:86: note: move occurred here 886 \| auto mutations = db::schema_tables::make_create_view_mutations(keyspace, std::move(view), ts); \| ^ ``` in which, `view` is an instance of view_ptr which is a type with the semantics of shared pointer, it's backed by a member variable of `seastar::lw_shared_ptr<const schema>`, whose move-ctor actually resets the original instance. so we are actually accessing the moved-away pointer in ```c++ db.get_notifier().before_create_column_family(keyspace, view, mutations, ts) ``` so, in this change, instead of moving away from `view`, we create a copy, and pass the copy to `db::schema_tables::make_create_view_mutations()`. this should be fine, as the behavior of `db::schema_tables::make_create_view_mutations()` does not rely on if the `view` passed to it is a moved away from it or not. the change which introduced this use-after-move was `88a5ddabce` Refs `88a5ddabce` Fixes #18837 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18838	2024-05-26 12:04:00 +03:00
Kefu Chai	dbfdc71d2d	treewide: fix typos in comment and error messages these typos were identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18868	2024-05-26 11:54:36 +03:00
Kefu Chai	35e1fcde1f	mutation,db: drop operator<< for mutation and seed_provider_type& since we've migrated away from the generic homebrew formatters for range-alike containers, there is no need to keep there operator<< around -- they were preserved in order to work with the container formatters which expect operator<< of the elements. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	9bd9f283f4	build: disable operator<< for vector and unordered_map seastar provides an option named `Seastar_DEPRECATED_OSTREAM_FORMATTERS` to enable the operator<< for `std::vector` and `std::unordered_map`, and this option is enabled by default. but we intent to avoid using them, so that we can use the fmt::formatter specializations when Boost.test prints variables. if we keep these two operator<< enabled, Boost.test would use them when printing variables to be compaired then the check fails, but if elements in the vector or unordered_map to be compaired does do not provide operator<<, compiling would fail. so, in this change, let's disable these operator<< implementations. this allows us to ditch the operator<< implementations which are preserved only for testing. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	8e0a6ea021	db/heat_load_balance: include used header in this header, we use `hr_logger.trace("returned _pp={}", p)` to print a `vector<float>`, so we we need to include `fmt/ranges.h`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 13:44:55 +08:00
Kefu Chai	4c1b6f0476	test: define a more generic boost_test_print_type fmt::is_formattable<T>::value is false, even if * T is a container of U, and * fmt::is_formattable<U>, and * U can be formatted using fmt::formatter so, we have to define a more generic boost_test_print_type() for the all types supported by {fmt}. it will help us to ditch the operator<< for vector and unordered_map in Seastar, and allow us to use the fmt::formatter specialization of the element types. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Kefu Chai	bfe918ac9e	test/boost: define fmt::formatter for service_level_controller_test.cc since we are moving away for operator<< based formatter, more and more types now only have {fmt} based formatters. the same will apply to the STL container types after ditching the generic homebrew formatter in to_string.hh, so to be prepared for the change, let's add the fmt::formatter for tests as well. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Kefu Chai	222dbf2ce4	test/boost: include test/lib/test_utils.hh this change was created in the same spirit of 505900f18f. because we are deprecating the operator<< for vector and unorderd_map in Seastar, some tests do not compile anymore if we disable these operators. so to be prepared for the change disabling them, let's include test/lib/test_utils.hh for accessing the printer dedicated for Boost.test. and also '#include <fmt/ranges.h>' when necessary, because, in order to format the ranges using {fmt}, we need to use fmt/ranges.h. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-05-26 12:32:43 +08:00
Pavel Emelyanov	cf564d7a54	cql3: Mark prepared_statement's fields const Not only users of prepared_statement point to immutable object, but the class itself doesn't assume modifications of its fields, so mark them const too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-25 16:41:30 +03:00
Pavel Emelyanov	828862bdff	cql3: Define prepared_statement weak pointer as const The pointer points to immutable prepared_statement, so tune up the type respectively. Tracing has its own alieas for it, fix one too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-25 16:40:35 +03:00
Michał Chojnowski	de798775fd	test: test_coordinator_queue_management: wait for logs properly The modified lines of code intend to await the first appearance of a log on one of the nodes. But due to misplaced parentheses, instead of creating a list of log-awaiting tasks with a list comprehension, they pass a generator expression to asyncio.create_task(). This is nonsense, and it fails immediately with a type error. But since they don't actually check the result of the await, the test just assumes that the search completed successfully. This was uncovered by an upgrade to Python 3.12, because its typing is stronger and asyncio.create_task() screams when it's passed a regular generator. This patch fixes the bad list comprehension, and also adds an error check on the completed awaitables (by calling `await` on them). Fixes #18740 Closes scylladb/scylladb#18754	2024-05-25 10:54:44 +03:00
Pavel Emelyanov	31edab277a	database: Don't export statement scheduling group Now all the code gets this group from elsewhere and the method can be removed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	ddc511872e	test: Use async attrs and cql-test-env scheduling groups Continuation of the prevuous patch, but with its own flavor. There's a manual test that wants to run seastar thread in statement scheduling group and gets one from database. This patch makes it get the group from cql-test-env and, while at it, makes it switch to that group using thread attributes passed to async() method. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	2e3a057db1	test: Use get_scheduling_groups() to get scheduling groups There's such a helper in cql-test-env that other tests use to get sched groups from. Few other tests (ab)use databse for that, this patch fixes those remnants. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	d86a8252d4	api: Don't switch sched group to start/stop protocol servers All the protocol servers implementations now maintain scheduling group on their own, so the API handler can stop caring Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	ee0239b2ef	main: Don't switch sched group to start protocol servers Now each of them does this switch on its own Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	7c76a35e0b	code: Switch to sched group in request_stop_server() This method is used to stop protocol server in the runtime (via the API). Since it's not just "kick it and wait to wrap up", it's needed to perform this in the inherited sched group too. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 18:00:01 +03:00
Pavel Emelyanov	fe349a73c8	code: Switch to server sched group in start() This patch makes all protocol servers implementations use the inherited sched group in their start methods. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:56:02 +03:00
Pavel Emelyanov	bf5894cc69	protocol_server: Keep scheduling group on board The groups is now mandatory for the real protocol server implementation to initialize. Previous patch make all of them get the sched group as constructor argument, so that's where to take it from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:54:29 +03:00
Pavel Emelyanov	fc3c3e1099	code: Add scheduling group to controllers There are four of them currently -- transport, thrift, alternator and redis. This patch makes main pass to all the statement scheduling group as constructor argument. Next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:53:16 +03:00
Pavel Emelyanov	82511f3c25	redis: Coroutinize start() method Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-24 17:52:48 +03:00
Michał Jadwiszczak	af0b6bcc56	test/auth_cluster/test_raft_service_levels: try to create sl in recovery	2024-05-23 17:49:59 +02:00
Pavel Emelyanov	8906126a2c	stream_manager: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:56 +03:00
Pavel Emelyanov	84ef6a8179	repair: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:56 +03:00
Pavel Emelyanov	ae2dcdc7c2	streaming: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:55 +03:00
Pavel Emelyanov	afa94d2837	sstables_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	b728857954	distributed_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	66a8035b64	view: Make register_staging_sstable() a method of view_builder Callers of it had just checked if an sstable still has some views building, so the should talk to view-builder to register the sstable that's now considered to be staging. Effectively. this is to hide the view-update-generator from other services and make them communicate with the builder only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	92ff0d3fc3	view: Make check_view_build_ongoing() helper a method of view_builder This helper checks if there's an ongoing build of a view, and it's in fact internal to view-builder, who keeps its status in one of its system tables. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	57517d5987	streaming: Proparage view_builder& down to make_streaming_consumer() Continuation of the previous patch. Repair itself doesn't need it, but streaming consumer does. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:46 +03:00
Pavel Emelyanov	5e6893075d	repair: Keep view_builder& on repair_writer_impl Preparation patch, next patches will make use of this new member Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:29 +03:00
Pavel Emelyanov	0d946a5fdf	distributed_loader: Propagate view_builder& via process_upload_dir() Preparation to next patches, they'll make use of this new argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	d917b06857	stream_manager: Add view builder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f0f1097d0c	repair_service: Add view builder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f269a37541	sstables_loader: Add view_bulder dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	ff63f8b1a5	main: Start sstables loader later This service is on its own, nothing depends on it. Neither it can work before system distributed keyspace is started, so move it lower. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Pavel Emelyanov	f4341ea088	repair: Remove unwanted local references from repair_meta When constructed, the class copies local references to services just to push them into make_repair_writer() later in the same initializers list. There's no need in keeping those references. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Marcin Maliszkiewicz	9adf74ae6c	docs: remove note about performance degradation with default superuser This doesn't apply for auth-v2 as we improved data placement and removed cassandra quirk which was setting different CL for some default superuser involved operations. Fixes #18773 Closes scylladb/scylladb#18785	2024-05-23 13:16:11 +03:00
Kefu Chai	dfeef4e4e8	build: use f-string when appropriate for better readability Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#18808	2024-05-23 11:19:39 +03:00
Anna Stuchlik	2da25cca1a	doc: enable publishing docs for branch-6.0 This commit enables publishing documentation from branch-6.0. The docs will be published as UNSTABLE (the warning about version 6.0 being unstable will be displayed). Closes scylladb/scylladb#18832	2024-05-23 10:37:55 +03:00
Michał Jadwiszczak	ee08d7fdad	service/qos/raft_sl_dda: reject changes to service levels in recovery mode When a cluster goes into recovery mode and service levels were migrated to raft, service levels become temporarily read-only. This commit adds a proper error message in case a user tries to do any changes.	2024-05-23 08:18:03 +02:00
Michał Jadwiszczak	2b56158d13	service/qos/raft_sl_dda: extract raft_sl_dda steps to common function When setting/dropping a service level using raft data accessor, the same validation steps are executed (this_shard_id = 0 and guard is present). To not duplicate the calls in both functions, they can be extracted to a helper function.	2024-05-23 08:16:00 +02:00
Raphael S. Carvalho	e7246751b6	test: Fix flakiness in topology_experimental_raft/test_tablets One source of flakiness is in test_tablet_metadata_propagates_with_schema_changes_in_snapshot_mode due to gossiper being aborted prematurely, and causing reconnection storm. Another is test_tablet_missing_data_repair which is flaky due an issue in python driver that session might not reconnect on rolling restart (tracked by https://github.com/scylladb/python-driver/issues/230) Refs #15356. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 17:02:29 -03:00
Raphael S. Carvalho	eb8ef38543	replica: Fix tablet's compaction_groups_for_token_range() with unowned range File-based tablet streaming calls every shard to return data of every group that intersects with a given range. After dynamic group allocation, that breaks as the tablet range will only be present in a single shard, so an exception is thrown causing migration to halt during streaming phase. Ideally, only one shard is invoked, but that's out of the scope of this fix and compaction_groups_for_token_range() should return empty result if none of the local groups intersect with the range. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#18798	2024-05-22 20:15:33 +03:00
Anna Stuchlik	6626d72520	doc: replace Raft-disabled with Raft-enabled procedure This commit fixes the incorrect Raft-related information on the Handling Cluster Membership Change Failures page introduced with https://github.com/scylladb/scylladb/pull/17500. The page describes the procedure for when Raft is disabled. Since 6.0, Raft for consistent schema management is enabled and mandatory (cannot be disabled), this commit adds the procedure for Raft-enabled setups. Closes scylladb/scylladb#18803	2024-05-22 17:45:20 +02:00
David Garcia	de2b30fafd	docs: docs: autogenerate metrics Autogenerates metrics documentation using the scripts/get_description.py script introduced in #17479 docs: add beta Closes scylladb/scylladb#18767	2024-05-22 15:49:41 +03:00
Raphael S. Carvalho	551bf9dd58	service: Use tablet read selector to determine which replica to account table stats Since we introduced the ability to revert migrations, we can no longer rely on ordering of transition stages to determine whether to account pending or leaving replica. Let's use read selector instead, which correctly has info which replica type has correct stats info. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 09:25:29 -03:00
Raphael S. Carvalho	abcc68dbe7	storage_service: Fix race between tablet split and stats retrieval If tablet split is finalized while retrieving stats, the saved erm, used by all shards, will be invalidated. It can either cause incorrect behavior or crash if id is not available. It's worked by feeding local tablet map into the "coordinator" collecting stats from all shards. We will also no longer have a snapshot of erm shared between shards to help intra-node migration. This is simplified by serializing token metadata changes and the retrieval of the stats (latter should complete pretty fast, so it shouldn't block the former for any significant time). Fixes #18085. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-05-22 09:25:29 -03:00
Yaron Kaikov	9cc42c98f5	[Mergify] update configuration for 6.0 Updating mergify conf to support 6.0 release Closes scylladb/scylladb#18823	2024-05-22 14:28:43 +03:00
Yaron Kaikov	219daf3489	Update ScyllaDB version to: 6.1.0-dev	2024-05-22 14:08:56 +03:00
Botond Dénes	2f87bfd634	Update tools/java submodule * tools/java 4ee15fd9...88809606 (2): > Update Scylla Java driver to 3.11.5.3. > install-dependencies.sh: s/python/python3/ [botond: regenerate toolchain image] Closes scylladb/scylladb#18790	2024-05-22 11:39:02 +03:00
Asias He	1a03e3d5ae	repair: Add missing db/config.hh Since commit `952dfc6157` "repair: Introduce repair_partition_count_estimation_ratio config option", get_config() is used. We need to include db/config.hh for that. Spotted when backporting to 5.4 branch. Refs #18615 Closes scylladb/scylladb#18780	2024-05-22 11:00:16 +03:00
Nadav Har'El	dc80b5dafe	test/alternator: do not write to auth tables As part of the Alternator test suite, we check Alternator's support for authentication. Alternator maps Scylla's existing CQL roles to AWS's authentication: * AWS's access_key_id <- the name of the CQL role * AWS's secret_access_key <- the salted hash of the password of the CQL role Before this patch, the Alternator test suite created a new role with a preset salted hash (role "alternator", salted hash "secret_pass") and than used that in the tests. However, with the advent of Raft-based metadata it is wrong to write directly to the roles table, and starting with #17952 such writes will be outright forbidden. But we don't actually need to create a new CQL role! We already have a perfectly good CQL role called "cassandra", and our tests already use it. So what this patch does is to have the Alternator tests (conftest.py) read from the roles system-table the salted hash of the "cassandra" role, and then use that - instead of the hard-coded pair alternator/secret_pass - in the tests. A couple more tests assumed that the role name that was used was "alternator", but now it was changed to "cassandra" so those tests needed minor fixes as well. After this patch, the Alternator tests no longer write to the roles system table. Moreover, after this patch, test/alternator/run and test/alternator/suite.yaml (used when testing with test.py) no longer need to do extra ugly CQL setup before starting the Alternator tests. Fixes #18744 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#18771	2024-05-22 11:00:15 +03:00
Avi Kivity	c37f2c2984	version: bump version to 6.0.0-dev The next release will be called 6.0, not 5.5, so bump the version to reflect that. Closes scylladb/scylladb#18789	2024-05-22 11:00:15 +03:00
Kefu Chai	0610eda1b5	Update seastar submodule * seastar 42f15a5f...914a4241 (33): > sstring: deprecate formatters for vector and unordered_map > github: use fedora:40 image for testing > github: add 2 testing combinations back to the matrix > github: extract test.yaml into a resusable workflow > build: use initial-exec TLS when building seastar as shared library > coroutine: preserve this->container before calling dtor > smp: allocate hugepages eagerly when kernel support is available > shared_mutex: Add tests for std::shared_lock and std::unique_lock > shared_mutex: Add RAII locks > README.md: replace C++17 with C++23 > treewide: do not check for SEASTAR_COROUTINES_ENABLED > build: support enabled options when building seastar-module > treewide: include required header files > build: move add_subdirectory(src) down > README.md: replace CircleCI badge with GitHub badge > weak_ptr: Make it possible to convert to "compatible" pointers > circleci: remove circleci CI tests > build: use DPDK_MACHINE=haswell when testing dpdk build on github-hosted runner > build: add --dpdk-machine option to configure.py > build: stop translating -march option to names recognized by DPDK > github: encode matrix.enables in cache key > doc/prometheus.md: add metrics? in URL exporter URI > tests/unit/metrics_tester: use deferred_stop() when appropriate > httpd: mark http_server_control::stop() noexcept > reactor: print scheduling group along with backtrace > reactor: update lowres_clock when max_task_backlog is exceeded > tests: add test for prometheus exporter > tests: move apps/metrics_tester to tests/unit > apps/metrics_tester: keep metrics with "private" labels > apps/metrics_tester: support "labels" in conf.yaml > apps/metrics_tester: stop server properly > apps/metrics_tester: always start exporter > apps/metrics_tester: fix typo in conf-example.yaml Closes scylladb/scylladb#18800	2024-05-22 11:00:15 +03:00
Pavel Emelyanov	26eda88401	test/tablets: Check that after RF change data is replicated properly There's a test that checks system.tablets contents to see that after changing ks replication factor via ALTER KEYSPACE the tablet map is updated properly. This patch extends this test that also validates that mutations themselves are replicated according to the desired replication factor. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18644	2024-05-22 11:00:15 +03:00
Anna Stuchlik	92bc8053e2	doc: remove outdated MV error from Troubleshooting This commit removes the MV error message, which only affect older versions of ScyllaDB, from the Troubleshooting section. Fixes https://github.com/scylladb/scylladb/issues/17205 Closes scylladb/scylladb#17229	2024-05-21 19:02:31 +03:00
Avi Kivity	2bf2e24fcd	Merge 'Coroutinize some auth and service levels related functions' from Marcin Maliszkiewicz Coroutinization will help improve readability and allow easier changes planned for this code. This work was separated from https://github.com/scylladb/scylladb/pull/17910 to make it smoother to review and merge. Closes scylladb/scylladb#18788 * github.com:scylladb/scylladb: cql3: coroutinize create/alter/drop service levels auth: coroutinize alter_role and drop_role auth: coroutinize grant_permissions and revoke_permissions auth: coroutinize create_role cql3: statements: co-routinize auth related statements cql3: statements: release unused guard explicitly in auth related statements	2024-05-21 17:45:19 +03:00
Botond Dénes	5e41dd28c7	Merge 'Sanitize sl controller draining' from Pavel Emelyanov The sl-controller is stopped in three steps. The first (and instantly the second) is unsubscribing from lifecycle notification and draining. The third is stop itself. First two steps are "out of order" as compared to the desired start-stop sequence of any service, this patch fixes these steps. After this PR the drain_on_shutdown() (the call that drains the node upon stop) finally becomes clean and tidy and is no longer accompanied by ad-hoc fellow drains/stops/aborts/whatever. refs: #2737 Closes scylladb/scylladb#18731 * github.com:scylladb/scylladb: sl_controller: Remove drain() method sl_controller: Move abort kicking into do_abort() main,sl_controller: Subscribe for early abort main: Unsubscribe sl controller next to subscribing	2024-05-21 17:16:23 +03:00
Marcin Maliszkiewicz	570b766e8b	cql3: coroutinize create/alter/drop service levels	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	f98cb6e309	auth: coroutinize alter_role and drop_role	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	21556c39d3	auth: coroutinize grant_permissions and revoke_permissions	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	6709947ccf	auth: coroutinize create_role	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	7f5d259b54	cql3: statements: co-routinize auth related statements	2024-05-21 10:37:26 +02:00
Marcin Maliszkiewicz	dee17e5ab6	cql3: statements: release unused guard explicitly in auth related statements Currently guard is released immediately because those functions are based on continuations and guard lifetime is not extended. In the following commit we rewrite those functions to coroutines and lifetime will be automatically extended. This would deadlock the client because we'd try to take second guard inside auth code without releasing this unused one. In the future commits auth guard will be removed and the one from statement will be used but this needs some more code re-arrangements.	2024-05-21 10:37:26 +02:00
Pavel Emelyanov	fed457eb06	sl_controller: Remove drain() method The draining now only consists of waiting for the data update future to resolve. It can be safely moved to .stop() (i.e. -- later) because its stopping had already been initiated by abort-source, and no other services depend on sl-controller to be stopped and drained. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-21 09:42:16 +03:00
Pavel Emelyanov	535e5f4ae7	sl_controller: Move abort kicking into do_abort() Draining sl controller consists of two parts -- first, kicks the wrap-up process by aborting operations, breaking semaphores, etc. It's no-waiting part. At last there goes co_await of the completion future. This part moves the no-waiting part into recently introduced abort subscription, so that wrap-up starts few bits earlier. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-21 09:42:16 +03:00
Pavel Emelyanov	8d4c8711fa	main,sl_controller: Subscribe for early abort There's stop-signal in main that fires an abort source on stop. Lots of other services are subscribed in it, add the sl-controller too. For now it's a no-op, but next patches will make use of it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-20 21:26:31 +03:00
Pavel Emelyanov	5105ee3284	main: Unsubscribe sl controller next to subscribing The subscription only handles on_leave_cluster() and only for local node, so even if controller gets subscribed for longer, it won't do any harm. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-20 21:26:31 +03:00

2011 changed files with 68870 additions and 36548 deletions

209

.clang-format Normal file

View File

@@ -0,0 +1,209 @@
 ---
 Language: Cpp
 AccessModifierOffset: -4
 AlignAfterOpenBracket: DontAlign
 AlignArrayOfStructures: None
 AlignConsecutiveAssignments:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: true
 AlignConsecutiveBitFields:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: false
 AlignConsecutiveDeclarations:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: false
 AlignConsecutiveMacros:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCompound: false
   PadOperators: false
 AlignConsecutiveShortCaseStatements:
   Enabled: false
   AcrossEmptyLines: false
   AcrossComments: false
   AlignCaseColons: false
 AlignEscapedNewlines: Right
 AlignOperands: Align
 AlignTrailingComments:
   Kind: Always
   OverEmptyLines: 0
 AllowAllArgumentsOnNextLine: true
 AllowAllParametersOfDeclarationOnNextLine: true
 AllowShortBlocksOnASingleLine: Never
 AllowShortCaseLabelsOnASingleLine: false
 AllowShortEnumsOnASingleLine: true
 AllowShortFunctionsOnASingleLine: None
 AllowShortIfStatementsOnASingleLine: Never
 AllowShortLambdasOnASingleLine: Empty
 AllowShortLoopsOnASingleLine: false
 AlwaysBreakAfterDefinitionReturnType: None
 AlwaysBreakAfterReturnType: None
 AlwaysBreakBeforeMultilineStrings: false
 AlwaysBreakTemplateDeclarations: Yes
 AttributeMacros:
   - __capability
 BinPackArguments: true
 BinPackParameters: true
 BitFieldColonSpacing: Both
 BraceWrapping:
   AfterCaseLabel: false
   AfterClass: false
   AfterControlStatement: Never
   AfterEnum: false
   AfterExternBlock: false
   AfterFunction: false
   AfterNamespace: false
   AfterObjCDeclaration: false
   AfterStruct: false
   AfterUnion: false
   BeforeCatch: false
   BeforeElse: false
   BeforeLambdaBody: false
   BeforeWhile: false
   IndentBraces: false
   SplitEmptyFunction: true
   SplitEmptyRecord: true
   SplitEmptyNamespace: true
 BreakAfterAttributes: Never
 BreakAfterJavaFieldAnnotations: false
 BreakArrays: true
 BreakBeforeBinaryOperators: None
 BreakBeforeConceptDeclarations: Always
 BreakBeforeBraces: Attach
 BreakBeforeInlineASMColon: OnlyMultiline
 BreakBeforeTernaryOperators: true
 BreakConstructorInitializers: BeforeComma
 BreakInheritanceList: BeforeColon
 BreakStringLiterals: true
 ColumnLimit: 160
 CommentPragmas: '^ IWYU pragma:'
 CompactNamespaces: false
 ConstructorInitializerIndentWidth: 4
 ContinuationIndentWidth: 8
 Cpp11BracedListStyle: true
 DerivePointerAlignment: false
 DisableFormat: false
 EmptyLineAfterAccessModifier: Never
 EmptyLineBeforeAccessModifier: LogicalBlock
 ExperimentalAutoDetectBinPacking: false
 FixNamespaceComments: true
 ForEachMacros:
   - foreach
   - Q_FOREACH
   - BOOST_FOREACH
 IfMacros:
   - KJ_IF_MAYBE
 IndentAccessModifiers: false
 IndentCaseBlocks: false
 IndentCaseLabels: false
 IndentExternBlock: AfterExternBlock
 IndentGotoLabels: true
 IndentPPDirectives: None
 IndentRequiresClause: true
 IndentWidth: 4
 IndentWrappedFunctionNames: false
 InsertBraces: false
 InsertNewlineAtEOF: true
 InsertTrailingCommas: None
 IntegerLiteralSeparator:
   Binary: 0
   BinaryMinDigits: 0
   Decimal: 0
   DecimalMinDigits: 0
   Hex: 0
   HexMinDigits: 0
 JavaScriptQuotes: Leave
 JavaScriptWrapImports: true
 KeepEmptyLinesAtTheStartOfBlocks: true
 KeepEmptyLinesAtEOF: false
 LambdaBodyIndentation: Signature
 LineEnding: DeriveLF
 MacroBlockBegin: ''
 MacroBlockEnd: ''
 MaxEmptyLinesToKeep: 2
 NamespaceIndentation: None
 PackConstructorInitializers: BinPack
 PenaltyBreakAssignment: 2
 PenaltyBreakBeforeFirstCallParameter: 19
 PenaltyBreakComment: 300
 PenaltyBreakFirstLessLess: 120
 PenaltyBreakOpenParenthesis: 0
 PenaltyBreakString: 1000
 PenaltyBreakTemplateDeclaration: 10
 PenaltyExcessCharacter: 1000000
 PenaltyIndentedWhitespace: 0
 PenaltyReturnTypeOnItsOwnLine: 60
 PointerAlignment: Left
 PPIndentWidth: -1
 QualifierAlignment: Leave
 ReferenceAlignment: Pointer
 ReflowComments: true
 RemoveBracesLLVM: false
 RemoveParentheses: Leave
 RemoveSemicolon: false
 RequiresClausePosition: OwnLine
 RequiresExpressionIndentation: OuterScope
 SeparateDefinitionBlocks: Leave
 ShortNamespaceLines: 1
 SortIncludes: Never
 SortJavaStaticImport: Before
 SortUsingDeclarations: Never
 SpaceAfterCStyleCast: false
 SpaceAfterLogicalNot: false
 SpaceAfterTemplateKeyword: true
 SpaceAroundPointerQualifiers: Default
 SpaceBeforeAssignmentOperators: true
 SpaceBeforeCaseColon: false
 SpaceBeforeCpp11BracedList: false
 SpaceBeforeCtorInitializerColon: true
 SpaceBeforeInheritanceColon: true
 SpaceBeforeJsonColon: false
 SpaceBeforeParens: ControlStatements
 SpaceBeforeParensOptions:
   AfterControlStatements: true
   AfterForeachMacros: true
   AfterFunctionDefinitionName: false
   AfterFunctionDeclarationName: false
   AfterIfMacros: true
   AfterOverloadedOperator: false
   AfterRequiresInClause: false
   AfterRequiresInExpression: false
   BeforeNonEmptyParentheses: false
 SpaceBeforeRangeBasedForLoopColon: true
 SpaceBeforeSquareBrackets: false
 SpaceInEmptyBlock: false
 SpacesBeforeTrailingComments: 1
 SpacesInAngles: Never
 SpacesInContainerLiterals: true
 SpacesInLineCommentPrefix:
   Minimum: 1
   Maximum: -1
 SpacesInParens: Never
 SpacesInParensOptions:
   InCStyleCasts: false
   InConditionalStatements: false
   InEmptyParentheses: false
   Other: false
 SpacesInSquareBrackets: false
 Standard: Latest
 TabWidth: 8
 UseTab: Never
 VerilogBreakBetweenInstancePorts: true
 WhitespaceSensitiveMacros:
   - BOOST_PP_STRINGIZE
   - CF_SWIFT_NAME
   - NS_SWIFT_NAME
   - PP_STRINGIZE
   - STRINGIZE
 ...

1

.gitattributes vendored

View File

@@ -1,3 +1,4 @@
 *.cc diff=cpp
 *.hh diff=cpp
 *.svg binary
 docs/_static/api/js/* binary

31

.github/CODEOWNERS vendored

View File

@@ -1,5 +1,5 @@
 # AUTH
 auth/* @elcallio @vladzcloudius
 auth/* @nuivall @ptrsmrn @KrzaQ
 # CACHE
 row_cache* @tgrabiec
@@ -7,9 +7,9 @@ row_cache* @tgrabiec
 test/boost/mvcc* @tgrabiec
 # CDC
 cdc/* @kbr- @elcallio @piodul @jul-stas
 test/cql/cdc_* @kbr- @elcallio @piodul @jul-stas
 test/boost/cdc_* @kbr- @elcallio @piodul @jul-stas
 cdc/* @kbr-scylla @elcallio @piodul
 test/cql/cdc_* @kbr-scylla @elcallio @piodul
 test/boost/cdc_* @kbr-scylla @elcallio @piodul
 # COMMITLOG / BATCHLOG
 db/commitlog/* @elcallio @eliransin
@@ -25,18 +25,18 @@ compaction/* @raphaelsc
 transport/*
 # CQL QUERY LANGUAGE
 cql3/* @tgrabiec
 cql3/* @tgrabiec @nuivall @ptrsmrn @KrzaQ
 # COUNTERS
 counters* @jul-stas
 tests/counter_test* @jul-stas
 counters* @nuivall @ptrsmrn @KrzaQ
 tests/counter_test* @nuivall @ptrsmrn @KrzaQ
 # DOCS
 docs/* @annastuchlik @tzach
 docs/alternator @annastuchlik @tzach @nyh @havaker @nuivall
 docs/alternator @annastuchlik @tzach @nyh @nuivall @ptrsmrn @KrzaQ
 # GOSSIP
 gms/* @tgrabiec @asias
 gms/* @tgrabiec @asias @kbr-scylla
 # DOCKER
 dist/docker/*
@@ -74,8 +74,8 @@ streaming/* @tgrabiec @asias
 service/storage_service.* @tgrabiec @asias
 # ALTERNATOR
 alternator/* @havaker @nuivall
 test/alternator/* @havaker @nuivall
 alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
 test/alternator/* @nyh @nuivall @ptrsmrn @KrzaQ
 # HINTED HANDOFF
 db/hints/* @piodul @vladzcloudius @eliransin
@@ -91,11 +91,14 @@ test/boost/mutation_reader_test.cc @denesb
 test/boost/querier_cache_test.cc @denesb
 # PYTEST-BASED CQL TESTS
 test/cql-pytest/* @nyh
 test/cqlpy/* @nyh
 # RAFT
 raft/* @kbr- @gleb-cloudius @kostja
 test/raft/* @kbr- @gleb-cloudius @kostja
 raft/* @kbr-scylla @gleb-cloudius @kostja
 test/raft/* @kbr-scylla @gleb-cloudius @kostja
 # HEAT-WEIGHTED LOAD BALANCING
 db/heat_load_balance.* @nyh @gleb-cloudius
 # Tools
 tools/* @denesb

									
										15

.github/ISSUE_TEMPLATE.md
									
										vendored
									
												View File
											
				@@ -1,15 +0,0 @@

				This is Scylla's bug tracker, to be used for reporting bugs only.

				If you have a question about Scylla, and not a bug, please ask it in

				our mailing-list at scylladb-dev@googlegroups.com or in our slack channel.

				- [] I have read the disclaimer above, and I am reporting a suspected malfunction in Scylla.

				*Installation details*

				Scylla version (or git commit hash):

				Cluster size:

				OS (RHEL/CentOS/Ubuntu/AWS AMI):

				*Hardware details (for performance issues)*          Delete if unneeded

				Platform (physical/VM/cloud instance type/docker):

				Hardware: sockets= cores= hyperthreading= memory=

				Disks: (SSD/HDD, count)

									
										86

.github/ISSUE_TEMPLATE/bug_report.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,86 @@

				name: "Report a bug"

				description: "File a bug report."

				title: "[Bug]: "

				type: "bug"

				labels: bug

				body:

				  - type: checkboxes

				    id: terms

				    attributes:

				      label: Code of Conduct

				      description: "This is Scylla's bug tracker, to be used for reporting bugs only.

				If you have a question about Scylla, and not a bug, please ask it in

				our forum at https://forum.scylladb.com/ or in our slack channel https://slack.scylladb.com/ "

				      options:

				        - label: I have read the disclaimer above and am reporting a suspected malfunction in Scylla.

				          required: true

				  - type: input

				    id: product-version

				    attributes:

				      label: product version

				      description: Scylla version (or git commit hash)

				      placeholder: ex. scylla-6.1.1

				    validations:

				      required: true

				  - type: input

				    id: cluster-size

				    attributes:

				      label: Cluster Size

				    validations:

				      required: true  

				  - type: input

				    id: os

				    attributes:

				      label: OS

				      placeholder: RHEL/CentOS/Ubuntu/AWS AMI

				    validations:

				      required: true

				  - type: textarea

				    id: additional-data

				    attributes:

				      label: Additional Environmental Data

				      #description: 

				      placeholder: Add additional data

				      value: "Platform (physical/VM/cloud instance type/docker):\n

				Hardware: sockets=   cores=   hyperthreading=   memory=\n

				Disks: (SSD/HDD, count)"

				    validations:

				      required: false

				  - type: textarea

				    id: reproducer-steps

				    attributes:

				      label: Reproduction Steps

				      placeholder: Describe how to reproduce the problem

				      value: "The steps to reproduce the problem are:"

				    validations:

				      required: true

				  - type: textarea

				    id: the-problem

				    attributes:

				      label: What is the problem?

				      placeholder: Describe the problem you found

				      value: "The problem is that"

				    validations:

				      required: true

				  - type: textarea

				    id: what-happened

				    attributes:

				      label: Expected behavior?

				      placeholder: Describe what should have happened

				      value: "I expected that "

				    validations:

				      required: true

				  - type: textarea

				    id: logs

				    attributes:

				      label: Relevant log output

				      description: Please copy and paste any relevant log output. This will be automatically formatted into code, so no need for backticks.

				      render: shell

									
										84

.github/actions/setup-build/action.yaml
									
										vendored
									
												View File
											
				@@ -1,84 +0,0 @@

				name: setup-build-env

				description: Setup Building Environment

				inputs:

				  install_clang_tool:

				    description: 'install clang-tool'

				    required: false

				    default: false

				    type: boolean

				  install_clang_tidy:

				    description: 'install clang-tidy'

				    required: false

				    default: false

				    type: boolean

				# use the stable branch

				# should be the same as the one used by the compositing workflow

				env:

				  CLANG_VERSION: 18

				runs:

				  using: 'composite'

				  steps:

				    - name: Add scylla-ppa repo

				      shell: bash

				      run: |

				        sudo add-apt-repository ppa:scylladb/ppa

				    - name: Add clang apt repo

				      if: ${{ inputs.install_clang_tool || inputs.install_clang_tidy }}

				      shell: bash

				      run: |

				        sudo apt-get install -y curl

				        curl -fsSL https://apt.llvm.org/llvm-snapshot.gpg.key | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc >/dev/null

				        repo_component=llvm-toolchain-jammy

				        # use the development branch if $CLANG_VERSION is empty

				        if [ -n "$CLANG_VERSION" ]; then

				            repo_component+=-$CLANG_VERSION

				        fi

				        echo "deb http://apt.llvm.org/jammy/ $repo_component main" | sudo tee -a /etc/apt/sources.list.d/llvm.list

				        sudo apt-get update

				    - name: Install clang-tools

				      if: ${{ inputs.install_clang_tools }}

				      shell: bash

				      run: |

				        sudo apt-get install -y clang-tools-$CLANG_VERSION

				    - name: Install clang-tidy

				      if: ${{ inputs.install_clang_tidy }}

				      shell: bash

				      run: |

				        sudo apt-get install -y clang-tidy-$CLANG_VERSION

				    - name: Install GCC-12

				      # ubuntu:jammy comes with GCC-11. and libstdc++-11 fails to compile

				      # scylla which defines value type of std::unordered_map in .cc

				      shell: bash

				      run: |

				        sudo add-apt-repository -y ppa:ubuntu-toolchain-r/ppa

				        sudo apt-get install -y libstdc++-12-dev

				    - name: Install more build dependencies

				      shell: bash

				      run: |

				        # - do not install java dependencies, which is not only not necessary,

				        #   and they include "python", which is not EOL and not available.

				        # - replace "scylla-libthrift010" with "libthrift-dev". because

				        #   scylla-libthrift010 : Depends: libssl1.0.0 (>= 1.0.1) but it is not installable

				        # - we don't perform tests, so minio is not necessary.

				        sed -i.orig \

				          -e '/tools\/.*\/install-dependencies.sh/d' \

				          -e 's/scylla-libthrift010-dev/libthrift-dev/' \

				          -e 's/(minio_download_jobs)/(true)/' \

				          ./install-dependencies.sh

				        sudo ./install-dependencies.sh

				        mv ./install-dependencies.sh{.orig,}

				        # for ld.lld

				        sudo apt-get install -y lld-18

				    - name: Install {fmt} using cooking.sh

				      shell: bash

				      run: |

				        sudo apt-get remove -y libfmt-dev

				        seastar/cooking.sh -d build-fmt -p cooking -i fmt

									
										20

.github/clang-include-cleaner.json
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,20 @@

				{

				    "problemMatcher": [

				        {

				            "owner": "clang-include-cleaner",

				            "severity": "error",

				            "pattern": [

				                {

				                    "regexp": "^([^\\-\\+].*)$",

				                    "file": 1

				                },

				                {

				                    "regexp": "^(-\\s+[^\\s]+)\\s+@Line:(\\d+)$",

				                    "line": 2,

				                    "message": 1,

				                    "loop": true

				                }

				            ]

				        }

				    ]

				}

									
										2

.github/clang-tidy-matcher.json → .github/clang-matcher.json
									
										vendored
									
												View File
												
				@@ -1,7 +1,7 @@

				{

				    "problemMatcher": [

				        {

				            "owner": "clang-tidy",

				            "owner": "clang",

				            "pattern": [

				                {

				                    "regexp": "^([^:]+):(\\d+):(\\d+):\\s+(warning|error):\\s+(.*?)\\s+\\[(.*?)\\]$",

									
										9

.github/dependabot.yml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,9 @@

				version: 2

				updates:

				- package-ecosystem: "pip"

				  directory: "/docs"

				  schedule:

				    interval: "daily"

				  allow:

				  - dependency-name: "sphinx-scylladb-theme"

				  - dependency-name: "sphinx-multiversion-scylla"

									
										83

.github/mergify.yml
									
										vendored
									
												View File
												
				@@ -15,7 +15,7 @@ pull_request_rules:

				        - closed

				    actions:

				      delete_head_branch:

				  - name: Automate backport pull request 5.2

				  - name: Automate backport pull request 6.2

				    conditions:

				      - or:

				        - closed

				@@ -23,36 +23,11 @@ pull_request_rules:

				      - or:

				          - base=master

				          - base=next

				      - label=backport/5.2 # The PR must have this label to trigger the backport

				      - label=backport/6.2 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 5.2] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-5.2

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 5.4

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/5.4 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 5.4] {{ title }}"

				        title: "[Backport 6.2] {{ title }}"

				        body: |

				          {{ body }}

				@@ -62,6 +37,56 @@ pull_request_rules:

				          Refs #{{number}}

				        branches:

				          - branch-5.4

				          - branch-6.2

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 6.1

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/6.1 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 6.1] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-6.1

				        assignees:

				          - "{{ author }}"

				  - name: Automate backport pull request 6.0

				    conditions:

				      - or:

				        - closed

				        - merged

				      - or:

				          - base=master

				          - base=next

				      - label=backport/6.0 # The PR must have this label to trigger the backport

				      - label=promoted-to-master

				    actions:

				      copy:

				        title: "[Backport 6.0] {{ title }}"

				        body: |

				          {{ body }}

				          {% for c in commits %}

				          (cherry picked from commit {{ c.sha }})

				          {% endfor %}

				           Refs #{{number}}

				        branches:

				          - branch-6.0

				        assignees:

				          - "{{ author }}"

									
										181

.github/scripts/auto-backport.py
									
										vendored
									
										Executable file
									
												View File
												
				@@ -0,0 +1,181 @@

				#!/usr/bin/env python3

				import argparse

				import os

				import re

				import sys

				import tempfile

				import logging

				from github import Github, GithubException

				from git import Repo, GitCommandError

				logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				except KeyError:

				    print("Please set the 'GITHUB_TOKEN' environment variable")

				    sys.exit(1)

				def is_pull_request():

				    return '--pull-request' in sys.argv[1:]

				def parse_args():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repo', type=str, required=True, help='Github repository name')

				    parser.add_argument('--base-branch', type=str, default='refs/heads/master', help='Base branch')

				    parser.add_argument('--commits', default=None, type=str, help='Range of promoted commits.')

				    parser.add_argument('--pull-request', type=int, help='Pull request number to be backported')

				    parser.add_argument('--head-commit', type=str, required=is_pull_request(), help='The HEAD of target branch after the pull request specified by --pull-request is merged')

				    return parser.parse_args()

				def create_pull_request(repo, new_branch_name, base_branch_name, pr, backport_pr_title, commits, is_draft=False):

				    pr_body = f'{pr.body}\n\n'

				    for commit in commits:

				        pr_body += f'- (cherry picked from commit {commit})\n\n'

				    pr_body += f'Parent PR: #{pr.number}'

				    try:

				        backport_pr = repo.create_pull(

				            title=backport_pr_title,

				            body=pr_body,

				            head=f'scylladbbot:{new_branch_name}',

				            base=base_branch_name,

				            draft=is_draft

				        )

				        logging.info(f"Pull request created: {backport_pr.html_url}")

				        backport_pr.add_to_assignees(pr.user)

				        logging.info(f"Assigned PR to original author: {pr.user}")

				        return backport_pr

				    except GithubException as e:

				        if 'A pull request already exists' in str(e):

				            logging.warning(f'A pull request already exists for {pr.user}:{new_branch_name}')

				        else:

				            logging.error(f'Failed to create PR: {e}')

				def get_pr_commits(repo, pr, stable_branch, start_commit=None):

				    commits = []

				    if pr.merged:

				        merge_commit = repo.get_commit(pr.merge_commit_sha)

				        if len(merge_commit.parents) > 1:  # Check if this merge commit includes multiple commits

				            commits.append(pr.merge_commit_sha)

				        else:

				            if start_commit:

				                promoted_commits = repo.compare(start_commit, stable_branch).commits

				            else:

				                promoted_commits = repo.get_commits(sha=stable_branch)

				            for commit in pr.get_commits():

				                for promoted_commit in promoted_commits:

				                    commit_title = commit.commit.message.splitlines()[0]

				                    # In Scylla-pkg and scylla-dtest, for example,

				                    # we don't create a merge commit for a PR with multiple commits,

				                    # according to the GitHub API, the last commit will be the merge commit,

				                    # which is not what we need when backporting (we need all the commits).

				                    # So here, we are validating the correct SHA for each commit so we can cherry-pick

				                    if promoted_commit.commit.message.startswith(commit_title):

				                        commits.append(promoted_commit.sha)

				    elif pr.state == 'closed':

				        events = pr.get_issue_events()

				        for event in events:

				            if event.event == 'closed':

				                commits.append(event.commit_id)

				    return commits

				def create_pr_comment_and_remove_label(pr, comment_body):

				    labels = pr.get_labels()

				    pattern = re.compile(r"backport/\d+\.\d+$")

				    for label in labels:

				        if pattern.match(label.name):

				            print(f"Removing label: {label.name}")

				            comment_body += f'- {label.name}\n'

				            pr.remove_from_labels(label)

				    pr.create_issue_comment(comment_body)

				def backport(repo, pr, version, commits, backport_base_branch):

				    new_branch_name = f'backport/{pr.number}/to-{version}'

				    backport_pr_title = f'[Backport {version}] {pr.title}'

				    repo_url = f'https://scylladbbot:{github_token}@github.com/{repo.full_name}.git'

				    fork_repo = f'https://scylladbbot:{github_token}@github.com/scylladbbot/{repo.name}.git'

				    with (tempfile.TemporaryDirectory() as local_repo_path):

				        try:

				            repo_local = Repo.clone_from(repo_url, local_repo_path, branch=backport_base_branch)

				            repo_local.git.checkout(b=new_branch_name)

				            is_draft = False

				            for commit in commits:

				                try:

				                    repo_local.git.cherry_pick(commit, '-m1', '-x')

				                except GitCommandError as e:

				                    logging.warning(f'Cherry-pick conflict on commit {commit}: {e}')

				                    is_draft = True

				                    repo_local.git.add(A=True)

				                    repo_local.git.cherry_pick('--continue')

				            if not repo.private and not repo.has_in_collaborators(pr.user.login):

				                repo.add_to_collaborators(pr.user.login, permission="push")

				                comment = f':warning:  @{pr.user.login} you have been added as collaborator to scylladbbot fork '

				                comment += f'Please check your inbox and approve the invitation, once it is done, please add the backport labels again'

				                create_pr_comment_and_remove_label(pr, comment)

				                return

				            repo_local.git.push(fork_repo, new_branch_name, force=True)

				            create_pull_request(repo, new_branch_name, backport_base_branch, pr, backport_pr_title, commits,

				                                is_draft=is_draft)

				        except GitCommandError as e:

				            logging.warning(f"GitCommandError: {e}")

				def main():

				    args = parse_args()

				    base_branch = args.base_branch.split('/')[2]

				    promoted_label = 'promoted-to-master'

				    repo_name = args.repo

				    if 'scylla-enterprise' in args.repo:

				        promoted_label = 'promoted-to-enterprise'

				    stable_branch = base_branch

				    backport_branch = 'branch-'

				    backport_label_pattern = re.compile(r'backport/\d+\.\d+$')

				    g = Github(github_token)

				    repo = g.get_repo(repo_name)

				    closed_prs = []

				    start_commit = None

				    if args.commits:

				        start_commit, end_commit = args.commits.split('..')

				        commits = repo.compare(start_commit, end_commit).commits

				        for commit in commits:

				            match = re.search(rf"Closes .*#([0-9]+)", commit.commit.message, re.IGNORECASE)

				            if match:

				                pr_number = int(match.group(1))

				                pr = repo.get_pull(pr_number)

				                closed_prs.append(pr)

				    if args.pull_request:

				        start_commit = args.head_commit

				        pr = repo.get_pull(args.pull_request)

				        closed_prs = [pr]

				    for pr in closed_prs:

				        labels = [label.name for label in pr.labels]

				        backport_labels = [label for label in labels if backport_label_pattern.match(label)]

				        if promoted_label not in labels:

				            print(f'no {promoted_label} label: {pr.number}')

				            continue

				        if not backport_labels:

				            print(f'no backport label: {pr.number}')

				            continue

				        commits = get_pr_commits(repo, pr, stable_branch, start_commit)

				        logging.info(f"Found PR #{pr.number} with commit {commits} and the following labels: {backport_labels}")

				        for backport_label in backport_labels:

				            version = backport_label.replace('backport/', '')

				            backport_base_branch = backport_label.replace('backport/', backport_branch)

				            backport(repo, pr, version, commits, backport_base_branch)

				if __name__ == "__main__":

				    main()

									
										85

.github/scripts/label_promoted_commits.py
									
										vendored
									
												View File
												
				@@ -1,9 +1,9 @@

				import requests

				from github import Github

				import argparse

				import re

				import sys

				import os

				from github import Github

				from github.GithubException import UnknownObjectException

				try:

				    github_token = os.environ["GITHUB_TOKEN"]

				@@ -16,43 +16,72 @@ def parser():

				    parser = argparse.ArgumentParser()

				    parser.add_argument('--repository', type=str, required=True,

				                        help='Github repository name (e.g., scylladb/scylladb)')

				    parser.add_argument('--commit_before_merge', type=str, required=True, help='Git commit ID to start labeling from ('

				                                                                               'newest commit).')

				    parser.add_argument('--commit_after_merge', type=str, required=True,

				                        help='Git commit ID to end labeling at (oldest '

				                             'commit, exclusive).')

				    parser.add_argument('--update_issue', type=bool, default=False, help='Set True to update issues when backport was '

				                                                                         'done')

				    parser.add_argument('--label', type=str, required=True, help='Label to use')

				    parser.add_argument('--commits', type=str, required=True, help='Range of promoted commits.')

				    parser.add_argument('--label', type=str, default='promoted-to-master', help='Label to use')

				    parser.add_argument('--ref', type=str, required=True, help='PR target branch')

				    return parser.parse_args()

				def add_comment_and_close_pr(pr, comment):

				    if pr.state == 'open':

				        pr.create_issue_comment(comment)

				        pr.edit(state="closed")

				def mark_backport_done(repo, ref_pr_number, branch):

				    pr = repo.get_pull(int(ref_pr_number))

				    label_to_remove = f'backport/{branch}'

				    label_to_add = f'{label_to_remove}-done'

				    current_labels = [label.name for label in pr.get_labels()]

				    if label_to_remove in current_labels:

				        pr.remove_from_labels(label_to_remove)

				    if label_to_add not in current_labels:

				        pr.add_to_labels(label_to_add)

				def main():

				    # This script is triggered by a push event to either the master branch or a branch named branch-x.y (where x and y represent version numbers). Based on the pushed branch, the script performs the following actions:

				    # - When ref branch is `master`, it will add the `promoted-to-master` label, which we need later for the auto backport process

				    # - When ref branch is `branch-x.y` (which means we backported a patch), it will replace in the original PR the `backport/x.y` label with `backport/x.y-done` and will close the backport PR (Since GitHub close only the one referring to default branch)

				    args = parser()

				    pr_pattern = re.compile(r'Closes .*#([0-9]+)')

				    target_branch = re.search(r'branch-(\d+\.\d+)', args.ref)

				    g = Github(github_token)

				    repo = g.get_repo(args.repository, lazy=False)

				    commits = repo.compare(head=args.commit_after_merge, base=args.commit_before_merge)

				    start_commit, end_commit = args.commits.split('..')

				    commits = repo.compare(start_commit, end_commit).commits

				    processed_prs = set()

				    # Print commit information

				    for commit in commits.commits:

				        print(commit.sha)

				        match = pr_pattern.search(commit.commit.message)

				    for commit in commits:

				        print(f'Commit sha is: {commit.sha}')

				        pr_last_line = commit.commit.message.splitlines()[-1]

				        match = pr_pattern.search(pr_last_line)

				        if match:

				            pr_number = match.group(1)

				            url = f'https://api.github.com/repos/{args.repository}/issues/{pr_number}/labels'

				            data = {

				                "labels": [f'{args.label}']

				            }

				            headers = {

				                "Authorization": f"token {github_token}",

				                "Accept": "application/vnd.github.v3+json"

				            }

				            response = requests.post(url, headers=headers, json=data)

				            if response.ok:

				                print(f"Label added successfully to {url}")

				            pr_number = int(match.group(1))

				            if pr_number in processed_prs:

				                continue

				            if target_branch:

				                pr = repo.get_pull(pr_number)

				                branch_name = target_branch[1]

				                refs_pr = re.findall(r'Parent PR: (?:#|https.*?)(\d+)', pr.body)

				                if refs_pr:

				                    print(f'branch-{target_branch.group(1)}, pr number is: {pr_number}')

				                    # 1. change the backport label of the parent PR to note that

				                    #    we've merged the corresponding backport PR

				                    # 2. close the backport PR and leave a comment on it to note

				                    #    that it has been merged with a certain git commit.

				                    ref_pr_number = refs_pr[0]

				                    mark_backport_done(repo, ref_pr_number, branch_name)

				                    comment = f'Closed via {commit.sha}'

				                    add_comment_and_close_pr(pr, comment)

				            else:

				                print(f"No label was added to {url}")

				                try:

				                    pr = repo.get_pull(pr_number)

				                    pr.add_to_labels('promoted-to-master')

				                    print(f'master branch, pr number is: {pr_number}')

				                except UnknownObjectException:

				                    print(f'{pr_number} is not a PR but an issue, no need to add label')

				            processed_prs.add(pr_number)

				if __name__ == "__main__":

									
										55

.github/workflows/add-label-when-promoted.yaml
									
										vendored
									
												View File
												
				@@ -4,6 +4,11 @@ on:

				  push:

				    branches:

				      - master

				      - branch-*.*

				      - enterprise

				    pull_request_target:

				      types: [labeled]

				      branches: [master, next, enterprise]

				jobs:

				  check-commit:

				@@ -12,15 +17,55 @@ jobs:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Dump GitHub context

				        env:

				          GITHUB_CONTEXT: ${{ toJson(github) }}

				        run: echo "$GITHUB_CONTEXT"

				      - name: Set Default Branch

				        id: set_branch

				        run: |

				          if [[ "${{ github.repository }}" == *enterprise* ]]; then

				            echo "DEFAULT_BRANCH=enterprise" >> $GITHUB_ENV

				          else

				            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV

				          fi

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				          repository: ${{ github.repository }}

				          ref: ${{ env.DEFAULT_BRANCH }}

				          token: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				          fetch-depth: 0  # Fetch all history for all tags and branches

				      - name: Set up Git identity

				        run: |

				          git config --global user.name "GitHub Action"

				          git config --global user.email "action@github.com"

				          git config --global merge.conflictstyle diff3

				      - name: Install dependencies

				        run: sudo apt-get install -y python3-github

				        run: sudo apt-get install -y python3-github python3-git

				      - name: Run python script

				        if: github.event_name == 'push'

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py --commit_before_merge ${{ github.event.before }} --commit_after_merge ${{ github.event.after }} --repository ${{ github.repository }} --label promoted-to-master

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/label_promoted_commits.py  --commits ${{ github.event.before }}..${{ github.sha }} --repository ${{ github.repository }} --ref ${{ github.ref }}

				      - name: Run auto-backport.py when promotion completed

				        if: github.event_name == 'push' && github.ref == 'refs/heads/${{ env.DEFAULT_BRANCH }}'

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --commits ${{ github.event.before }}..${{ github.sha }}

				      - name: Check if label starts with 'backport/' and contains digits

				        id: check_label

				        run: |

				          label_name="${{ github.event.label.name }}"

				          if [[ "$label_name" =~ ^backport/[0-9]+\.[0-9]+$ ]]; then

				            echo "Label matches backport/X.X pattern."

				            echo "backport_label=true" >> $GITHUB_OUTPUT

				          else

				            echo "Label does not match the required pattern."

				            echo "backport_label=false" >> $GITHUB_OUTPUT

				          fi

				      - name: Run auto-backport.py when label was added

				        if: github.event_name == 'pull_request_target' && steps.check_label.outputs.backport_label == 'true' && (github.event.pull_request.state == 'closed' && github.event.pull_request.merged == true)

				        env:

				          GITHUB_TOKEN: ${{ secrets.AUTO_BACKPORT_TOKEN }}

				        run: python .github/scripts/auto-backport.py --repo ${{ github.repository }} --base-branch ${{ github.ref }} --pull-request ${{ github.event.pull_request.number }} --head-commit ${{ github.event.pull_request.base.sha }}

									
										9

.github/workflows/backport-pr-fixes-validation.yaml
									
										vendored
									
												View File
												
				@@ -22,5 +22,12 @@ jobs:

				            const regex = new RegExp(pattern);

				            if (!regex.test(body)) {

				              core.setFailed("PR body does not contain a valid 'Fixes' reference.");

				              const error = "PR body does not contain a valid 'Fixes' reference.";

				              core.setFailed(error);

				              await github.rest.issues.createComment({

				                issue_number: context.issue.number,

				                owner: context.repo.owner,

				                repo: context.repo.repo,

				                body: `:warning: ${error}`

				              });

				            }

									
										39

.github/workflows/build-scylla.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,39 @@

				name: Build Scylla

				on:

				  workflow_call:

				    inputs:

				      build_mode:

				        description: 'the build mode'

				        type: string

				        required: true

				    outputs:

				      md5sum:

				        description: 'the md5sum for scylla executable'

				        value: ${{ jobs.build.outputs.md5sum }}

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  build:

				    if: github.repository == 'scylladb/scylladb'

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    outputs:

				      md5sum: ${{ steps.checksum.outputs.md5sum }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: recursive

				      - name: Generate the building system

				        run: |

				          git config --global --add safe.directory $GITHUB_WORKSPACE

				          ./configure.py --mode ${{ inputs.build_mode }} --with scylla

				      - run: |

				          ninja build/${{ inputs.build_mode }}/scylla

				      - id: checksum

				        run: |

				          checksum=$(md5sum build/${{ inputs.build_mode }}/scylla | cut -c -32)

				          echo "md5sum=$checksum" >> $GITHUB_OUTPUT

									
										66

.github/workflows/clang-nightly.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,66 @@

				name: clang-nightly

				on:

				  schedule:

				    # only at 5AM Saturday

				    - cron: '0 5 * * SAT'

				env:

				  # use the development branch explicitly

				  CLANG_VERSION: 20

				  BUILD_DIR: build

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  clang-dev:

				    name: Build with clang nightly

				    if: github.repository == 'scylladb/scylladb'

				    runs-on: ubuntu-latest

				    container: fedora:40

				    strategy:

				      matrix:

				        build_type:

				          - Debug

				          - RelWithDebInfo

				          - Dev

				    steps:

				      - run: |

				          sudo dnf -y install git

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - name: Install build dependencies

				        run: |

				          # use the copr repo for llvm snapshot builds, see

				          # https://copr.fedorainfracloud.org/coprs/g/fedora-llvm-team/llvm-snapshots/

				          sudo dnf -y install 'dnf-command(copr)'

				          sudo dnf copr enable -y @fedora-llvm-team/llvm-snapshots

				          # do not install java dependencies, which is not only not used here

				          sed -i.orig \

				            -e '/tools\/.*\/install-dependencies.sh/d' \

				            -e 's/(minio_download_jobs)/(true)/' \

				            ./install-dependencies.sh

				          sudo ./install-dependencies.sh

				          sudo dnf -y install lld

				      - name: Generate the building system

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \

				            -DCMAKE_C_COMPILER=clang-$CLANG_VERSION     \

				            -DCMAKE_CXX_COMPILER=clang++-$CLANG_VERSION \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      # see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md

				      - run: |

				          echo "::add-matcher::.github/clang-matcher.json"

				      - run: |

				          cmake --build $BUILD_DIR --target scylla

				      - run: |

				          echo "::remove-matcher owner=clang::"

									
										37

.github/workflows/clang-tidy.yaml
									
										vendored
									
												View File
												
				@@ -10,13 +10,11 @@ on:

				      - 'docs/**'

				      - '.github/**'

				  workflow_dispatch:

				  schedule:

				    # only at 5AM Saturday

				    - cron: '0 5 * * SAT'

				  issue_comment:

				    types:

				      - created

				env:

				  # use the stable branch

				  CLANG_VERSION: 18

				  BUILD_TYPE: RelWithDebInfo

				  BUILD_DIR: build

				  CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'

				@@ -29,35 +27,42 @@ concurrency:

				  cancel-in-progress: true

				jobs:

				  read-toolchain:

				    if: github.event_name == 'pull_request' || (github.event.issue.pull_request && startsWith(github.event.comment.body, '/clang-tidy'))

				    uses: ./.github/workflows/read-toolchain.yaml

				  clang-tidy:

				    name: Run clang-tidy

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    steps:

				      - env:

				          IMAGE: ${{ needs.read-toolchain.image }}

				        run: |

				          echo ${{ needs.read-toolchain.image }}

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - uses: ./.github/actions/setup-build

				        with:

				          install_clang_tidy: true

				      - run: |

				          sudo dnf -y install clang-tools-extra

				      - name: Generate the building system

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=$BUILD_TYPE              \

				            -DCMAKE_C_COMPILER=clang-$CLANG_VERSION     \

				            -DScylla_USE_LINKER=ld.lld-$CLANG_VERSION   \

				            -DCMAKE_CXX_COMPILER=clang++-$CLANG_VERSION \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DScylla_USE_LINKER=ld.lld                  \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON          \

				            -DCMAKE_CXX_CLANG_TIDY="clang-tidy-$CLANG_VERSION;--checks=$CLANG_TIDY_CHECKS" \

				            -DCMAKE_CXX_FLAGS=-DFMT_HEADER_ONLY         \

				            -DCMAKE_PREFIX_PATH=$PWD/cooking            \

				            -DCMAKE_CXX_CLANG_TIDY="clang-tidy;--checks=$CLANG_TIDY_CHECKS" \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      # see https://github.com/actions/toolkit/blob/main/docs/problem-matchers.md

				      - run: |

				          echo "::add-matcher::.github/clang-tidy-matcher.json"

				          echo "::add-matcher::.github/clang-matcher.json"

				      - name: Build with clang-tidy enabled

				        run: |

				          cmake --build $BUILD_DIR --target scylla

				      - run: |

				          echo "::remove-matcher owner=clang-tidy::"

				          echo "::remove-matcher owner=clang::"

									
										2

.github/workflows/codespell.yaml
									
										vendored
									
												View File
												
				@@ -14,4 +14,4 @@ jobs:

				        with:

				          only_warn: 1

				          ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison"

				          skip: "./.git,./build,./tools,*.js,*.thrift,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

				          skip: "./.git,./build,./tools,*.js,*.lock,./test,./licenses,./redis/lolwut.cc,*.svg"

									
										45

.github/workflows/conflict_reminder.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,45 @@

				name: Notify PR Authors of Conflicts

				on:

				  schedule:

				    - cron: '0 10 * * 1,4'  # Runs every Monday and Thursday at 10:00am

				  workflow_dispatch:      # Manual trigger for testing

				jobs:

				  notify_conflict_prs:

				    runs-on: ubuntu-latest

				    steps:

				      - name: Notify PR Authors of Conflicts

				        uses: actions/github-script@v7

				        with:

				          script: |

				            const prs = await github.paginate(github.rest.pulls.list, {

				              owner: context.repo.owner,

				              repo: context.repo.repo,

				              state: 'open',

				              per_page: 100

				            });

				            const branchPrefix = 'branch-';

				            const threeDaysAgo = new Date();

				            const conflictLabel = 'conflicts';          

				            threeDaysAgo.setDate(threeDaysAgo.getDate() - 3);

				            for (const pr of prs) {

				              if (!pr.base.ref.startsWith(branchPrefix)) continue;

				              const hasConflictLabel = pr.labels.some(label => label.name === conflictLabel);

				              if (!hasConflictLabel) continue;

				              const updatedDate = new Date(pr.updated_at);

				              if (updatedDate >= threeDaysAgo) continue;

				              if (pr.assignee === null) continue;

				              const assignee = pr.assignee;

				              if (assignee) {

				                await github.rest.issues.createComment({

				                  owner: context.repo.owner,

				                  repo: context.repo.repo,

				                  issue_number: pr.number,

				                  body: `@${assignee}, this PR has been open with conflicts. Please resolve the conflicts so we can merge it.`,

				                });

				                console.log(`Notified @${assignee} for PR #${pr.number}`);

				              } 

				            }

				            console.log(`Total PRs checked: ${prs.length}`);

									
										3

.github/workflows/docs-pr.yaml
									
										vendored
									
												View File
												
				@@ -12,7 +12,8 @@ on:

				      - enterprise

				    paths:

				      - "docs/**"

				      - "db/config.hh"

				      - "db/config.cc"

				jobs:

				  build:

				    runs-on: ubuntu-latest

									
										82

.github/workflows/iwyu.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,82 @@

				name: iwyu

				on:

				  pull_request:

				    branches:

				      - master

				env:

				  BUILD_TYPE: RelWithDebInfo

				  BUILD_DIR: build

				  CLEANER_OUTPUT_PATH: build/clang-include-cleaner.log

				  # the "idl" subdirectory does not contain C++ source code. the .hh files in it are

				  # supposed to be processed by idl-compiler.py, so we don't check them using the cleaner

				  CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction db dht gms index lang

				permissions: {}

				# cancel the in-progress run upon a repush

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				jobs:

				  read-toolchain:

				    uses: ./.github/workflows/read-toolchain.yaml

				  clang-include-cleaner:

				    name: "Analyze #includes in source files"

				    needs:

				      - read-toolchain

				    runs-on: ubuntu-latest

				    container: ${{ needs.read-toolchain.outputs.image }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          sudo dnf -y install clang-tools-extra

				      - name: Generate compilation database

				        run: |

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=$BUILD_TYPE              \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -DCMAKE_EXPORT_COMPILE_COMMANDS=ON          \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      - name: Build headers

				        run: |

				          swagger_targets=''

				          for f in api/api-doc/*.json; do

				            if test "${f#*.}" = json; then

				              name=$(basename "$f" .json)

				              if test $name != swagger20_header; then

				                swagger_targets+=" scylla_swagger_gen_$name"

				              fi

				            fi

				          done

				          cmake                                         \

				            --build build                               \

				             --target seastar_http_request_parser       \

				             --target idl-sources                       \

				             --target $swagger_targets

				      - run: |

				          echo "::add-matcher::.github/clang-include-cleaner.json"

				      - name: clang-include-cleaner

				        run: |

				          for d in $CLEANER_DIRS; do

				            find $d -name '*.cc' -o -name '*.hh'          \

				              -exec echo {} \;                            \

				              -exec clang-include-cleaner                 \

				                --ignore-headers=seastarx.hh              \

				                --print=changes                           \

				                -p $BUILD_DIR                             \

				                {} \; | tee --append $CLEANER_OUTPUT_PATH

				          done

				      - run: |

				          echo "::remove-matcher owner=clang-include-cleaner::"

				      - uses: actions/upload-artifact@v4

				        with:

				          name: Logs (clang-include-cleaner)

				          path: "./${{ env.CLEANER_OUTPUT_PATH }}"

									
										23

.github/workflows/read-toolchain.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,23 @@

				name: Read Toolchain

				on:

				  workflow_call:

				    outputs:

				      image:

				        description: "the toolchain docker image"

				        value: ${{ jobs.read-toolchain.outputs.image }}

				jobs:

				  read-toolchain:

				    runs-on: ubuntu-latest

				    outputs:

				      image: ${{ steps.read.outputs.image }}

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          sparse-checkout: tools/toolchain/image

				          sparse-checkout-cone-mode: false

				      - id: read

				        run: |

				          image=$(cat tools/toolchain/image)

				          echo "image=$image" >> $GITHUB_OUTPUT

									
										35

.github/workflows/reproducible-build.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,35 @@

				name: Check Reproducible Build

				on:

				  schedule:

				    # 5AM every friday

				    - cron: '0 5 * * FRI'

				permissions: {}

				env:

				  BUILD_MODE: release

				jobs:

				  build-a:

				    uses: ./.github/workflows/build-scylla.yaml

				    with:

				      build_mode: release

				  build-b:

				    uses: ./.github/workflows/build-scylla.yaml

				    with:

				      build_mode: release

				  compare-checksum:

				    if: github.repository == 'scylladb/scylladb'

				    runs-on: ubuntu-latest

				    needs:

				      - build-a

				      - build-b

				    steps:

				      - env:

				          CHECKSUM_A: ${{needs.build-a.outputs.md5sum}}

				          CHECKSUM_B: ${{needs.build-b.outputs.md5sum}}

				        run: |

				          if [ $CHECKSUM_A != $CHECKSUM_B ]; then                             \

				            echo "::error::mismatched checksums: $CHECKSUM_A != $CHECKSUM_B"; \

				            exit 1;                                                           \

				          fi

									
										50

.github/workflows/seastar.yaml
									
										vendored
									
										Normal file
									
												View File
												
				@@ -0,0 +1,50 @@

				name: Build with the latest Seastar

				on:

				  schedule:

				    # 5AM everyday

				    - cron: '0 5 * * *'

				permissions: {}

				concurrency:

				  group: ${{ github.workflow }}-${{ github.ref }}

				  cancel-in-progress: true

				env:

				  BUILD_DIR: build

				jobs:

				  build-with-the-latest-seastar:

				    runs-on: ubuntu-latest

				    # be consistent with tools/toolchain/image

				    container: scylladb/scylla-toolchain:fedora-40-20240621

				    strategy:

				      matrix:

				        build_type:

				          - Debug

				          - RelWithDebInfo

				          - Dev

				    steps:

				      - uses: actions/checkout@v4

				        with:

				          submodules: true

				      - run: |

				          rm -rf seastar

				      - uses: actions/checkout@v4

				        with:

				          repository: scylladb/seastar

				          submodules: true

				          path: seastar

				      - name: Generate the building system

				        run: |

				          git config --global --add safe.directory $GITHUB_WORKSPACE

				          cmake                                         \

				            -DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \

				            -DCMAKE_C_COMPILER=clang                    \

				            -DCMAKE_CXX_COMPILER=clang++                \

				            -G Ninja                                    \

				            -B $BUILD_DIR                               \

				            -S .

				      - run: |

				          cmake --build $BUILD_DIR --target scylla

									
										6

.github/workflows/sync-labels.yaml
									
										vendored
									
												View File
												
				@@ -16,6 +16,10 @@ jobs:

				      pull-requests: write

				      issues: write

				    steps:

				      - name: Dump GitHub context

				        env:

				          GITHUB_CONTEXT: ${{ toJson(github) }}

				        run: echo "$GITHUB_CONTEXT"

				      - name: Checkout repository

				        uses: actions/checkout@v4

				        with:

				@@ -33,7 +37,7 @@ jobs:

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }}

				      - name: Pull request labeled or unlabeled event

				        if: github.event_name == 'pull_request' && startsWith(github.event.label.name, 'backport/')

				        if: github.event_name == 'pull_request_target' && startsWith(github.event.label.name, 'backport/')

				        env:

				          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

				        run: python .github/scripts/sync_labels.py --repo ${{ github.repository }} --number ${{ github.event.number }} --action ${{ github.event.action }} --label ${{ github.event.label.name }}

6

.gitignore vendored

View File

@@ -3,6 +3,8 @@
 .settings
 build
 build.ninja
 cmake-build-*
 build.ninja.new
 cscope.*
 /debian/
 dist/ami/files/*.rpm
@@ -12,13 +14,14 @@ dist/ami/scylla_deploy.sh
 Cql.tokens
 .kdev4
 *.kdev4
 .idea
 CMakeLists.txt.user
 .cache
 .tox
 *.egg-info
 __pycache__CMakeLists.txt.user
 .gdbinit
 resources
 /resources
 .pytest_cache
 /expressions.tokens
 tags
@@ -31,3 +34,4 @@ compile_commands.json
 .mypy_cache
 .envrc
 clang_build
 .idea/

3

.gitmodules vendored

View File

@@ -9,9 +9,6 @@
 [submodule "abseil"]
 	path = abseil
 	url = ../abseil-cpp
 [submodule "scylla-jmx"]
 	path = tools/jmx
 	url = ../scylla-jmx
 [submodule "scylla-tools"]
 	path = tools/java
 	url = ../scylla-tools-java

									
										111

CMakeLists.txt
									
												View File
												
				@@ -2,8 +2,6 @@ cmake_minimum_required(VERSION 3.27)

				project(scylla)

				include(CTest)

				list(APPEND CMAKE_MODULE_PATH

				  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

				  ${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)

				@@ -25,7 +23,8 @@ if(DEFINED CMAKE_BUILD_TYPE)

				endif(DEFINED CMAKE_BUILD_TYPE)

				include(mode.common)

				if(CMAKE_CONFIGURATION_TYPES)

				get_property(is_multi_config GLOBAL PROPERTY GENERATOR_IS_MULTI_CONFIG)

				if(is_multi_config)

				    foreach(config ${CMAKE_CONFIGURATION_TYPES})

				        include(mode.${config})

				        list(APPEND scylla_build_modes ${scylla_build_mode_${config}})

				@@ -44,29 +43,75 @@ endif()

				include(limit_jobs)

				# Configure Seastar compile options to align with Scylla

				set(CMAKE_CXX_STANDARD "20" CACHE INTERNAL "")

				set(CMAKE_CXX_STANDARD "23" CACHE INTERNAL "")

				set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")

				set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")

				set(CMAKE_CXX_VISIBILITY_PRESET hidden)

				set(Seastar_TESTING ON CACHE BOOL "" FORCE)

				set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)

				set(Seastar_APPS ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

				set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

				add_subdirectory(seastar)

				if(is_multi_config)

				    find_package(Seastar)

				    # this is atypical compared to standard ExternalProject usage:

				    # - Seastar's build system should already be configured at this point.

				    # - We maintain separate project variants for each configuration type.

				    #

				    # Benefits of this approach:

				    # - Allows the parent project to consume the compile options exposed by

				    #   .pc file. as the compile options vary from one config to another.

				    # - Allows application of config-specific settings

				    # - Enables building Seastar within the parent project's build system

				    # - Facilitates linking of artifacts with the external project target,

				    #   establishing proper dependencies between them

				    include(ExternalProject)

				    ExternalProject_Add(Seastar

				        SOURCE_DIR "${PROJECT_SOURCE_DIR}/seastar"

				        BINARY_DIR "${CMAKE_BINARY_DIR}/$<CONFIG>/seastar"

				        CONFIGURE_COMMAND ""

				        BUILD_COMMAND ${CMAKE_COMMAND} --build <BINARY_DIR>

				          --target seastar

				          --target seastar_testing

				          --target seastar_perf_testing

				          --target app_iotune

				        BUILD_ALWAYS ON

				        BUILD_BYPRODUCTS

				          <BINARY_DIR>/libseastar.$<IF:$<CONFIG:Debug,Dev>,so,a>

				          <BINARY_DIR>/libseastar_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>

				          <BINARY_DIR>/libseastar_perf_testing.$<IF:$<CONFIG:Debug,Dev>,so,a>

				          <BINARY_DIR>/apps/iotune/iotune

				          <BINARY_DIR>/gen/include/seastar/http/chunk_parsers.hh

				          <BINARY_DIR>/gen/include/seastar/http/request_parser.hh

				          <BINARY_DIR>/gen/include/seastar/http/response_parser.hh

				        INSTALL_COMMAND "")

				    add_dependencies(Seastar::seastar Seastar)

				    add_dependencies(Seastar::seastar_testing Seastar)

				else()

				    set(Seastar_TESTING ON CACHE BOOL "" FORCE)

				    set(Seastar_API_LEVEL 7 CACHE STRING "" FORCE)

				    set(Seastar_DEPRECATED_OSTREAM_FORMATTERS OFF CACHE BOOL "" FORCE)

				    set(Seastar_APPS ON CACHE BOOL "" FORCE)

				    set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

				    set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

				    set(Seastar_IO_URING OFF CACHE BOOL "" FORCE)

				    set(Seastar_SCHEDULING_GROUPS_COUNT 16 CACHE STRING "" FORCE)

				    set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

				    add_subdirectory(seastar)

				    target_compile_definitions (seastar

				      PRIVATE

				        SEASTAR_NO_EXCEPTION_HACK)

				endif()

				set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)

				find_package(Sanitizers QUIET)

				set(sanitizer_cxx_flags

				    $<$<IN_LIST:$<CONFIG>,Debug;Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)

				    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)

				if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				    set(ABSL_GCC_FLAGS ${sanitizer_cxx_flags})

				elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

				    set(ABSL_LLVM_FLAGS ${sanitizer_cxx_flags})

				endif()

				set(ABSL_DEFAULT_LINKOPTS

				    $<$<IN_LIST:$<CONFIG>,Debug;Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_LINK_LIBRARIES>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_LINK_LIBRARIES>>)

				    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_LINK_LIBRARIES>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_LINK_LIBRARIES>>)

				add_subdirectory(abseil)

				add_library(absl-headers INTERFACE)

				target_include_directories(absl-headers SYSTEM INTERFACE

				@@ -93,13 +138,14 @@ target_link_libraries(Boost::regex

				find_package(Lua REQUIRED)

				find_package(ZLIB REQUIRED)

				find_package(ICU COMPONENTS uc i18n REQUIRED)

				find_package(fmt 9.0.0 REQUIRED)

				find_package(fmt 10.0.0 REQUIRED)

				find_package(libdeflate REQUIRED)

				find_package(libxcrypt REQUIRED)

				find_package(Snappy REQUIRED)

				find_package(RapidJSON REQUIRED)

				find_package(Thrift REQUIRED)

				find_package(xxHash REQUIRED)

				find_package(yaml-cpp REQUIRED)

				find_package(zstd REQUIRED)

				set(scylla_gen_build_dir "${CMAKE_BINARY_DIR}/gen")

				file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				@@ -107,6 +153,14 @@ file(MAKE_DIRECTORY "${scylla_gen_build_dir}")

				include(add_version_library)

				generate_scylla_version()

				add_library(scylla-zstd STATIC

				    zstd.cc)

				target_link_libraries(scylla-zstd

				  PRIVATE

				    db

				    Seastar::seastar

				    zstd::libzstd)

				add_library(scylla-main STATIC)

				target_sources(scylla-main

				  PRIVATE

				@@ -128,6 +182,7 @@ target_sources(scylla-main

				    keys.cc

				    multishard_mutation_query.cc

				    mutation_query.cc

				    node_ops/task_manager_module.cc

				    partition_slice_builder.cc

				    querier.cc

				    query.cc

				@@ -141,14 +196,15 @@ target_sources(scylla-main

				    serializer.cc

				    sstables_loader.cc

				    table_helper.cc

				    tasks/task_handler.cc

				    tasks/task_manager.cc

				    timeout_config.cc

				    unimplemented.cc

				    validation.cc

				    vint-serialization.cc

				    zstd.cc)

				    vint-serialization.cc)

				target_link_libraries(scylla-main

				  PRIVATE

				    "$<LINK_LIBRARY:WHOLE_ARCHIVE,scylla-zstd>"

				    db

				    absl::headers

				    absl::btree

				@@ -184,6 +240,12 @@ include(check_headers)

				check_headers(check-headers scylla-main

				  GLOB ${CMAKE_CURRENT_SOURCE_DIR}/*.hh)

				option(Scylla_DIST

				  "Build dist targets"

				  ON)

				add_custom_target(compiler-training)

				add_subdirectory(api)

				add_subdirectory(alternator)

				add_subdirectory(db)

				@@ -196,7 +258,6 @@ add_subdirectory(dht)

				add_subdirectory(gms)

				add_subdirectory(idl)

				add_subdirectory(index)

				add_subdirectory(interface)

				add_subdirectory(lang)

				add_subdirectory(locator)

				add_subdirectory(message)

				@@ -214,7 +275,6 @@ add_subdirectory(service)

				add_subdirectory(sstables)

				add_subdirectory(streaming)

				add_subdirectory(test)

				add_subdirectory(thrift)

				add_subdirectory(tools)

				add_subdirectory(tracing)

				add_subdirectory(transport)

				@@ -255,7 +315,6 @@ target_link_libraries(scylla PRIVATE

				    sstables

				    streaming

				    test-perf

				    thrift

				    tools

				    tracing

				    transport

				@@ -263,12 +322,20 @@ target_link_libraries(scylla PRIVATE

				    utils)

				target_link_libraries(scylla PRIVATE

				    seastar

				    Seastar::seastar

				    absl::headers

				    yaml-cpp::yaml-cpp

				    Boost::program_options)

				target_include_directories(scylla PRIVATE

				    "${CMAKE_CURRENT_SOURCE_DIR}"

				    "${scylla_gen_build_dir}")

				add_subdirectory(dist)

				add_custom_target(maybe-scylla

				  DEPENDS $<$<CONFIG:Dev>:$<TARGET_FILE:scylla>>)

				add_dependencies(compiler-training

				  maybe-scylla)

				if(Scylla_DIST)

				  add_subdirectory(dist)

				endif()

									
										21

HACKING.md
									
												View File
												
				@@ -19,18 +19,18 @@ $ git submodule update --init --recursive

				### Dependencies

				Scylla is fairly fussy about its build environment, requiring a very recent

				version of the C++20 compiler and numerous tools and libraries to build.

				version of the C++23 compiler and numerous tools and libraries to build.

				Run `./install-dependencies.sh` (as root) to use your Linux distributions's

				package manager to install the appropriate packages on your build machine.

				However, this will only work on very recent distributions. For example,

				currently Fedora users must upgrade to Fedora 32 otherwise the C++ compiler

				will be too old, and not support the new C++20 standard that Scylla uses.

				will be too old, and not support the new C++23 standard that Scylla uses.

				Alternatively, to avoid having to upgrade your build machine or install

				various packages on it, we provide another option - the **frozen toolchain**.

				This is a script, `./tools/toolchain/dbuild`, that can execute build or run

				commands inside a Docker image that contains exactly the right build tools and

				commands inside a container that contains exactly the right build tools and

				libraries. The `dbuild` technique is useful for beginners, but is also the way

				in which ScyllaDB produces official releases, so it is highly recommended.

				@@ -43,6 +43,12 @@ $ ./tools/toolchain/dbuild ninja build/release/scylla

				$ ./tools/toolchain/dbuild ./build/release/scylla --developer-mode 1

				```

				Note: do not mix environemtns - either perform all your work with dbuild, or natively on the host.

				Note2: you can get to an interactive shell within dbuild by running it without any parameters:

				```bash

				$ ./tools/toolchain/dbuild

				```

				### Build system

				**Note**: Compiling Scylla requires, conservatively, 2 GB of memory per native

				@@ -116,6 +122,13 @@ Run all tests through the test execution wrapper with

				$ ./test.py --mode={debug,release}

				```

				or, if you are using `dbuild`, you need to build the code and the tests and then you can run them at will:

				```bash

				$ ./tools/toolchain/dbuild ninja {debug,release,dev}-build

				$ ./tools/toolchain/dbuild ./test.py --mode {debug,release,dev}

				```

				The `--name` argument can be specified to run a particular test.

				Alternatively, you can execute the test executable directly. For example,

				@@ -199,7 +212,7 @@ The `scylla.yaml` file in the repository by default writes all database data to

				Scylla has a number of requirements for the file-system and operating system to operate ideally and at peak performance. However, during development, these requirements can be relaxed with the `--developer-mode` flag.

				Additionally, when running on under-powered platforms like portable laptops, the `--overprovisined` flag is useful.

				Additionally, when running on under-powered platforms like portable laptops, the `--overprovisioned` flag is useful.

				On a development machine, one might run Scylla as

									
										16

README.md
									
												View File
												
				@@ -15,7 +15,7 @@ For more information, please see the [ScyllaDB web site].

				## Build Prerequisites

				Scylla is fairly fussy about its build environment, requiring very recent

				versions of the C++20 compiler and of many libraries to build. The document

				versions of the C++23 compiler and of many libraries to build. The document

				[HACKING.md](HACKING.md) includes detailed information on building and

				developing Scylla, but to get Scylla building quickly on (almost) any build

				machine, Scylla offers a [frozen toolchain](tools/toolchain/README.md),

				@@ -65,11 +65,13 @@ $ ./tools/toolchain/dbuild ./build/release/scylla --help

				## Testing

				[![Build with the latest Seastar](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/seastar.yaml) [![Check Reproducible Build](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/reproducible-build.yaml) [![clang-nightly](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml/badge.svg)](https://github.com/scylladb/scylladb/actions/workflows/clang-nightly.yaml)

				See [test.py manual](docs/dev/testing.md).

				## Scylla APIs and compatibility

				By default, Scylla is compatible with Apache Cassandra and its APIs - CQL and

				Thrift. There is also support for the API of Amazon DynamoDB™,

				By default, Scylla is compatible with Apache Cassandra and its API - CQL.

				There is also support for the API of Amazon DynamoDB™,

				which needs to be enabled and configured in order to be used. For more

				information on how to enable the DynamoDB™ API in Scylla,

				and the current compatibility of this feature as well as Scylla-specific extensions, see

				@@ -82,11 +84,11 @@ Documentation can be found [here](docs/dev/README.md).

				Seastar documentation can be found [here](http://docs.seastar.io/master/index.html).

				User documentation can be found [here](https://docs.scylladb.com/).

				## Training 

				## Training

				Training material and online courses can be found at [Scylla University](https://university.scylladb.com/). 

				The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling, 

				administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions, 

				Training material and online courses can be found at [Scylla University](https://university.scylladb.com/).

				The courses are free, self-paced and include hands-on examples. They cover a variety of topics including Scylla data modeling,

				administration, architecture, basic NoSQL concepts, using drivers for application development, Scylla setup, failover, compactions,

				multi-datacenters and how Scylla integrates with third-party applications.

				## Contributing to Scylla

4

SCYLLA-VERSION-GEN

View File

@@ -78,7 +78,7 @@ fi
 # Default scylla product/version tags
 PRODUCT=scylla
 VERSION=5.5.0-dev
 VERSION=6.3.0-dev
 if test -f version
 then
@@ -104,7 +104,7 @@ else
 fi
 if [ -f "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" ]; then
 	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" |cut -d . -f 3)
 	GIT_COMMIT_FILE=$(cat "$OUTPUT_DIR/SCYLLA-RELEASE-FILE" | rev | cut -d . -f 1 | rev)
 	if [ "$GIT_COMMIT" = "$GIT_COMMIT_FILE" ]; then
 		exit 0
 	fi

									
										25

alternator/auth.cc
									
												View File
												
				@@ -8,7 +8,7 @@

				#include "alternator/error.hh"

				#include "auth/common.hh"

				#include "log.hh"

				#include "utils/log.hh"

				#include <string>

				#include <string_view>

				#include "bytes.hh"

				@@ -19,6 +19,7 @@

				#include "alternator/executor.hh"

				#include "cql3/selection/selection.hh"

				#include "cql3/result_set.hh"

				#include "types/types.hh"

				#include <seastar/core/coroutine.hh>

				namespace alternator {

				@@ -31,11 +32,12 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv

				    dht::partition_range_vector partition_ranges{dht::partition_range(dht::decorate_key(*schema, pk))};

				    std::vector<query::clustering_range> bounds{query::clustering_range::make_open_ended_both_sides()};

				    const column_definition* salted_hash_col = schema->get_column_definition(bytes("salted_hash"));

				    if (!salted_hash_col) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("Credentials cannot be fetched for: {}", username)));

				    const column_definition* can_login_col = schema->get_column_definition(bytes("can_login"));

				    if (!salted_hash_col || !can_login_col) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("Credentials cannot be fetched for: {}", username)));

				    }

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id}, selection->get_query_options());

				    auto selection = cql3::selection::selection::for_columns(schema, {salted_hash_col, can_login_col});

				    auto partition_slice = query::partition_slice(std::move(bounds), {}, query::column_id_vector{salted_hash_col->id, can_login_col->id}, selection->get_query_options());

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice,

				            proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				    auto cl = auth::password_authenticator::consistency_for_user(username);

				@@ -49,11 +51,18 @@ future<std::string> get_key_from_roles(service::storage_proxy& proxy, auth::serv

				    auto result_set = builder.build();

				    if (result_set->empty()) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("User not found: {}", username)));

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("User not found: {}", username)));

				    }

				    const managed_bytes_opt& salted_hash = result_set->rows().front().front(); // We only asked for 1 row and 1 column

				    const auto& result = result_set->rows().front();

				    bool can_login = result[1] && value_cast<bool>(boolean_type->deserialize(*result[1]));

				    if (!can_login) {

				        // This is a valid role name, but has "login=False" so should not be

				        // usable for authentication (see #19735).

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("Role {} has login=false so cannot be used for login", username)));

				    }

				    const managed_bytes_opt& salted_hash = result.front();

				    if (!salted_hash) {

				        co_await coroutine::return_exception(api_error::unrecognized_client(format("No password found for user: {}", username)));

				        co_await coroutine::return_exception(api_error::unrecognized_client(fmt::format("No password found for user: {}", username)));

				    }

				    co_return value_cast<sstring>(utf8_type->deserialize(*salted_hash));

				}

									
										14

alternator/conditions.cc
									
												View File
												
				@@ -15,8 +15,6 @@

				#include "utils/base64.hh"

				#include "utils/rjson.hh"

				#include <stdexcept>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <boost/algorithm/cxx11/any_of.hpp>

				#include "utils/overloaded_functor.hh"

				#include "expressions.hh"

				@@ -42,12 +40,12 @@ comparison_operator_type get_comparison_operator(const rjson::value& comparison_

				            {"NOT_CONTAINS", comparison_operator_type::NOT_CONTAINS},

				    };

				    if (!comparison_operator.IsString()) {

				        throw api_error::validation(format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				        throw api_error::validation(fmt::format("Invalid comparison operator definition {}", rjson::print(comparison_operator)));

				    }

				    std::string op = comparison_operator.GetString();

				    auto it = ops.find(op);

				    if (it == ops.end()) {

				        throw api_error::validation(format("Unsupported comparison operator {}", op));

				        throw api_error::validation(fmt::format("Unsupported comparison operator {}", op));

				    }

				    return it->second;

				}

				@@ -429,7 +427,7 @@ static bool check_BETWEEN(const T& v, const T& lb, const T& ub, bool bounds_from

				    if (cmp_lt()(ub, lb)) {

				        if (bounds_from_query) {

				            throw api_error::validation(

				                format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				                fmt::format("BETWEEN operator requires lower_bound <= upper_bound, but {} > {}", lb, ub));

				        } else {

				            return false;

				        }

				@@ -613,7 +611,7 @@ conditional_operator_type get_conditional_operator(const rjson::value& req) {

				        return conditional_operator_type::OR;

				    } else {

				        throw api_error::validation(

				                format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));

				                fmt::format("'ConditionalOperator' parameter must be AND, OR or missing. Found {}.", s));

				    }

				}

				@@ -743,9 +741,9 @@ bool verify_condition_expression(

				            };

				            switch (list.op) {

				            case '&':

				                return boost::algorithm::all_of(list.conditions, verify_condition);

				                return std::ranges::all_of(list.conditions, verify_condition);

				            case '|':

				                return boost::algorithm::any_of(list.conditions, verify_condition);

				                return std::ranges::any_of(list.conditions, verify_condition);

				            default:

				                // Shouldn't happen unless we have a bug in the parser

				                throw std::logic_error("bad operator in condition_list");

									
										20

alternator/controller.cc
									
												View File
												
				@@ -32,8 +32,10 @@ controller::controller(

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config)

				    : _gossiper(gossiper)

				        const db::config& config,

				        seastar::scheduling_group sg)

				    : protocol_server(sg)

				    , _gossiper(gossiper)

				    , _proxy(proxy)

				    , _mm(mm)

				    , _sys_dist_ks(sys_dist_ks)

				@@ -62,7 +64,9 @@ std::vector<socket_address> controller::listen_addresses() const {

				}

				future<> controller::start_server() {

				    return seastar::async([this] {

				    seastar::thread_attributes attr;

				    attr.sched_group = _sched_group;

				    return seastar::async(std::move(attr), [this] {

				        _listen_addresses.clear();

				        auto preferred = _config.listen_interface_prefer_ipv6() ? std::make_optional(net::inet_address::family::INET6) : std::nullopt;

				@@ -126,10 +130,10 @@ future<> controller::start_server() {

				                std::throw_with_nested(std::runtime_error("Failed to set up Alternator TLS credentials"));

				            }

				        }

				        bool alternator_enforce_authorization = _config.alternator_enforce_authorization();

				        _server.invoke_on_all(

				                [this, addr, alternator_port, alternator_https_port, creds = std::move(creds), alternator_enforce_authorization] (server& server) mutable {

				            return server.init(addr, alternator_port, alternator_https_port, creds, alternator_enforce_authorization,

				                [this, addr, alternator_port, alternator_https_port, creds = std::move(creds)] (server& server) mutable {

				            return server.init(addr, alternator_port, alternator_https_port, creds,

				                    _config.alternator_enforce_authorization,

				                    &_memory_limiter.local().get_semaphore(),

				                    _config.max_concurrent_requests_per_shard);

				        }).handle_exception([this, addr, alternator_port, alternator_https_port] (std::exception_ptr ep) {

				@@ -156,7 +160,9 @@ future<> controller::stop_server() {

				}

				future<> controller::request_stop_server() {

				    return stop_server();

				    return with_scheduling_group(_sched_group, [this] {

				        return stop_server();

				    });

				}

				}

									
										3

alternator/controller.hh
									
												View File
												
				@@ -80,7 +80,8 @@ public:

				        sharded<service::memory_limiter>& memory_limiter,

				        sharded<auth::service>& auth_service,

				        sharded<qos::service_level_controller>& sl_controller,

				        const db::config& config);

				        const db::config& config,

				        seastar::scheduling_group sg);

				    virtual sstring name() const override;

				    virtual sstring protocol() const override;

504

alternator/executor.cc

View File

File diff suppressed because it is too large Load Diff

									
										16

alternator/executor.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include <seastar/core/future.hh>

				#include <seastar/http/httpd.hh>

				#include "seastarx.hh"

				#include <seastar/json/json_elements.hh>

				#include <seastar/core/sharded.hh>

				@@ -24,6 +23,8 @@

				#include "utils/rjson.hh"

				#include "utils/updateable_value.hh"

				#include "tracing/trace_state.hh"

				namespace db {

				    class system_distributed_keyspace;

				}

				@@ -51,6 +52,8 @@ class gossiper;

				}

				class schema_builder;

				namespace alternator {

				class rmw_operation;

				@@ -159,6 +162,7 @@ class executor : public peering_sharded_service<executor> {

				    service::migration_manager& _mm;

				    db::system_distributed_keyspace& _sdks;

				    cdc::metadata& _cdc_metadata;

				    utils::updateable_value<bool> _enforce_authorization;

				    // An smp_service_group to be used for limiting the concurrency when

				    // forwarding Alternator request between shards - if necessary for LWT.

				    smp_service_group _ssg;

				@@ -177,10 +181,7 @@ public:

				             db::system_distributed_keyspace& sdks,

				             cdc::metadata& cdc_metadata,

				             smp_service_group ssg,

				             utils::updateable_value<uint32_t> default_timeout_in_ms)

				        : _gossiper(gossiper), _proxy(proxy), _mm(mm), _sdks(sdks), _cdc_metadata(cdc_metadata), _ssg(ssg) {

				        s_default_timeout_in_ms = std::move(default_timeout_in_ms);

				    }

				             utils::updateable_value<uint32_t> default_timeout_in_ms);

				    future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				    future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

				@@ -263,4 +264,9 @@ public:

				// add more than a couple of levels in its own output construction.

				bool is_big(const rjson::value& val, int big_size = 100'000);

				// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,

				// SELECT, DROP, etc.) on the given table. When permission is denied an

				// appropriate user-readable api_error::access_denied is thrown.

				future<> verify_permission(bool enforce_authorization, const service::client_state&, const schema_ptr&, auth::permission);

				}

									
										23

alternator/expressions.cc
									
												View File
												
				@@ -20,15 +20,12 @@

				#include <seastar/core/print.hh>

				#include <seastar/util/log.hh>

				#include <boost/algorithm/cxx11/any_of.hpp>

				#include <boost/algorithm/cxx11/all_of.hpp>

				#include <functional>

				#include <unordered_map>

				namespace alternator {

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>

				static Result do_with_parser(std::string_view input, Func&& f) {

				    expressionsLexer::InputStreamType input_stream{

				        reinterpret_cast<const ANTLR_UINT8*>(input.data()),

				@@ -43,7 +40,7 @@ static Result do_with_parser(std::string_view input, Func&& f) {

				    return result;

				}

				template <typename Func, typename Result = std::result_of_t<Func(expressionsParser&)>>

				template <typename Func, typename Result = std::invoke_result_t<Func, expressionsParser&>>

				static Result parse(const char* input_name, std::string_view input, Func&& f) {

				    if (input.length() > 4096) {

				        throw expressions_syntax_error(format("{} expression size {} exceeds allowed maximum 4096.",

				@@ -57,10 +54,10 @@ static Result parse(const char* input_name, std::string_view input, Func&& f) {

				        // TODO: displayRecognitionError could set a position inside the

				        // expressions_syntax_error in throws, and we could use it here to

				        // mark the broken position in 'input'.

				        throw expressions_syntax_error(format("Failed parsing {} '{}': {}",

				        throw expressions_syntax_error(fmt::format("Failed parsing {} '{}': {}",

				            input_name, input, e.what()));

				    } catch (...) {

				        throw expressions_syntax_error(format("Failed parsing {} '{}': {}",

				        throw expressions_syntax_error(fmt::format("Failed parsing {} '{}': {}",

				            input_name, input, std::current_exception()));

				    }

				}

				@@ -160,12 +157,12 @@ static std::optional<std::string> resolve_path_component(const std::string& colu

				    if (column_name.size() > 0 && column_name.front() == '#') {

				        if (!expression_attribute_names) {

				            throw api_error::validation(

				                    format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));

				                    fmt::format("ExpressionAttributeNames missing, entry '{}' required by expression", column_name));

				        }

				        const rjson::value* value = rjson::find(*expression_attribute_names, column_name);

				        if (!value || !value->IsString()) {

				            throw api_error::validation(

				                    format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));

				                    fmt::format("ExpressionAttributeNames missing entry '{}' required by expression", column_name));

				        }

				        used_attribute_names.emplace(column_name);

				        return std::string(rjson::to_string_view(*value));

				@@ -202,16 +199,16 @@ static void resolve_constant(parsed::constant& c,

				        [&] (const std::string& valref) {

				            if (!expression_attribute_values) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));

				                        fmt::format("ExpressionAttributeValues missing, entry '{}' required by expression", valref));

				            }

				            const rjson::value* value = rjson::find(*expression_attribute_values, valref);

				            if (!value) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues missing entry '{}' required by expression", valref));

				                        fmt::format("ExpressionAttributeValues missing entry '{}' required by expression", valref));

				            }

				            if (value->IsNull()) {

				                throw api_error::validation(

				                        format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));

				                        fmt::format("ExpressionAttributeValues null value for entry '{}' required by expression", valref));

				            }

				            validate_value(*value, "ExpressionAttributeValues");

				            used_attribute_values.emplace(valref);

				@@ -708,7 +705,7 @@ rjson::value calculate_value(const parsed::value& v,

				            auto function_it = function_handlers.find(std::string_view(f._function_name));

				            if (function_it == function_handlers.end()) {

				                throw api_error::validation(

				                        format("{}: unknown function '{}' called.", caller, f._function_name));

				                        fmt::format("{}: unknown function '{}' called.", caller, f._function_name));

				            }

				            return function_it->second(caller, previous_item, f);

				        },

									
										1

alternator/expressions_types.hh
									
												View File
												
				@@ -66,7 +66,6 @@ public:

				    std::vector<std::variant<std::string, unsigned>>& operators() {

				        return _operators;

				    }

				    friend std::ostream& operator<<(std::ostream&, const path&);

				};

				// When an expression is first parsed, all constants are references, like

									
										2

alternator/rmw_operation.hh
									
												View File
												
				@@ -12,6 +12,8 @@

				#include "service/paxos/cas_request.hh"

				#include "utils/rjson.hh"

				#include "executor.hh"

				#include "tracing/trace_state.hh"

				#include "keys.hh"

				namespace alternator {

									
										37

alternator/serialization.cc
									
												View File
												
				@@ -8,7 +8,7 @@

				#include "utils/base64.hh"

				#include "utils/rjson.hh"

				#include "log.hh"

				#include "utils/log.hh"

				#include "serialization.hh"

				#include "error.hh"

				#include "concrete_types.hh"

				@@ -143,17 +143,17 @@ static big_decimal parse_and_validate_number(std::string_view s) {

				        big_decimal ret(s);

				        auto [magnitude, precision] = internal::get_magnitude_and_precision(s);

				        if (magnitude > 125) {

				            throw api_error::validation(format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));

				            throw api_error::validation(fmt::format("Number overflow: {}. Attempting to store a number with magnitude larger than supported range.", s));

				        }

				        if (magnitude < -130) {

				            throw api_error::validation(format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));

				            throw api_error::validation(fmt::format("Number underflow: {}. Attempting to store a number with magnitude lower than supported range.", s));

				        }

				        if (precision > 38) {

				            throw api_error::validation(format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));

				            throw api_error::validation(fmt::format("Number too precise: {}. Attempting to store a number with more significant digits than supported.", s));

				        }

				        return ret;

				    } catch (const marshal_exception& e) {

				        throw api_error::validation(format("The parameter cannot be converted to a numeric value: {}", s));

				        throw api_error::validation(fmt::format("The parameter cannot be converted to a numeric value: {}", s));

				    }

				}

				@@ -265,7 +265,7 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co

				    std::string column_name = column.name_as_text();

				    const rjson::value* key_typed_value = rjson::find(item, column_name);

				    if (!key_typed_value) {

				        throw api_error::validation(format("Key column {} not found", column_name));

				        throw api_error::validation(fmt::format("Key column {} not found", column_name));

				    }

				    return get_key_from_typed_value(*key_typed_value, column);

				}

				@@ -277,19 +277,26 @@ bytes get_key_column_value(const rjson::value& item, const column_definition& co

				// mentioned in the exception message).

				// If the type does match, a reference to the encoded value is returned.

				static const rjson::value& get_typed_value(const rjson::value& key_typed_value, std::string_view type_str, std::string_view name, std::string_view value_name) {

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1 ||

				            !key_typed_value.MemberBegin()->value.IsString()) {

				    if (!key_typed_value.IsObject() || key_typed_value.MemberCount() != 1) {

				        throw api_error::validation(

				                format("Malformed value object for {} {}: {}",

				                fmt::format("Malformed value object for {} {}: {}",

				                        value_name, name, key_typed_value));

				    }

				    auto it = key_typed_value.MemberBegin();

				    if (rjson::to_string_view(it->name) != type_str) {

				        throw api_error::validation(

				                format("Type mismatch: expected type {} for {} {}, got type {}",

				                fmt::format("Type mismatch: expected type {} for {} {}, got type {}",

				                        type_str, value_name, name, it->name));

				    }

				    // We assume this function is called just for key types (S, B, N), and

				    // all of those always have a string value in the JSON.

				    if (!it->value.IsString()) {

				        throw api_error::validation(

				            fmt::format("Malformed value object for {} {}: {}",

				                    value_name, name, key_typed_value));

				    }

				    return it->value;

				}

				@@ -395,16 +402,16 @@ position_in_partition pos_from_json(const rjson::value& item, schema_ptr schema)

				big_decimal unwrap_number(const rjson::value& v, std::string_view diagnostic) {

				    if (!v.IsObject() || v.MemberCount() != 1) {

				        throw api_error::validation(format("{}: invalid number object", diagnostic));

				        throw api_error::validation(fmt::format("{}: invalid number object", diagnostic));

				    }

				    auto it = v.MemberBegin();

				    if (it->name != "N") {

				        throw api_error::validation(format("{}: expected number, found type '{}'", diagnostic, it->name));

				        throw api_error::validation(fmt::format("{}: expected number, found type '{}'", diagnostic, it->name));

				    }

				    if (!it->value.IsString()) {

				        // We shouldn't reach here. Callers normally validate their input

				        // earlier with validate_value().

				        throw api_error::validation(format("{}: improperly formatted number constant", diagnostic));

				        throw api_error::validation(fmt::format("{}: improperly formatted number constant", diagnostic));

				    }

				    big_decimal ret = parse_and_validate_number(rjson::to_string_view(it->value));

				    return ret;

				@@ -485,7 +492,7 @@ rjson::value set_sum(const rjson::value& v1, const rjson::value& v2) {

				    auto [set1_type, set1] = unwrap_set(v1);

				    auto [set2_type, set2] = unwrap_set(v2);

				    if (set1_type != set2_type) {

				        throw api_error::validation(format("Mismatched set types: {} and {}", set1_type, set2_type));

				        throw api_error::validation(fmt::format("Mismatched set types: {} and {}", set1_type, set2_type));

				    }

				    if (!set1 || !set2) {

				        throw api_error::validation("UpdateExpression: ADD operation for sets must be given sets as arguments");

				@@ -513,7 +520,7 @@ std::optional<rjson::value> set_diff(const rjson::value& v1, const rjson::value&

				    auto [set1_type, set1] = unwrap_set(v1);

				    auto [set2_type, set2] = unwrap_set(v2);

				    if (set1_type != set2_type) {

				        throw api_error::validation(format("Set DELETE type mismatch: {} and {}", set1_type, set2_type));

				        throw api_error::validation(fmt::format("Set DELETE type mismatch: {} and {}", set1_type, set2_type));

				    }

				    if (!set1 || !set2) {

				        throw api_error::validation("UpdateExpression: DELETE operation can only be performed on a set");

									
										77

alternator/server.cc
									
												View File
												
				@@ -7,7 +7,8 @@

				 */

				#include "alternator/server.hh"

				#include "log.hh"

				#include "gms/application_state.hh"

				#include "utils/log.hh"

				#include <fmt/ranges.h>

				#include <seastar/http/function_handlers.hh>

				#include <seastar/http/short_streams.hh>

				@@ -17,10 +18,15 @@

				#include <seastar/util/short_streams.hh>

				#include "seastarx.hh"

				#include "error.hh"

				#include "service/client_state.hh"

				#include "service/qos/service_level_controller.hh"

				#include "utils/assert.hh"

				#include "timeout_config.hh"

				#include "utils/rjson.hh"

				#include "auth.hh"

				#include <cctype>

				#include <string_view>

				#include <utility>

				#include "service/storage_proxy.hh"

				#include "gms/gossiper.hh"

				#include "utils/overloaded_functor.hh"

				@@ -34,8 +40,6 @@ using reply = http::reply;

				namespace alternator {

				static constexpr auto TARGET = "X-Amz-Target";

				inline std::vector<std::string_view> split(std::string_view text, char separator) {

				    std::vector<std::string_view> tokens;

				    if (text == "") {

				@@ -208,11 +212,35 @@ protected:

				        // using _gossiper().get_live_members(). But getting

				        // just the list of live nodes in this DC needs more elaborate code:

				        auto& topology = _proxy.get_token_metadata_ptr()->get_topology();

				        sstring local_dc = topology.get_datacenter();

				        std::unordered_set<gms::inet_address> local_dc_nodes = topology.get_datacenter_endpoints().at(local_dc);

				        // /localnodes lists nodes in a single DC. By default the DC of this

				        // server is used, but it can be overridden by a "dc" query option.

				        // If the DC does not exist, we return an empty list - not an error.

				        sstring query_dc = req->get_query_param("dc");

				        sstring local_dc = query_dc.empty() ? topology.get_datacenter() : query_dc;

				        std::unordered_set<gms::inet_address> local_dc_nodes;

				        const auto& endpoints = topology.get_datacenter_endpoints();

				        auto dc_it = endpoints.find(local_dc);

				        if (dc_it != endpoints.end()) {

				            local_dc_nodes = dc_it->second;

				        }

				        // By default, /localnodes lists the nodes of all racks in the given

				        // DC, unless a single rack is selected by the "rack" query option.

				        // If the rack does not exist, we return an empty list - not an error.

				        sstring query_rack = req->get_query_param("rack");

				        for (auto& ip : local_dc_nodes) {

				            if (_gossiper.is_alive(ip)) {

				                rjson::push_back(results, rjson::from_string(fmt::to_string(ip)));

				            if (!query_rack.empty()) {

				                auto rack = _gossiper.get_application_state_value(ip, gms::application_state::RACK);

				                if (rack != query_rack) {

				                    continue;

				                }

				            }

				            // Note that it's not enough for the node to be is_alive() - a

				            // node joining the cluster is also "alive" but not responsive to

				            // requests. We need the node to be in normal state. See #19694.

				            if (_gossiper.is_normal(ip)) {

				                // Use the gossiped broadcast_rpc_address if available instead

				                // of the internal IP address "ip". See discussion in #18711.

				                rjson::push_back(results, rjson::from_string(_gossiper.get_rpc_address(ip)));

				            }

				        }

				        rep->set_status(reply::status_type::ok);

				@@ -255,7 +283,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				    std::string_view authorization_header = authorization_it->second;

				    auto pos = authorization_header.find_first_of(' ');

				    if (pos == std::string_view::npos || authorization_header.substr(0, pos) != "AWS4-HMAC-SHA256") {

				        throw api_error::invalid_signature(format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));

				        throw api_error::invalid_signature(fmt::format("Authorization header must use AWS4-HMAC-SHA256 algorithm: {}", authorization_header));

				    }

				    authorization_header.remove_prefix(pos+1);

				    std::string credential;

				@@ -290,7 +318,7 @@ future<std::string> server::verify_signature(const request& req, const chunked_c

				    std::vector<std::string_view> credential_split = split(credential, '/');

				    if (credential_split.size() != 5) {

				        throw api_error::validation(format("Incorrect credential information format: {}", credential));

				        throw api_error::validation(fmt::format("Incorrect credential information format: {}", credential));

				    }

				    std::string user(credential_split[0]);

				    std::string datestamp(credential_split[1]);

				@@ -375,7 +403,7 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_

				        std::string buf;

				        tracing::add_session_param(trace_state, "alternator_op", op);

				        tracing::add_query(trace_state, truncated_content_view(query, buf));

				        tracing::begin(trace_state, format("Alternator {}", op), client_state.get_client_address());

				        tracing::begin(trace_state, seastar::format("Alternator {}", op), client_state.get_client_address());

				        if (!username.empty()) {

				            tracing::set_username(trace_state, auth::authenticated_user(username));

				        }

				@@ -385,10 +413,10 @@ static tracing::trace_state_ptr maybe_trace_query(service::client_state& client_

				future<executor::request_return_type> server::handle_api_request(std::unique_ptr<request> req) {

				    _executor._stats.total_operations++;

				    sstring target = req->get_header(TARGET);

				    std::vector<std::string_view> split_target = split(target, '.');

				    //NOTICE(sarna): Target consists of Dynamo API version followed by a dot '.' and operation type (e.g. CreateTable)

				    std::string op = split_target.empty() ? std::string() : std::string(split_target.back());

				    sstring target = req->get_header("X-Amz-Target");

				    // target is DynamoDB API version followed by a dot '.' and operation type (e.g. CreateTable)

				    auto dot = target.find('.');

				    std::string_view op = (dot == sstring::npos) ? std::string_view() : std::string_view(target).substr(dot+1);

				    // JSON parsing can allocate up to roughly 2x the size of the raw

				    // document, + a couple of bytes for maintenance.

				    // TODO: consider the case where req->content_length is missing. Maybe

				@@ -400,7 +428,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				        ++_executor._stats.requests_blocked_memory;

				    }

				    auto units = co_await std::move(units_fut);

				    assert(req->content_stream);

				    SCYLLA_ASSERT(req->content_stream);

				    chunked_content content = co_await util::read_entire_stream(*req->content_stream);

				    auto username = co_await verify_signature(*req, content);

				@@ -411,7 +439,7 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				    auto callback_it = _callbacks.find(op);

				    if (callback_it == _callbacks.end()) {

				        _executor._stats.unsupported_operations++;

				        co_return api_error::unknown_operation(format("Unsupported operation {}", op));

				        co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));

				    }

				    if (_pending_requests.get_count() >= _max_concurrent_requests) {

				        _executor._stats.requests_shed++;

				@@ -419,11 +447,11 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

				    }

				    _pending_requests.enter();

				    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

				    //FIXME: Client state can provide more context, e.g. client's endpoint address

				    // We use unique_ptr because client_state cannot be moved or copied

				    executor::client_state client_state = username.empty()

				        ? service::client_state{service::client_state::internal_tag()}

				        : service::client_state{service::client_state::internal_tag(), _auth_service, _sl_controller, username};

				    executor::client_state client_state(service::client_state::external_tag(),

				        _auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());

				    if (!username.empty()) {

				        client_state.set_login(auth::authenticated_user(username));

				    }

				    co_await client_state.maybe_update_per_service_level_params();

				    tracing::trace_state_ptr trace_state = maybe_trace_query(client_state, username, op, content);

				@@ -470,6 +498,7 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos

				        , _enforce_authorization(false)

				        , _enabled_servers{}

				        , _pending_requests{}

				        , _timeout_config(_proxy.data_dictionary().get_config())

				      , _callbacks{

				        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

				            return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

				@@ -547,9 +576,9 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos

				}

				future<> server::init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				        bool enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {

				        utils::updateable_value<bool> enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests) {

				    _memory_limiter = memory_limiter;

				    _enforce_authorization = enforce_authorization;

				    _enforce_authorization = std::move(enforce_authorization);

				    _max_concurrent_requests = std::move(max_concurrent_requests);

				    if (!port && !https_port) {

				        return make_exception_future<>(std::runtime_error("Either regular port or TLS port"

				@@ -634,7 +663,7 @@ future<> server::json_parser::stop() {

				const char* api_error::what() const noexcept {

				    if (_what_string.empty()) {

				        _what_string = format("{} {}: {}", static_cast<int>(_http_code), _type, _msg);

				        _what_string = fmt::format("{} {}: {}", std::to_underlying(_http_code), _type, _msg);

				    }

				    return _what_string.c_str();

				}

									
										9

alternator/server.hh
									
												View File
												
				@@ -39,9 +39,14 @@ class server {

				    qos::service_level_controller& _sl_controller;

				    key_cache _key_cache;

				    bool _enforce_authorization;

				    utils::updateable_value<bool> _enforce_authorization;

				    utils::small_vector<std::reference_wrapper<seastar::httpd::http_server>, 2> _enabled_servers;

				    gate _pending_requests;

				    // In some places we will need a CQL updateable_timeout_config object even

				    // though it isn't really relevant for Alternator which defines its own

				    // timeouts separately. We can create this object only once.

				    updateable_timeout_config _timeout_config;

				    alternator_callbacks_map _callbacks;

				    semaphore* _memory_limiter;

				@@ -71,7 +76,7 @@ public:

				    server(executor& executor, service::storage_proxy& proxy, gms::gossiper& gossiper, auth::service& service, qos::service_level_controller& sl_controller);

				    future<> init(net::inet_address addr, std::optional<uint16_t> port, std::optional<uint16_t> https_port, std::optional<tls::credentials_builder> creds,

				            bool enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);

				            utils::updateable_value<bool> enforce_authorization, semaphore* memory_limiter, utils::updateable_value<uint32_t> max_concurrent_requests);

				    future<> stop();

				private:

				    void set_routes(seastar::httpd::routes& r);

									
										6

alternator/stats.cc
									
												View File
												
				@@ -67,6 +67,8 @@ stats::stats() : api_operations{} {

				            OPERATION_LATENCY(get_item_latency, "GetItem")

				            OPERATION_LATENCY(delete_item_latency, "DeleteItem")

				            OPERATION_LATENCY(update_item_latency, "UpdateItem")

				            OPERATION_LATENCY(batch_write_item_latency, "BatchWriteItem")

				            OPERATION_LATENCY(batch_get_item_latency, "BatchGetItem")

				            OPERATION(list_streams, "ListStreams")

				            OPERATION(describe_stream, "DescribeStream")

				            OPERATION(get_shard_iterator, "GetShardIterator")

				@@ -94,6 +96,10 @@ stats::stats() : api_operations{} {

				                    seastar::metrics::description("number of rows read and matched during filtering operations")),

				            seastar::metrics::make_total_operations("filtered_rows_dropped_total", [this] { return cql_stats.filtered_rows_read_total - cql_stats.filtered_rows_matched_total; },

				                    seastar::metrics::description("number of rows read and dropped during filtering operations")),

				                    seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},

				                            api_operations.batch_write_item_batch_total).set_skip_when_empty(),

				                    seastar::metrics::make_counter("batch_item_count", seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},

				                            api_operations.batch_get_item_batch_total).set_skip_when_empty(),

				    });

				}

									
										5

alternator/stats.hh
									
												View File
												
				@@ -11,7 +11,6 @@

				#include <cstdint>

				#include <seastar/core/metrics_registration.hh>

				#include "utils/estimated_histogram.hh"

				#include "utils/histogram.hh"

				#include "cql3/stats.hh"

				@@ -27,6 +26,8 @@ public:

				    struct {

				        uint64_t batch_get_item = 0;

				        uint64_t batch_write_item = 0;

				        uint64_t batch_get_item_batch_total = 0;

				        uint64_t batch_write_item_batch_total = 0;

				        uint64_t create_backup = 0;

				        uint64_t create_global_table = 0;

				        uint64_t create_table = 0;

				@@ -70,6 +71,8 @@ public:

				        utils::timed_rate_moving_average_summary_and_histogram get_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram delete_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram update_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram batch_write_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram batch_get_item_latency;

				        utils::timed_rate_moving_average_summary_and_histogram get_records_latency;

				    } api_operations;

				    // Miscellaneous event counters

									
										47

alternator/streams.cc
									
												View File
												
				@@ -13,6 +13,7 @@

				#include <seastar/json/formatter.hh>

				#include "auth/permission.hh"

				#include "db/config.hh"

				#include "cdc/log.hh"

				@@ -233,11 +234,8 @@ struct shard_id {

				    // dynamo specifies shardid as max 65 chars. 

				    friend std::ostream& operator<<(std::ostream& os, const shard_id& id) {

				        boost::io::ios_flags_saver fs(os);

				        return os << marker << std::hex  

				            << id.time.time_since_epoch().count()

				            << ':' << id.id.to_bytes()

				            ;

				        fmt::print(os, "{} {:x}:{}", marker, id.time.time_since_epoch().count(), id.id.to_bytes());

				        return os;

				    }

				};

				@@ -779,7 +777,7 @@ struct event_id {

				    cdc::stream_id stream;

				    utils::UUID timestamp;

				    static const auto marker = 'E';

				    static constexpr auto marker = 'E';

				    event_id(cdc::stream_id s, utils::UUID ts)

				        : stream(s)

				@@ -787,10 +785,8 @@ struct event_id {

				    {}

				    friend std::ostream& operator<<(std::ostream& os, const event_id& id) {

				        boost::io::ios_flags_saver fs(os);

				        return os << marker << std::hex << id.stream.to_bytes()

				            << ':' << id.timestamp

				            ;

				        fmt::print(os, "{}{}:{}", marker, id.stream.to_bytes(), id.timestamp);

				        return os;

				    }

				};

				}

				@@ -823,11 +819,13 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    }

				    if (!schema || !base || !is_alternator_keyspace(schema->ks_name())) {

				        throw api_error::resource_not_found(fmt::to_string(iter.table));

				        co_return api_error::resource_not_found(fmt::to_string(iter.table));

				    }

				    tracing::add_table_name(trace_state, schema->ks_name(), schema->cf_name());

				    co_await verify_permission(_enforce_authorization, client_state, schema, auth::permission::SELECT);

				    db::consistency_level cl = db::consistency_level::LOCAL_QUORUM;

				    partition_key pk = iter.shard.id.to_partition_key(*schema);

				@@ -846,19 +844,21 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    static const bytes op_column_name = cdc::log_meta_column_name_bytes("operation");

				    static const bytes eor_column_name = cdc::log_meta_column_name_bytes("end_of_batch");

				    std::optional<attrs_to_get> key_names = boost::copy_range<attrs_to_get>(

				        boost::range::join(std::move(base->partition_key_columns()), std::move(base->clustering_key_columns()))

				        | boost::adaptors::transformed([&] (const column_definition& cdef) {

				    std::optional<attrs_to_get> key_names =

				        base->primary_key_columns()

				        | std::views::transform([&] (const column_definition& cdef) {

				            return std::make_pair<std::string, attrs_to_get_node>(cdef.name_as_text(), {}); })

				    );

				        | std::ranges::to<attrs_to_get>()

				    ;

				    // Include all base table columns as values (in case pre or post is enabled).

				    // This will include attributes not stored in the frozen map column

				    std::optional<attrs_to_get> attr_names = boost::copy_range<attrs_to_get>(base->regular_columns()

				    std::optional<attrs_to_get> attr_names = base->regular_columns()

				        // this will include the :attrs column, which we will also force evaluating. 

				        // But not having this set empty forces out any cdc columns from actual result 

				        | boost::adaptors::transformed([] (const column_definition& cdef) {

				        | std::views::transform([] (const column_definition& cdef) {

				            return std::make_pair<std::string, attrs_to_get_node>(cdef.name_as_text(), {}); })

				    );

				        | std::ranges::to<attrs_to_get>()

				    ;

				    std::vector<const column_definition*> columns;

				    columns.reserve(schema->all_columns().size());

				@@ -869,10 +869,11 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    std::transform(pks.begin(), pks.end(), std::back_inserter(columns), [](auto& c) { return &c; });

				    std::transform(cks.begin(), cks.end(), std::back_inserter(columns), [](auto& c) { return &c; });

				    auto regular_columns = boost::copy_range<query::column_id_vector>(schema->regular_columns() 

				        | boost::adaptors::filtered([](const column_definition& cdef) { return cdef.name() == op_column_name || cdef.name() == eor_column_name || !cdc::is_cdc_metacolumn_name(cdef.name_as_text()); })

				        | boost::adaptors::transformed([&] (const column_definition& cdef) { columns.emplace_back(&cdef); return cdef.id; })

				    );

				    auto regular_columns = schema->regular_columns()

				        | std::views::filter([](const column_definition& cdef) { return cdef.name() == op_column_name || cdef.name() == eor_column_name || !cdc::is_cdc_metacolumn_name(cdef.name_as_text()); })

				        | std::views::transform([&] (const column_definition& cdef) { columns.emplace_back(&cdef); return cdef.id; })

				        | std::ranges::to<query::column_id_vector>()

				    ;

				    stream_view_type type = cdc_options_to_steam_view_type(base->cdc_options());

				@@ -892,7 +893,7 @@ future<executor::request_return_type> executor::get_records(client_state& client

				    auto command = ::make_lw_shared<query::read_command>(schema->id(), schema->version(), partition_slice, _proxy.get_max_result_size(partition_slice),

				            query::tombstone_limit(_proxy.get_tombstone_limit()), query::row_limit(limit * mul));

				    return _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(

				    co_return co_await _proxy.query(schema, std::move(command), std::move(partition_ranges), cl, service::storage_proxy::coordinator_query_options(default_timeout(), std::move(permit), client_state)).then(

				            [this, schema, partition_slice = std::move(partition_slice), selection = std::move(selection), start_time = std::move(start_time), limit, key_names = std::move(key_names), attr_names = std::move(attr_names), type, iter, high_ts] (service::storage_proxy::coordinator_query_result qr) mutable {       

				        cql3::selection::result_set_builder builder(*selection, gc_clock::now());

				        query::result_view::consume(*qr.query_result, partition_slice, cql3::selection::result_set_builder::visitor(builder, *schema, *selection));

									
										114

alternator/ttl.cc
									
												View File
												
				@@ -23,9 +23,10 @@

				#include "gms/inet_address.hh"

				#include "inet_address_vectors.hh"

				#include "locator/abstract_replication_strategy.hh"

				#include "log.hh"

				#include "utils/log.hh"

				#include "gc_clock.hh"

				#include "replica/database.hh"

				#include "service/client_state.hh"

				#include "service_permit.hh"

				#include "timestamp.hh"

				#include "service/storage_proxy.hh"

				@@ -35,6 +36,7 @@

				#include "mutation/mutation.hh"

				#include "types/types.hh"

				#include "types/map.hh"

				#include "utils/assert.hh"

				#include "utils/rjson.hh"

				#include "utils/big_decimal.hh"

				#include "cql3/selection/selection.hh"

				@@ -97,6 +99,7 @@ future<executor::request_return_type> executor::update_time_to_live(client_state

				    }

				    sstring attribute_name(v->GetString(), v->GetStringLength());

				    co_await verify_permission(_enforce_authorization, client_state, schema, auth::permission::ALTER);

				    co_await db::modify_tags(_mm, schema->ks_name(), schema->cf_name(), [&](std::map<sstring, sstring>& tags_map) {

				        if (enabled) {

				            if (tags_map.contains(TTL_TAG_KEY)) {

				@@ -312,7 +315,7 @@ static size_t random_offset(size_t min, size_t max) {

				// this range's primary node is down. For this we need to return not just

				// a list of this node's secondary ranges - but also the primary owner of

				// each of those ranges.

				static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary_ranges(

				static future<std::vector<std::pair<dht::token_range, gms::inet_address>>> get_secondary_ranges(

				        const locator::effective_replication_map_ptr& erm,

				        gms::inet_address ep) {

				    const auto& tm = *erm->get_token_metadata_ptr();

				@@ -323,6 +326,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				    }

				    auto prev_tok = sorted_tokens.back();

				    for (const auto& tok : sorted_tokens) {

				        co_await coroutine::maybe_yield();

				        inet_address_vector_replica_set eps = erm->get_natural_endpoints(tok);

				        if (eps.size() <= 1 || eps[1] != ep) {

				            prev_tok = tok;

				@@ -350,7 +354,7 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				        }

				        prev_tok = tok;

				    }

				    return ret;

				    co_return ret;

				}

				@@ -386,63 +390,63 @@ static std::vector<std::pair<dht::token_range, gms::inet_address>> get_secondary

				//

				// FIXME: Check if this algorithm is safe with tablet migration.

				// https://github.com/scylladb/scylladb/issues/16567

				enum primary_or_secondary_t {primary, secondary};

				template<primary_or_secondary_t primary_or_secondary>

				class token_ranges_owned_by_this_shard {

				    // ranges_holder_primary holds just the primary ranges themselves

				    class ranges_holder_primary {

				        const dht::token_range_vector _token_ranges;

				     public:

				        ranges_holder_primary(const locator::vnode_effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)

				            : _token_ranges(erm->get_primary_ranges(ep)) {}

				        std::size_t size() const { return _token_ranges.size(); }

				        const dht::token_range& operator[](std::size_t i) const {

				            return _token_ranges[i];

				        }

				        bool should_skip(std::size_t i) const {

				            return false;

				        }

				    };

				    // ranges_holder<secondary> holds the secondary token ranges plus each

				    // range's primary owner, needed to implement should_skip().

				    class ranges_holder_secondary {

				        std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;

				        gms::gossiper& _gossiper;

				     public:

				        ranges_holder_secondary(const locator::effective_replication_map_ptr& erm, gms::gossiper& g, gms::inet_address ep)

				            : _token_ranges(get_secondary_ranges(erm, ep))

				            , _gossiper(g) {}

				        std::size_t size() const { return _token_ranges.size(); }

				        const dht::token_range& operator[](std::size_t i) const {

				            return _token_ranges[i].first;

				        }

				        // range i should be skipped if its primary owner is alive.

				        bool should_skip(std::size_t i) const {

				            return _gossiper.is_alive(_token_ranges[i].second);

				        }

				    };

				// ranges_holder_primary holds just the primary ranges themselves

				class ranges_holder_primary {

				    dht::token_range_vector _token_ranges;

				public:

				    explicit ranges_holder_primary(dht::token_range_vector token_ranges) : _token_ranges(std::move(token_ranges)) {}

				    static future<ranges_holder_primary> make(const locator::vnode_effective_replication_map_ptr& erm, gms::inet_address ep) {

				        co_return ranges_holder_primary(co_await erm->get_primary_ranges(ep));

				    }

				    std::size_t size() const { return _token_ranges.size(); }

				    const dht::token_range& operator[](std::size_t i) const {

				        return _token_ranges[i];

				    }

				    bool should_skip(std::size_t i) const {

				        return false;

				    }

				};

				// ranges_holder<secondary> holds the secondary token ranges plus each

				// range's primary owner, needed to implement should_skip().

				class ranges_holder_secondary {

				    std::vector<std::pair<dht::token_range, gms::inet_address>> _token_ranges;

				    const gms::gossiper& _gossiper;

				public:

				    explicit ranges_holder_secondary(std::vector<std::pair<dht::token_range, gms::inet_address>> token_ranges, const gms::gossiper& g)

				        : _token_ranges(std::move(token_ranges))

				        , _gossiper(g) {}

				    static future<ranges_holder_secondary> make(const locator::effective_replication_map_ptr& erm, gms::inet_address ep, const gms::gossiper& g) {

				        co_return ranges_holder_secondary(co_await get_secondary_ranges(erm, ep), g);

				    }

				    std::size_t size() const { return _token_ranges.size(); }

				    const dht::token_range& operator[](std::size_t i) const {

				        return _token_ranges[i].first;

				    }

				    // range i should be skipped if its primary owner is alive.

				    bool should_skip(std::size_t i) const {

				        return _gossiper.is_alive(_token_ranges[i].second);

				    }

				};

				template<class primary_or_secondary_t>

				class token_ranges_owned_by_this_shard {

				    schema_ptr _s;

				    locator::effective_replication_map_ptr _erm;

				    // _token_ranges will contain a list of token ranges owned by this node.

				    // We'll further need to split each such range to the pieces owned by

				    // the current shard, using _intersecter.

				    using ranges_holder = std::conditional_t<

				            primary_or_secondary == primary_or_secondary_t::primary,

				            ranges_holder_primary,

				            ranges_holder_secondary>;

				    const ranges_holder _token_ranges;

				    const primary_or_secondary_t _token_ranges;

				    // NOTICE: _range_idx is used modulo _token_ranges size when accessing

				    // the data to ensure that it doesn't go out of bounds

				    size_t _range_idx;

				    size_t _end_idx;

				    std::optional<dht::selective_token_range_sharder> _intersecter;

				public:

				    token_ranges_owned_by_this_shard(replica::database& db, gms::gossiper& g, schema_ptr s)

				    token_ranges_owned_by_this_shard(schema_ptr s, primary_or_secondary_t token_ranges)

				        :  _s(s)

				        , _erm(s->table().get_effective_replication_map())

				        , _token_ranges(db.find_keyspace(s->ks_name()).get_vnode_effective_replication_map(),

				                g, _erm->get_topology().my_address())

				        , _token_ranges(std::move(token_ranges))

				        , _range_idx(random_offset(0, _token_ranges.size() - 1))

				        , _end_idx(_range_idx + _token_ranges.size())

				    {

				@@ -498,6 +502,7 @@ struct scan_ranges_context {

				    bytes column_name;

				    std::optional<std::string> member;

				    service::client_state internal_client_state;

				    ::shared_ptr<cql3::selection::selection> selection;

				    std::unique_ptr<service::query_state> query_state_ptr;

				    std::unique_ptr<cql3::query_options> query_options;

				@@ -507,6 +512,7 @@ struct scan_ranges_context {

				        : s(s)

				        , column_name(column_name)

				        , member(member)

				        , internal_client_state(service::client_state::internal_tag())

				    {

				        // FIXME: don't read the entire items - read only parts of it.

				        // We must read the key columns (to be able to delete) and also

				@@ -515,8 +521,9 @@ struct scan_ranges_context {

				        // be good if we can read only the single item of the map - it

				        // should be possible (and a must for issue #7751!).

				        lw_shared_ptr<service::pager::paging_state> paging_state = nullptr;

				        auto regular_columns = boost::copy_range<query::column_id_vector>(

				            s->regular_columns() | boost::adaptors::transformed([] (const column_definition& cdef) { return cdef.id; }));

				        auto regular_columns =

				            s->regular_columns() | std::views::transform([] (const column_definition& cdef) { return cdef.id; })

				            | std::ranges::to<query::column_id_vector>();

				        selection = cql3::selection::selection::wildcard(s);

				        query::partition_slice::option_set opts = selection->get_query_options();

				        opts.set<query::partition_slice::option::allow_short_read>();

				@@ -525,10 +532,9 @@ struct scan_ranges_context {

				        std::vector<query::clustering_range> ck_bounds{query::clustering_range::make_open_ended_both_sides()};

				        auto partition_slice = query::partition_slice(std::move(ck_bounds), {}, std::move(regular_columns), opts);

				        command = ::make_lw_shared<query::read_command>(s->id(), s->version(), partition_slice, proxy.get_max_result_size(partition_slice), query::tombstone_limit(proxy.get_tombstone_limit()));

				        executor::client_state client_state{executor::client_state::internal_tag()};

				        tracing::trace_state_ptr trace_state;

				        // NOTICE: empty_service_permit is used because the TTL service has fixed parallelism

				        query_state_ptr = std::make_unique<service::query_state>(client_state, trace_state, empty_service_permit());

				        query_state_ptr = std::make_unique<service::query_state>(internal_client_state, trace_state, empty_service_permit());

				        // FIXME: What should we do on multi-DC? Will we run the expiration on the same ranges on all

				        // DCs or only once for each range? If the latter, we need to change the CLs in the

				        // scanner and deleter.

				@@ -551,7 +557,7 @@ static future<> scan_table_ranges(

				        expiration_service::stats& expiration_stats)

				{

				    const schema_ptr& s = scan_ctx.s;

				    assert (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.

				    SCYLLA_ASSERT (partition_ranges.size() == 1); // otherwise issue #9167 will cause incorrect results.

				    auto p = service::pager::query_pagers::pager(proxy, s, scan_ctx.selection, *scan_ctx.query_state_ptr,

				            *scan_ctx.query_options, scan_ctx.command, std::move(partition_ranges), nullptr);

				    while (!p->is_exhausted()) {

				@@ -724,7 +730,9 @@ static future<bool> scan_table(

				    expiration_stats.scan_table++;

				    // FIXME: need to pace the scan, not do it all at once.

				    scan_ranges_context scan_ctx{s, proxy, std::move(column_name), std::move(member)};

				    token_ranges_owned_by_this_shard<primary> my_ranges(db.real_database(), gossiper, s);

				    auto erm = db.real_database().find_keyspace(s->ks_name()).get_vnode_effective_replication_map();

				    auto my_address = erm->get_topology().my_address();

				    token_ranges_owned_by_this_shard my_ranges(s, co_await ranges_holder_primary::make(erm, my_address));

				    while (std::optional<dht::partition_range> range = my_ranges.next_partition_range()) {

				        // Note that because of issue #9167 we need to run a separate

				        // query on each partition range, and can't pass several of

				@@ -744,7 +752,7 @@ static future<bool> scan_table(

				    // by tasking another node to take over scanning of the dead node's primary

				    // ranges. What we do here is that this node will also check expiration

				    // on its *secondary* ranges - but only those whose primary owner is down.

				    token_ranges_owned_by_this_shard<secondary> my_secondary_ranges(db.real_database(), gossiper, s);

				    token_ranges_owned_by_this_shard my_secondary_ranges(s, co_await ranges_holder_secondary::make(erm, my_address, gossiper));

				    while (std::optional<dht::partition_range> range = my_secondary_ranges.next_partition_range()) {

				        expiration_stats.secondary_ranges_scanned++;

				        dht::partition_range_vector partition_ranges;

									
										31

api/CMakeLists.txt
									
												View File
												
				@@ -1,4 +1,29 @@

				# Generate C++ sources from Swagger definitions

				function(generate_swagger)

				  set(one_value_args TARGET VAR IN_FILE OUT_DIR)

				  cmake_parse_arguments(args "" "${one_value_args}" "" ${ARGN})

				  get_filename_component(in_file_name ${args_IN_FILE} NAME)

				  set(generator ${PROJECT_SOURCE_DIR}/seastar/scripts/seastar-json2code.py)

				  set(header_out ${args_OUT_DIR}/${in_file_name}.hh)

				  set(source_out ${args_OUT_DIR}/${in_file_name}.cc)

				  add_custom_command(

				    DEPENDS

				      ${args_IN_FILE}

				      ${generator}

				    OUTPUT ${header_out} ${source_out}

				    COMMAND ${CMAKE_COMMAND} -E make_directory ${args_OUT_DIR}

				    COMMAND ${generator} --create-cc -f ${args_IN_FILE} -o ${header_out})

				  add_custom_target(${args_TARGET}

				    DEPENDS

				      ${header_out}

				      ${source_out})

				  set(${args_VAR} ${header_out} ${source_out} PARENT_SCOPE)

				endfunction()

				set(swagger_files

				  api-doc/authorization_cache.json

				  api-doc/cache_service.json

				@@ -7,6 +32,7 @@ set(swagger_files

				  api-doc/commitlog.json

				  api-doc/compaction_manager.json

				  api-doc/config.json

				  api-doc/cql_server_test.json

				  api-doc/endpoint_snitch_info.json

				  api-doc/error_injection.json

				  api-doc/failure_detector.json

				@@ -28,7 +54,7 @@ set(swagger_files

				foreach(f ${swagger_files})

				  get_filename_component(fname "${f}" NAME_WE)

				  get_filename_component(dir "${f}" DIRECTORY)

				  seastar_generate_swagger(

				  generate_swagger(

				    TARGET scylla_swagger_gen_${fname}

				    VAR scylla_swagger_gen_${fname}_files

				    IN_FILE "${CMAKE_CURRENT_SOURCE_DIR}/${f}"

				@@ -36,7 +62,7 @@ foreach(f ${swagger_files})

				  list(APPEND swagger_gen_files "${scylla_swagger_gen_${fname}_files}")

				endforeach()

				add_library(api)

				add_library(api STATIC)

				target_sources(api

				  PRIVATE

				    api.cc

				@@ -46,6 +72,7 @@ target_sources(api

				    commitlog.cc

				    compaction_manager.cc

				    config.cc

				    cql_server_test.cc

				    endpoint_snitch.cc

				    error_injection.cc

				    authorization_cache.cc

									
										4

api/api-doc/collectd.json
									
												View File
												
				@@ -67,7 +67,7 @@

				               "parameters":[

				                  {

				                     "name":"pluginid",

				                     "description":"The plugin ID, describe the component the metric belongs to. Examples are cache, thrift, etc'. Regex are supported.The plugin ID, describe the component the metric belong to. Examples are: cache, thrift etc'. regex are supported",

				                     "description":"The plugin ID, describe the component the metric belongs to. Examples are cache and alternator, etc'. Regex are supported.",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -199,4 +199,4 @@

				         }

				      }

				   }

				}

				}

									
										8

api/api-doc/column_family.json
									
												View File
												
				@@ -92,6 +92,14 @@

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"consider_only_existing_data",

				                     "description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"split_output",

				                     "description":"true if the output of the major compaction should be split in several sstables",

									
										26

api/api-doc/cql_server_test.json
									
										Normal file
									
												View File
												
				@@ -0,0 +1,26 @@

				{

				    "apiVersion":"0.0.1",

				    "swaggerVersion":"1.2",

				    "basePath":"{{Protocol}}://{{Host}}",

				    "resourcePath":"/cql_server_test",

				    "produces":[

				        "application/json"

				    ],

				    "apis":[

				        {

				            "path":"/cql_server_test/connections_params",

				            "operations":[

				                {

				                    "method":"GET",

				                    "summary":"Get service level params of each CQL connection",

				                    "type":"connections_service_level_params",

				                    "nickname":"connections_params",

				                    "produces":[

				                        "application/json"

				                    ],

				                    "parameters":[]

				                }

				            ]

				        }

				    ]

				}

									
										56

api/api-doc/error_injection.json
									
												View File
												
				@@ -63,6 +63,28 @@

				                     "paramType":"path"

				                  }

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Read the state of an injection from all shards",

				               "type":"array",

				               "items":{

				                  "type":"error_injection_info"

				               },

				               "nickname":"read_injection",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"injection",

				                     "description":"injection name",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  }

				               ]

				            }

				         ]

				      },

				@@ -152,5 +174,39 @@

				            }

				         }

				      }

				   },

				   "models":{

				      "mapper":{

				         "id":"mapper",

				         "description":"A key value mapping",

				         "properties":{

				            "key":{

				               "type":"string",

				               "description":"The key"

				            },

				            "value":{

				               "type":"string",

				               "description":"The value"

				            }

				         }

				      },

				       "error_injection_info":{

				         "id":"error_injection_info",

				         "description":"Information about an error injection",

				         "properties":{

				            "enabled":{

				               "type":"boolean",

				               "description":"Is the error injection enabled"

				            },

				            "parameters":{

				               "type":"array",

				               "items":{

				                  "type":"mapper"

				               },

				               "description":"The parameter values"

				            }

				         },

				         "required":["enabled"]

				      }

				   }

				}

									
										64

api/api-doc/raft.json
									
												View File
												
				@@ -62,6 +62,70 @@

				               ]

				            }

				         ]

				      },

				      {

				         "path": "/raft/read_barrier",

				         "operations": [

				            {

				               "method": "POST",

				               "summary": "Triggers read barrier for the given Raft group to wait for previously committed commands in this group to be applied locally. For example, can be used on group 0 to wait for the node to obtain latest schema changes.",

				               "type": "string",

				               "nickname": "read_barrier",

				               "produces": [

				                  "application/json"

				               ],

				               "parameters": [

				                  {

				                     "name": "group_id",

				                     "description": "The ID of the group. When absent, group0 is used.",

				                     "required": false,

				                     "allowMultiple": false,

				                     "type": "string",

				                     "paramType": "query"

				                  },

				                  {

				                     "name": "timeout",

				                     "description": "Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",

				                     "required": false,

				                     "allowMultiple": false,

				                     "type": "long",

				                     "paramType": "query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				         "path":"/raft/trigger_stepdown/",

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Triggers stepdown of a leader for given Raft group or group0 if not provided (returns an error if the node is not a leader)",

				               "type":"string",

				               "nickname":"trigger_stepdown",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				                  {

				                     "name":"group_id",

				                     "description":"The ID of the group which leader should stepdown",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"timeout",

				                     "description":"Timeout in seconds after which the endpoint returns a failure. If not provided, 60s is used.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"long",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      }

				   ]

				}

									
										182

api/api-doc/storage_service.json
									
												View File
												
				@@ -741,11 +741,151 @@

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"consider_only_existing_data",

				                     "description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				         ]

				      },

				      {

				          "path":"/storage_service/backup",

				          "operations":[

				              {

				                  "method":"POST",

				                  "summary":"Starts copying SSTables from a specified keyspace to a designated bucket in object storage",

				                  "type":"string",

				                  "nickname":"start_backup",

				                  "produces":[

				                      "application/json"

				                  ],

				                  "parameters":[

				                      {

				                          "name":"endpoint",

				                          "description":"ID of the configured object storage endpoint to copy sstables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"bucket",

				                          "description":"Name of the bucket to backup sstables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                           "name":"prefix",

				                           "description":"The prefix of the objects for the backuped sstables",

				                           "required":true,

				                           "allowMultiple":false,

				                           "type":"string",

				                           "paramType":"query"

				                      },

				                      {

				                          "name":"keyspace",

				                          "description":"Name of a keyspace to copy sstables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"table",

				                          "description":"Name of a table to copy sstables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"snapshot",

				                          "description":"Name of a snapshot to copy sstables from",

				                          "required":false,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      }

				                  ]

				              }

				          ]

				      },

				      {

				          "path":"/storage_service/restore",

				          "operations":[

				              {

				                  "method":"POST",

				                  "summary":"Starts copying SSTables from a designated bucket in object storage to a specified keyspace",

				                  "type":"string",

				                  "nickname":"start_restore",

				                  "produces":[

				                      "application/json"

				                  ],

				                  "parameters":[

				                      {

				                          "name":"endpoint",

				                          "description":"ID of the configured object storage endpoint to copy SSTables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"bucket",

				                          "description":"Name of the bucket to read SSTables from",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"prefix",

				                          "description":"The prefix of the object keys for the backuped SSTables",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "in": "body",

				                          "name": "sstables",

				                          "description": "The list of the object keys of the TOC component of the SSTables to be restored",

				                          "required":true,

				                          "schema" :{

				                              "type": "array",

				                              "items": {

				                                  "type": "string"

				                              }

				                          }

				                      },

				                      {

				                          "name":"keyspace",

				                          "description":"Name of a keyspace to copy SSTables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      },

				                      {

				                          "name":"table",

				                          "description":"Name of a table to copy SSTables to",

				                          "required":true,

				                          "allowMultiple":false,

				                          "type":"string",

				                          "paramType":"query"

				                      }

				                  ]

				              }

				          ]

				      },

				      {

				         "path":"/storage_service/keyspace_compaction/{keyspace}",

				         "operations":[

				@@ -781,6 +921,14 @@

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"consider_only_existing_data",

				                     "description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -1689,33 +1837,11 @@

				      {

				         "path":"/storage_service/rpc_server",

				         "operations":[

				            {

				               "method":"DELETE",

				               "summary":"Allows a user to disable thrift",

				               "type":"void",

				               "nickname":"stop_rpc_server",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"POST",

				               "summary":"allows a user to re-enable thrift",

				               "type":"void",

				               "nickname":"start_rpc_server",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Determine if thrift is running",

				               "type":"boolean",

				               "nickname":"is_rpc_server_running",

				               "nickname":"is_thrift_server_running",

				               "produces":[

				                  "application/json"

				               ],

				@@ -1913,6 +2039,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"query"

				                  },

				                  {

				                     "name":"force",

				                     "description":"Enforce the source_dc option, even if it unsafe to use for rebuild",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"boolean",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -2070,7 +2204,7 @@

				         "operations":[

				            {

				               "method":"POST",

				               "summary":"Enables/Disables tracing for the whole system. Only thrift requests can start tracing currently",

				               "summary":"Enables/Disables tracing for the whole system.",

				               "type":"void",

				               "nickname":"set_trace_probability",

				               "produces":[

									
										15

api/api-doc/system.json
									
												View File
												
				@@ -194,6 +194,21 @@

				               "parameters":[]

				            }

				         ]

				      },

				      {

				         "path":"/system/highest_supported_sstable_version",

				         "operations":[

				            {

				               "method":"GET",

				               "summary":"Get highest supported sstable version",

				               "type":"string",

				               "nickname":"get_highest_supported_sstable_version",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[]

				            }

				         ]

				      }

				   ]

				}

									
										55

api/api-doc/task_manager.json
									
												View File
												
				@@ -115,7 +115,7 @@

				               "parameters":[

				                  {

				                     "name":"task_id",

				                     "description":"The uuid of a task to abort",

				                     "description":"The uuid of a task to abort; if the task is not abortable, 403 status code is returned",

				                     "required":true,

				                     "allowMultiple":false,

				                     "type":"string",

				@@ -144,6 +144,14 @@

				                     "allowMultiple":false,

				                     "type":"string",

				                     "paramType":"path"

				                  },

				                  {

				                     "name":"timeout",

				                     "description":"Timeout for waiting; if times out, 408 status code is returned",

				                     "required":false,

				                     "allowMultiple":false,

				                     "type":"long",

				                     "paramType":"query"

				                  }

				               ]

				            }

				@@ -197,11 +205,36 @@

				                     "paramType":"query"

				                  }

				               ]

				            },

				            {

				               "method":"GET",

				               "summary":"Get current ttl value",

				               "type":"long",

				               "nickname":"get_ttl",

				               "produces":[

				                  "application/json"

				               ],

				               "parameters":[

				               ]

				            }

				         ]

				      }

				   ],

				   "models":{

				      "task_identity":{

				         "id": "task_identity",

				         "description":"Id and node of a task",

				         "properties":{

				            "task_id":{

				               "type":"string",

				               "description":"The uuid of a task"

				            },

				            "node":{

				               "type":"string",

				               "description":"Address of a server on which a task is created"

				            }

				         }

				      },

				      "task_stats" :{

				         "id": "task_stats",

				         "description":"A task statistics object",

				@@ -224,6 +257,14 @@

				               "type":"string",

				               "description":"The description of the task"

				            },

				            "kind":{

				               "type":"string",

				               "enum":[

				                  "node",

				                  "cluster"

				               ],

				               "description":"The kind of a task"

				            },

				            "scope":{

				               "type":"string",

				               "description":"The scope of the task"

				@@ -258,6 +299,14 @@

				               "type":"string",

				               "description":"The description of the task"

				            },

				            "kind":{

				               "type":"string",

				               "enum":[

				                  "node",

				                  "cluster"

				               ],

				               "description":"The kind of a task"

				            },

				            "scope":{

				               "type":"string",

				               "description":"The scope of the task"

				@@ -327,9 +376,9 @@

				            "children_ids":{

				               "type":"array",

				               "items":{

				                  "type":"string"

				                  "type":"task_identity"

				               },

				               "description":"Task IDs of children of this task"

				               "description":"Task identities of children of this task"

				            }

				         }

				      }

									
										2

api/api-doc/utils.json
									
												View File
												
				@@ -75,7 +75,7 @@

				               "items":{

				                  "type":"double"

				               },

				               "description":"One, five and fifteen mintues rates"

				               "description":"One, five and fifteen minutes rates"

				            },

				            "mean_rate": {

				               "type":"double",

									
										100

api/api.cc
									
												View File
												
				@@ -10,6 +10,7 @@

				#include <seastar/http/file_handler.hh>

				#include <seastar/http/transformers.hh>

				#include <seastar/http/api_docs.hh>

				#include "cql_server_test.hh"

				#include "storage_service.hh"

				#include "token_metadata.hh"

				#include "commitlog.hh"

				@@ -71,6 +72,10 @@ future<> set_server_init(http_context& ctx) {

				        rb->register_function(r, "error_injection",

				            "The error injection API");

				        set_error_injection(ctx, r);

				        rb->register_function(r, "storage_proxy",

				                "The storage proxy API");

				        rb->register_function(r, "storage_service",

				                "The storage service API");

				    });

				}

				@@ -81,6 +86,10 @@ future<> set_server_config(http_context& ctx, const db::config& cfg) {

				    });

				}

				future<> unset_server_config(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_config(ctx, r); });

				}

				static future<> register_api(http_context& ctx, const sstring& api_name,

				        const sstring api_desc,

				        std::function<void(http_context& ctx, routes& r)> f) {

				@@ -100,16 +109,16 @@ future<> unset_transport_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_transport_controller(ctx, r); });

				}

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl) {

				    return ctx.http_server.set_routes([&ctx, &ctl] (routes& r) { set_rpc_controller(ctx, r, ctl); });

				future<> set_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { set_thrift_controller(ctx, r); });

				}

				future<> unset_rpc_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_rpc_controller(ctx, r); });

				future<> unset_thrift_controller(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_thrift_controller(ctx, r); });

				}

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				    return register_api(ctx, "storage_service", "The storage service API", [&ss, &group0_client] (http_context& ctx, routes& r) {

				    return ctx.http_server.set_routes([&ctx, &ss, &group0_client] (routes& r) {

				            set_storage_service(ctx, r, ss, group0_client);

				        });

				}

				@@ -118,6 +127,22 @@ future<> unset_server_storage_service(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_storage_service(ctx, r); });

				}

				future<> set_load_meter(http_context& ctx, service::load_meter& lm) {

				    return ctx.http_server.set_routes([&ctx, &lm] (routes& r) { set_load_meter(ctx, r, lm); });

				}

				future<> unset_load_meter(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_load_meter(ctx, r); });

				}

				future<> set_format_selector(http_context& ctx, db::sstables_format_selector& sel) {

				    return ctx.http_server.set_routes([&ctx, &sel] (routes& r) { set_format_selector(ctx, r, sel); });

				}

				future<> unset_format_selector(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_format_selector(ctx, r); });

				}

				future<> set_server_sstables_loader(http_context& ctx, sharded<sstables_loader>& sst_loader) {

				    return ctx.http_server.set_routes([&ctx, &sst_loader] (routes& r) { set_sstables_loader(ctx, r, sst_loader); });

				}

				@@ -180,10 +205,21 @@ future<> unset_server_snitch(http_context& ctx) {

				}

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g) {

				    return register_api(ctx, "gossiper",

				    co_await register_api(ctx, "gossiper",

				                "The gossiper API", [&g] (http_context& ctx, routes& r) {

				                    set_gossiper(ctx, r, g.local());

				                });

				    co_await register_api(ctx, "failure_detector",

				                "The failure detector API", [&g] (http_context& ctx, routes& r) {

				                    set_failure_detector(ctx, r, g.local());

				                });

				}

				future<> unset_server_gossip(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) {

				        unset_gossiper(ctx, r);

				        unset_failure_detector(ctx, r);

				    });

				}

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks) {

				@@ -208,10 +244,7 @@ future<> unset_server_messaging_service(http_context& ctx) {

				}

				future<> set_server_storage_proxy(http_context& ctx, sharded<service::storage_proxy>& proxy) {

				    return register_api(ctx, "storage_proxy",

				                "The storage proxy API", [&proxy] (http_context& ctx, routes& r) {

				                    set_storage_proxy(ctx, r, proxy);

				                });

				    return ctx.http_server.set_routes([&ctx, &proxy] (routes& r) { set_storage_proxy(ctx, r, proxy); });

				}

				future<> unset_server_storage_proxy(http_context& ctx) {

				@@ -234,6 +267,10 @@ future<> set_server_cache(http_context& ctx) {

				            "The cache service API", set_cache_service);

				}

				future<> unset_server_cache(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cache_service(ctx, r); });

				}

				future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& proxy) {

				    return register_api(ctx, "hinted_handoff",

				                "The hinted handoff API", [&proxy] (http_context& ctx, routes& r) {

				@@ -245,24 +282,14 @@ future<> unset_hinted_handoff(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_hinted_handoff(ctx, r); });

				}

				future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx, &g](routes& r) {

				        rb->register_function(r, "failure_detector",

				                "The failure detector API");

				        set_failure_detector(ctx, r, g.local());

				future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm) {

				    return register_api(ctx, "compaction_manager", "The Compaction manager API", [&cm] (http_context& ctx, routes& r) {

				        set_compaction_manager(ctx, r, cm);

				    });

				}

				future<> set_server_compaction_manager(http_context& ctx) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				    return ctx.http_server.set_routes([rb, &ctx](routes& r) {

				        rb->register_function(r, "compaction_manager",

				                "The Compaction manager API");

				        set_compaction_manager(ctx, r);

				    });

				future<> unset_server_compaction_manager(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_compaction_manager(ctx, r); });

				}

				future<> set_server_done(http_context& ctx) {

				@@ -272,15 +299,22 @@ future<> set_server_done(http_context& ctx) {

				        rb->register_function(r, "lsa", "Log-structured allocator API");

				        set_lsa(ctx, r);

				        rb->register_function(r, "commitlog",

				                "The commit log API");

				        set_commitlog(ctx,r);

				        rb->register_function(r, "collectd",

				                "The collectd API");

				        set_collectd(ctx, r);

				    });

				}

				future<> set_server_commitlog(http_context& ctx, sharded<replica::database>& db) {

				    return register_api(ctx, "commitlog", "The commit log API", [&db] (http_context& ctx, routes& r) {

				        set_commitlog(ctx, r, db);

				    });

				}

				future<> unset_server_commitlog(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_commitlog(ctx, r); });

				}

				future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg) {

				    auto rb = std::make_shared < api_registry_builder > (ctx.api_doc);

				@@ -311,6 +345,16 @@ future<> unset_server_task_manager_test(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_task_manager_test(ctx, r); });

				}

				future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl) {

				    return register_api(ctx, "cql_server_test", "The CQL server test API", [&ctl] (http_context& ctx, routes& r) {

				        set_cql_server_test(ctx, r, ctl);

				    });

				}

				future<> unset_server_cql_server_test(http_context& ctx) {

				    return ctx.http_server.set_routes([&ctx] (routes& r) { unset_cql_server_test(ctx, r); });

				}

				#endif

				future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl) {

									
										2

api/api.hh
									
												View File
												
				@@ -246,7 +246,7 @@ public:

				                value = T{boost::lexical_cast<Base>(param)};

				            }

				        } catch (boost::bad_lexical_cast&) {

				            throw httpd::bad_param_exception(format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				            throw httpd::bad_param_exception(fmt::format("{} ({}): type error - should be {}", name, param, boost::units::detail::demangle(typeid(Base).name())));

				        }

				    }

									
										29

api/api_init.hh
									
												View File
												
				@@ -17,6 +17,8 @@

				using request = http::request;

				using reply = http::reply;

				class compaction_manager;

				namespace service {

				class load_meter;

				@@ -46,10 +48,10 @@ class snitch_ptr;

				} // namespace locator

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db {

				class snapshot_ctl;

				class config;

				class sstables_format_selector;

				namespace view {

				class view_builder;

				}

				@@ -77,17 +79,16 @@ struct http_context {

				    sstring api_doc;

				    httpd::http_server_control http_server;

				    distributed<replica::database>& db;

				    service::load_meter& lmeter;

				    http_context(distributed<replica::database>& _db,

				            service::load_meter& _lm)

				            : db(_db), lmeter(_lm)

				    http_context(distributed<replica::database>& _db)

				            : db(_db)

				    {

				    }

				};

				future<> set_server_init(http_context& ctx);

				future<> set_server_config(http_context& ctx, const db::config& cfg);

				future<> unset_server_config(http_context& ctx);

				future<> set_server_snitch(http_context& ctx, sharded<locator::snitch_ptr>& snitch);

				future<> unset_server_snitch(http_context& ctx);

				future<> set_server_storage_service(http_context& ctx, sharded<service::storage_service>& ss, service::raft_group0_client&);

				@@ -100,8 +101,8 @@ future<> set_server_repair(http_context& ctx, sharded<repair_service>& repair);

				future<> unset_server_repair(http_context& ctx);

				future<> set_transport_controller(http_context& ctx, cql_transport::controller& ctl);

				future<> unset_transport_controller(http_context& ctx);

				future<> set_rpc_controller(http_context& ctx, thrift_controller& ctl);

				future<> unset_rpc_controller(http_context& ctx);

				future<> set_thrift_controller(http_context& ctx);

				future<> unset_thrift_controller(http_context& ctx);

				future<> set_server_authorization_cache(http_context& ctx, sharded<auth::service> &auth_service);

				future<> unset_server_authorization_cache(http_context& ctx);

				future<> set_server_snapshot(http_context& ctx, sharded<db::snapshot_ctl>& snap_ctl);

				@@ -109,6 +110,7 @@ future<> unset_server_snapshot(http_context& ctx);

				future<> set_server_token_metadata(http_context& ctx, sharded<locator::shared_token_metadata>& tm);

				future<> unset_server_token_metadata(http_context& ctx);

				future<> set_server_gossip(http_context& ctx, sharded<gms::gossiper>& g);

				future<> unset_server_gossip(http_context& ctx);

				future<> set_server_column_family(http_context& ctx, sharded<db::system_keyspace>& sys_ks);

				future<> unset_server_column_family(http_context& ctx);

				future<> set_server_messaging_service(http_context& ctx, sharded<netw::messaging_service>& ms);

				@@ -119,9 +121,10 @@ future<> set_server_stream_manager(http_context& ctx, sharded<streaming::stream_

				future<> unset_server_stream_manager(http_context& ctx);

				future<> set_hinted_handoff(http_context& ctx, sharded<service::storage_proxy>& p);

				future<> unset_hinted_handoff(http_context& ctx);

				future<> set_server_gossip_settle(http_context& ctx, sharded<gms::gossiper>& g);

				future<> set_server_cache(http_context& ctx);

				future<> set_server_compaction_manager(http_context& ctx);

				future<> unset_server_cache(http_context& ctx);

				future<> set_server_compaction_manager(http_context& ctx, sharded<compaction_manager>& cm);

				future<> unset_server_compaction_manager(http_context& ctx);

				future<> set_server_done(http_context& ctx);

				future<> set_server_task_manager(http_context& ctx, sharded<tasks::task_manager>& tm, lw_shared_ptr<db::config> cfg);

				future<> unset_server_task_manager(http_context& ctx);

				@@ -131,5 +134,13 @@ future<> set_server_tasks_compaction_module(http_context& ctx, sharded<service::

				future<> unset_server_tasks_compaction_module(http_context& ctx);

				future<> set_server_raft(http_context&, sharded<service::raft_group_registry>&);

				future<> unset_server_raft(http_context&);

				future<> set_load_meter(http_context& ctx, service::load_meter& lm);

				future<> unset_load_meter(http_context& ctx);

				future<> set_format_selector(http_context& ctx, db::sstables_format_selector& sel);

				future<> unset_format_selector(http_context& ctx);

				future<> set_server_cql_server_test(http_context& ctx, cql_transport::controller& ctl);

				future<> unset_server_cql_server_test(http_context& ctx);

				future<> set_server_commitlog(http_context& ctx, sharded<replica::database>&);

				future<> unset_server_commitlog(http_context& ctx);

				}

									
										15

api/authorization_cache.hh
									
												View File
												
				@@ -8,11 +8,20 @@

				#pragma once

				#include "api.hh"

				#include <seastar/core/sharded.hh>

				namespace seastar::httpd {

				class routes;

				}

				namespace auth {

				class service;

				}

				namespace api {

				void set_authorization_cache(http_context& ctx, httpd::routes& r, sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_authorization_cache(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<auth::service> &auth_service);

				void unset_authorization_cache(http_context& ctx, seastar::httpd::routes& r);

				}

									
										52

api/cache_service.cc
									
												View File
												
				@@ -7,6 +7,7 @@

				 */

				#include "cache_service.hh"

				#include "api/api.hh"

				#include "api/api-doc/cache_service.json.hh"

				#include "column_family.hh"

				@@ -195,9 +196,9 @@ void set_cache_service(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cs::get_row_capacity.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) -> uint64_t {

				            return memory::stats().total_memory();

				    cs::get_row_capacity.set(r, [] (std::unique_ptr<http::request> req) {

				        return seastar::map_reduce(smp::all_cpus(), [] (int cpu) {

				            return make_ready_future<uint64_t>(memory::stats().total_memory());

				        }, uint64_t(0), std::plus<uint64_t>()).then([](const int64_t& res) {

				            return make_ready_future<json::json_return_type>(res);

				        });

				@@ -319,5 +320,50 @@ void set_cache_service(http_context& ctx, routes& r) {

				    });

				}

				void unset_cache_service(http_context& ctx, routes& r) {

				    cs::get_row_cache_save_period_in_seconds.unset(r);

				    cs::set_row_cache_save_period_in_seconds.unset(r);

				    cs::get_key_cache_save_period_in_seconds.unset(r);

				    cs::set_key_cache_save_period_in_seconds.unset(r);

				    cs::get_counter_cache_save_period_in_seconds.unset(r);

				    cs::set_counter_cache_save_period_in_seconds.unset(r);

				    cs::get_row_cache_keys_to_save.unset(r);

				    cs::set_row_cache_keys_to_save.unset(r);

				    cs::get_key_cache_keys_to_save.unset(r);

				    cs::set_key_cache_keys_to_save.unset(r);

				    cs::get_counter_cache_keys_to_save.unset(r);

				    cs::set_counter_cache_keys_to_save.unset(r);

				    cs::invalidate_key_cache.unset(r);

				    cs::invalidate_counter_cache.unset(r);

				    cs::set_row_cache_capacity_in_mb.unset(r);

				    cs::set_key_cache_capacity_in_mb.unset(r);

				    cs::set_counter_cache_capacity_in_mb.unset(r);

				    cs::save_caches.unset(r);

				    cs::get_key_capacity.unset(r);

				    cs::get_key_hits.unset(r);

				    cs::get_key_requests.unset(r);

				    cs::get_key_hit_rate.unset(r);

				    cs::get_key_hits_moving_avrage.unset(r);

				    cs::get_key_requests_moving_avrage.unset(r);

				    cs::get_key_size.unset(r);

				    cs::get_key_entries.unset(r);

				    cs::get_row_capacity.unset(r);

				    cs::get_row_hits.unset(r);

				    cs::get_row_requests.unset(r);

				    cs::get_row_hit_rate.unset(r);

				    cs::get_row_hits_moving_avrage.unset(r);

				    cs::get_row_requests_moving_avrage.unset(r);

				    cs::get_row_size.unset(r);

				    cs::get_row_entries.unset(r);

				    cs::get_counter_capacity.unset(r);

				    cs::get_counter_hits.unset(r);

				    cs::get_counter_requests.unset(r);

				    cs::get_counter_hit_rate.unset(r);

				    cs::get_counter_hits_moving_avrage.unset(r);

				    cs::get_counter_requests_moving_avrage.unset(r);

				    cs::get_counter_size.unset(r);

				    cs::get_counter_entries.unset(r);

				}

				}

									
										8

api/cache_service.hh
									
												View File
												
				@@ -8,10 +8,14 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				void set_cache_service(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_cache_service(http_context& ctx, seastar::httpd::routes& r);

				void unset_cache_service(http_context& ctx, seastar::httpd::routes& r);

				}

									
										3

api/collectd.cc
									
												View File
												
				@@ -11,6 +11,7 @@

				#include <seastar/core/scollectd.hh>

				#include <seastar/core/scollectd_api.hh>

				#include <boost/range/irange.hpp>

				#include <ranges>

				#include <regex>

				#include "api/api_init.hh"

				@@ -61,7 +62,7 @@ void set_collectd(http_context& ctx, routes& r) {

				        return do_with(std::vector<cd::collectd_value>(), [id] (auto& vec) {

				            vec.resize(smp::count);

				            return parallel_for_each(boost::irange(0u, smp::count), [&vec, id] (auto cpu) {

				            return parallel_for_each(std::views::iota(0u, smp::count), [&vec, id] (auto cpu) {

				                return smp::submit_to(cpu, [id = *id] {

				                    return scollectd::get_collectd_value(id);

				                }).then([&vec, cpu] (auto res) {

									
										77

api/column_family.cc
									
												View File
												
				@@ -15,6 +15,7 @@

				#include <seastar/http/exception.hh>

				#include "sstables/sstables.hh"

				#include "sstables/metadata_collector.hh"

				#include "utils/assert.hh"

				#include "utils/estimated_histogram.hh"

				#include <algorithm>

				#include "db/system_keyspace.hh"

				@@ -23,6 +24,8 @@

				#include "compaction/compaction_manager.hh"

				#include "unimplemented.hh"

				#include <boost/range/algorithm/copy.hpp>

				extern logging::logger apilog;

				namespace api {

				@@ -60,14 +63,6 @@ table_id get_uuid(const sstring& name, const replica::database& db) {

				    return get_uuid(ks, cf, db);

				}

				future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f) {

				    auto uuid = get_uuid(name, ctx.db.local());

				    return ctx.db.invoke_on_all([f, uuid](replica::database& db) {

				        f(db.find_column_family(uuid));

				    });

				}

				future<json::json_return_type>  get_cf_stats(http_context& ctx, const sstring& name,

				        int64_t replica::column_family_stats::*f) {

				    return map_reduce_cf(ctx, name, int64_t(0), [f](const replica::column_family& cf) {

				@@ -82,7 +77,7 @@ future<json::json_return_type>  get_cf_stats(http_context& ctx,

				    }, std::plus<int64_t>());

				}

				static future<json::json_return_type> set_tables(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {

				static future<json::json_return_type> for_tables_on_all_shards(http_context& ctx, const sstring& keyspace, std::vector<sstring> tables, std::function<future<>(replica::table&)> set) {

				    if (tables.empty()) {

				        tables = map_keys(ctx.db.local().find_keyspace(keyspace).metadata().get()->cf_meta_data());

				    }

				@@ -103,7 +98,7 @@ class autocompaction_toggle_guard {

				    replica::database& _db;

				public:

				    autocompaction_toggle_guard(replica::database& db) : _db(db) {

				        assert(this_shard_id() == 0);

				        SCYLLA_ASSERT(this_shard_id() == 0);

				        if (!_db._enable_autocompaction_toggle) {

				            throw std::runtime_error("Autocompaction toggle is busy");

				        }

				@@ -112,7 +107,7 @@ public:

				    autocompaction_toggle_guard(const autocompaction_toggle_guard&) = delete;

				    autocompaction_toggle_guard(autocompaction_toggle_guard&&) = default;

				    ~autocompaction_toggle_guard() {

				        assert(this_shard_id() == 0);

				        SCYLLA_ASSERT(this_shard_id() == 0);

				        _db._enable_autocompaction_toggle = true;

				    }

				};

				@@ -122,7 +117,7 @@ static future<json::json_return_type> set_tables_autocompaction(http_context& ct

				    return ctx.db.invoke_on(0, [&ctx, keyspace, tables = std::move(tables), enabled] (replica::database& db) {

				        auto g = autocompaction_toggle_guard(db);

				        return set_tables(ctx, keyspace, tables, [enabled] (replica::table& cf) {

				        return for_tables_on_all_shards(ctx, keyspace, tables, [enabled] (replica::table& cf) {

				            if (enabled) {

				                cf.enable_auto_compaction();

				            } else {

				@@ -135,7 +130,7 @@ static future<json::json_return_type> set_tables_autocompaction(http_context& ct

				static future<json::json_return_type> set_tables_tombstone_gc(http_context& ctx, const sstring &keyspace, std::vector<sstring> tables, bool enabled) {

				    apilog.info("set_tables_tombstone_gc: enabled={} keyspace={} tables={}", enabled, keyspace, tables);

				    return set_tables(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {

				    return for_tables_on_all_shards(ctx, keyspace, std::move(tables), [enabled] (replica::table& t) {

				        t.set_tombstone_gc_enabled(enabled);

				        return make_ready_future<>();

				    });

				@@ -366,6 +361,14 @@ ratio_holder filter_recent_false_positive_as_ratio_holder(const sstables::shared

				    return ratio_holder(f + sst->filter_get_recent_true_positive(), f);

				}

				uint64_t accumulate_on_active_memtables(replica::table& t, noncopyable_function<uint64_t(replica::memtable& mt)> action) {

				    uint64_t ret = 0;

				    t.for_each_active_memtable([&] (replica::memtable& mt) {

				        ret += action(mt);

				    });

				    return ret;

				}

				void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace>& sys_ks) {

				    cf::get_column_family_name.set(r, [&ctx] (const_req req){

				        std::vector<sstring> res;

				@@ -401,13 +404,13 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), uint64_t{0}, [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				            return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));

				        }, std::plus<>());

				    });

				    cf::get_all_memtable_columns_count.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, uint64_t{0}, [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed(std::mem_fn(&replica::memtable::partition_count)), uint64_t(0));

				            return accumulate_on_active_memtables(cf, std::mem_fn(&replica::memtable::partition_count));

				        }, std::plus<>());

				    });

				@@ -421,33 +424,33 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().total_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_off_heap_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().total_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().total_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, req->get_path_param("name"), int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				    cf::get_all_memtable_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				@@ -485,9 +488,9 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    cf::get_all_cf_all_memtables_live_data_size.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        warn(unimplemented::cause::INDEXES);

				        return map_reduce_cf(ctx, int64_t(0), [](replica::column_family& cf) {

				            return boost::accumulate(cf.active_memtables() | boost::adaptors::transformed([] (replica::memtable* active_memtable) {

				                return active_memtable->region().occupancy().used_space();

				            }), uint64_t(0));

				            return accumulate_on_active_memtables(cf, [] (replica::memtable& active_memtable) {

				                return active_memtable.region().occupancy().used_space();

				            });

				        }, std::plus<int64_t>());

				    });

				@@ -1047,12 +1050,12 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				    });

				    cf::set_compaction_strategy_class.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        sstring strategy = req->get_query_param("class_name");

				        apilog.info("column_family/set_compaction_strategy_class: name={} strategy={}", req->get_path_param("name"), strategy);

				        return foreach_column_family(ctx, req->get_path_param("name"), [strategy](replica::column_family& cf) {

				        return for_tables_on_all_shards(ctx, ks, {std::move(cf)}, [strategy] (replica::table& cf) {

				            cf.set_compaction_strategy(sstables::compaction_strategy::type(strategy));

				        }).then([] {

				                return make_ready_future<json::json_return_type>(json_void());

				            return make_ready_future<>();

				        });

				    });

				@@ -1117,6 +1120,7 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        auto params = req_params({

				            std::pair("name", mandatory::yes),

				            std::pair("flush_memtables", mandatory::no),

				            std::pair("consider_only_existing_data", mandatory::no),

				            std::pair("split_output", mandatory::no),

				        });

				        params.process(*req);

				@@ -1125,7 +1129,8 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        }

				        auto [ks, cf] = parse_fully_qualified_cf_name(*params.get("name"));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.info("column_family/force_major_compaction: name={} flush={}", req->get_path_param("name"), flush);

				        auto consider_only_existing_data = params.get_as<bool>("consider_only_existing_data").value_or(false);

				        apilog.info("column_family/force_major_compaction: name={} flush={} consider_only_existing_data={}", req->get_path_param("name"), flush, consider_only_existing_data);

				        auto keyspace = validate_keyspace(ctx, ks);

				        std::vector<table_info> table_infos = {table_info{

				@@ -1135,10 +1140,10 @@ void set_column_family(http_context& ctx, routes& r, sharded<db::system_keyspace

				        auto& compaction_module = ctx.db.local().get_compaction_manager().get_task_manager_module();

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				        if (!flush && !consider_only_existing_data) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt);

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), ctx.db, std::move(table_infos), fmopt, consider_only_existing_data);

				        co_await task->done();

				        co_return json_void();

				    });

									
										3

api/column_family.hh
									
												View File
												
				@@ -9,7 +9,6 @@

				#pragma once

				#include "replica/database.hh"

				#include <seastar/core/future-util.hh>

				#include <seastar/json/json_elements.hh>

				#include <any>

				#include "api/api_init.hh"

				@@ -24,8 +23,6 @@ void set_column_family(http_context& ctx, httpd::routes& r, sharded<db::system_k

				void unset_column_family(http_context& ctx, httpd::routes& r);

				table_id get_uuid(const sstring& name, const replica::database& db);

				future<> foreach_column_family(http_context& ctx, const sstring& name, std::function<void(replica::column_family&)> f);

				template<class Mapper, class I, class Reducer>

				future<I> map_reduce_cf_raw(http_context& ctx, const sstring& name, I init,

									
										43

api/commitlog.cc
									
												View File
												
				@@ -9,18 +9,20 @@

				#include "commitlog.hh"

				#include "db/commitlog/commitlog.hh"

				#include "api/api-doc/commitlog.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "api/api_init.hh"

				#include "replica/database.hh"

				#include <vector>

				namespace api {

				using namespace seastar::httpd;

				namespace ss = httpd::storage_service_json;

				template<typename T>

				static auto acquire_cl_metric(http_context& ctx, std::function<T (const db::commitlog*)> func) {

				static auto acquire_cl_metric(sharded<replica::database>& db, std::function<T (const db::commitlog*)> func) {

				    typedef T ret_type;

				    return ctx.db.map_reduce0([func = std::move(func)](replica::database& db) {

				    return db.map_reduce0([func = std::move(func)](replica::database& db) {

				        if (db.commitlog() == nullptr) {

				            return make_ready_future<ret_type>();

				        }

				@@ -30,11 +32,11 @@ static auto acquire_cl_metric(http_context& ctx, std::function<T (const db::comm

				    });

				}

				void set_commitlog(http_context& ctx, routes& r) {

				void set_commitlog(http_context& ctx, routes& r, sharded<replica::database>& db) {

				    httpd::commitlog_json::get_active_segment_names.set(r,

				            [&ctx](std::unique_ptr<request> req) {

				            [&db](std::unique_ptr<request> req) {

				        auto res = make_shared<std::vector<sstring>>();

				        return ctx.db.map_reduce([res](std::vector<sstring> names) {

				        return db.map_reduce([res](std::vector<sstring> names) {

				            res->insert(res->end(), names.begin(), names.end());

				        }, [](replica::database& db) {

				            if (db.commitlog() == nullptr) {

				@@ -52,20 +54,35 @@ void set_commitlog(http_context& ctx, routes& r) {

				        return res;

				    });

				    httpd::commitlog_json::get_completed_tasks.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));

				    httpd::commitlog_json::get_completed_tasks.set(r, [&db](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(db, std::bind(&db::commitlog::get_completed_tasks, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_pending_tasks.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));

				    httpd::commitlog_json::get_pending_tasks.set(r, [&db](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(db, std::bind(&db::commitlog::get_pending_tasks, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_total_commit_log_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				    httpd::commitlog_json::get_total_commit_log_size.set(r, [&db](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(db, std::bind(&db::commitlog::get_total_size, std::placeholders::_1));

				    });

				    httpd::commitlog_json::get_max_disk_size.set(r, [&ctx](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(ctx, std::bind(&db::commitlog::disk_limit, std::placeholders::_1));

				    httpd::commitlog_json::get_max_disk_size.set(r, [&db](std::unique_ptr<request> req) {

				        return acquire_cl_metric<uint64_t>(db, std::bind(&db::commitlog::disk_limit, std::placeholders::_1));

				    });

				    ss::get_commitlog.set(r, [&db](const_req req) {

				        return db.local().commitlog()->active_config().commit_log_location;

				    });

				}

				void unset_commitlog(http_context& ctx, routes& r) {

				    httpd::commitlog_json::get_active_segment_names.unset(r);

				    httpd::commitlog_json::get_archiving_segment_names.unset(r);

				    httpd::commitlog_json::get_completed_tasks.unset(r);

				    httpd::commitlog_json::get_pending_tasks.unset(r);

				    httpd::commitlog_json::get_total_commit_log_size.unset(r);

				    httpd::commitlog_json::get_max_disk_size.unset(r);

				    ss::get_commitlog.unset(r);

				}

				}

									
										7

api/commitlog.hh
									
												View File
												
				@@ -8,12 +8,17 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				namespace seastar::httpd {

				class routes;

				}

				namespace replica { class database; }

				namespace api {

				struct http_context;

				void set_commitlog(http_context& ctx, seastar::httpd::routes& r);

				void set_commitlog(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<replica::database>&);

				void unset_commitlog(http_context& ctx, seastar::httpd::routes& r);

				}

									
										93

api/compaction_manager.cc
									
												View File
												
				@@ -7,11 +7,13 @@

				 */

				#include <seastar/core/coroutine.hh>

				#include <seastar/coroutine/exception.hh>

				#include "compaction_manager.hh"

				#include "compaction/compaction_manager.hh"

				#include "api/api.hh"

				#include "api/api-doc/compaction_manager.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "db/system_keyspace.hh"

				#include "column_family.hh"

				#include "unimplemented.hh"

				@@ -22,13 +24,14 @@

				namespace api {

				namespace cm = httpd::compaction_manager_json;

				namespace ss = httpd::storage_service_json;

				using namespace json;

				using namespace seastar::httpd;

				static future<json::json_return_type> get_cm_stats(http_context& ctx,

				static future<json::json_return_type> get_cm_stats(sharded<compaction_manager>& cm,

				        int64_t compaction_manager::stats::*f) {

				    return ctx.db.map_reduce0([f](replica::database& db) {

				        return db.get_compaction_manager().get_stats().*f;

				    return cm.map_reduce0([f](compaction_manager& cm) {

				        return cm.get_stats().*f;

				    }, int64_t(0), std::plus<int64_t>()).then([](const int64_t& res) {

				        return make_ready_future<json::json_return_type>(res);

				    });

				@@ -43,11 +46,10 @@ static std::unordered_map<std::pair<sstring, sstring>, uint64_t, utils::tuple_ha

				    return std::move(a);

				}

				void set_compaction_manager(http_context& ctx, routes& r) {

				    cm::get_compactions.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.db.map_reduce0([](replica::database& db) {

				void set_compaction_manager(http_context& ctx, routes& r, sharded<compaction_manager>& cm) {

				    cm::get_compactions.set(r, [&cm] (std::unique_ptr<http::request> req) {

				        return cm.map_reduce0([](compaction_manager& cm) {

				            std::vector<cm::summary> summaries;

				            const compaction_manager& cm = db.get_compaction_manager();

				            for (const auto& c : cm.get_compactions()) {

				                cm::summary s;

				@@ -99,10 +101,9 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    cm::stop_compaction.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				    cm::stop_compaction.set(r, [&cm] (std::unique_ptr<http::request> req) {

				        auto type = req->get_query_param("type");

				        return ctx.db.invoke_on_all([type] (replica::database& db) {

				            auto& cm = db.get_compaction_manager();

				        return cm.invoke_on_all([type] (compaction_manager& cm) {

				            return cm.stop_compaction(type);

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				@@ -134,8 +135,8 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        }, std::plus<int64_t>());

				    });

				    cm::get_completed_tasks.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return get_cm_stats(ctx, &compaction_manager::stats::completed_tasks);

				    cm::get_completed_tasks.set(r, [&cm] (std::unique_ptr<http::request> req) {

				        return get_cm_stats(cm, &compaction_manager::stats::completed_tasks);

				    });

				    cm::get_total_compactions_completed.set(r, [] (std::unique_ptr<http::request> req) {

				@@ -152,11 +153,14 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(0);

				    });

				    cm::get_compaction_history.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        std::function<future<>(output_stream<char>&&)> f = [&ctx](output_stream<char>&& s) {

				            return do_with(output_stream<char>(std::move(s)), true, [&ctx] (output_stream<char>& s, bool& first){

				                return s.write("[").then([&ctx, &s, &first] {

				                    return ctx.db.local().get_compaction_manager().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable {

				    cm::get_compaction_history.set(r, [&cm] (std::unique_ptr<http::request> req) {

				        std::function<future<>(output_stream<char>&&)> f = [&cm] (output_stream<char>&& out) -> future<> {

				            auto s = std::move(out);

				            bool first = true;

				            std::exception_ptr ex;

				            try {

				                co_await s.write("[");

				                co_await cm.local().get_compaction_history([&s, &first](const db::compaction_history_entry& entry) mutable -> future<> {

				                        cm::history h;

				                        h.id = fmt::to_string(entry.id);

				                        h.ks = std::move(entry.ks);

				@@ -164,24 +168,29 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				                        h.compacted_at = entry.compacted_at;

				                        h.bytes_in = entry.bytes_in;

				                        h.bytes_out =  entry.bytes_out;

				                        for (auto it : entry.rows_merged) {

				                        std::map<int32_t, int64_t> items(entry.rows_merged.begin(), entry.rows_merged.end());

				                        for (auto it : items) {

				                            httpd::compaction_manager_json::row_merged e;

				                            e.key = it.first;

				                            e.value = it.second;

				                            h.rows_merged.push(std::move(e));

				                        }

				                        auto fut = first ? make_ready_future<>() : s.write(", ");

				                        if (!first) {

				                            co_await s.write(", ");

				                        }

				                        first = false;

				                        return fut.then([&s, h = std::move(h)] {

				                            return formatter::write(s, h);

				                        });

				                    }).then([&s] {

				                        return s.write("]").then([&s] {

				                            return s.close();

				                        });

				                        co_await formatter::write(s, h);

				                    });

				                });

				            });

				                co_await s.write("]");

				                co_await s.flush();

				            } catch (...) {

				                ex = std::current_exception();

				            }

				            co_await s.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        };

				        return make_ready_future<json::json_return_type>(std::move(f));

				    });

				@@ -194,6 +203,34 @@ void set_compaction_manager(http_context& ctx, routes& r) {

				        return make_ready_future<json::json_return_type>(res);

				    });

				    ss::get_compaction_throughput_mb_per_sec.set(r, [&cm](std::unique_ptr<http::request> req) {

				        int value = cm.local().throughput_mbs();

				        return make_ready_future<json::json_return_type>(value);

				    });

				    ss::set_compaction_throughput_mb_per_sec.set(r, [](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        auto value = req->get_query_param("value");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				}

				void unset_compaction_manager(http_context& ctx, routes& r) {

				    cm::get_compactions.unset(r);

				    cm::get_pending_tasks_by_table.unset(r);

				    cm::force_user_defined_compaction.unset(r);

				    cm::stop_compaction.unset(r);

				    cm::stop_keyspace_compaction.unset(r);

				    cm::get_pending_tasks.unset(r);

				    cm::get_completed_tasks.unset(r);

				    cm::get_total_compactions_completed.unset(r);

				    cm::get_bytes_compacted.unset(r);

				    cm::get_compaction_history.unset(r);

				    cm::get_compaction_info.unset(r);

				    ss::get_compaction_throughput_mb_per_sec.unset(r);

				    ss::set_compaction_throughput_mb_per_sec.unset(r);

				}

				}

									
										6

api/compaction_manager.hh
									
												View File
												
				@@ -7,13 +7,17 @@

				 */

				#pragma once

				#include <seastar/core/sharded.hh>

				namespace seastar::httpd {

				class routes;

				}

				class compaction_manager;

				namespace api {

				struct http_context;

				void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r);

				void set_compaction_manager(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<compaction_manager>& cm);

				void unset_compaction_manager(http_context& ctx, seastar::httpd::routes& r);

				}

									
										112

api/config.cc
									
												View File
												
				@@ -6,8 +6,12 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#include "api/api.hh"

				#include "api/config.hh"

				#include "api/api-doc/config.json.hh"

				#include "api/api-doc/storage_proxy.json.hh"

				#include "api/api-doc/storage_service.json.hh"

				#include "replica/database.hh"

				#include "db/config.hh"

				#include <sstream>

				#include <boost/algorithm/string/replace.hpp>

				@@ -15,6 +19,8 @@

				namespace api {

				using namespace seastar::httpd;

				namespace sp = httpd::storage_proxy_json;

				namespace ss = httpd::storage_service_json;

				template<class T>

				json::json_return_type get_json_return_type(const T& val) {

				@@ -101,6 +107,112 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx

				        }

				        throw bad_param_exception(sstring("No such config entry: ") + id);

				    });

				    sp::get_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.request_timeout_in_ms()/1000.0;

				    });

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_read_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.read_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_write_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_counter_write_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.counter_write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_cas_contention_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.cas_contention_timeout_in_ms()/1000.0;

				    });

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_range_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.range_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    sp::get_truncate_rpc_timeout.set(r, [&cfg](const_req req)  {

				        return cfg.truncate_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(seastar::json::json_void());

				    });

				    ss::get_all_data_file_locations.set(r, [&cfg](const_req req) {

				        return container_to_vec(cfg.data_file_directories());

				    });

				    ss::get_saved_caches_location.set(r, [&cfg](const_req req) {

				        return cfg.saved_caches_directory();

				    });

				}

				void unset_config(http_context& ctx, routes& r) {

				    cs::find_config_id.unset(r);

				    sp::get_rpc_timeout.unset(r);

				    sp::set_rpc_timeout.unset(r);

				    sp::get_read_rpc_timeout.unset(r);

				    sp::set_read_rpc_timeout.unset(r);

				    sp::get_write_rpc_timeout.unset(r);

				    sp::set_write_rpc_timeout.unset(r);

				    sp::get_counter_write_rpc_timeout.unset(r);

				    sp::set_counter_write_rpc_timeout.unset(r);

				    sp::get_cas_contention_timeout.unset(r);

				    sp::set_cas_contention_timeout.unset(r);

				    sp::get_range_rpc_timeout.unset(r);

				    sp::set_range_rpc_timeout.unset(r);

				    sp::get_truncate_rpc_timeout.unset(r);

				    sp::set_truncate_rpc_timeout.unset(r);

				    ss::get_all_data_file_locations.unset(r);

				    ss::get_saved_caches_location.unset(r);

				}

				}

									
										1

api/config.hh
									
												View File
												
				@@ -14,4 +14,5 @@

				namespace api {

				void set_config(std::shared_ptr<httpd::api_registry_builder20> rb, http_context& ctx, httpd::routes& r, const db::config& cfg, bool first = false);

				void unset_config(http_context& ctx, httpd::routes& r);

				}

									
										69

api/cql_server_test.cc
									
										Normal file
									
												View File
												
				@@ -0,0 +1,69 @@

				/*

				 * Copyright (C) 2024-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				#include <seastar/core/coroutine.hh>

				#include "api/api-doc/cql_server_test.json.hh"

				#include "cql_server_test.hh"

				#include "transport/controller.hh"

				#include "transport/server.hh"

				#include "service/qos/qos_common.hh"

				namespace api {

				namespace cst = httpd::cql_server_test_json;

				using namespace json;

				using namespace seastar::httpd;

				struct connection_sl_params : public json::json_base {

				    json::json_element<sstring> _role_name;

				    json::json_element<sstring> _workload_type;

				    json::json_element<sstring> _timeout;

				    connection_sl_params(const sstring& role_name, const sstring& workload_type, const sstring& timeout) {

				        _role_name = role_name;

				        _workload_type = workload_type;

				        _timeout = timeout;

				        register_params();

				    }

				    connection_sl_params(const connection_sl_params& params)

				        : connection_sl_params(params._role_name(), params._workload_type(), params._timeout()) {}

				    void register_params() {

				        add(&_role_name, "role_name");

				        add(&_workload_type, "workload_type");

				        add(&_timeout, "timeout");

				    }    

				};

				void set_cql_server_test(http_context& ctx, seastar::httpd::routes& r, cql_transport::controller& ctl) {

				    cst::connections_params.set(r, [&ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto sl_params = co_await ctl.get_connections_service_level_params();

				        std::vector<connection_sl_params> result;

				        std::ranges::transform(std::move(sl_params), std::back_inserter(result), [] (const cql_transport::connection_service_level_params& params) {

				            auto nanos = std::chrono::duration_cast<std::chrono::nanoseconds>(params.timeout_config.read_timeout).count();

				            return connection_sl_params(

				                    std::move(params.role_name), 

				                    sstring(qos::service_level_options::to_string(params.workload_type)), 

				                    to_string(cql_duration(months_counter{0}, days_counter{0}, nanoseconds_counter{nanos})));

				        });

				        co_return result;

				    });

				}

				void unset_cql_server_test(http_context& ctx, seastar::httpd::routes& r) {

				    cst::connections_params.unset(r);

				}

				}

				#endif

									
										29

api/cql_server_test.hh
									
										Normal file
									
												View File
												
				@@ -0,0 +1,29 @@

				/*

				 * Copyright (C) 2024-present ScyllaDB

				 */

				/*

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#ifndef SCYLLA_BUILD_MODE_RELEASE

				#pragma once

				namespace cql_transport {

				class controller;

				}

				namespace seastar::httpd {

				class routes;

				}

				namespace api {

				struct http_context;

				void set_cql_server_test(http_context& ctx, seastar::httpd::routes& r, cql_transport::controller& ctl);

				void unset_cql_server_test(http_context& ctx, seastar::httpd::routes& r);

				}

				#endif

									
										30

api/error_injection.cc
									
												View File
												
				@@ -7,10 +7,8 @@

				 */

				#include "api/api-doc/error_injection.json.hh"

				#include "api/api.hh"

				#include "api/api_init.hh"

				#include <seastar/http/exception.hh>

				#include "log.hh"

				#include "utils/error_injection.hh"

				#include "utils/rjson.hh"

				#include <seastar/core/future-util.hh>

				@@ -64,6 +62,32 @@ void set_error_injection(http_context& ctx, routes& r) {

				        });

				    });

				    hf::read_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

				        const sstring injection = req->get_path_param("injection");

				        std::vector<error_injection_json::error_injection_info> error_injection_infos(smp::count, error_injection_json::error_injection_info{});

				        co_await smp::invoke_on_all([&] {

				            auto& info = error_injection_infos[this_shard_id()];

				            auto& errinj = utils::get_local_injector();

				            const auto enabled = errinj.is_enabled(injection);

				            info.enabled = enabled;

				            if (!enabled) {

				                return;

				            }

				            std::vector<error_injection_json::mapper> parameters;

				            for (const auto& p : errinj.get_injection_parameters(injection)) {

				                error_injection_json::mapper param;

				                param.key = p.first;

				                param.value = p.second;

				                parameters.push_back(std::move(param));

				            }

				            info.parameters = std::move(parameters);

				        });

				        co_return json::json_return_type(error_injection_infos);

				    });

				    hf::disable_on_all.set(r, [](std::unique_ptr<request> req) {

				        auto& errinj = utils::get_local_injector();

				        return errinj.disable_on_all().then([] {

									
										11

api/failure_detector.cc
									
												View File
												
				@@ -99,5 +99,16 @@ void set_failure_detector(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				}

				void unset_failure_detector(http_context& ctx, routes& r) {

				    fd::get_all_endpoint_states.unset(r);

				    fd::get_up_endpoint_count.unset(r);

				    fd::get_down_endpoint_count.unset(r);

				    fd::get_phi_convict_threshold.unset(r);

				    fd::get_simple_states.unset(r);

				    fd::set_phi_convict_threshold.unset(r);

				    fd::get_endpoint_state.unset(r);

				    fd::get_endpoint_phi_values.unset(r);

				}

				}

									
										1

api/failure_detector.hh
									
												View File
												
				@@ -19,5 +19,6 @@ class gossiper;

				namespace api {

				void set_failure_detector(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_failure_detector(http_context& ctx, httpd::routes& r);

				}

									
										10

api/gossiper.cc
									
												View File
												
				@@ -71,4 +71,14 @@ void set_gossiper(http_context& ctx, routes& r, gms::gossiper& g) {

				    });

				}

				void unset_gossiper(http_context& ctx, routes& r) {

				    httpd::gossiper_json::get_down_endpoint.unset(r);

				    httpd::gossiper_json::get_live_endpoint.unset(r);

				    httpd::gossiper_json::get_endpoint_downtime.unset(r);

				    httpd::gossiper_json::get_current_generation_number.unset(r);

				    httpd::gossiper_json::get_current_heart_beat_version.unset(r);

				    httpd::gossiper_json::assassinate_endpoint.unset(r);

				    httpd::gossiper_json::force_remove_endpoint.unset(r);

				}

				}

									
										1

api/gossiper.hh
									
												View File
												
				@@ -19,5 +19,6 @@ class gossiper;

				namespace api {

				void set_gossiper(http_context& ctx, httpd::routes& r, gms::gossiper& g);

				void unset_gossiper(http_context& ctx, httpd::routes& r);

				}

									
										2

api/lsa.cc
									
												View File
												
				@@ -11,7 +11,7 @@

				#include <seastar/http/exception.hh>

				#include "utils/logalloc.hh"

				#include "log.hh"

				#include "utils/log.hh"

				namespace api {

				using namespace seastar::httpd;

									
										148

api/raft.cc
									
												View File
												
				@@ -8,10 +8,11 @@

				#include <seastar/core/coroutine.hh>

				#include "api/api.hh"

				#include "api/api-doc/raft.json.hh"

				#include "service/raft/raft_group_registry.hh"

				#include "service/raft/raft_address_map.hh"

				#include "utils/log.hh"

				using namespace seastar::httpd;

				@@ -19,23 +20,128 @@ extern logging::logger apilog;

				namespace api {

				struct http_context;

				namespace r = httpd::raft_json;

				using namespace json;

				namespace {

				::service::raft_timeout get_request_timeout(const http::request& req) {

				    return std::invoke([timeout_str = req.get_query_param("timeout")] {

				        if (timeout_str.empty()) {

				            return ::service::raft_timeout{};

				        }

				        auto dur = std::stoll(timeout_str);

				        if (dur <= 0) {

				            throw bad_param_exception{"Timeout must be a positive number."};

				        }

				        return ::service::raft_timeout{.value = lowres_clock::now() + std::chrono::seconds{dur}};

				    });

				}

				}  // namespace

				void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_registry>& raft_gr) {

				    r::trigger_snapshot.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};

				        auto timeout_dur = std::invoke([timeout_str = req->get_query_param("timeout")] {

				            if (timeout_str.empty()) {

				                return std::chrono::seconds{60};

				        auto timeout = get_request_timeout(*req);

				        std::atomic<bool> found_srv{false};

				        co_await raft_gr.invoke_on_all([gid, timeout, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {

				            if (!raft_gr.find_server(gid)) {

				                co_return;

				            }

				            auto dur = std::stoll(timeout_str);

				            if (dur <= 0) {

				                throw std::runtime_error{"Timeout must be a positive number."};

				            found_srv = true;

				            apilog.info("Triggering Raft group {} snapshot", gid);

				            auto srv = raft_gr.get_server_with_timeouts(gid);

				            auto result = co_await srv.trigger_snapshot(nullptr, timeout);

				            if (result) {

				                apilog.info("New snapshot for Raft group {} created", gid);

				            } else {

				                apilog.info("Could not create new snapshot for Raft group {}, no new entries applied", gid);

				            }

				            return std::chrono::seconds{dur};

				        });

				        if (!found_srv) {

				            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_void{};

				    });

				    r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        if (!req->query_parameters.contains("group_id")) {

				            const auto leader_id = co_await raft_gr.invoke_on(0, [] (service::raft_group_registry& raft_gr) {

				                auto& srv = raft_gr.group0();

				                return srv.current_leader();

				            });

				            co_return json_return_type{leader_id.to_sstring()};

				        }

				        const raft::group_id gid{utils::UUID{req->get_query_param("group_id")}};

				        std::atomic<bool> found_srv{false};

				        std::atomic<raft::server_id> leader_id = raft::server_id::create_null_id();

				        co_await raft_gr.invoke_on_all([gid, &found_srv, &leader_id] (service::raft_group_registry& raft_gr) {

				            if (raft_gr.find_server(gid)) {

				                found_srv = true;

				                leader_id = raft_gr.get_server(gid).current_leader();

				            }

				            return make_ready_future<>();

				        });

				        if (!found_srv) {

				            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_return_type(leader_id.load().to_sstring());

				    });

				    r::read_barrier.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto timeout = get_request_timeout(*req);

				        if (!req->query_parameters.contains("group_id")) {

				            // Read barrier on group 0 by default

				            co_await raft_gr.invoke_on(0, [timeout] (service::raft_group_registry& raft_gr) -> future<> {

				                co_await raft_gr.group0_with_timeouts().read_barrier(nullptr, timeout);

				            });

				            co_return json_void{};

				        }

				        raft::group_id gid{utils::UUID{req->get_query_param("group_id")}};

				        std::atomic<bool> found_srv{false};

				        co_await raft_gr.invoke_on_all([gid, timeout, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {

				            if (!raft_gr.find_server(gid)) {

				                co_return;

				            }

				            found_srv = true;

				            co_await raft_gr.get_server_with_timeouts(gid).read_barrier(nullptr, timeout);

				        });

				        if (!found_srv) {

				            throw bad_param_exception{fmt::format("Server for group ID {} not found", gid)};

				        }

				        co_return json_void{};

				    });

				    r::trigger_stepdown.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        auto timeout = get_request_timeout(*req);

				        auto dur = timeout.value ? *timeout.value - lowres_clock::now() : std::chrono::seconds(60);

				        const auto stepdown_timeout_ticks = dur / service::raft_tick_interval;

				        auto timeout_dur = raft::logical_clock::duration(stepdown_timeout_ticks);

				        if (!req->query_parameters.contains("group_id")) {

				            // Stepdown on group 0 by default

				            co_await raft_gr.invoke_on(0, [timeout_dur] (service::raft_group_registry& raft_gr) {

				                apilog.info("Triggering stepdown for group0");

				                return raft_gr.group0().stepdown(timeout_dur);

				            });

				            co_return json_void{};

				        }

				        raft::group_id gid{utils::UUID{req->get_path_param("group_id")}};

				        std::atomic<bool> found_srv{false};

				        co_await raft_gr.invoke_on_all([gid, timeout_dur, &found_srv] (service::raft_group_registry& raft_gr) -> future<> {

				            auto* srv = raft_gr.find_server(gid);

				@@ -44,14 +150,8 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis

				            }

				            found_srv = true;

				            abort_on_expiry aoe(lowres_clock::now() + timeout_dur);

				            apilog.info("Triggering Raft group {} snapshot", gid);

				            auto result = co_await srv->trigger_snapshot(&aoe.abort_source());

				            if (result) {

				                apilog.info("New snapshot for Raft group {} created", gid);

				            } else {

				                apilog.info("Could not create new snapshot for Raft group {}, no new entries applied", gid);

				            }

				            apilog.info("Triggering stepdown for group {}", gid);

				            co_await srv->stepdown(timeout_dur);

				        });

				        if (!found_srv) {

				@@ -60,25 +160,13 @@ void set_raft(http_context&, httpd::routes& r, sharded<service::raft_group_regis

				        co_return json_void{};

				    });

				    r::get_leader_host.set(r, [&raft_gr] (std::unique_ptr<http::request> req) -> future<json_return_type> {

				        return smp::submit_to(0, [&] {

				            auto& srv = std::invoke([&] () -> raft::server& {

				                if (req->query_parameters.contains("group_id")) {

				                    raft::group_id id{utils::UUID{req->get_query_param("group_id")}};

				                    return raft_gr.local().get_server(id);

				                } else {

				                    return raft_gr.local().group0();

				                }

				            });

				            return json_return_type(srv.current_leader().to_sstring());

				        });

				    });

				}

				void unset_raft(http_context&, httpd::routes& r) {

				    r::trigger_snapshot.unset(r);

				    r::get_leader_host.unset(r);

				    r::read_barrier.unset(r);

				    r::trigger_stepdown.unset(r);

				}

				}

									
										2

api/scrub_status.hh
									
												View File
												
				@@ -6,6 +6,8 @@

				 * SPDX-License-Identifier: AGPL-3.0-or-later

				 */

				#pragma once

				namespace api {

				enum class scrub_status {

									
										92

api/storage_proxy.cc
									
												View File
												
				@@ -13,7 +13,6 @@

				#include "api/api-doc/utils.json.hh"

				#include "db/config.hh"

				#include "utils/histogram.hh"

				#include "replica/database.hh"

				#include <seastar/core/scheduling_specific.hh>

				namespace api {

				@@ -259,83 +258,6 @@ void set_storage_proxy(http_context& ctx, routes& r, sharded<service::storage_pr

				        return make_ready_future<json::json_return_type>(0);

				    });

				    sp::get_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().request_timeout_in_ms()/1000.0;

				    });

				    sp::set_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_read_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().read_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_read_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_write_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_counter_write_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().counter_write_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_counter_write_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_cas_contention_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().cas_contention_timeout_in_ms()/1000.0;

				    });

				    sp::set_cas_contention_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_range_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().range_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_range_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::get_truncate_rpc_timeout.set(r, [&ctx](const_req req)  {

				        return ctx.db.local().get_config().truncate_request_timeout_in_ms()/1000.0;

				    });

				    sp::set_truncate_rpc_timeout.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				        auto enable = req->get_query_param("timeout");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    sp::reload_trigger_classes.set(r, [](std::unique_ptr<http::request> req)  {

				        //TBD

				        unimplemented();

				@@ -516,20 +438,6 @@ void unset_storage_proxy(http_context& ctx, routes& r) {

				    sp::get_max_hints_in_progress.unset(r);

				    sp::set_max_hints_in_progress.unset(r);

				    sp::get_hints_in_progress.unset(r);

				    sp::get_rpc_timeout.unset(r);

				    sp::set_rpc_timeout.unset(r);

				    sp::get_read_rpc_timeout.unset(r);

				    sp::set_read_rpc_timeout.unset(r);

				    sp::get_write_rpc_timeout.unset(r);

				    sp::set_write_rpc_timeout.unset(r);

				    sp::get_counter_write_rpc_timeout.unset(r);

				    sp::set_counter_write_rpc_timeout.unset(r);

				    sp::get_cas_contention_timeout.unset(r);

				    sp::set_cas_contention_timeout.unset(r);

				    sp::get_range_rpc_timeout.unset(r);

				    sp::set_range_rpc_timeout.unset(r);

				    sp::get_truncate_rpc_timeout.unset(r);

				    sp::set_truncate_rpc_timeout.unset(r);

				    sp::reload_trigger_classes.unset(r);

				    sp::get_read_repair_attempted.unset(r);

				    sp::get_read_repair_repaired_blocking.unset(r);

									
										247

api/storage_service.cc
									
												View File
												
				@@ -30,16 +30,16 @@

				#include "service/raft/raft_group0_client.hh"

				#include "service/storage_service.hh"

				#include "service/load_meter.hh"

				#include "db/commitlog/commitlog.hh"

				#include "gms/gossiper.hh"

				#include "db/system_keyspace.hh"

				#include <seastar/http/exception.hh>

				#include <seastar/core/coroutine.hh>

				#include <seastar/coroutine/parallel_for_each.hh>

				#include <seastar/coroutine/exception.hh>

				#include "repair/row_level.hh"

				#include "locator/snitch_base.hh"

				#include "column_family.hh"

				#include "log.hh"

				#include "utils/log.hh"

				#include "release.hh"

				#include "compaction/compaction_manager.hh"

				#include "compaction/task_manager_module.hh"

				@@ -48,12 +48,13 @@

				#include "db/extensions.hh"

				#include "db/snapshot-ctl.hh"

				#include "transport/controller.hh"

				#include "thrift/controller.hh"

				#include "locator/token_metadata.hh"

				#include "cdc/generation_service.hh"

				#include "locator/abstract_replication_strategy.hh"

				#include "sstables_loader.hh"

				#include "db/view/view_builder.hh"

				#include "utils/rjson.hh"

				#include "utils/user_provided_param.hh"

				using namespace seastar::httpd;

				using namespace std::chrono_literals;

				@@ -306,21 +307,17 @@ future<scrub_info> parse_scrub_options(const http_context& ctx, sharded<db::snap

				}

				void set_transport_controller(http_context& ctx, routes& r, cql_transport::controller& ctl) {

				    ss::start_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				    ss::start_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.start_server();

				            });

				            return ctl.start_server();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::stop_native_transport.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				    ss::stop_native_transport.set(r, [&ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.request_stop_server();

				            });

				            return ctl.request_stop_server();

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				@@ -341,40 +338,17 @@ void unset_transport_controller(http_context& ctx, routes& r) {

				    ss::is_native_transport_running.unset(r);

				}

				void set_rpc_controller(http_context& ctx, routes& r, thrift_controller& ctl) {

				    ss::stop_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.request_stop_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::start_rpc_server.set(r, [&ctx, &ctl](std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return with_scheduling_group(ctx.db.local().get_statement_scheduling_group(), [&ctl] {

				                return ctl.start_server();

				            });

				        }).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::is_rpc_server_running.set(r, [&ctl] (std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [&] {

				            return !ctl.listen_addresses().empty();

				        }).then([] (bool running) {

				            return make_ready_future<json::json_return_type>(running);

				// NOTE: preserved only for backward compatibility

				void set_thrift_controller(http_context& ctx, routes& r) {

				    ss::is_thrift_server_running.set(r, [] (std::unique_ptr<http::request> req) {

				        return smp::submit_to(0, [] {

				            return make_ready_future<json::json_return_type>(false);

				        });

				    });

				}

				void unset_rpc_controller(http_context& ctx, routes& r) {

				    ss::stop_rpc_server.unset(r);

				    ss::start_rpc_server.unset(r);

				    ss::is_rpc_server_running.unset(r);

				void unset_thrift_controller(http_context& ctx, routes& r) {

				    ss::is_thrift_server_running.unset(r);

				}

				void set_repair(http_context& ctx, routes& r, sharded<repair_service>& repair) {

				@@ -516,10 +490,32 @@ void set_sstables_loader(http_context& ctx, routes& r, sharded<sstables_loader>&

				            return make_ready_future<json::json_return_type>(json_void());

				        });

				    });

				    ss::start_restore.set(r, [&sst_loader] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto endpoint = req->get_query_param("endpoint");

				        auto keyspace = req->get_query_param("keyspace");

				        auto table = req->get_query_param("table");

				        auto bucket = req->get_query_param("bucket");

				        auto prefix = req->get_query_param("prefix");

				        // TODO: the http_server backing the API does not use content streaming

				        // should use it for better performance

				        rjson::value parsed = rjson::parse(req->content);

				        if (!parsed.IsArray()) {

				            throw httpd::bad_param_exception("malformatted sstables in body");

				        }

				        auto sstables = parsed.GetArray() |

				            std::views::transform([] (const auto& s) { return sstring(rjson::to_string_view(s)); }) |

				            std::ranges::to<std::vector>();

				        auto task_id = co_await sst_loader.local().download_new_sstables(keyspace, table, prefix, std::move(sstables), endpoint, bucket);

				        co_return json::json_return_type(fmt::to_string(task_id));

				    });

				}

				void unset_sstables_loader(http_context& ctx, routes& r) {

				    ss::load_new_ss_tables.unset(r);

				    ss::start_restore.unset(r);

				}

				void set_view_builder(http_context& ctx, routes& r, sharded<db::view::view_builder>& vb) {

				@@ -547,10 +543,6 @@ static future<json::json_return_type> describe_ring_as_json_for_table(const shar

				}

				void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_service>& ss, service::raft_group0_client& group0_client) {

				    ss::get_commitlog.set(r, [&ctx](const_req req) {

				        return ctx.db.local().commitlog()->active_config().commit_log_location;

				    });

				    ss::get_token_endpoint.set(r, [&ctx, &ss] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        const auto keyspace_name = req->get_query_param("keyspace");

				        const auto table_name = req->get_query_param("cf");

				@@ -637,14 +629,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        return ss.local().get_schema_version();

				    });

				    ss::get_all_data_file_locations.set(r, [&ctx](const_req req) {

				        return container_to_vec(ctx.db.local().get_config().data_file_directories());

				    });

				    ss::get_saved_caches_location.set(r, [&ctx](const_req req) {

				        return ctx.db.local().get_config().saved_caches_directory();

				    });

				    ss::get_range_to_endpoint_map.set(r, [&ctx, &ss](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto keyspace = validate_keyspace(ctx, req);

				        auto table = req->get_query_param("cf");

				@@ -708,19 +692,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        return get_cf_stats(ctx, &replica::column_family_stats::live_disk_space_used);

				    });

				    ss::get_load_map.set(r, [&ctx] (std::unique_ptr<http::request> req) {

				        return ctx.lmeter.get_load_map().then([] (auto&& load_map) {

				            std::vector<ss::map_string_double> res;

				            for (auto i : load_map) {

				                ss::map_string_double val;

				                val.key = i.first;

				                val.value = i.second;

				                res.push_back(val);

				            }

				            return make_ready_future<json::json_return_type>(res);

				        });

				    });

				    ss::get_current_generation_number.set(r, [&ss](std::unique_ptr<http::request> req) {

				        auto ep = ss.local().get_token_metadata().get_topology().my_address();

				        return ss.local().gossiper().get_current_generation_number(ep).then([](gms::generation_type res) {

				@@ -746,17 +717,19 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        auto& db = ctx.db;

				        auto params = req_params({

				            std::pair("flush_memtables", mandatory::no),

				            std::pair("consider_only_existing_data", mandatory::no),

				        });

				        params.process(*req);

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.info("force_compaction: flush={}", flush);

				        auto consider_only_existing_data = params.get_as<bool>("consider_only_existing_data").value_or(false);

				        apilog.info("force_compaction: flush={} consider_only_existing_data={}", flush, consider_only_existing_data);

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				        if (!flush && !consider_only_existing_data) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<global_major_compaction_task_impl>({}, db, fmopt);

				        auto task = co_await compaction_module.make_and_start_task<global_major_compaction_task_impl>({}, db, fmopt, consider_only_existing_data);

				        try {

				            co_await task->done();

				        } catch (...) {

				@@ -773,19 +746,21 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				            std::pair("keyspace", mandatory::yes),

				            std::pair("cf", mandatory::no),

				            std::pair("flush_memtables", mandatory::no),

				            std::pair("consider_only_existing_data", mandatory::no),

				        });

				        params.process(*req);

				        auto keyspace = validate_keyspace(ctx, *params.get("keyspace"));

				        auto table_infos = parse_table_infos(keyspace, ctx, params.get("cf").value_or(""));

				        auto flush = params.get_as<bool>("flush_memtables").value_or(true);

				        apilog.debug("force_keyspace_compaction: keyspace={} tables={}, flush={}", keyspace, table_infos, flush);

				        auto consider_only_existing_data = params.get_as<bool>("consider_only_existing_data").value_or(false);

				        apilog.info("force_keyspace_compaction: keyspace={} tables={}, flush={} consider_only_existing_data={}", keyspace, table_infos, flush, consider_only_existing_data);

				        auto& compaction_module = db.local().get_compaction_manager().get_task_manager_module();

				        std::optional<flush_mode> fmopt;

				        if (!flush) {

				        if (!flush && !consider_only_existing_data) {

				            fmopt = flush_mode::skip;

				        }

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt);

				        auto task = co_await compaction_module.make_and_start_task<major_keyspace_compaction_task_impl>({}, std::move(keyspace), tasks::task_id::create_null_id(), db, table_infos, fmopt, consider_only_existing_data);

				        try {

				            co_await task->done();

				        } catch (...) {

				@@ -924,7 +899,8 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        auto host_id = validate_host_id(req->get_query_param("host_id"));

				        std::vector<sstring> ignore_nodes_strs = utils::split_comma_separated_list(req->get_query_param("ignore_nodes"));

				        apilog.info("remove_node: host_id={} ignore_nodes={}", host_id, ignore_nodes_strs);

				        auto ignore_nodes = std::list<locator::host_id_or_endpoint>();

				        locator::host_id_or_endpoint_list ignore_nodes;

				        ignore_nodes.reserve(ignore_nodes_strs.size());

				        for (const sstring& n : ignore_nodes_strs) {

				            try {

				                auto hoep = locator::host_id_or_endpoint(n);

				@@ -933,7 +909,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				                }

				                ignore_nodes.push_back(std::move(hoep));

				            } catch (...) {

				                throw std::runtime_error(format("Failed to parse ignore_nodes parameter: ignore_nodes={}, node={}: {}", ignore_nodes_strs, n, std::current_exception()));

				                throw std::runtime_error(fmt::format("Failed to parse ignore_nodes parameter: ignore_nodes={}, node={}: {}", ignore_nodes_strs, n, std::current_exception()));

				            }

				        }

				        return ss.local().removenode(host_id, std::move(ignore_nodes)).then([] {

				@@ -1087,18 +1063,6 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				        return make_ready_future<json::json_return_type>(0);

				    });

				    ss::get_compaction_throughput_mb_per_sec.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        int value = ctx.db.local().get_config().compaction_throughput_mb_per_sec();

				        return make_ready_future<json::json_return_type>(value);

				    });

				    ss::set_compaction_throughput_mb_per_sec.set(r, [](std::unique_ptr<http::request> req) {

				        //TBD

				        unimplemented();

				        auto value = req->get_query_param("value");

				        return make_ready_future<json::json_return_type>(json_void());

				    });

				    ss::is_incremental_backups_enabled.set(r, [&ctx](std::unique_ptr<http::request> req) {

				        // If this is issued in parallel with an ongoing change, we may see values not agreeing.

				        // Reissuing is asking for trouble, so we will just return true upon seeing any true value.

				@@ -1136,7 +1100,16 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    });

				    ss::rebuild.set(r, [&ss](std::unique_ptr<http::request> req) {

				        auto source_dc = req->get_query_param("source_dc");

				        utils::optional_param source_dc;

				        if (auto source_dc_str = req->get_query_param("source_dc"); !source_dc_str.empty()) {

				            source_dc.emplace(std::move(source_dc_str)).set_user_provided();

				        }

				        if (auto force_str = req->get_query_param("force"); !force_str.empty() && service::loosen_constraints(validate_bool(force_str))) {

				            if (!source_dc) {

				                throw bad_param_exception("The `source_dc` option must be provided for using the `force` option");

				            }

				            source_dc.set_force();

				        }

				        apilog.info("rebuild: source_dc={}", source_dc);

				        return ss.local().rebuild(std::move(source_dc)).then([] {

				            return make_ready_future<json::json_return_type>(json_void());

				@@ -1479,12 +1452,7 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				    ss::reload_raft_topology_state.set(r,

				            [&ss, &group0_client] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        co_await ss.invoke_on(0, [&group0_client] (service::storage_service& ss) -> future<> {

				            apilog.info("Waiting for group 0 read/apply mutex before reloading Raft topology state...");

				            auto holder = co_await group0_client.hold_read_apply_mutex();

				            apilog.info("Reloading Raft topology state");

				            // Using topology_transition() instead of topology_state_load(), because the former notifies listeners

				            co_await ss.topology_transition();

				            apilog.info("Reloaded Raft topology state");

				            return ss.reload_raft_topology_state(group0_client);

				        });

				        co_return json_void();

				    });

				@@ -1593,19 +1561,15 @@ void set_storage_service(http_context& ctx, routes& r, sharded<service::storage_

				}

				void unset_storage_service(http_context& ctx, routes& r) {

				    ss::get_commitlog.unset(r);

				    ss::get_token_endpoint.unset(r);

				    ss::toppartitions_generic.unset(r);

				    ss::get_release_version.unset(r);

				    ss::get_scylla_release_version.unset(r);

				    ss::get_schema_version.unset(r);

				    ss::get_all_data_file_locations.unset(r);

				    ss::get_saved_caches_location.unset(r);

				    ss::get_range_to_endpoint_map.unset(r);

				    ss::get_pending_range_to_endpoint_map.unset(r);

				    ss::describe_ring.unset(r);

				    ss::get_load.unset(r);

				    ss::get_load_map.unset(r);

				    ss::get_current_generation_number.unset(r);

				    ss::get_natural_endpoints.unset(r);

				    ss::cdc_streams_check_and_repair.unset(r);

				@@ -1639,8 +1603,6 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    ss::is_joined.unset(r);

				    ss::set_stream_throughput_mb_per_sec.unset(r);

				    ss::get_stream_throughput_mb_per_sec.unset(r);

				    ss::get_compaction_throughput_mb_per_sec.unset(r);

				    ss::set_compaction_throughput_mb_per_sec.unset(r);

				    ss::is_incremental_backups_enabled.unset(r);

				    ss::set_incremental_backups_enabled.unset(r);

				    ss::rebuild.unset(r);

				@@ -1681,36 +1643,63 @@ void unset_storage_service(http_context& ctx, routes& r) {

				    sp::get_schema_versions.unset(r);

				}

				void set_load_meter(http_context& ctx, routes& r, service::load_meter& lm) {

				    ss::get_load_map.set(r, [&lm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto load_map = co_await lm.get_load_map();

				        std::vector<ss::map_string_double> res;

				        for (auto i : load_map) {

				            ss::map_string_double val;

				            val.key = i.first;

				            val.value = i.second;

				            res.push_back(val);

				        }

				        co_return res;

				    });

				}

				void unset_load_meter(http_context& ctx, routes& r) {

				    ss::get_load_map.unset(r);

				}

				void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_ctl) {

				    ss::get_snapshot_details.set(r, [&snap_ctl](std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto result = co_await snap_ctl.local().get_snapshot_details();

				        co_return std::function([res = std::move(result)] (output_stream<char>&& o) -> future<> {

				            auto result = std::move(res);

				            std::exception_ptr ex;

				            output_stream<char> out = std::move(o);

				            bool first = true;

				            try {

				                auto result = std::move(res);

				                bool first = true;

				            co_await out.write("[");

				            for (auto& [name, details] : result) {

				                if (!first) {

				                    co_await out.write(", ");

				                co_await out.write("[");

				                for (auto& [name, details] : result) {

				                    if (!first) {

				                        co_await out.write(", ");

				                    }

				                    std::vector<ss::snapshot> snapshot;

				                    for (auto& cf : details) {

				                        ss::snapshot snp;

				                        snp.ks = cf.ks;

				                        snp.cf = cf.cf;

				                        snp.live = cf.details.live;

				                        snp.total = cf.details.total;

				                        snapshot.push_back(std::move(snp));

				                    }

				                    ss::snapshots all_snapshots;

				                    all_snapshots.key = name;

				                    all_snapshots.value = std::move(snapshot);

				                    co_await all_snapshots.write(out);

				                    first = false;

				                }

				                std::vector<ss::snapshot> snapshot;

				                for (auto& cf : details) {

				                    ss::snapshot snp;

				                    snp.ks = cf.ks;

				                    snp.cf = cf.cf;

				                    snp.live = cf.details.live;

				                    snp.total = cf.details.total;

				                    snapshot.push_back(std::move(snp));

				                }

				                ss::snapshots all_snapshots;

				                all_snapshots.key = name;

				                all_snapshots.value = std::move(snapshot);

				                co_await all_snapshots.write(out);

				                first = false;

				                co_await out.write("]");

				                co_await out.flush();

				            } catch (...) {

				              ex = std::current_exception();

				            }

				            co_await out.write("]");

				            co_await out.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        });

				    });

				@@ -1790,6 +1779,23 @@ void set_snapshot(http_context& ctx, routes& r, sharded<db::snapshot_ctl>& snap_

				        co_return json::json_return_type(static_cast<int>(scrub_status::successful));

				    });

				    ss::start_backup.set(r, [&snap_ctl] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto endpoint = req->get_query_param("endpoint");

				        auto keyspace = req->get_query_param("keyspace");

				        auto table = req->get_query_param("table");

				        auto bucket = req->get_query_param("bucket");

				        auto prefix = req->get_query_param("prefix");

				        auto snapshot_name = req->get_query_param("snapshot");

				        if (snapshot_name.empty()) {

				            // TODO: If missing, snapshot should be taken by scylla, then removed

				            throw httpd::bad_param_exception("The snapshot name must be specified");

				        }

				        auto& ctl = snap_ctl.local();

				        auto task_id = co_await ctl.start_backup(std::move(endpoint), std::move(bucket), std::move(prefix), std::move(keyspace), std::move(table), std::move(snapshot_name));

				        co_return json::json_return_type(fmt::to_string(task_id));

				    });

				    cf::get_true_snapshots_size.set(r, [&snap_ctl] (std::unique_ptr<http::request> req) {

				        auto [ks, cf] = parse_fully_qualified_cf_name(req->get_path_param("name"));

				        return snap_ctl.local().true_snapshots_size(std::move(ks), std::move(cf)).then([] (int64_t res) {

				@@ -1811,6 +1817,7 @@ void unset_snapshot(http_context& ctx, routes& r) {

				    ss::del_snapshot.unset(r);

				    ss::true_snapshots_size.unset(r);

				    ss::scrub.unset(r);

				    ss::start_backup.unset(r);

				    cf::get_true_snapshots_size.unset(r);

				    cf::get_all_true_snapshots_size.unset(r);

				}

									
										8

api/storage_service.hh
									
												View File
												
				@@ -12,9 +12,9 @@

				#include <seastar/json/json_elements.hh>

				#include "api/api_init.hh"

				#include "db/data_listeners.hh"

				#include "compaction/compaction_descriptor.hh"

				namespace cql_transport { class controller; }

				class thrift_controller;

				namespace db {

				class snapshot_ctl;

				namespace view {

				@@ -80,10 +80,12 @@ void set_repair(http_context& ctx, httpd::routes& r, sharded<repair_service>& re

				void unset_repair(http_context& ctx, httpd::routes& r);

				void set_transport_controller(http_context& ctx, httpd::routes& r, cql_transport::controller& ctl);

				void unset_transport_controller(http_context& ctx, httpd::routes& r);

				void set_rpc_controller(http_context& ctx, httpd::routes& r, thrift_controller& ctl);

				void unset_rpc_controller(http_context& ctx, httpd::routes& r);

				void set_thrift_controller(http_context& ctx, httpd::routes& r);

				void unset_thrift_controller(http_context& ctx, httpd::routes& r);

				void set_snapshot(http_context& ctx, httpd::routes& r, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_snapshot(http_context& ctx, httpd::routes& r);

				void set_load_meter(http_context& ctx, httpd::routes& r, service::load_meter& lm);

				void unset_load_meter(http_context& ctx, httpd::routes& r);

				seastar::future<json::json_return_type> run_toppartitions_query(db::toppartitions_query& q, http_context &ctx, bool legacy_request = false);

				} // namespace api

									
										15

api/system.cc
									
												View File
												
				@@ -10,6 +10,7 @@

				#include "api/api-doc/system.json.hh"

				#include "api/api-doc/metrics.json.hh"

				#include "replica/database.hh"

				#include "db/sstables-format-selector.hh"

				#include <rapidjson/document.h>

				#include <seastar/core/reactor.hh>

				@@ -19,7 +20,7 @@

				#include <seastar/util/short_streams.hh>

				#include <seastar/http/short_streams.hh>

				#include "log.hh"

				#include "utils/log.hh"

				extern logging::logger apilog;

				@@ -184,4 +185,16 @@ void set_system(http_context& ctx, routes& r) {

				    }) ;

				}

				void set_format_selector(http_context& ctx, routes& r, db::sstables_format_selector& sel) {

				    hs::get_highest_supported_sstable_version.set(r, [&sel] (std::unique_ptr<request> req) {

				        return smp::submit_to(0, [&sel] {

				            return make_ready_future<json::json_return_type>(seastar::to_sstring(sel.selected_format()));

				        });

				    });

				}

				void unset_format_selector(http_context& ctx, routes& r) {

				    hs::get_highest_supported_sstable_version.unset(r);

				}

				}

									
										12

api/system.hh
									
												View File
												
				@@ -8,10 +8,18 @@

				#pragma once

				#include "api.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace db { class sstables_format_selector; }

				namespace api {

				void set_system(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_system(http_context& ctx, seastar::httpd::routes& r);

				void set_format_selector(http_context& ctx, seastar::httpd::routes& r, db::sstables_format_selector& sel);

				void unset_format_selector(http_context& ctx, seastar::httpd::routes& r);

				}

									
										265

api/task_manager.cc
									
												View File
												
				@@ -7,15 +7,17 @@

				 */

				#include <seastar/core/coroutine.hh>

				#include <seastar/coroutine/exception.hh>

				#include <seastar/http/exception.hh>

				#include "task_manager.hh"

				#include "api/api.hh"

				#include "api/api-doc/task_manager.json.hh"

				#include "db/system_keyspace.hh"

				#include "tasks/task_handler.hh"

				#include "utils/overloaded_functor.hh"

				#include <utility>

				#include <boost/range/adaptors.hpp>

				namespace api {

				@@ -23,233 +25,182 @@ namespace tm = httpd::task_manager_json;

				using namespace json;

				using namespace seastar::httpd;

				inline bool filter_tasks(tasks::task_manager::task_ptr task, std::unordered_map<sstring, sstring>& query_params) {

				    return (!query_params.contains("keyspace") || query_params["keyspace"] == task->get_status().keyspace) &&

				        (!query_params.contains("table") || query_params["table"] == task->get_status().table);

				}

				struct full_task_status {

				    tasks::task_manager::task::status task_status;

				    std::string type;

				    tasks::task_manager::task::progress progress;

				    std::string module;

				    tasks::task_id parent_id;

				    tasks::is_abortable abortable;

				    std::vector<std::string> children_ids;

				};

				struct task_stats {

				    task_stats(tasks::task_manager::task_ptr task)

				        : task_id(task->id().to_sstring())

				        , state(task->get_status().state)

				        , type(task->type())

				        , scope(task->get_status().scope)

				        , keyspace(task->get_status().keyspace)

				        , table(task->get_status().table)

				        , entity(task->get_status().entity)

				        , sequence_number(task->get_status().sequence_number)

				    { }

				    sstring task_id;

				    tasks::task_manager::task_state state;

				    std::string type;

				    std::string scope;

				    std::string keyspace;

				    std::string table;

				    std::string entity;

				    uint64_t sequence_number;

				};

				tm::task_status make_status(full_task_status status) {

				    auto start_time = db_clock::to_time_t(status.task_status.start_time);

				    auto end_time = db_clock::to_time_t(status.task_status.end_time);

				tm::task_status make_status(tasks::task_status status) {

				    auto start_time = db_clock::to_time_t(status.start_time);

				    auto end_time = db_clock::to_time_t(status.end_time);

				    ::tm st, et;

				    ::gmtime_r(&end_time, &et);

				    ::gmtime_r(&start_time, &st);

				    std::vector<tm::task_identity> tis{status.children.size()};

				    std::ranges::transform(status.children, tis.begin(), [] (const auto& child) {

				        tm::task_identity ident;

				        ident.task_id = child.task_id.to_sstring();

				        ident.node = fmt::format("{}", child.node);

				        return ident;

				    });

				    tm::task_status res{};

				    res.id = status.task_status.id.to_sstring();

				    res.id = status.task_id.to_sstring();

				    res.type = status.type;

				    res.scope = status.task_status.scope;

				    res.state = status.task_status.state;

				    res.is_abortable = bool(status.abortable);

				    res.kind = status.kind;

				    res.scope = status.scope;

				    res.state = status.state;

				    res.is_abortable = bool(status.is_abortable);

				    res.start_time = st;

				    res.end_time = et;

				    res.error = status.task_status.error;

				    res.parent_id = status.parent_id.to_sstring();

				    res.sequence_number = status.task_status.sequence_number;

				    res.shard = status.task_status.shard;

				    res.keyspace = status.task_status.keyspace;

				    res.table = status.task_status.table;

				    res.entity = status.task_status.entity;

				    res.progress_units = status.task_status.progress_units;

				    res.error = status.error;

				    res.parent_id = status.parent_id ? status.parent_id.to_sstring() : "none";

				    res.sequence_number = status.sequence_number;

				    res.shard = status.shard;

				    res.keyspace = status.keyspace;

				    res.table = status.table;

				    res.entity = status.entity;

				    res.progress_units = status.progress_units;

				    res.progress_total = status.progress.total;

				    res.progress_completed = status.progress.completed;

				    res.children_ids = std::move(status.children_ids);

				    res.children_ids = std::move(tis);

				    return res;

				}

				future<full_task_status> retrieve_status(const tasks::task_manager::foreign_task_ptr& task) {

				    if (task.get() == nullptr) {

				        co_return coroutine::return_exception(httpd::bad_param_exception("Task not found"));

				    }

				    auto progress = co_await task->get_progress();

				    full_task_status s;

				    s.task_status = task->get_status();

				    s.type = task->type();

				    s.parent_id = task->get_parent_id();

				    s.abortable = task->is_abortable();

				    s.module = task->get_module_name();

				    s.progress.completed = progress.completed;

				    s.progress.total = progress.total;

				    std::vector<std::string> ct{task->get_children().size()};

				    boost::transform(task->get_children(), ct.begin(), [] (const auto& child) {

				        return child->id().to_sstring();

				    });

				    s.children_ids = std::move(ct);

				    co_return s;

				tm::task_stats make_stats(tasks::task_stats stats) {

				    tm::task_stats res{};

				    res.task_id = stats.task_id.to_sstring();

				    res.type = stats.type;

				    res.kind = stats.kind;

				    res.scope = stats.scope;

				    res.state = stats.state;

				    res.sequence_number = stats.sequence_number;

				    res.keyspace = stats.keyspace;

				    res.table = stats.table;

				    res.entity = stats.entity;

				    return res;

				}

				void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>& tm, db::config& cfg) {

				    tm::get_modules.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        std::vector<std::string> v = boost::copy_range<std::vector<std::string>>(tm.local().get_modules() | boost::adaptors::map_keys);

				        std::vector<std::string> v = tm.local().get_modules() | std::views::keys | std::ranges::to<std::vector>();

				        co_return v;

				    });

				    tm::get_tasks.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        using chunked_stats = utils::chunked_vector<task_stats>;

				        using chunked_stats = utils::chunked_vector<tasks::task_stats>;

				        auto internal = tasks::is_internal{req_param<bool>(*req, "internal", false)};

				        std::vector<chunked_stats> res = co_await tm.map([&req, internal] (tasks::task_manager& tm) {

				            chunked_stats local_res;

				            tasks::task_manager::module_ptr module;

				            std::optional<std::string> keyspace = std::nullopt;

				            std::optional<std::string> table = std::nullopt;

				            try {

				                module = tm.find_module(req->get_path_param("module"));

				            } catch (...) {

				                throw bad_param_exception(fmt::format("{}", std::current_exception()));

				            }

				            const auto& filtered_tasks = module->get_tasks() | boost::adaptors::filtered([&params = req->query_parameters, internal] (const auto& task) {

				                return (internal || !task.second->is_internal()) && filter_tasks(task.second, params);

				            });

				            for (auto& [task_id, task] : filtered_tasks) {

				                local_res.push_back(task_stats{task});

				            if (auto it = req->query_parameters.find("keyspace"); it != req->query_parameters.end()) {

				                keyspace = it->second;

				            }

				            return local_res;

				            if (auto it = req->query_parameters.find("table"); it != req->query_parameters.end()) {

				                table = it->second;

				            }

				            return module->get_stats(internal, [keyspace = std::move(keyspace), table = std::move(table)] (std::string& ks, std::string& t) {

				                return (!keyspace || keyspace == ks) && (!table || table == t);

				            });

				        });

				        std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {

				            auto s = std::move(os);

				            auto res = std::move(r);

				            co_await s.write("[");

				            std::string delim = "";

				            for (auto& v: res) {

				                for (auto& stats: v) {

				                    co_await s.write(std::exchange(delim, ", "));

				                    tm::task_stats ts;

				                    ts = stats;

				                    co_await formatter::write(s, ts);

				            std::exception_ptr ex;

				            try {

				                auto res = std::move(r);

				                co_await s.write("[");

				                std::string delim = "";

				                for (auto& v: res) {

				                    for (auto& stats: v) {

				                        co_await s.write(std::exchange(delim, ", "));

				                        tm::task_stats ts = make_stats(stats);

				                        co_await formatter::write(s, ts);

				                    }

				                }

				                co_await s.write("]");

				                co_await s.flush();

				            } catch (...) {

				                ex = std::current_exception();

				            }

				            co_await s.write("]");

				            co_await s.close();

				            if (ex) {

				                co_await coroutine::return_exception_ptr(std::move(ex));

				            }

				        };

				        co_return std::move(f);

				    });

				    tm::get_task_status.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        tasks::task_manager::foreign_task_ptr task;

				        tasks::task_status status;

				        try {

				            task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {

				                if (task->is_complete()) {

				                    task->unregister_task();

				                }

				                co_return std::move(task);

				            }));

				            auto task = tasks::task_handler{tm.local(), id};

				            status = co_await task.get_status();

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

				        }

				        auto s = co_await retrieve_status(task);

				        co_return make_status(s);

				        co_return make_status(status);

				    });

				    tm::abort_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        try {

				            co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {

				                if (!task->is_abortable()) {

				                    co_await coroutine::return_exception(std::runtime_error("Requested task cannot be aborted"));

				                }

				                co_await task->abort();

				            });

				            auto task = tasks::task_handler{tm.local(), id};

				            co_await task.abort();

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

				        } catch (tasks::task_not_abortable& e) {

				            throw httpd::base_exception{e.what(), http::reply::status_type::forbidden};

				        }

				        co_return json_void();

				    });

				    tm::wait_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        tasks::task_manager::foreign_task_ptr task;

				        tasks::task_status status;

				        std::optional<std::chrono::seconds> timeout = std::nullopt;

				        if (auto it = req->query_parameters.find("timeout"); it != req->query_parameters.end()) {

				            timeout = std::chrono::seconds(boost::lexical_cast<uint32_t>(it->second));

				        }

				        try {

				            task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) {

				                return task->done().then_wrapped([task] (auto f) {

				                    task->unregister_task();

				                    // done() is called only because we want the task to be complete before getting its status.

				                    // The future should be ignored here as the result does not matter.

				                    f.ignore_ready_future();

				                    return make_foreign(task);

				                });

				            }));

				            auto task = tasks::task_handler{tm.local(), id};

				            status = co_await task.wait_for_task(timeout);

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

				        } catch (timed_out_error& e) {

				            throw httpd::base_exception{e.what(), http::reply::status_type::request_timeout};

				        }

				        auto s = co_await retrieve_status(task);

				        co_return make_status(s);

				        co_return make_status(status);

				    });

				    tm::get_task_status_recursively.set(r, [&_tm = tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto& tm = _tm;

				        auto id = tasks::task_id{utils::UUID{req->get_path_param("task_id")}};

				        std::queue<tasks::task_manager::foreign_task_ptr> q;

				        utils::chunked_vector<full_task_status> res;

				        tasks::task_manager::foreign_task_ptr task;

				        try {

				            // Get requested task.

				            task = co_await tasks::task_manager::invoke_on_task(tm, id, std::function([] (tasks::task_manager::task_ptr task) -> future<tasks::task_manager::foreign_task_ptr> {

				                if (task->is_complete()) {

				                    task->unregister_task();

				            auto task = tasks::task_handler{tm.local(), id};

				            auto res = co_await task.get_status_recursively(true);

				            std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {

				                auto s = std::move(os);

				                auto res = std::move(r);

				                co_await s.write("[");

				                std::string delim = "";

				                for (auto& status: res) {

				                    co_await s.write(std::exchange(delim, ", "));

				                    co_await formatter::write(s, make_status(status));

				                }

				                co_return task;

				            }));

				                co_await s.write("]");

				                co_await s.close();

				            };

				            co_return f;

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

				        }

				        // Push children's statuses in BFS order.

				        q.push(co_await task.copy());   // Task cannot be moved since we need it to be alive during whole loop execution.

				        while (!q.empty()) {

				            auto& current = q.front();

				            res.push_back(co_await retrieve_status(current));

				            for (auto& child: current->get_children()) {

				                q.push(co_await child.copy());

				            }

				            q.pop();

				        }

				        std::function<future<>(output_stream<char>&&)> f = [r = std::move(res)] (output_stream<char>&& os) -> future<> {

				            auto s = std::move(os);

				            auto res = std::move(r);

				            co_await s.write("[");

				            std::string delim = "";

				            for (auto& status: res) {

				                co_await s.write(std::exchange(delim, ", "));

				                co_await formatter::write(s, make_status(status));

				            }

				            co_await s.write("]");

				            co_await s.close();

				        };

				        co_return f;

				    });

				    tm::get_and_update_ttl.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				@@ -261,6 +212,11 @@ void set_task_manager(http_context& ctx, routes& r, sharded<tasks::task_manager>

				        }

				        co_return json::json_return_type(ttl);

				    });

				    tm::get_ttl.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        uint32_t ttl = cfg.task_ttl_seconds();

				        co_return json::json_return_type(ttl);

				    });

				}

				void unset_task_manager(http_context& ctx, routes& r) {

				@@ -271,6 +227,7 @@ void unset_task_manager(http_context& ctx, routes& r) {

				    tm::wait_task.unset(r);

				    tm::get_task_status_recursively.unset(r);

				    tm::get_and_update_ttl.unset(r);

				    tm::get_ttl.unset(r);

				}

				}

									
										40

api/task_manager_test.cc
									
												View File
												
				@@ -13,6 +13,7 @@

				#include "task_manager_test.hh"

				#include "api/api-doc/task_manager_test.json.hh"

				#include "tasks/test_module.hh"

				#include "utils/overloaded_functor.hh"

				namespace api {

				@@ -61,8 +62,8 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man

				        auto module = tms.local().find_module("test");

				        id = co_await module->make_task<tasks::test_task_impl>(shard, id, keyspace, table, entity, data);

				        co_await tms.invoke_on(shard, [id] (tasks::task_manager& tm) {

				            auto it = tm.get_all_tasks().find(id);

				            if (it != tm.get_all_tasks().end()) {

				            auto it = tm.get_local_tasks().find(id);

				            if (it != tm.get_local_tasks().end()) {

				                it->second->start();

				            }

				        });

				@@ -72,9 +73,16 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man

				    tmt::unregister_test_task.set(r, [&tm] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

				        auto id = tasks::task_id{utils::UUID{req->query_parameters["task_id"]}};

				        try {

				            co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_ptr task) -> future<> {

				                tasks::test_task test_task{task};

				                co_await test_task.unregister_task();

				            co_await tasks::task_manager::invoke_on_task(tm, id, [] (tasks::task_manager::task_variant task_v) -> future<> {

				                return std::visit(overloaded_functor{

				                    [] (tasks::task_manager::task_ptr task) -> future<> {

				                        tasks::test_task test_task{task};

				                        co_await test_task.unregister_task();

				                    },

				                    [] (tasks::task_manager::virtual_task_ptr task) {

				                        return make_ready_future();

				                    }

				                }, task_v);

				            });

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

				@@ -89,14 +97,20 @@ void set_task_manager_test(http_context& ctx, routes& r, sharded<tasks::task_man

				        std::string error = fail ? it->second : "";

				        try {

				            co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) {

				                tasks::test_task test_task{task};

				                if (fail) {

				                    test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));

				                } else {

				                    test_task.finish();

				                }

				                return make_ready_future<>();

				            co_await tasks::task_manager::invoke_on_task(tm, id, [fail, error = std::move(error)] (tasks::task_manager::task_variant task_v) -> future<> {

				                return std::visit(overloaded_functor{

				                    [fail, error = std::move(error)] (tasks::task_manager::task_ptr task) -> future<> {

				                        tasks::test_task test_task{task};

				                        if (fail) {

				                            co_await test_task.finish_failed(std::make_exception_ptr(std::runtime_error(error)));

				                        } else {

				                            co_await test_task.finish();

				                        }

				                    },

				                    [] (tasks::task_manager::virtual_task_ptr task) {

				                        return make_ready_future();

				                    }

				                }, task_v);

				            });

				        } catch (tasks::task_manager::task_not_found& e) {

				            throw bad_param_exception(e.what());

									
										11

api/task_manager_test.hh
									
												View File
												
				@@ -11,16 +11,19 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api.hh"

				namespace tasks {

				class task_manager;

				}

				namespace api {

				namespace seastar::httpd {

				class routes;

				}

				void set_task_manager_test(http_context& ctx, httpd::routes& r, sharded<tasks::task_manager>& tm);

				void unset_task_manager_test(http_context& ctx, httpd::routes& r);

				namespace api {

				struct http_context;

				void set_task_manager_test(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<tasks::task_manager>& tm);

				void unset_task_manager_test(http_context& ctx, seastar::httpd::routes& r);

				}

									
										1

api/tasks.cc
									
												View File
												
				@@ -16,6 +16,7 @@

				#include "compaction/task_manager_module.hh"

				#include "service/storage_service.hh"

				#include "tasks/task_manager.hh"

				#include "replica/database.hh"

				using namespace seastar::httpd;

									
										13

api/tasks.hh
									
												View File
												
				@@ -8,11 +8,20 @@

				#pragma once

				#include "api.hh"

				#include "db/config.hh"

				#include <seastar/core/sharded.hh>

				#include "db/snapshot-ctl.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace service {

				class storage_service;

				}

				namespace api {

				struct http_context;

				void set_tasks_compaction_module(http_context& ctx, httpd::routes& r, sharded<service::storage_service>& ss, sharded<db::snapshot_ctl>& snap_ctl);

				void unset_tasks_compaction_module(http_context& ctx, httpd::routes& r);

									
										5

api/token_metadata.cc
									
												View File
												
				@@ -21,6 +21,9 @@ using namespace json;

				void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_token_metadata>& tm) {

				    ss::local_hostid.set(r, [&tm](std::unique_ptr<http::request> req) {

				        auto id = tm.local().get()->get_my_id();

				        if (!bool(id)) {

				            throw not_found_exception("local host ID is not yet set");

				        }

				        return make_ready_future<json::json_return_type>(id.to_sstring());

				    });

				@@ -68,7 +71,7 @@ void set_token_metadata(http_context& ctx, routes& r, sharded<locator::shared_to

				    ss::get_host_id_map.set(r, [&tm](const_req req) {

				        std::vector<ss::mapper> res;

				        return map_to_key_value(tm.local().get()->get_endpoint_to_host_id_map_for_reading(), res);

				        return map_to_key_value(tm.local().get()->get_endpoint_to_host_id_map(), res);

				    });

				    static auto host_or_broadcast = [&tm](const_req req) {

									
										11

api/token_metadata.hh
									
												View File
												
				@@ -9,13 +9,16 @@

				#pragma once

				#include <seastar/core/sharded.hh>

				#include "api/api_init.hh"

				namespace seastar::httpd {

				class routes;

				}

				namespace locator { class shared_token_metadata; }

				namespace api {

				void set_token_metadata(http_context& ctx, httpd::routes& r, sharded<locator::shared_token_metadata>& tm);

				void unset_token_metadata(http_context& ctx, httpd::routes& r);

				struct http_context;

				void set_token_metadata(http_context& ctx, seastar::httpd::routes& r, seastar::sharded<locator::shared_token_metadata>& tm);

				void unset_token_metadata(http_context& ctx, seastar::httpd::routes& r);

				}

									
										6

auth/allow_all_authenticator.hh
									
												View File
												
				@@ -59,15 +59,15 @@ public:

				        return make_ready_future<authenticated_user>(anonymous_user());

				    }

				    virtual future<> create(std::string_view, const authentication_options& options) override {

				    virtual future<> create(std::string_view, const authentication_options& options, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual future<> alter(std::string_view, const authentication_options& options) override {

				    virtual future<> alter(std::string_view, const authentication_options& options, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual future<> drop(std::string_view) override {

				    virtual future<> drop(std::string_view, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

									
										15

auth/allow_all_authorizer.hh
									
												View File
												
				@@ -9,6 +9,7 @@

				#pragma once

				#include "auth/authorizer.hh"

				#include <seastar/core/future.hh>

				namespace cql3 {

				class query_processor;

				@@ -44,12 +45,12 @@ public:

				        return make_ready_future<permission_set>(permissions::ALL);

				    }

				    virtual future<> grant(std::string_view, permission_set, const resource&) override {

				    virtual future<> grant(std::string_view, permission_set, const resource&, ::service::group0_batch&) override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("GRANT operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke(std::string_view, permission_set, const resource&) override {

				    virtual future<> revoke(std::string_view, permission_set, const resource&, ::service::group0_batch&) override {

				        return make_exception_future<>(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    }

				@@ -60,14 +61,12 @@ public:

				                        "LIST PERMISSIONS operation is not supported by AllowAllAuthorizer"));

				    }

				    virtual future<> revoke_all(std::string_view) override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    virtual future<> revoke_all(std::string_view, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual future<> revoke_all(const resource&) override {

				        return make_exception_future(

				                unsupported_authorization_operation("REVOKE operation is not supported by AllowAllAuthorizer"));

				    virtual future<> revoke_all(const resource&, ::service::group0_batch&) override {

				        return make_ready_future();

				    }

				    virtual const resource_set& protected_resources() const override {

									
										57

auth/authentication_options.hh
									
												View File
												
				@@ -12,6 +12,7 @@

				#include <stdexcept>

				#include <unordered_map>

				#include <unordered_set>

				#include <variant>

				#include <seastar/core/print.hh>

				#include <seastar/core/sstring.hh>

				@@ -22,29 +23,10 @@ namespace auth {

				enum class authentication_option {

				    password,

				    hashed_password,

				    options

				};

				using authentication_option_set = std::unordered_set<authentication_option>;

				using custom_options = std::unordered_map<sstring, sstring>;

				struct authentication_options final {

				    std::optional<sstring> password;

				    std::optional<custom_options> options;

				};

				inline bool any_authentication_options(const authentication_options& aos) noexcept {

				    return aos.password || aos.options;

				}

				class unsupported_authentication_option : public std::invalid_argument {

				public:

				    explicit unsupported_authentication_option(authentication_option k)

				            : std::invalid_argument(format("The {} option is not supported.", k)) {

				    }

				};

				}

				template <>

				@@ -55,9 +37,44 @@ struct fmt::formatter<auth::authentication_option> : fmt::formatter<string_view>

				        switch (a) {

				        case password:

				            return formatter<string_view>::format("PASSWORD", ctx);

				        case hashed_password:

				            return formatter<string_view>::format("HASHED PASSWORD", ctx);

				        case options:

				            return formatter<string_view>::format("OPTIONS", ctx);

				        }

				        std::abort();

				    }

				};

				namespace auth {

				using authentication_option_set = std::unordered_set<authentication_option>;

				using custom_options = std::unordered_map<sstring, sstring>;

				struct password_option {

				    sstring password;

				};

				/// Used exclusively for restoring roles.

				struct hashed_password_option {

				    sstring hashed_password;

				};

				struct authentication_options final {

				    std::optional<std::variant<password_option, hashed_password_option>> credentials;

				    std::optional<custom_options> options;

				};

				inline bool any_authentication_options(const authentication_options& aos) noexcept {

				    return aos.options || aos.credentials;

				}

				class unsupported_authentication_option : public std::invalid_argument {

				public:

				    explicit unsupported_authentication_option(authentication_option k)

				            : std::invalid_argument(format("The {} option is not supported.", k)) {

				    }

				};

				}

									
										28

auth/authenticator.hh
									
												View File
												
				@@ -16,15 +16,15 @@

				#include <optional>

				#include <functional>

				#include <seastar/core/enum.hh>

				#include <seastar/core/future.hh>

				#include <seastar/core/sstring.hh>

				#include <seastar/core/shared_ptr.hh>

				#include "auth/authentication_options.hh"

				#include "auth/resource.hh"

				#include "auth/sasl_challenge.hh"

				#include "service/raft/raft_group0_client.hh"

				namespace db {

				    class config;

				}

				@@ -43,6 +43,11 @@ struct certificate_info {

				using session_dn_func = std::function<future<std::optional<certificate_info>>()>;

				class unsupported_authentication_operation : public std::invalid_argument {

				public:

				    using std::invalid_argument::invalid_argument;

				};

				///

				/// Abstract client for authenticating role identity.

				///

				@@ -106,7 +111,7 @@ public:

				    ///

				    /// The options provided must be a subset of `supported_options()`.

				    ///

				    virtual future<> create(std::string_view role_name, const authentication_options& options) = 0;

				    virtual future<> create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) = 0;

				    ///

				    /// Alter the authentication record of an existing user.

				@@ -115,12 +120,12 @@ public:

				    ///

				    /// Callers must ensure that the specification of `alterable_options()` is adhered to.

				    ///

				    virtual future<> alter(std::string_view role_name, const authentication_options& options) = 0;

				    virtual future<> alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) = 0;

				    ///

				    /// Delete the authentication record for a user. This will disallow the user from logging in.

				    ///

				    virtual future<> drop(std::string_view role_name) = 0;

				    virtual future<> drop(std::string_view role_name, ::service::group0_batch&) = 0;

				    ///

				    /// Query for custom options (those corresponding to \ref authentication_options::options).

				@@ -129,6 +134,19 @@ public:

				    ///

				    virtual future<custom_options> query_custom_options(std::string_view role_name) const = 0;

				    virtual bool uses_password_hashes() const {

				        return false;

				    }

				    ///

				    /// Query the password hash corresponding to a given role.

				    ///

				    /// If the authenticator doesn't use password hashes, throws an `unsupported_authentication_operation` exception.

				    ///

				    virtual future<std::optional<sstring>> get_password_hash(std::string_view role_name) const {

				        return make_exception_future<std::optional<sstring>>(unsupported_authentication_operation("get_password_hash is not implemented"));

				    }

				    ///

				    /// System resources used internally as part of the implementation. These are made inaccessible to users.

				    ///

									
										10

auth/authorizer.hh
									
												View File
												
				@@ -16,10 +16,10 @@

				#include <vector>

				#include <seastar/core/future.hh>

				#include <seastar/core/shared_ptr.hh>

				#include "auth/permission.hh"

				#include "auth/resource.hh"

				#include "service/raft/raft_group0_client.hh"

				#include "seastarx.hh"

				namespace auth {

				@@ -81,14 +81,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if granting permissions is not supported.

				    ///

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&) = 0;

				    virtual future<> grant(std::string_view role_name, permission_set, const resource&, ::service::group0_batch&) = 0;

				    ///

				    /// Revoke a set of permissions from a role for a particular \ref resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&) = 0;

				    virtual future<> revoke(std::string_view role_name, permission_set, const resource&, ::service::group0_batch&) = 0;

				    ///

				    /// Query for all directly granted permissions.

				@@ -102,14 +102,14 @@ public:

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(std::string_view role_name) = 0;

				    virtual future<> revoke_all(std::string_view role_name, ::service::group0_batch&) = 0;

				    ///

				    /// Revoke all permissions granted to any role for a particular resource.

				    ///

				    /// \throws \ref unsupported_authorization_operation if revoking permissions is not supported.

				    ///

				    virtual future<> revoke_all(const resource&) = 0;

				    virtual future<> revoke_all(const resource&, ::service::group0_batch&) = 0;

				    ///

				    /// System resources used internally as part of the implementation. These are made inaccessible to users.

									
										12

auth/certificate_authenticator.cc
									
												View File
												
				@@ -9,7 +9,7 @@

				#include "auth/certificate_authenticator.hh"

				#include <regex>

				#include <boost/regex.hpp>

				#include <fmt/ranges.h>

				#include "utils/class_registrator.hh"

				@@ -76,7 +76,7 @@ auth::certificate_authenticator::certificate_authenticator(cql3::query_processor

				                    continue;

				                } catch (std::out_of_range&) {

				                    // just fallthrough

				                } catch (std::regex_error&) {

				                } catch (boost::regex_error&) {

				                    std::throw_with_nested(std::invalid_argument(fmt::format("Invalid query expression: {}", map.at(cfg_query_attr))));

				                }

				            }

				@@ -149,7 +149,7 @@ future<std::optional<auth::authenticated_user>> auth::certificate_authenticator:

				            co_return username;

				        }

				    }

				    throw exceptions::authentication_exception(format("Subject '{}'/'{}' does not match any query expression", subject, altname));

				    throw exceptions::authentication_exception(seastar::format("Subject '{}'/'{}' does not match any query expression", subject, altname));

				}

				@@ -157,16 +157,16 @@ future<auth::authenticated_user> auth::certificate_authenticator::authenticate(c

				    throw exceptions::authentication_exception("Cannot authenticate using attribute map");

				}

				future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options) {

				future<> auth::certificate_authenticator::create(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) {

				    // TODO: should we keep track of roles/enforce existence? Role manager should deal with this...

				    co_return;

				}

				future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options) {

				future<> auth::certificate_authenticator::alter(std::string_view role_name, const authentication_options& options, ::service::group0_batch& mc) {

				    co_return;

				}

				future<> auth::certificate_authenticator::drop(std::string_view role_name) {

				future<> auth::certificate_authenticator::drop(std::string_view role_name, ::service::group0_batch&) {

				    co_return;

				}

Compare commits

2590 Commits next-6.0 ... dani-tweig

209 .clang-format Normal file Unescape Escape View File

1 .gitattributes vendored Unescape Escape View File

31 .github/CODEOWNERS vendored Unescape Escape View File

15 .github/ISSUE_TEMPLATE.md vendored Unescape Escape View File

86 .github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file Unescape Escape View File

84 .github/actions/setup-build/action.yaml vendored Unescape Escape View File

20 .github/clang-include-cleaner.json vendored Normal file Unescape Escape View File

2 .github/clang-tidy-matcher.json → .github/clang-matcher.json vendored Unescape Escape View File

9 .github/dependabot.yml vendored Normal file Unescape Escape View File

83 .github/mergify.yml vendored Unescape Escape View File

181 .github/scripts/auto-backport.py vendored Executable file Unescape Escape View File

85 .github/scripts/label_promoted_commits.py vendored Unescape Escape View File

55 .github/workflows/add-label-when-promoted.yaml vendored Unescape Escape View File

9 .github/workflows/backport-pr-fixes-validation.yaml vendored Unescape Escape View File

39 .github/workflows/build-scylla.yaml vendored Normal file Unescape Escape View File

66 .github/workflows/clang-nightly.yaml vendored Normal file Unescape Escape View File

37 .github/workflows/clang-tidy.yaml vendored Unescape Escape View File

2 .github/workflows/codespell.yaml vendored Unescape Escape View File

45 .github/workflows/conflict_reminder.yaml vendored Normal file Unescape Escape View File

3 .github/workflows/docs-pr.yaml vendored Unescape Escape View File

82 .github/workflows/iwyu.yaml vendored Normal file Unescape Escape View File

23 .github/workflows/read-toolchain.yaml vendored Normal file Unescape Escape View File

35 .github/workflows/reproducible-build.yaml vendored Normal file Unescape Escape View File

50 .github/workflows/seastar.yaml vendored Normal file Unescape Escape View File

6 .github/workflows/sync-labels.yaml vendored Unescape Escape View File

6 .gitignore vendored Unescape Escape View File

3 .gitmodules vendored Unescape Escape View File

111 CMakeLists.txt Unescape Escape View File

21 HACKING.md Unescape Escape View File

16 README.md Unescape Escape View File

4 SCYLLA-VERSION-GEN Unescape Escape View File

25 alternator/auth.cc Unescape Escape View File

14 alternator/conditions.cc Unescape Escape View File

20 alternator/controller.cc Unescape Escape View File

3 alternator/controller.hh Unescape Escape View File

504 alternator/executor.cc View File

16 alternator/executor.hh Unescape Escape View File

23 alternator/expressions.cc Unescape Escape View File

1 alternator/expressions_types.hh Unescape Escape View File

2 alternator/rmw_operation.hh Unescape Escape View File

37 alternator/serialization.cc Unescape Escape View File

77 alternator/server.cc Unescape Escape View File

9 alternator/server.hh Unescape Escape View File

6 alternator/stats.cc Unescape Escape View File

5 alternator/stats.hh Unescape Escape View File

47 alternator/streams.cc Unescape Escape View File

114 alternator/ttl.cc Unescape Escape View File

31 api/CMakeLists.txt Unescape Escape View File

4 api/api-doc/collectd.json Unescape Escape View File

8 api/api-doc/column_family.json Unescape Escape View File

26 api/api-doc/cql_server_test.json Normal file Unescape Escape View File

56 api/api-doc/error_injection.json Unescape Escape View File

64 api/api-doc/raft.json Unescape Escape View File

182 api/api-doc/storage_service.json Unescape Escape View File

15 api/api-doc/system.json Unescape Escape View File

55 api/api-doc/task_manager.json Unescape Escape View File

2 api/api-doc/utils.json Unescape Escape View File

100 api/api.cc Unescape Escape View File

2 api/api.hh Unescape Escape View File

29 api/api_init.hh Unescape Escape View File

15 api/authorization_cache.hh Unescape Escape View File

52 api/cache_service.cc Unescape Escape View File

8 api/cache_service.hh Unescape Escape View File

3 api/collectd.cc Unescape Escape View File

77 api/column_family.cc Unescape Escape View File

3 api/column_family.hh Unescape Escape View File

43 api/commitlog.cc Unescape Escape View File

7 api/commitlog.hh Unescape Escape View File

93 api/compaction_manager.cc Unescape Escape View File

6 api/compaction_manager.hh Unescape Escape View File

112 api/config.cc Unescape Escape View File

1 api/config.hh Unescape Escape View File

69 api/cql_server_test.cc Normal file Unescape Escape View File

29 api/cql_server_test.hh Normal file Unescape Escape View File

30 api/error_injection.cc Unescape Escape View File

11 api/failure_detector.cc Unescape Escape View File

1 api/failure_detector.hh Unescape Escape View File

10 api/gossiper.cc Unescape Escape View File

2590 Commits

next-6.0 ... dani-tweig

209

.clang-format Normal file

View File

1

.gitattributes vendored

View File

31

.github/CODEOWNERS vendored

View File

15

.github/ISSUE_TEMPLATE.md vendored

View File

86

.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file

View File

84

.github/actions/setup-build/action.yaml vendored

View File

20

.github/clang-include-cleaner.json vendored Normal file

View File

2

.github/clang-tidy-matcher.json → .github/clang-matcher.json vendored

View File

9

.github/dependabot.yml vendored Normal file

View File

83

.github/mergify.yml vendored

View File

181

.github/scripts/auto-backport.py vendored Executable file

View File

85

.github/scripts/label_promoted_commits.py vendored

View File

55

.github/workflows/add-label-when-promoted.yaml vendored

View File

9

.github/workflows/backport-pr-fixes-validation.yaml vendored

View File

39

.github/workflows/build-scylla.yaml vendored Normal file

View File

66

.github/workflows/clang-nightly.yaml vendored Normal file

View File

37

.github/workflows/clang-tidy.yaml vendored

View File

2

.github/workflows/codespell.yaml vendored

View File

45

.github/workflows/conflict_reminder.yaml vendored Normal file

View File

3

.github/workflows/docs-pr.yaml vendored

View File

82

.github/workflows/iwyu.yaml vendored Normal file

View File

23

.github/workflows/read-toolchain.yaml vendored Normal file

View File

35

.github/workflows/reproducible-build.yaml vendored Normal file

View File

50

.github/workflows/seastar.yaml vendored Normal file

View File

6

.github/workflows/sync-labels.yaml vendored

View File

6

.gitignore vendored

View File

3

.gitmodules vendored

View File

111

CMakeLists.txt

View File

21

HACKING.md

View File

16

README.md

View File

4

SCYLLA-VERSION-GEN

View File

25

alternator/auth.cc

View File

14

alternator/conditions.cc

View File

20

alternator/controller.cc

View File

3

alternator/controller.hh

View File

504

alternator/executor.cc

View File

16

alternator/executor.hh

View File

23

alternator/expressions.cc

View File

1

alternator/expressions_types.hh

View File

2

alternator/rmw_operation.hh

View File

37

alternator/serialization.cc

View File

77

alternator/server.cc

View File

9

alternator/server.hh

View File

6

alternator/stats.cc

View File

5

alternator/stats.hh

View File

47

alternator/streams.cc

View File

114

alternator/ttl.cc

View File

31

api/CMakeLists.txt

View File

4

api/api-doc/collectd.json

View File

8

api/api-doc/column_family.json

View File

26

api/api-doc/cql_server_test.json Normal file

View File

56

api/api-doc/error_injection.json

View File

64

api/api-doc/raft.json

View File

182

api/api-doc/storage_service.json

View File

15

api/api-doc/system.json

View File

55

api/api-doc/task_manager.json

View File

2

api/api-doc/utils.json

View File

100

api/api.cc

View File

2

api/api.hh

View File

29

api/api_init.hh

View File

15

api/authorization_cache.hh

View File

52

api/cache_service.cc

View File

8

api/cache_service.hh

View File

3

api/collectd.cc

View File

77

api/column_family.cc

View File

3

api/column_family.hh

View File

43

api/commitlog.cc

View File

7

api/commitlog.hh

View File

93

api/compaction_manager.cc

View File

6

api/compaction_manager.hh

View File

112

api/config.cc

View File

1

api/config.hh

View File

69

api/cql_server_test.cc Normal file

View File

29

api/cql_server_test.hh Normal file

View File

30

api/error_injection.cc

View File

11

api/failure_detector.cc

View File

1

api/failure_detector.hh

View File

10

api/gossiper.cc

View File

1

api/gossiper.hh

View File