scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 19:21:01 +00:00

Author	SHA1	Message	Date
Asias He	6f04de3efd	streaming: Fail stream plan on stream_mutation_fragments handler in case of error The following is observed in pytest: 1) node1, stream master, tried to pull data from node3 2) node3, stream follower, found node1 restarted 3) node3 killed the rpc stream 4) node1 did not get the stream session failure message from node3. This failure message was supposed to kill the stream plan on node1. That's the reason node1 failed the stream session much later at "2024-08-19 21:07:45,539". Note, node3 failed the stream on its side, so it should have sent the stream session failure message. ``` $ cat node1.log \|grep f890bea0-5e68-11ef-99ae-e5bca04385fc INFO 2024-08-19 20:24:01,162 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers={127.0.34.3}, master ERROR 2024-08-19 20:24:01,190 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=127.0.34.3: seastar::nested_exception: seastar::rpc::stream_closed (rpc stream was closed by peer) (while cleaning up after seastar::rpc::stream_closed (rpc stream was closed by peer)) WARN 2024-08-19 21:07:45,539 [shard 0:main] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.3}, tx=0 KiB, 0.00 KiB/s, rx=484 KiB, 0.18 KiB/s $ cat node3.log \|grep f890bea0-5e68-11ef-99ae-e5bca04385fc INFO 2024-08-19 20:24:01,163 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers=127.0.34.1, slave INFO 2024-08-19 20:24:01,164 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Start sending ks=ks, cf=cf, estimated_partitions=2560, with new rpc streaming WARN 2024-08-19 20:24:01,187 [shard 0: gms] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.1}, tx=633 KiB, 26506.81 KiB/s, rx=0 KiB, 0.00 KiB/s WARN 2024-08-19 20:24:01,188 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] stream_transfer_task: Fail to send to 127.0.34.1:0: seastar::rpc::stream_closed (rpc stream was closed by peer) WARN 2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to send: seastar::rpc::stream_closed (rpc stream was closed by peer) WARN 2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming error occurred, peer=127.0.34.1 ``` To be safe in case the stream fail message is not received, node1 could fail the stream plan as soon as the rpc stream is aborted in the stream_mutation_fragments handler. Fixes #20227 Closes scylladb/scylladb#21960	2025-02-10 16:32:12 +01:00
Botond Dénes	51a273401c	Merge 'test: tablets_test: Create proper schema in load balancer tests' from Tomasz Grabiec This PR converts boost load balancer tests in preparation for load balancer changes which add per-table tablet hints. After those changes, load balancer consults with the replication strategy in the database, so we need to create proper schema in the database. To do that, we need proper topology for replication strategies which use RF > 1, otherwise keyspace creation will fail. Topology is created in tests via group0 commands, which is abstracted by the new `topology_builder` class. Tests cannot modify token_metadata only in memory now as it needs to be consistent with the schema and on-disk metadata. That's why modifications to tablet metadata are now made under group0 guard and save back metadata to disk. Closes scylladb/scylladb#22648 * github.com:scylladb/scylladb: test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario tests: tablets: Set initial tablets to 1 to exit growing mode test: tablets_test: Create proper schema in load balancer tests test: lib: Introduce topology_builder test: cql_test_env: Expose topology_state_machine topology_state_machine: Introduce lock transition	2025-02-10 16:08:41 +02:00
Nikita Kurashkin	025bb379a4	cql: remove expansion of "SELECT " in DESC MATERIALIZED VIEW This patch removes expansion of "SELECT " in DESC MATERIALIZED VIEW. Instead of explicitly printing each column, DESC command will now just use SELECT *, if view was created with it. Also, adds a correspodning test. Fixes #21154 Closes scylladb/scylladb#21962	2025-02-10 15:01:23 +02:00
Michael Litvak	c098e9a327	test/test_view_build_status: fix flaky asserts In few test cases of test_view_build_status we create a view, wait for it and then query the view_build_status table and expect it to have all rows for each node and view. But it may fail because it could happen that the wait_for_view query and the following queries are done on different nodes, and some of the nodes didn't apply all the table updates yet, so they have missing rows. To fix it, we change the assert to work in the eventual consistency sense, retrying until the number of rows is as expectd. Fixes scylladb/scylladb#22644 Closes scylladb/scylladb#22654	2025-02-10 12:41:42 +01:00
Nadav Har'El	a492e239e3	Merge 'test.py: Add the possibility to run boost and unit tests with pytest ' from Andrei Chekun Add the possibility to run boost and unit tests with pytest test.py should follow the next paradigm - the ability to run all test cases sequentially by ONE pytest command. With this paradigm, to have the better performance, we can split this 1 command into 2,3,4,5,100,200... whatever we want It's a new functionality that does not touch test.py way of executing the boost and unit tests. It supports the main features of test.py way of execution: automatic discovery of modes, repeats. There is an additional requirement to execute tests in parallel: pytest-xdist. To install it, execute `pip install pytest-xdist` To run test with pytest execute `pytest test/boost`. To execute only one file, provide the path filename `pytest test/boost/aggregate_fcts_test.cc` since it's a normal path, autocompletion will work on the terminal. To provide a specific mode, use the next parameter `--mode dev`, if parameter will not be provided pytest will try to use `ninja mode_list` to find out the compiled modes. Parallel execution controlled by pyest-xdist and the parameter `-n 12`. The useful command to discover the tests in the file or directory is `pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc`. That will return all test functions in the file. To execute only one function from the test, you can invoke the output from the previous command, but suffix for mode should be skipped, for example output will be `test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev`, so to execute this specific test function, please use the next command `pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg` There is a parameter `--repeat` that used to repeat the test case several times in the same way as test.py did. It's not possible to run both boost and unit tests directories with one command, so we need to provide explicitly which directory should be executed. Like this `pytest --mode dev test/unit` or `pytest --mode dev test/boost` Fixes: https://github.com/scylladb/qa-tasks/issues/1775 Closes scylladb/scylladb#21108 * github.com:scylladb/scylladb: test.py: Add possibility to run ldap tests from pytest test.py: Add the possibility to run unit tests from pytest test.py: Add the possibility to run boost test from pytest test.py: Add discovery for C++ tests for pytest test.py: Modify s3 server mock test.py: Add method to get environment variables from MinIO wrapper test.py: Move get configured modes to common lib	2025-02-09 11:56:24 +01:00
Avi Kivity	9712390336	Merge 'Add per-table tablet options in schema' from Benny Halevy This series extends the table schema with per-table tablet options. The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions, when the table size changes. * New feature, no backport required Closes scylladb/scylladb#22090 * github.com:scylladb/scylladb: tablets: resize_decision: get rid of initial_decision tablet_allocator: consider tablet options for resize decision tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member network_topology_strategy: allocate_tablets_for_new_table: consider tablet options network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0 cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0 cql3/create_keyspace_statement: add deprecation warning for initial tablets test: cqlpy: test_tablets: add tests for per-table tablet options schema: add per-table tablet options feature_service: add TABLET_OPTIONS cluster schema feature	2025-02-08 20:32:19 +02:00
Avi Kivity	9db9b0963f	Merge ' reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Botond Dénes `set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: https://github.com/scylladb/scylladb/issues/22629 Backport required to all active releases, they are all affected. Closes scylladb/scylladb#22701 * github.com:scylladb/scylladb: reader_concurrency_semaphore: set_notify_handler(): disable timeout reader_permit: mark check_abort() as const	2025-02-08 20:05:03 +02:00
Andrei Chekun	043534acc6	test.py: Add possibility to run ldap tests from pytest Add posibility to run ldap tests with pytest. LDAP server will be created for each worker if xdist will be used. For one thread one LDAP server will be used for all tests.	2025-02-07 21:40:28 +01:00
Andrei Chekun	36ad813b94	test.py: Add the possibility to run unit tests from pytest Add the possibility to run unit tests from pytest	2025-02-07 21:40:28 +01:00
Andrei Chekun	8ef840a1c5	test.py: Add the possibility to run boost test from pytest Add the possibility to run boost test from pytest. Boost facade based on code from https://github.com/pytest-dev/pytest-cpp, but enhanced and rewritten to suite better.	2025-02-07 21:40:25 +01:00
Andrei Chekun	4addc039e5	test.py: Add discovery for C++ tests for pytest Code based on https://github.com/pytest-dev/pytest-cpp. Updated, customized, enhanced to suit current needs. Modify generate report to not modify the names, since it will break xdist way of working. Instead modification will be done in post collect but before executing the tests.	2025-02-07 19:44:06 +01:00
Andrei Chekun	fb4722443d	test.py: Modify s3 server mock Add the possibility to return environment as a dict to use it later it subprocess created by xdist, without starting another s3 mock server for each thread.	2025-02-07 19:38:53 +01:00
Andrei Chekun	7948c4561d	test.py: Add method to get environment variables from MinIO wrapper Add method to retrieve MinIO server wrapper environment variables for later processing. This change will allow to sharing connection information with other processes and allow reusing the server across multiple tests.	2025-02-07 19:38:53 +01:00
Andrei Chekun	108ef5856f	test.py: Move get configured modes to common lib This will allow using this method inside the test module for pytest launching the boost and unit tests	2025-02-07 19:38:53 +01:00
Tomasz Grabiec	1854ea2165	test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario This scenario is invoked in a loop in the test_load_balancing_merge_colocation_with_random_load test case, which will cause accumulation of tablet maps making each reload slower in subsequent iterations. It wasn't a problem before because we overwritten tablet_metadata in each iteration to contain only tablets for the current table, but now we need to keep it consistent with the schema and don't do that.	2025-02-07 17:13:52 +01:00
Tomasz Grabiec	58460a8863	tests: tablets: Set initial tablets to 1 to exit growing mode After tablet hints, there is no notion of leaving growing mode and tablet count is sustained continuously by initial tablet option, so we need to lower it for merge to happen.	2025-02-07 17:13:52 +01:00
Tomasz Grabiec	ca6159fbe2	test: tablets_test: Create proper schema in load balancer tests This is in preparation for load balancer changes needed to respect per-table tablet hints and respecting per-shard tablet count goal. After those changes, load balancer consults with the replication strategy in the database, so we need to create proper schema in the database. To do that, we need proper topology for replication strategies which use RF > 1, otherwise keyspace creation will fail.	2025-02-07 17:13:52 +01:00
Tomasz Grabiec	0d259bb175	test: lib: Introduce topology_builder Will be used by load balancer tests which need more than a single-node topology, and which want to create proper schema in the database which depends on that topology, in particular creating keyspaces with replication factor > 1. We need to do that because load balancer will use replication strategy from the database as part of plan making.	2025-02-07 16:48:33 +01:00
Tomasz Grabiec	3bb9d2fbdb	test: cql_test_env: Expose topology_state_machine	2025-02-07 16:09:21 +01:00
Alexey Novikov	cc35905531	Allow to use memtable_flush_period_in_ms schema option for system tables It's possible to modify 'memtable_flush_period_in_ms' option only and as single option, not with any other options together Refs #20999 Fixes #21223 Closes scylladb/scylladb#22536	2025-02-07 10:33:05 +02:00
Botond Dénes	9174f27cc8	reader_concurrency_semaphore: set_notify_handler(): disable timeout set_notify_handler() is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: #scylladb/scylladb#22629	2025-02-07 02:31:01 -05:00
Pavel Emelyanov	f331d3b876	Merge 'auth: ensure default superuser password is set before serving CQL' from Andrzej Jackowski Before this change, it was ensured that a default superuser is created before serving CQL. However, the mechanism didn't wait for default password initialization, so effectively, for a short period, customer couldn't authenticate as the superuser properily. The purpose of this change is to improve the superuser initialization mechanism to wait for superuser default password, just as for the superuser creation. This change: - Introduce authenticator::ensure_superuser_is_created() to allow waiting for complete initialization of super user authentication - Implement ensure_superuser_is_created in password_authenticator, so waiting for superuser password initialization is possible - Implement ensure_superuser_is_create in transitional_authenticator, so the implementation from password_authenticator is used - Implement no-op ensure_superuser_is_create for other authenticators - Extend service::ensure_superuser_is_created to wait for superuser initialization in authenticator, just as it was implemented earlier for role_manager - Add injected error (sleep) in password_authenticator::start to reproduce a case of delayed password creation - Implement test_delayed_deafult_password to verify the correctness of the fix - Ensure superuser is created in single_node_cql_env::run_in_thread to make single_node_cql more similar to scylla_main in main.cc Fixes scylladb/scylladb#20566 Backport not needed - a minor bugfix Closes scylladb/scylladb#22532 * github.com:scylladb/scylladb: test: implement test_auth_password_ensured test: implement connect_driver argument in ManagerClient::server_add auth: ensure default superuser password is set before serving CQL auth: added password_authenticator_start_pause injected error	2025-02-07 08:47:01 +03:00
Avi Kivity	861fb58e14	Merge 'vector: add support for vector type' from Dawid Pawlik This pull request is an implementation of vector data type similar to one used by Apache Cassandra. The patch contains: - implementation of vector_type_impl class - necessary functionalities similar to other data types - support for serialization and deserialization of vectors - support for Lua and JSON format - valid CQL syntax for `vector<>` type - `type_parser` support for vectors - expression adjustments such as: - add `collection_constructor::style_type::vector` - rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector` - vector type encoding (for drivers) - unit tests - cassandra compatibility tests - necessary documentation Co-authored-by: @janpiotrlakomy Fixes https://github.com/scylladb/scylladb/issues/19455 Closes scylladb/scylladb#22488 * github.com:scylladb/scylladb: docs: add vector type documentation cassandra_tests: translate tests covering the vector type type_codec: add vector type encoding boost/expr_test: add vector expression tests expression: adjust collection constructor list style expression: add vector style type test/boost: add vector type cql_env boost tests test/boost: add vector type_parser tests type_parser: support vector type cql3: add vector type syntax types: implement vector_type_impl	2025-02-06 20:36:50 +02:00
Benny Halevy	20c6ca2813	tablet_allocator: consider tablet options for resize decision Do not merge tablets if that would drop the tablet_count below the minimum provided by hints. Split tablets if the current tablet_count is less than the minimum tablet count calculated using the table's tablet options. TODO: override min_tablet_count if the tablet count per shard is greater than the maximum allowed. In this case the tables tablet counts should be scaled down proportionally. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 18:43:35 +02:00
Andrzej Jackowski	d5a4f3d4cd	test: implement test_auth_password_ensured Before fix of scylladb#20566, CQL was served irrespectively of default superuser password creation, which led to an incorrect product behavior and sporadic test failures. This test verifies race condition of serving CQL and creating default superuser password. Injected failure is used to ensure CQL use is attempted before default superuser password creation, however, the attempt is expected to fail because scylladb#20566 is fixed. Following that, the injected error is notified, so CQL driver can be started correctly. Finally, CREATE USER query is executed to confirm successful superuser authentication. This change: - Implement test_auth_password_ensured.py The test starts a server without expecting CQL serving, because expected_server_up_state=ServerUpState.HOST_ID_QUERIED and connect_driver=False. Error password_authenticator_start_pause is injected to block superuser password setup during server startup. Next, the test waits for a log to confirm that the code implementing injected error is reached. When the server startup procedure is unfinished, some operations might not complete on a first try, so waiting for driver connection is wrapped in repeat_if_host_unavailable.	2025-02-06 10:30:55 +01:00
Andrzej Jackowski	e70ba7e3ed	test: implement connect_driver argument in ManagerClient::server_add This commit introduces connect_driver argument in ManagerClient::server_add. The argument allow skipping CQL driver initialization part during server start. Starting a server without the driver is necessary to implement some test scenarios related to system initialization. After stopping a server, ManagerClient::server_start can be used to start the server again, so connect_driver argument is also added here to allow preventing connecting the driver after a server restart. This change: - Implement connect_driver argument in ManagerClient::server_add - Implement connect_driver argument in ManagerClient::server_start	2025-02-06 10:30:55 +01:00
Pavel Emelyanov	64baab1b95	Merge 'config: prevent SIGHUP from changing non-liveupdatable parameters' from Andrzej Jackowski Before this change, it was possible to change non-liveupdatable config parameter without process restart. This erroneous behavior not only contradicts the documentation but is potentially dangerous, as various components theoretically might not be prepared for a change of configuration parameter value without a restart. The issue came from a fact that liveupdatability verification check was skipped for default configuration parameters (those without its initial values in configuration file during process start). This change: - Introduce _initialization_completed member in config_file - Set _initialization_completed=true when config file is processed on server start - Verify config_file's initialization status during config update - if config_file was initialized, prevent from further changes of non-liveupdatable parameters - Implement ScyllaRESTAPIClient::get_config() that obtains a current value of given configuration parameter via /v2/config REST API - Implement test to confirm that only liveupdatable parameters are changed when SIGHUP is sent after configuration file change Function set_initialization_completed() is called only once in main.cc, and the effect is expected to be visible in all shards, as a side effect of cfg->broadcast_to_all_shards() that is called shortly after. The same technique was already used for enable_3_1_0_compatibility_mode() call. Fixes scylladb/scylladb#5382 No backport - minor fix. Closes scylladb/scylladb#22655 * github.com:scylladb/scylladb: test: SIGHUP doesn't change non-liveupdatable configuration test: implement ScyllaRESTAPIClient::get_config() config: prevent SIGHUP from changing non-liveupdatable parameters config: remove unused set_value_on_all_shards(const YAML::Node&)	2025-02-06 11:33:59 +03:00
Pavel Emelyanov	951625ca13	Merge 's3 client: add aws credentials providers' from Ernest Zaslavsky This update introduces four types of credential providers: 1. Environment variables 2. Configuration file 3. AWS STS 4. EC2 Metadata service The first two providers should only be used for testing and local runs. They must NEVER be used in production. The last two providers are intended for use on real EC2 instances: - AWS STS: Preferred method for obtaining temporary credentials using IAM roles. - EC2 Metadata Service: Should be used as a last resort. Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result. fixes: #21828 Closes scylladb/scylladb#21830 * github.com:scylladb/scylladb: docs: update the `object_storage.md` and `admin.rst` aws creds: add STS and Instance Metadata service credentials providers aws creds: add env. and file credentials providers s3 creds: move credentials out of endpoint config	2025-02-06 11:12:37 +03:00
Benny Halevy	32c2f7579f	network_topology_strategy: allocate_tablets_for_new_table: consider tablet options Use the keyspace initial_tablets for min_tablet_count, if the latter isn't set, then take the maximum of the option-based tablet counts: - min_tablet_count - and expected_data_size_in_gb / target_tablet_size - min_per_shard_tablet_count (via calculate_initial_tablets_from_topology) If none of the hints produce a positive tablet_count, fall back to calculate_initial_tablets_from_topology * initial_scale. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 08:59:32 +02:00
Benny Halevy	7cd29810a0	test: cqlpy: test_tablets: add tests for per-table tablet options Test specifying of per-table tablet options on table creation and alter table. Also, add a negative test for atempting to use tablet options with vnodes (that should fail). And add a basic test for testing tablet options also with materialized views. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 08:55:51 +02:00
Benny Halevy	c5668d99c9	schema: add per-table tablet options Unlike with vnodes, each tablet is served only by a single shard, and it is associated with a memtable that, when flushed, it creates sstables which token-range is confined to the tablet owning them. On one hand, this allows for far better agility and elasticity since migration of tablets between nodes or shards does not require rewriting most if not all of the sstables, as required with vnodes (at the cleanup phase). Having too few tablets might limit performance due not being served by all shards or by imbalance between shards caused by quantization. The number of tabelts per table has to be a power of 2 with the current design, and when divided by the number of shards, some shards will serve N tablets, while others may serve N+1, and when N is small N+1/N may be significantly larger than 1. For example, with N=1, some shards will serve 2 tablet replicas and some will serve only 1, causing an imbalance of 100%. Now, simply allocating a lot more tablets for each table may theoretically address this problem, but practically: a. Each tablet has memory overhead and having too many tablets in the system with many tables and many tablets for each of them may overwhelm the system's and cause out-of-memory errors. b. Too-small tablets cause a proliferation of small sstables that are less efficient to acces, have higher metadata overhead (due to per-sstable overhead), and might exhaust the system's open file-descriptors limitations. The options introduced in this change can help the user tune the system in two ways: 1. Sizing the table to prevent unnecessary tablet splits and migrations. This can be done when the table is created, or later on, using ALTER TABLE. 2. Controlling min_per_shard_tablet_count to improve tablet balancing, for hot tables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 08:55:51 +02:00
Tomasz Grabiec	3bb19e9ac9	locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables For example, nodes which are being decommissioned should not be consider as available capacity for new tables. We don't allocate tablets on such nodes. Would result in higher per-shard load then planned. Closes scylladb/scylladb#22657	2025-02-05 23:59:41 +02:00
Kefu Chai	9a20fb43ab	tree: replace boost::min_element() with std::ranges::min_element() in order to reduce the external header dependency, let's switch to the standardlized std::ranges::min_element(). Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22572	2025-02-05 21:54:01 +02:00
Tomasz Grabiec	e22e3b21b1	locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes In that case, new_racks will be used, but when we discover no candidates, we try to pop from existing_racks. Fixes #22625 Closes scylladb/scylladb#22652	2025-02-05 20:13:05 +02:00
Nadav Har'El	bfdd805f15	test/alternator: fix running against installation blocking CQL One of the design goals of the Alternator test suite (test/alternator) is that developers should be able to run the tests against some already running installation by running `cd test/alternator; pytest [--url ...]`. Some of our presentations and documents recommend running Alternator via docker as: docker run --name scylla -d -p 8000:8000 scylladb/scylla:latest --alternator-port=8000 --alternator-write-isolation=always This only makes port 8000 available to the host - the CQL port is blocked. We had a bug in conftest.py's get_valid_alternator_role() which caused it to fail (and fail every single test) when CQL is not available. What we really want is that when CQL is not available and we can't figure out a correct secret key to connect to Alternator, we just try a connect with a fake key - and hope that the option alternator-enforce-authorization is turned off. In fact, this is what the code comments claim was already happening - but we failed to handle the case that CQL is not available at all. After this patch, one can run Alternator with the above docker command, and then run tests against it. By the way, this provides another way for running any old release of Scylla and running Alternator tests against it. We already supported a similar feature via test/alternator/run's "--release" option, but its implementation doesn't use docker. Fixes #22591 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22592	2025-02-05 19:01:31 +03:00
Botond Dénes	7ce932ce01	service: query_pager: fix last-position for filtering queries On short-pages, cut short because of a tombstone prefix. When page-results are filtered and the filter drops some rows, the last-position is taken from the page visitor, which does the filtering. This means that last partition and row position will be that of the last row the filter saw. This will not match the last position of the replica, when the replica cut the page due to tombstones. When fetching the next page, this means that all the tombstone suffix of the last page, will be re-fetched. Worse still: the last position of the next page will not match that of the saved reader left on the replica, so the saved reader will be dropped and a new one created from scratch. This wasted work will show up as elevated tail latencies. Fix by always taking the last position from raw query results. Fixes: #22620 Closes scylladb/scylladb#22622	2025-02-05 17:23:30 +02:00
Raphael S. Carvalho	ce65164315	test: Use linux-aio backend again on seastar-based tests Since mid December, tests started failing with ENOMEM while submitting I/O requests. Logs of failed tests show IO uring was used as backend, but we never deliberately switched to IO uring. Investigation pointed to it happening accidentaly in commit `1bac6b75dc`, which turned on IO uring for allowing native tool in production, and picked linux-aio backend explicitly when initializing Scylla. But it missed that seastar-based tests would pick the default backend, which is io_uring once enabled. There's a reason we never made io_uring the default, which is that it's not stable enough, and turns out we made the right choice back then and it apparently continue to be unstable causing flakiness in the tests. Let's undo that accidental change in tests by explicitly picking the linux-aio backend for seastar-based tests. This should hopefully bring back stability. Refs #21968. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#22695	2025-02-05 15:19:24 +02:00
Ernest Zaslavsky	dee4fc7150	aws creds: add STS and Instance Metadata service credentials providers This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.	2025-02-05 14:57:19 +02:00
Ernest Zaslavsky	d534051bea	aws creds: add env. and file credentials providers This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.	2025-02-05 14:57:19 +02:00
Andrzej Jackowski	6f5ba3dd89	test: SIGHUP doesn't change non-liveupdatable configuration This change: - Implement test to confirm that only liveupdatable parameters are changed when SIGHUP is sent after configuration file change	2025-02-05 09:37:37 +01:00
Andrzej Jackowski	a001b20938	test: implement ScyllaRESTAPIClient::get_config() This change: - Implement ScyllaRESTAPIClient::get_config() that obtains a current value of given configuration parameter via /v2/config REST API	2025-02-05 09:37:37 +01:00
Pavel Emelyanov	83f3821f99	Merge 'cql: clean the code validating replication strategy options' from Piotr Smaron Clean the code validating if a replication strategy can be used. This PR consists of a bunch of unmerged https://github.com/scylladb/scylladb/pull/20088 commits - the solution to the problem that the linked PR tried to solve has been accomplished in another PR, leaving the refactor commits unmerged. The commits introduced in this PR have already been reviewed in the old PR. No need to backport, it's just a refactor. Closes scylladb/scylladb#22516 * github.com:scylladb/scylladb: cql: restore validating replication strategies options cql: change validating NetworkTopologyStrategy tags to internal_error cql: inline abstract_replication_strategy::validate_replication_strategy cql: clean redundant code validating replication strategy options	2025-02-05 11:18:50 +03:00
Botond Dénes	f2d5819645	reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload with_permit() creates a permit, with a self-reference, to avoid attaching a continuation to the permit's run function. This self-reference is used to keep the permit alive, until the execution loop processes it. This self reference has to be carefully cleared on error-paths, otherwise the permit will become a zombie, effectively leaking memory. Instead of trying to handle all loose ends, get rid of this self-reference altogether: ask caller to provide a place to save the permit, where it will survive until the end of the call. This makes the call-site a little bit less nice, but it gets rid of a whole class of possible bugs. Fixes: #22588 Closes scylladb/scylladb#22624	2025-02-04 21:27:16 +02:00
Ernest Zaslavsky	c911fc4f34	s3 creds: move credentials out of endpoint config This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.	2025-02-04 16:45:23 +02:00
Piotr Smaron	2953d3ebe0	cql: restore validating replication strategies options `validate_options` needs to be extended with `topology` parameter, because NetworkTopologyStrategy needs to validate if every explicitly listed DC is really existing. I did cut corner a bit and trimmed the message thrown when it's not the case, just to avoid passing and extra parameter (ks name) to the `validate_options` function, as I find the longer message to be a bit redundant (the driver will receive info which KS modification failed). The tests that have been commented out in the previous commit have been restored.	2025-02-04 12:27:33 +01:00
Piotr Smaron	100e8d2856	cql: change validating NetworkTopologyStrategy tags to internal_error The check for `replication_factor` tag in `network_topology_strategy::validate_options` is redundant for 2 reasons: - before we reach this part of the code, the `replication_factor` tag is replaced with specific DC names - we actually do allow for `replication_factor` tag in NetworkTopologyStrategy for keyspaces that have tablets disabled. This code is unreachable, hence changing it to an internal error, which means this situation should never occur. The place that unrolls `replication_factor` tag checked for presence of this tag ignoring the case, which lead to an unexpected behaviour: - `replication_factor` tag (note the lowercase) was unrolled, as explained above, - the same tag but written in any other case resulted in throwing a vague message: "replication_factor is an option for SimpleStrategy, not NetworkTopologyStrategy". So we're changing this validation to accept and unroll only the lowercase version of this tag. We can't ignore the case here, as this tag is present inside a json, and json is case-sensitive, even though the CQL itself is case insensitive. Added a test that passes for both scylla and cassandra. Fixes: #15336	2025-02-04 12:27:29 +01:00
Aleksandra Martyniuk	683176d3db	tasks: add shard, start_time, and end_time to task_stats task_stats contains short info about a task. To get a list of task_stats in the module, one needs to request /task_manager/list_module_tasks/{module}. To make identification and navigation between tasks easier, extend task_stats to contain shard, start_time, and end_time. Closes scylladb/scylladb#22351	2025-02-04 12:11:24 +02:00
Botond Dénes	8c8db2052e	Merge 'service: add child for tablet repair virtual task' from Aleksandra Martyniuk tablet_repair_task_impl is run as a part of tablet repair. Make it a child of tablet repair virtual task. tablet_repair_task_impl started by /storage_service/repair_async API (vnode repair) does not have a parent, as it is the top-level task in that case. No backport needed; new functionality Closes scylladb/scylladb#22372 * github.com:scylladb/scylladb: test: add test to check tablet repair child service: add child for tablet repair virtual task	2025-02-04 12:08:24 +02:00
Aleksandra Martyniuk	610a761ca2	service: use read barrier in tablet_virtual_task::contains Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent. To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table. Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it. Fixes: #21975. Closes scylladb/scylladb#21995	2025-02-04 12:07:42 +02:00
Ran Regev	edd56a2c1c	moved cache files to db As requested in #22097, moved the files and fixed other includes and build system. Fixes: #22097 Signed-off-by: Ran Regev <ran.regev@scylladb.com> Closes scylladb/scylladb#22495	2025-02-04 12:21:31 +03:00

1 2 3 4 5 ...

8243 Commits