scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 00:50:35 +00:00

Author	SHA1	Message	Date
Nadav Har'El	a492e239e3	Merge 'test.py: Add the possibility to run boost and unit tests with pytest ' from Andrei Chekun Add the possibility to run boost and unit tests with pytest test.py should follow the next paradigm - the ability to run all test cases sequentially by ONE pytest command. With this paradigm, to have the better performance, we can split this 1 command into 2,3,4,5,100,200... whatever we want It's a new functionality that does not touch test.py way of executing the boost and unit tests. It supports the main features of test.py way of execution: automatic discovery of modes, repeats. There is an additional requirement to execute tests in parallel: pytest-xdist. To install it, execute `pip install pytest-xdist` To run test with pytest execute `pytest test/boost`. To execute only one file, provide the path filename `pytest test/boost/aggregate_fcts_test.cc` since it's a normal path, autocompletion will work on the terminal. To provide a specific mode, use the next parameter `--mode dev`, if parameter will not be provided pytest will try to use `ninja mode_list` to find out the compiled modes. Parallel execution controlled by pyest-xdist and the parameter `-n 12`. The useful command to discover the tests in the file or directory is `pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc`. That will return all test functions in the file. To execute only one function from the test, you can invoke the output from the previous command, but suffix for mode should be skipped, for example output will be `test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev`, so to execute this specific test function, please use the next command `pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg` There is a parameter `--repeat` that used to repeat the test case several times in the same way as test.py did. It's not possible to run both boost and unit tests directories with one command, so we need to provide explicitly which directory should be executed. Like this `pytest --mode dev test/unit` or `pytest --mode dev test/boost` Fixes: https://github.com/scylladb/qa-tasks/issues/1775 Closes scylladb/scylladb#21108 * github.com:scylladb/scylladb: test.py: Add possibility to run ldap tests from pytest test.py: Add the possibility to run unit tests from pytest test.py: Add the possibility to run boost test from pytest test.py: Add discovery for C++ tests for pytest test.py: Modify s3 server mock test.py: Add method to get environment variables from MinIO wrapper test.py: Move get configured modes to common lib	2025-02-09 11:56:24 +01:00
Avi Kivity	9712390336	Merge 'Add per-table tablet options in schema' from Benny Halevy This series extends the table schema with per-table tablet options. The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions, when the table size changes. * New feature, no backport required Closes scylladb/scylladb#22090 * github.com:scylladb/scylladb: tablets: resize_decision: get rid of initial_decision tablet_allocator: consider tablet options for resize decision tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member network_topology_strategy: allocate_tablets_for_new_table: consider tablet options network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0 cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0 cql3/create_keyspace_statement: add deprecation warning for initial tablets test: cqlpy: test_tablets: add tests for per-table tablet options schema: add per-table tablet options feature_service: add TABLET_OPTIONS cluster schema feature	2025-02-08 20:32:19 +02:00
Avi Kivity	9db9b0963f	Merge ' reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Botond Dénes `set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: https://github.com/scylladb/scylladb/issues/22629 Backport required to all active releases, they are all affected. Closes scylladb/scylladb#22701 * github.com:scylladb/scylladb: reader_concurrency_semaphore: set_notify_handler(): disable timeout reader_permit: mark check_abort() as const	2025-02-08 20:05:03 +02:00
Andrei Chekun	043534acc6	test.py: Add possibility to run ldap tests from pytest Add posibility to run ldap tests with pytest. LDAP server will be created for each worker if xdist will be used. For one thread one LDAP server will be used for all tests.	2025-02-07 21:40:28 +01:00
Andrei Chekun	36ad813b94	test.py: Add the possibility to run unit tests from pytest Add the possibility to run unit tests from pytest	2025-02-07 21:40:28 +01:00
Andrei Chekun	8ef840a1c5	test.py: Add the possibility to run boost test from pytest Add the possibility to run boost test from pytest. Boost facade based on code from https://github.com/pytest-dev/pytest-cpp, but enhanced and rewritten to suite better.	2025-02-07 21:40:25 +01:00
Andrei Chekun	4addc039e5	test.py: Add discovery for C++ tests for pytest Code based on https://github.com/pytest-dev/pytest-cpp. Updated, customized, enhanced to suit current needs. Modify generate report to not modify the names, since it will break xdist way of working. Instead modification will be done in post collect but before executing the tests.	2025-02-07 19:44:06 +01:00
Andrei Chekun	fb4722443d	test.py: Modify s3 server mock Add the possibility to return environment as a dict to use it later it subprocess created by xdist, without starting another s3 mock server for each thread.	2025-02-07 19:38:53 +01:00
Andrei Chekun	7948c4561d	test.py: Add method to get environment variables from MinIO wrapper Add method to retrieve MinIO server wrapper environment variables for later processing. This change will allow to sharing connection information with other processes and allow reusing the server across multiple tests.	2025-02-07 19:38:53 +01:00
Andrei Chekun	108ef5856f	test.py: Move get configured modes to common lib This will allow using this method inside the test module for pytest launching the boost and unit tests	2025-02-07 19:38:53 +01:00
Alexey Novikov	cc35905531	Allow to use memtable_flush_period_in_ms schema option for system tables It's possible to modify 'memtable_flush_period_in_ms' option only and as single option, not with any other options together Refs #20999 Fixes #21223 Closes scylladb/scylladb#22536	2025-02-07 10:33:05 +02:00
Botond Dénes	9174f27cc8	reader_concurrency_semaphore: set_notify_handler(): disable timeout set_notify_handler() is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: #scylladb/scylladb#22629	2025-02-07 02:31:01 -05:00
Pavel Emelyanov	f331d3b876	Merge 'auth: ensure default superuser password is set before serving CQL' from Andrzej Jackowski Before this change, it was ensured that a default superuser is created before serving CQL. However, the mechanism didn't wait for default password initialization, so effectively, for a short period, customer couldn't authenticate as the superuser properily. The purpose of this change is to improve the superuser initialization mechanism to wait for superuser default password, just as for the superuser creation. This change: - Introduce authenticator::ensure_superuser_is_created() to allow waiting for complete initialization of super user authentication - Implement ensure_superuser_is_created in password_authenticator, so waiting for superuser password initialization is possible - Implement ensure_superuser_is_create in transitional_authenticator, so the implementation from password_authenticator is used - Implement no-op ensure_superuser_is_create for other authenticators - Extend service::ensure_superuser_is_created to wait for superuser initialization in authenticator, just as it was implemented earlier for role_manager - Add injected error (sleep) in password_authenticator::start to reproduce a case of delayed password creation - Implement test_delayed_deafult_password to verify the correctness of the fix - Ensure superuser is created in single_node_cql_env::run_in_thread to make single_node_cql more similar to scylla_main in main.cc Fixes scylladb/scylladb#20566 Backport not needed - a minor bugfix Closes scylladb/scylladb#22532 * github.com:scylladb/scylladb: test: implement test_auth_password_ensured test: implement connect_driver argument in ManagerClient::server_add auth: ensure default superuser password is set before serving CQL auth: added password_authenticator_start_pause injected error	2025-02-07 08:47:01 +03:00
Avi Kivity	861fb58e14	Merge 'vector: add support for vector type' from Dawid Pawlik This pull request is an implementation of vector data type similar to one used by Apache Cassandra. The patch contains: - implementation of vector_type_impl class - necessary functionalities similar to other data types - support for serialization and deserialization of vectors - support for Lua and JSON format - valid CQL syntax for `vector<>` type - `type_parser` support for vectors - expression adjustments such as: - add `collection_constructor::style_type::vector` - rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector` - vector type encoding (for drivers) - unit tests - cassandra compatibility tests - necessary documentation Co-authored-by: @janpiotrlakomy Fixes https://github.com/scylladb/scylladb/issues/19455 Closes scylladb/scylladb#22488 * github.com:scylladb/scylladb: docs: add vector type documentation cassandra_tests: translate tests covering the vector type type_codec: add vector type encoding boost/expr_test: add vector expression tests expression: adjust collection constructor list style expression: add vector style type test/boost: add vector type cql_env boost tests test/boost: add vector type_parser tests type_parser: support vector type cql3: add vector type syntax types: implement vector_type_impl	2025-02-06 20:36:50 +02:00
Benny Halevy	20c6ca2813	tablet_allocator: consider tablet options for resize decision Do not merge tablets if that would drop the tablet_count below the minimum provided by hints. Split tablets if the current tablet_count is less than the minimum tablet count calculated using the table's tablet options. TODO: override min_tablet_count if the tablet count per shard is greater than the maximum allowed. In this case the tables tablet counts should be scaled down proportionally. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 18:43:35 +02:00
Andrzej Jackowski	d5a4f3d4cd	test: implement test_auth_password_ensured Before fix of scylladb#20566, CQL was served irrespectively of default superuser password creation, which led to an incorrect product behavior and sporadic test failures. This test verifies race condition of serving CQL and creating default superuser password. Injected failure is used to ensure CQL use is attempted before default superuser password creation, however, the attempt is expected to fail because scylladb#20566 is fixed. Following that, the injected error is notified, so CQL driver can be started correctly. Finally, CREATE USER query is executed to confirm successful superuser authentication. This change: - Implement test_auth_password_ensured.py The test starts a server without expecting CQL serving, because expected_server_up_state=ServerUpState.HOST_ID_QUERIED and connect_driver=False. Error password_authenticator_start_pause is injected to block superuser password setup during server startup. Next, the test waits for a log to confirm that the code implementing injected error is reached. When the server startup procedure is unfinished, some operations might not complete on a first try, so waiting for driver connection is wrapped in repeat_if_host_unavailable.	2025-02-06 10:30:55 +01:00
Andrzej Jackowski	e70ba7e3ed	test: implement connect_driver argument in ManagerClient::server_add This commit introduces connect_driver argument in ManagerClient::server_add. The argument allow skipping CQL driver initialization part during server start. Starting a server without the driver is necessary to implement some test scenarios related to system initialization. After stopping a server, ManagerClient::server_start can be used to start the server again, so connect_driver argument is also added here to allow preventing connecting the driver after a server restart. This change: - Implement connect_driver argument in ManagerClient::server_add - Implement connect_driver argument in ManagerClient::server_start	2025-02-06 10:30:55 +01:00
Pavel Emelyanov	64baab1b95	Merge 'config: prevent SIGHUP from changing non-liveupdatable parameters' from Andrzej Jackowski Before this change, it was possible to change non-liveupdatable config parameter without process restart. This erroneous behavior not only contradicts the documentation but is potentially dangerous, as various components theoretically might not be prepared for a change of configuration parameter value without a restart. The issue came from a fact that liveupdatability verification check was skipped for default configuration parameters (those without its initial values in configuration file during process start). This change: - Introduce _initialization_completed member in config_file - Set _initialization_completed=true when config file is processed on server start - Verify config_file's initialization status during config update - if config_file was initialized, prevent from further changes of non-liveupdatable parameters - Implement ScyllaRESTAPIClient::get_config() that obtains a current value of given configuration parameter via /v2/config REST API - Implement test to confirm that only liveupdatable parameters are changed when SIGHUP is sent after configuration file change Function set_initialization_completed() is called only once in main.cc, and the effect is expected to be visible in all shards, as a side effect of cfg->broadcast_to_all_shards() that is called shortly after. The same technique was already used for enable_3_1_0_compatibility_mode() call. Fixes scylladb/scylladb#5382 No backport - minor fix. Closes scylladb/scylladb#22655 * github.com:scylladb/scylladb: test: SIGHUP doesn't change non-liveupdatable configuration test: implement ScyllaRESTAPIClient::get_config() config: prevent SIGHUP from changing non-liveupdatable parameters config: remove unused set_value_on_all_shards(const YAML::Node&)	2025-02-06 11:33:59 +03:00
Pavel Emelyanov	951625ca13	Merge 's3 client: add aws credentials providers' from Ernest Zaslavsky This update introduces four types of credential providers: 1. Environment variables 2. Configuration file 3. AWS STS 4. EC2 Metadata service The first two providers should only be used for testing and local runs. They must NEVER be used in production. The last two providers are intended for use on real EC2 instances: - AWS STS: Preferred method for obtaining temporary credentials using IAM roles. - EC2 Metadata Service: Should be used as a last resort. Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result. fixes: #21828 Closes scylladb/scylladb#21830 * github.com:scylladb/scylladb: docs: update the `object_storage.md` and `admin.rst` aws creds: add STS and Instance Metadata service credentials providers aws creds: add env. and file credentials providers s3 creds: move credentials out of endpoint config	2025-02-06 11:12:37 +03:00
Benny Halevy	32c2f7579f	network_topology_strategy: allocate_tablets_for_new_table: consider tablet options Use the keyspace initial_tablets for min_tablet_count, if the latter isn't set, then take the maximum of the option-based tablet counts: - min_tablet_count - and expected_data_size_in_gb / target_tablet_size - min_per_shard_tablet_count (via calculate_initial_tablets_from_topology) If none of the hints produce a positive tablet_count, fall back to calculate_initial_tablets_from_topology * initial_scale. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 08:59:32 +02:00
Benny Halevy	7cd29810a0	test: cqlpy: test_tablets: add tests for per-table tablet options Test specifying of per-table tablet options on table creation and alter table. Also, add a negative test for atempting to use tablet options with vnodes (that should fail). And add a basic test for testing tablet options also with materialized views. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 08:55:51 +02:00
Benny Halevy	c5668d99c9	schema: add per-table tablet options Unlike with vnodes, each tablet is served only by a single shard, and it is associated with a memtable that, when flushed, it creates sstables which token-range is confined to the tablet owning them. On one hand, this allows for far better agility and elasticity since migration of tablets between nodes or shards does not require rewriting most if not all of the sstables, as required with vnodes (at the cleanup phase). Having too few tablets might limit performance due not being served by all shards or by imbalance between shards caused by quantization. The number of tabelts per table has to be a power of 2 with the current design, and when divided by the number of shards, some shards will serve N tablets, while others may serve N+1, and when N is small N+1/N may be significantly larger than 1. For example, with N=1, some shards will serve 2 tablet replicas and some will serve only 1, causing an imbalance of 100%. Now, simply allocating a lot more tablets for each table may theoretically address this problem, but practically: a. Each tablet has memory overhead and having too many tablets in the system with many tables and many tablets for each of them may overwhelm the system's and cause out-of-memory errors. b. Too-small tablets cause a proliferation of small sstables that are less efficient to acces, have higher metadata overhead (due to per-sstable overhead), and might exhaust the system's open file-descriptors limitations. The options introduced in this change can help the user tune the system in two ways: 1. Sizing the table to prevent unnecessary tablet splits and migrations. This can be done when the table is created, or later on, using ALTER TABLE. 2. Controlling min_per_shard_tablet_count to improve tablet balancing, for hot tables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-06 08:55:51 +02:00
Tomasz Grabiec	3bb19e9ac9	locator: network_topology_startegy: Ignore leaving nodes when computing capacity for new tables For example, nodes which are being decommissioned should not be consider as available capacity for new tables. We don't allocate tablets on such nodes. Would result in higher per-shard load then planned. Closes scylladb/scylladb#22657	2025-02-05 23:59:41 +02:00
Kefu Chai	9a20fb43ab	tree: replace boost::min_element() with std::ranges::min_element() in order to reduce the external header dependency, let's switch to the standardlized std::ranges::min_element(). Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22572	2025-02-05 21:54:01 +02:00
Tomasz Grabiec	e22e3b21b1	locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes In that case, new_racks will be used, but when we discover no candidates, we try to pop from existing_racks. Fixes #22625 Closes scylladb/scylladb#22652	2025-02-05 20:13:05 +02:00
Nadav Har'El	bfdd805f15	test/alternator: fix running against installation blocking CQL One of the design goals of the Alternator test suite (test/alternator) is that developers should be able to run the tests against some already running installation by running `cd test/alternator; pytest [--url ...]`. Some of our presentations and documents recommend running Alternator via docker as: docker run --name scylla -d -p 8000:8000 scylladb/scylla:latest --alternator-port=8000 --alternator-write-isolation=always This only makes port 8000 available to the host - the CQL port is blocked. We had a bug in conftest.py's get_valid_alternator_role() which caused it to fail (and fail every single test) when CQL is not available. What we really want is that when CQL is not available and we can't figure out a correct secret key to connect to Alternator, we just try a connect with a fake key - and hope that the option alternator-enforce-authorization is turned off. In fact, this is what the code comments claim was already happening - but we failed to handle the case that CQL is not available at all. After this patch, one can run Alternator with the above docker command, and then run tests against it. By the way, this provides another way for running any old release of Scylla and running Alternator tests against it. We already supported a similar feature via test/alternator/run's "--release" option, but its implementation doesn't use docker. Fixes #22591 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22592	2025-02-05 19:01:31 +03:00
Botond Dénes	7ce932ce01	service: query_pager: fix last-position for filtering queries On short-pages, cut short because of a tombstone prefix. When page-results are filtered and the filter drops some rows, the last-position is taken from the page visitor, which does the filtering. This means that last partition and row position will be that of the last row the filter saw. This will not match the last position of the replica, when the replica cut the page due to tombstones. When fetching the next page, this means that all the tombstone suffix of the last page, will be re-fetched. Worse still: the last position of the next page will not match that of the saved reader left on the replica, so the saved reader will be dropped and a new one created from scratch. This wasted work will show up as elevated tail latencies. Fix by always taking the last position from raw query results. Fixes: #22620 Closes scylladb/scylladb#22622	2025-02-05 17:23:30 +02:00
Raphael S. Carvalho	ce65164315	test: Use linux-aio backend again on seastar-based tests Since mid December, tests started failing with ENOMEM while submitting I/O requests. Logs of failed tests show IO uring was used as backend, but we never deliberately switched to IO uring. Investigation pointed to it happening accidentaly in commit `1bac6b75dc`, which turned on IO uring for allowing native tool in production, and picked linux-aio backend explicitly when initializing Scylla. But it missed that seastar-based tests would pick the default backend, which is io_uring once enabled. There's a reason we never made io_uring the default, which is that it's not stable enough, and turns out we made the right choice back then and it apparently continue to be unstable causing flakiness in the tests. Let's undo that accidental change in tests by explicitly picking the linux-aio backend for seastar-based tests. This should hopefully bring back stability. Refs #21968. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#22695	2025-02-05 15:19:24 +02:00
Ernest Zaslavsky	dee4fc7150	aws creds: add STS and Instance Metadata service credentials providers This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.	2025-02-05 14:57:19 +02:00
Ernest Zaslavsky	d534051bea	aws creds: add env. and file credentials providers This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.	2025-02-05 14:57:19 +02:00
Andrzej Jackowski	6f5ba3dd89	test: SIGHUP doesn't change non-liveupdatable configuration This change: - Implement test to confirm that only liveupdatable parameters are changed when SIGHUP is sent after configuration file change	2025-02-05 09:37:37 +01:00
Andrzej Jackowski	a001b20938	test: implement ScyllaRESTAPIClient::get_config() This change: - Implement ScyllaRESTAPIClient::get_config() that obtains a current value of given configuration parameter via /v2/config REST API	2025-02-05 09:37:37 +01:00
Pavel Emelyanov	83f3821f99	Merge 'cql: clean the code validating replication strategy options' from Piotr Smaron Clean the code validating if a replication strategy can be used. This PR consists of a bunch of unmerged https://github.com/scylladb/scylladb/pull/20088 commits - the solution to the problem that the linked PR tried to solve has been accomplished in another PR, leaving the refactor commits unmerged. The commits introduced in this PR have already been reviewed in the old PR. No need to backport, it's just a refactor. Closes scylladb/scylladb#22516 * github.com:scylladb/scylladb: cql: restore validating replication strategies options cql: change validating NetworkTopologyStrategy tags to internal_error cql: inline abstract_replication_strategy::validate_replication_strategy cql: clean redundant code validating replication strategy options	2025-02-05 11:18:50 +03:00
Botond Dénes	f2d5819645	reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload with_permit() creates a permit, with a self-reference, to avoid attaching a continuation to the permit's run function. This self-reference is used to keep the permit alive, until the execution loop processes it. This self reference has to be carefully cleared on error-paths, otherwise the permit will become a zombie, effectively leaking memory. Instead of trying to handle all loose ends, get rid of this self-reference altogether: ask caller to provide a place to save the permit, where it will survive until the end of the call. This makes the call-site a little bit less nice, but it gets rid of a whole class of possible bugs. Fixes: #22588 Closes scylladb/scylladb#22624	2025-02-04 21:27:16 +02:00
Ernest Zaslavsky	c911fc4f34	s3 creds: move credentials out of endpoint config This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.	2025-02-04 16:45:23 +02:00
Piotr Smaron	2953d3ebe0	cql: restore validating replication strategies options `validate_options` needs to be extended with `topology` parameter, because NetworkTopologyStrategy needs to validate if every explicitly listed DC is really existing. I did cut corner a bit and trimmed the message thrown when it's not the case, just to avoid passing and extra parameter (ks name) to the `validate_options` function, as I find the longer message to be a bit redundant (the driver will receive info which KS modification failed). The tests that have been commented out in the previous commit have been restored.	2025-02-04 12:27:33 +01:00
Piotr Smaron	100e8d2856	cql: change validating NetworkTopologyStrategy tags to internal_error The check for `replication_factor` tag in `network_topology_strategy::validate_options` is redundant for 2 reasons: - before we reach this part of the code, the `replication_factor` tag is replaced with specific DC names - we actually do allow for `replication_factor` tag in NetworkTopologyStrategy for keyspaces that have tablets disabled. This code is unreachable, hence changing it to an internal error, which means this situation should never occur. The place that unrolls `replication_factor` tag checked for presence of this tag ignoring the case, which lead to an unexpected behaviour: - `replication_factor` tag (note the lowercase) was unrolled, as explained above, - the same tag but written in any other case resulted in throwing a vague message: "replication_factor is an option for SimpleStrategy, not NetworkTopologyStrategy". So we're changing this validation to accept and unroll only the lowercase version of this tag. We can't ignore the case here, as this tag is present inside a json, and json is case-sensitive, even though the CQL itself is case insensitive. Added a test that passes for both scylla and cassandra. Fixes: #15336	2025-02-04 12:27:29 +01:00
Aleksandra Martyniuk	683176d3db	tasks: add shard, start_time, and end_time to task_stats task_stats contains short info about a task. To get a list of task_stats in the module, one needs to request /task_manager/list_module_tasks/{module}. To make identification and navigation between tasks easier, extend task_stats to contain shard, start_time, and end_time. Closes scylladb/scylladb#22351	2025-02-04 12:11:24 +02:00
Botond Dénes	8c8db2052e	Merge 'service: add child for tablet repair virtual task' from Aleksandra Martyniuk tablet_repair_task_impl is run as a part of tablet repair. Make it a child of tablet repair virtual task. tablet_repair_task_impl started by /storage_service/repair_async API (vnode repair) does not have a parent, as it is the top-level task in that case. No backport needed; new functionality Closes scylladb/scylladb#22372 * github.com:scylladb/scylladb: test: add test to check tablet repair child service: add child for tablet repair virtual task	2025-02-04 12:08:24 +02:00
Aleksandra Martyniuk	610a761ca2	service: use read barrier in tablet_virtual_task::contains Currently, when the tablet repair is started, info regarding the operation is kept in the system.tablets. The new tablet states are reflected in memory after load_topology_state is called. Before that, the data in the table and the memory aren't consistent. To check the supported operations, tablet_virtual_task uses in-memory tablet_metadata. Hence, it may not see the operation, even though its info is already kept in system.tablets table. Run read barrier in tablet_virtual_task::contains to ensure it will see the latest data. Add a test to check it. Fixes: #21975. Closes scylladb/scylladb#21995	2025-02-04 12:07:42 +02:00
Ran Regev	edd56a2c1c	moved cache files to db As requested in #22097, moved the files and fixed other includes and build system. Fixes: #22097 Signed-off-by: Ran Regev <ran.regev@scylladb.com> Closes scylladb/scylladb#22495	2025-02-04 12:21:31 +03:00
Aleksandra Martyniuk	43427b8fe0	test: add test to check tablet repair child	2025-02-03 10:31:16 +01:00
Michael Litvak	44c06ddfbb	test/test_view_build_status: fix wrong assert in test The test expects and asserts that after wait_for_view is completed we read the view_build_status table and get a row for each node and view. But this is wrong because wait_for_view may have read the table on one node, and then we query the table on a different node that didn't insert all the rows yet, so the assert could fail. To fix it we change the test to retry and check that eventually all expected rows are found and then eventually removed on the same host. Fixes scylladb/scylladb#22547 Closes scylladb/scylladb#22585	2025-01-30 21:25:53 +02:00
Michael Litvak	6d34125eb7	view_builder: fix loop in view builder when tokens are moved The view builder builds a view by going over the entire token ring, consuming the base table partitions, and generating view updates for each partition. A view is considered as built when we complete a full cycle of the token ring. Suppose we start to build a view at a token F. We will consume all partitions with tokens starting at F until the maximum token, then go back to the minimum token and consume all partitions until F, and then we detect that we pass F and complete building the view. This happens in the view builder consumer in `check_for_built_views`. The problem is that we check if we pass the first token F with the condition `_step.current_token() >= it->first_token` whenever we consume a new partition or the current_token goes back to the minimum token. But suppose that we don't have any partitions with a token greater than or equal to the first token (this could happen if the partition with token F was moved to another node for example), then this condition will never be satisfied, and we don't detect correctly when we pass F. Instead, we go back to the minimum token, building the same token ranges again, in a possibly infinite loop. To fix this we add another step when reaching the end of the reader's stream. When this happens it means we don't have any more fragments to consume until the end of the range, so we advance the current_token to the end of the range, simulating a partition, and check for built views in that range. Fixes scylladb/scylladb#21829 Closes scylladb/scylladb#22493	2025-01-30 14:35:18 +02:00
Nikos Dragazis	439862a8d4	test/cqlpy: Reproduce bug with exceeded limit on secondary index Add two cqlpy tests that reproduce a bug where a secondary index query returns more rows than the specified limit. This occurs when the indexed column is a partition key column or the first clustering key column, the query result spans multiple partitions, and the last partition causes the limit to be exceeded. `test/cqlpy/run --release ...` shows that the tests fail for Scylla versions all the way back to 4.4.0. Older Scylla versions fail with a syntax error in CQL query which suggests some incompatibility in the CQL protocol. That said, this bug is not a regression. The tests pass in Cassandra 5.0.2. Refs #22158. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#22513	2025-01-30 13:24:15 +02:00
Artsiom Mishuta	03606b8e22	test.py:topology_random_failures: enable tests deselected for #21534 removed tests deselectios for issue scylladb/scylladb#21534 as it closed now fixes: scylladb/scylladb#21711 Closes scylladb/scylladb#22424	2025-01-30 12:12:19 +01:00
aberry-21	69a0431cce	schema: add validation for PERCENTILE values in `speculative_retry` configuration This commit addresses issue #21825, where invalid PERCENTILE values for the `speculative_retry` setting were not properly handled, causing potential server crashes. The valid range for PERCENTILE is between 0 and 100, as defined in the documentation for speculative retry options, where values above 100 or below 0 are invalid and should be rejected. The added validation ensures that such invalid values are rejected with a clear error message, improving system stability and user experience. Fixes #21825 Closes scylladb/scylladb#21879	2025-01-30 11:34:46 +02:00
Nadav Har'El	698a63e14b	test/alternator: test for invalid B value in UpdateItem This patch adds an Alternator test for the case of UpdateItem attempting to insert in invalid B (bytes) value into an item. Values of type B use base64 encoding, and an attempt to insert a value which isn't valid base64 should be rejected, and this is what this test verifies. The new tests reproduce issue #17539, which claimed we have a bug in this area. However, test/alternator/run with the "--release" option shows that this bug existed in Scylla 5.2, but but fixed long ago, in 5.3 and doesn't exist in master. But we never had a regression test this issue, so now we do. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22029	2025-01-30 11:33:03 +02:00
Botond Dénes	af46894bb7	Merge 'Rack aware view pairing' from Benny Halevy Enabled with the tablets_rack_aware_view_pairing cluster feature rack-aware pairing pairs base to view replicas that are in the same dc and rack, using their ordinality in the replica map We distinguish between 2 cases: - Simple rack-aware pairing: when the replication factor in the dc is a multiple of the number of racks and the minimum number of nodes per rack in the dc is greater than or equal to rf / nr_racks. In this case (that includes the single rack case), all racks would have the same number of replicas, so we first filter all replicas by dc and rack, retaining their ordinality in the process, and finally, we pair between the base replicas and view replicas, that are in the same rack, using their original order in the tablet-map replica set. For example, nr_racks=2, rf=4: base_replicas = { N00, N01, N10, N11 } view_replicas = { N11, N12, N01, N02 } pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 } Note that we don't optimize for self-pairing if it breaks pairing ordinality. - Complex rack-aware pairing: when the replication factor is not a multiple of nr_racks. In this case, we attempt best-match pairing in all racks, using the minimum number of base or view replicas in each rack (given their global ordinality), while pairing all the other replicas, across racks, sorted by their ordinality. For example, nr_racks=4, rf=3: base_replicas = { N00, N10, N20 } view_replicas = { N11, N21, N31 } pairing would be: { N00, N31 }\, { N10, N11 }, { N20, N21 } \ cross-rack pair If we'd simply stable-sort both base and view replicas by rack, we might end up with much worse pairing across racks: { N00, N11 }\, { N10, N21 }\, { N20, N31 }\* \* cross-rack pair Fixes scylladb/scylladb#17147 * This is an improvement so no backport is required Closes scylladb/scylladb#21453 * github.com:scylladb/scylladb: network_topology_strategy_test: add tablets rack_aware_view_pairing tests view: get_view_natural_endpoint: implement rack-aware pairing for tablets view: get_view_natural_endpoint: handle case when there are too few view replicas view: get_view_natural_endpoint: track replica locator::nodes locator: topology: consult local_dc_rack if node not found by host_id locator: node: add dc and rack getters feature_service: add tablet_rack_aware_view_pairing feature view: get_view_natural_endpoint: refactor predicate function view: get_view_natural_endpoint: clarify documentation view: mutate_MV: optimize remote_endpoints filtering check view: mutate_MV: lookup base and view erms synchronously view: mutate_MV: calculate keyspace-dependent flags once	2025-01-30 11:32:19 +02:00
Botond Dénes	d8b8a6c5fc	Merge 'api: task_manager: do not unregister finish task when its status is queried' from Aleksandra Martyniuk Currently, when the status of a task is queried and the task is already finished, it gets unregistered. Getting the status shouldn't be a one-time operation. Stop removing the task after its status is queried. Adjust tests not to rely on this behavior. Add task_manager/drain API and nodetool tasks drain command to remove finished tasks in the module. Fixes: https://github.com/scylladb/scylladb/issues/21388. It's a fix to task_manager API, should be backported to all branches Closes scylladb/scylladb#22310 * github.com:scylladb/scylladb: api: task_manager: do not unregister tasks on get_status api: task_manager: add /task_manager/drain	2025-01-30 11:27:44 +02:00

1 2 3 4 5 ...

8234 Commits