scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Botond Dénes	cccf726b54	Merge '[Backport 2025.2] test: introduce upgrade tests to test.py, add a SSTable dict compression upgrade test' from Michał Chojnowski This PR adds an upgrade test for SSTable compression with shared dictionaries, and adds some bits to pylib and test.py to support that. In the series, we: 1. Mount $XDG_CACHE_DIR into dbuild. 2. Add a pylib function which downloads and installs a released ScyllaDB package into a subdirectory of $XDG_CACHE_DIR/scylladb/test.py, and returns the path to bin/scylla. 3. Add new methods and params to the cluster manager, which let the test start nodes with historical Scylla executables, and switch executables during the test. 4. Add a test which uses the above to run an upgrade test between the released package and the current build. 5. Add --run-internet-dependent-tests to test.py which lets the user of test.py skip this test (and potentially other internet-dependent tests in the future). (The patch modifying wait_for_cql_and_get_hosts is a part of the new test — the new test needs it to test how particular nodes in a mixed-version cluster react to some CQL queries.) This is a follow-up to https://github.com/scylladb/scylladb/pull/23025, split into a separate PR because the potential addition of upgrade tests to test.py deserved a separate thread. Needs backport to 2025.2, because that's where the tested feature is introduced. Fixes https://github.com/scylladb/scylladb/issues/24110 - (cherry picked from commit `63218bb094`) - (cherry picked from commit `cc7432888e`) - (cherry picked from commit `34098fbd1f`) - (cherry picked from commit `2ef0db0a6b`) - (cherry picked from commit `1ff7e09edc`) - (cherry picked from commit `5da19ff6a6`) - (cherry picked from commit `d3cb873532`) - (cherry picked from commit `dd878505ca`) Parent PR: https://github.com/scylladb/scylladb/pull/23538 Closes scylladb/scylladb#25158 * github.com:scylladb/scylladb: test: add test_sstable_compression_dictionaries_upgrade.py test.py: add --run-internet-dependent-tests pylib/manager_client: add server_switch_executable test/pylib: in add_server, give a way to specify the executable and version-specific config pylib: pass scylla_env environment variables to the topology suite test/pylib: add get_scylla_2025_1_executable() pylib/scylla_cluster: give a way to pass executable-specific options to nodes dbuild: mount "$XDG_CACHE_HOME/scylladb"	2025-08-07 06:26:25 +03:00
Dawid Mędrek	8e96968fb7	test: Enable RF-rack-valid keyspaces in all Python suites We're enabling the configuration option `rf_rack_valid_keyspaces` in all Python test suites. All relevant tests have been adjusted to work with it enabled. That encompasses the following suites: * alternator, * broadcast_tables, * cluster (already enabled in scylladb/scylladb@ee96f8dcfc), * cql, * cqlpy (already enabled in scylladb/scylladb@be0877ce69), * nodetool, * rest_api. Two remaining suites that use tests written in Python, redis and scylla_gdb, are not affected, at least not directly. The redis suite requires creating an instance of Scylla manually, and the tests don't do anything that could violate the restriction. The scylla_gdb suite focuses on testing the capabilities of scylla-gdb.py, but even then it reuses the `run` file from the cqlpy suite. Fixes scylladb/scylladb#25126 Closes scylladb/scylladb#24617 (cherry picked from commit `b41151ff1a`) Closes scylladb/scylladb#25230	2025-08-06 09:35:34 +03:00
Michał Chojnowski	1446b4e0ef	test.py: add --run-internet-dependent-tests Later, we will add upgrade tests, which need to download the previous release of Scylla from the internet. Internet access is a major dependency, so we want to make those tests opt-in for now. (cherry picked from commit `d3cb873532`)	2025-07-23 19:28:35 +02:00
Botond Dénes	6749954b2a	Merge '[Backport 2025.2] test.py: Fix start 3rd party services' from Scylladb[bot] Move 3rd party services starting under `try` clause to avoid situation that main process is collapses without going stopping services. Without this, if something wrong during start it will not trigger execution exit artifacts, so the process will stay forever. This functionality in 2025.2 and can potentially affect jobs, so backport needed. Fixes: #24773 - (cherry picked from commit `0ca539e162`) - (cherry picked from commit `c6c3e9f492`) Parent PR: #24734 Closes scylladb/scylladb#24774 * github.com:scylladb/scylladb: test.py: use unique hostname for Minio test.py: Catch possible exceptions during 3rd party services start	2025-07-15 13:23:12 +03:00
Andrei Chekun	4bc33c027d	test.py: use unique hostname for Minio To avoid situation that port is occupied on localhost, use unique hostname for Minio (cherry picked from commit `c6c3e9f492`)	2025-07-02 11:12:52 +02:00
Marcin Maliszkiewicz	6568065141	test: pylib: add ability to specify default authenticator during server_start Sometimes we may not want to use default cassandra role for control connection, especially when we test dropping default role. (cherry picked from commit 08bf7237f066cead133bf0cac9bba215f238070a)	2025-06-30 20:50:15 +02:00
Michał Chojnowski	06d6718f3b	pylib/manager_client: add server_switch_executable Add an util for switching the Scylla executable during the test. Will be used for upgrade tests. (cherry picked from commit `5da19ff6a6`)	2025-06-18 13:50:38 +00:00
Michał Chojnowski	b5591422c6	test/pylib: in add_server, give a way to specify the executable and version-specific config This will be used for upgrade tests. The cluster will be started with an older executable and without configs specific to newer versions. (cherry picked from commit `1ff7e09edc`)	2025-06-18 13:50:38 +00:00
Michał Chojnowski	043eaf099a	pylib: pass scylla_env environment variables to the topology suite I want to add an upgrade test under the topology suite. To work, it will have to know the path to the tested Scylla executable, so that it can switch the nodes to it. The path could be passed by various means and I'm not sure which what method is appropriate. In some other places (e.g. the cql suite) we pass the path via the `SCYLLA` environment variable and this patch follows that example. `PythonTestSuite` (parent class of `TopologySuite`) already has that variable set in `self.scylla_env`, and passes it around. However, `TopologySuite` uses its own `run()`, and so it implicitly overrides the decision to pass `self.scylla_env` down. This patch changes that, and after the patch we apply the `self.scylla_env` to the environment for topology tests. This might has some unforeseen side effects for coverage measurement, because AFAICS the (only) other variable in `self.scylla_env` is `LLVM_PROFILE_FILE`. But topology tests don't run Scylla executables themselves (they only send command to the cluster manager started externally), so I figure there should be no change. (cherry picked from commit `2ef0db0a6b`)	2025-06-18 13:50:38 +00:00
Michał Chojnowski	5e2b3be754	test/pylib: add get_scylla_2025_1_executable() Adds a function which downloads and installs (in `~/.cache`) the Scylla 2025.1, for upgrade tests. Note: this introduces an internet dependency into pylib, AFAIK the first one. We already have some other code for downloading existing Scylla releases, written for different purposes, in `cqlpy/fetch_scylla.py`. I made zero effort to reuse that in any way. Note: hardcoding the package version might be uncool, but if we want "better" version selection (e.g. the newest patch version in the given branch), we should have a separate library (or web service) for that, and share it with CCM/SCT. If we add a separate automatic version selection mechanism here, we are going to end up with yet another half-broken Scylla version selector, with yet different syntax and semantics than the other ones. We never clear the downloaded and unpacked files. This could become a problem in the future. (At which point we can add some mechanism that deletes cached archives downloaded more than a week ago.) (cherry picked from commit `34098fbd1f`)	2025-06-18 13:50:38 +00:00
Michał Chojnowski	d141b730fc	pylib/scylla_cluster: give a way to pass executable-specific options to nodes I'm trying to adapt pylib to multi-version tests. (Where the Scylla cluster is upgraded to a newer Scylla version during the test). Before this patch, the initial config (where "config" == yaml file + CLI args) of the nodes is hardcoded in scylla_cluster.py. The problem is that this config might not apply to past versions, so we need some way to give them a different config. (For example, with the config as it is before the patch, a Scylla 2025.1 executable would not boot up because it does not know the `group0_voter_handler` logger). In this patch, we create a way to attach version-specific config to the executable passed to ScyllaServer. (cherry picked from commit `cc7432888e`)	2025-06-18 13:50:37 +00:00
Michael Litvak	d094bc6fc9	test: tablets: add get_tablet_info helper Add a helper for tests to get the tablet info from system.tablets for a tablet owning a given token. (cherry picked from commit `fb18fc0505`)	2025-06-17 13:59:10 +00:00
Robert Bindar	b62264e1d9	Add nodetool refresh --scope option This change adds the --scope option to nodetool refresh. Like in the case of nodetool restore, you can pass either of: * node - On the local node. * rack - On the local rack. * dc - In the datacenter (DC) where the local node lives. * all (default) - Everywhere across the cluster. as scope. The feature is based on the existing load_and_stream paths, so it requires passing --load-and-stream to the refresh command. Also, it is not compatible with the --primary-replica-only option. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#23861 (cherry picked from commit `c570941692`)	2025-06-04 11:59:17 +03:00
Dawid Mędrek	f5cf4a3893	test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair We assign the newly created nodes to multiple racks. If RF <= 3, we create as many racks as the provided RF. We disallow the case of RF > 3 to avoid trying to create an RF-rack-invalid keyspace; note that no existing test calls `create_table_insert_data_for_repair` providing a higher RF. The rationale for doing this is we want to ensure that the tests calling the function can be run with the `rf_rack_valid_keyspaces` configuration option enabled. (cherry picked from commit `5d1bb8ebc5`)	2025-05-12 13:10:12 +00:00
Botond Dénes	4df6a17d30	test/cluster: extract execute_with_tracing() into pylib/util.py To allow reuse in other tests. (cherry picked from commit `51025de755`)	2025-05-07 13:26:08 +00:00
Andrei Chekun	22ef09489d	test.py: add awareness of extra_scylla_cmdline_options test_config.yaml can have field extra_scylla_cmdline_options that previously was not added to the commandline to start Scylla. Now any extra options will be added to commandline to start tests	2025-04-24 14:05:50 +02:00
Andrei Chekun	2758c4a08e	test.py: increase timeout for C++ tests in pytest Current timeouts it not enough. Tests failed randomly with hitting timeout. This will allow to test finish normally. As a downside if the process will hang we will be waiting more. This adjustments will be changed after we will have metrics how long it takes to test to pass in each mode.	2025-04-24 14:05:50 +02:00
Andrei Chekun	f5c88e1107	test.py: switch method of finding the root repo directory Switching to use constant defined in __init__ filet instead of getting the root directory from pytest's config. This is will allow to have only one source of truth in defining the root directory of the project to avoid cases when root directory defined incorrectly. This change also simplifies potential changes in future.	2025-04-24 14:05:50 +02:00
Andrei Chekun	06eca04370	test.py: move get_combined_tests to the correct facade Since get_combined_tests method is used only for boost tests and not all C++ tests, moving it into the correct place	2025-04-24 14:05:49 +02:00
Andrei Chekun	8cc9c0a53a	test.py: add common directory for reports When test.py executing python test it executes it by mode and by file, so it can say where the report should with mode. With new approach pytest will execute the tests for all modes inside himself, and we can only have one report per pytest invocation. That's why we need common directory for reports and not under the mode directory. It can later be used for simplification, so any report should be there.	2025-04-24 14:05:49 +02:00
Andrei Chekun	b791af1f16	test.py: add the possibility to provide additional env vars This will allow inject any environment variable to the test, because previosly it was taking only the environment variables from the process. Adding injecting ASAN and UBSAN variablet to the tests	2025-04-24 14:05:49 +02:00
Andrei Chekun	3cb5838619	test.py: move setup cgroups to the generic method This changes needed for later integration for pytest executing the C++ tests to be able to gather resource metric.	2025-04-24 14:05:49 +02:00
Andrei Chekun	ca615af407	test.py: refactor resource_gather.py Refactor resource_gather.py to not create the initial cgroup when the process it's already in it. This will allow not going deeper, creating again and again the same cgroup with each test.py execution when the terminal isn't closed. Add creation of own event loop in case it's not exists. This needed to be able to work with test.py that creates loop and with pytest that not create loop.	2025-04-24 14:05:49 +02:00
Andrei Chekun	57b66e6b2e	test.py: move the readme file for LDAP tests to the correct location README file was created in incorrect location, now it moved to the directory with source files where it intended to be.	2025-04-22 19:03:28 +02:00
Andrei Chekun	cf4747c151	test.py: eliminate deprecation warning for xml.etree.ElementTree.Element Testing the truth value of an Element emits DeprecationWarning. This check is done correctly	2025-04-22 19:03:21 +02:00
Andrei Chekun	5c3501e4bf	test.py: fix typo in toxiproxy name parameter Fix typo in toxiproxy name parameter. No any functional changes just cosmetic fix.	2025-04-22 19:02:12 +02:00
Andrei Chekun	2c37a793d1	test.py: add locking to the sqlite writer for resource gather SQLite blocking the DB during writes, so it's not possible to make writes from several thread. To be able to gather metrics in several threads, we need a locking mechanism for threads during writes. So thread will not try to write metrics while another thread is performing writes.	2025-04-22 19:01:30 +02:00
Andrei Chekun	800710dc2c	test.py: add sqlite datetime adapter for resource gather Add sqlite datetime adapter for resource gather since default adapters are deprecated from 3.12	2025-04-22 18:59:49 +02:00
Andrei Chekun	bf2a9e267e	test.py: change the parameter for get_modes_to_run() Change the parameter for get_modes_to_run() from session to config to narrow the scope, and prepare it to later use in method that do not have access to the session, but have access to the config object	2025-04-22 18:58:33 +02:00
Andrei Chekun	441cee8d9c	test.py: fix gathering logs in case of fail Currently log files have information about run_id twice: cluster.object_store_test_backup.10.test_abort_restore_with_rpc_error.dev.10_cluster.log However, sometimes the first run_id can be incorrect: cluster.object_store_test_backup.1.test_abort_restore_with_rpc_error.dev.10_cluster.log Removing first run_id in the name to not face this issue and because it's actually redundant. Removing creation empty file for scylla manager log, since it redundant and was done as incorrect assumption on the root cause of the fail. Add extension to the stacktrace file, so it will be opened in the browser in Jenkins in the new tab instead of downloading it. Fixes: https://github.com/scylladb/scylladb/issues/23731 Closes scylladb/scylladb#23797	2025-04-21 13:12:35 +03:00
Nadav Har'El	fbcf77d134	raft: make group0 Raft operation timeout configurable A recent commit `370707b111` (re)introduced a timeout for every group0 Raft operation. This timeout was set to 60 seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody". However, one of the things we do as a group0 operation is schema changes, and we already noticed a few years ago, see commit `0b2cf21932`, that in some extremely overloaded test machines where tests run hundreds of times (!) slower than usual, a single big schema operation - such as Alternator's DeleteTable deleting a table and multiple of its CDC or view tables - sometimes takes more than 60 seconds. The above fix changed the client's timeout to wait for 300 seconds instead of 60 seconds, but now we also need to increase our Raft timeout, or the server can time out. We've seen this happening recently making some tests flaky in CI (issue #23543). So let's make this timeout configurable, as a new configuration option group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e, 60 seconds), the same as the existing default. The test framework overrides this default with a a higher 300 second timeout, matching the client-side timeout. Before this patch, this timeout was already configurable in a strange way, using injections. But this was a misstep: We already have more than a dozen timeouts configurable through the normal configration, and this one should have been configured in the same way. There is nothing "holy" about the default of 60 seconds we chose, and who knows maybe in the future we might need to tweek it in the field, just like we made the other timeouts tweakable. Injections cannot be used in release mode, but configuration options can. Fixes #23543 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23717	2025-04-15 10:57:39 +03:00
Andrei Chekun	8e33d7ab81	test.py: Make the testpy log files in pytest follow the same format Fix the incorrect log file names between conftest and scylla_manager. This regression issue, was introduced in #22960. Currently, scylla manager will output it's logs to the file with the next pattern: suite_name.path_to_the_test_file_with_subfolders.run_id.function_name.mode.run_id_cluster.log On the same time pytest will try to find this log with next name: suite_name.file_name_without_subfolders_path.py.run_id.function_name.mode.run_id_cluster.log This inconsistency leads to the situation when the test failed, scylla manager log file will not be copied to the failed_test directory and test will have exception on teardown. Closes scylladb/scylladb#23596	2025-04-14 12:52:48 +03:00
Evgeniy Naydanov	d6b64642c5	test.py: print out path to Scylla log for Python test suites Test suites with `type: Python` are using single Scylla node created by test.py, but it's handy to print a path to a log file in pytest log too to make it easier to find the file on failures. Closes scylladb/scylladb#23683	2025-04-14 11:15:37 +03:00
Patryk Jędrzejczak	07a7a75b98	Merge 'raft: implement the limited voters feature' from Emil Maskovsky Currently if raft is enabled all nodes are voters in group0. However it is not necessary to have all nodes to be voters - it only slows down the raft group operation (since the quorum is large) and makes deployments with asymmetrical DCs problematic (2 DCs with 5 nodes along 1 DC with 10 nodes will lose the majority if large DC is isolated). The topology coordinator will now maintain a state where there are only limited number of voters, evenly distributed across the DCs and racks. After each node addition or removal the voters are recalculated and rebalanced if necessary. That means: * When a new node is added, it might become a voter depending on the current distribution of voters - either if there are still some voter "slots" available, or if the new node is a better candidate than some existing voter (in which case the existing node voter status might be revoked). * When a voter node is removed or stopped (shut down), its voter status is revoked and another node might become a voter instead (this can also depend on other circumstances, like e.g. changing the number of DCs). * If a node addition or removal causes a change in number of data centers (DCs) or racks, the rebalance action might become wider (as there are some special rules applying to 1 vs 2 vs more DCs, also changing the number of racks might cause similar effects in the voters distribution) Special conditions for various number of DCs: * 1 DC: Can have up to the maximum allowed number of voters (5 - see below) * 2 DCs: The distribution of the voters will be asymmetric (if possible), meaning that we can tolerate a loss of the DC with the smaller number of voters (if both would have the same number of voters we'd lose majority if any of the DCs is lost). For example, if we have 2 DCs with 2 nodes each, one of them will only have 1 voter (despite the limit of 5). Also, if one of the 2 DCs has more racks than the other and the node count allows it, the DC with the more racks will have more voters. * 3 and more DCs: The distribution of the voters will be so that every DC has strictly less than half of the total voters (so a loss of any of the DCs cannot lead to the majority loss). Again, DCs with more racks are being preferred in the voter distribution. At the moment we will be handling the zero-token nodes in the same way as the regular nodes (i.e. the zero-token nodes will not take any priority in the voter distribution). Technically it doesn't make much sense to have a zero-token node that is not a voter (when there are regular nodes in the same DC being voters), but currently the intended purpose of zero-token nodes is to form an "arbiter DC" (in case of 2 DCs, creating a third DC with zero-token nodes only), so for that intended purpose no special handling is needed and will work out of the box. If a preference of zero token nodes will eventually be needed/requested, it will be added separately from this PR. The maximum number of voters of 5 has been chosen as the smallest "safe" value. We can lose majority when multiple nodes (possibly in different dcs and racks) die independently in a short time span. With less than 5 voters, we would lose majority if 2 voters died, which is very unlikely to happen but not entirely impossible. With 5 voters, at least 3 voters must die to lose majority, which can be safely considered impossible in the case of independent failures. Currently the limit will not be configurable (we might introduce configurable limits later if that would be needed/requested). Tests added: * boost/group0_voter_registry_test.cc: run time on CI: ~3.5s * topology_custom/test_raft_voters.py: parametrized with 1 or 3 nodes per DC, the run time on CI: 1: ~20s. 3: ~40s, approx 1 min total Fixes: scylladb/scylladb#18793 No backport: This is a new feature that will not be backported. Closes scylladb/scylladb#21969 * https://github.com/scylladb/scylladb: raft: distribute voters by rack inside DC raft/test: fix lint warnings in `test_raft_no_quorum` raft/test: add the upgrade test for limited voters feature raft topology: handle on_up/on_down to add/remove node from voters raft: fix the indentation after the limited voters changes raft: implement the limited voters feature raft: drop the voter removal from the decommission raft/test: disable the `stop_before_becoming_raft_voter` test raft/test: stop the server less gracefully in the voters test	2025-04-10 15:29:15 +02:00
Botond Dénes	e5afd9b5fb	test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts Such that a given index in the return hosts refers to the same underlying Scylla instance, as the same index in the passed-in nodes list. This is what users of this method intuitively expect, but currently the returned hosts list is unordered (has random order).	2025-04-08 00:11:36 -04:00
Emil Maskovsky	1d06ea3a5a	raft: implement the limited voters feature Currently if raft is enabled all nodes are voters in group0. However it is not necessary to have all nodes to be voters - it only slows down the raft group operation (since the quorum is large) and makes deployments with asymmetrical DCs problematic (2 DCs with 5 nodes along 1 DC with 10 nodes will lose the majority if large DC is isolated). The topology coordinator will now maintain a state where there are only limited number of voters, evenly distributed across the DCs and racks. After each node addition or removal the voters are recalculated and rebalanced if necessary. That means: * When a new node is added, it might become a voter depending on the current distribution of voters - either if there are still some voter "slots" available, or if the new node is a better candidate than some existing voter (in which case the existing node voter status might be revoked). * When a voter node is removed or stopped (shut down), its voter status is revoked and another node might become a voter instead (this can also depend on other circumstances, like e.g. changing the number of DCs). * If a node addition or removal causes a change in number of datacenters (DCs) or racks, the rebalance action might become wider (as there are some special rules applying to 1 vs 2 vs more DCs, also changing the number of racks might cause similar effects in the voters distribution) Special conditions for various number of DCs: * 1 DC: Can have up to the maximum allowed number of voters (5 - see below) * 2 DCs: The distribution of the voters will be asymmetric (if possible), meaning that we can tolerate a loss of the DC with the smaller number of voters (if both would have the same number of voters we'd lose the majority if any of the DCs is lost). For example, if we have 2 DCs with 2 nodes each, one of them will only have 1 voter (despite the limit of 5). Also, if one of the 2 DCs has more racks than the other and the node count allows it, the DC with the more racks will have more voters. * 3 and more DCs: The distribution of the voters will be so that every DC has strictly less than half of the total voters (so a loss of any of the DCs cannot lead to the majority loss). Again, DCs with more racks are being preferred in the voter distribution. At the moment we will be handling the zero-token nodes in the same way as the regular nodes (i.e. the zero-token nodes will not take any priority in the voter distribution). Technically it doesn't make much sense to have a zero-token node that is not a voter (when there are regular nodes in the same DC being voters), but currently the intended purpose of zero-token nodes is to form an "arbiter DC" (in case of 2 DCs, creating a third DC with zero-token nodes only), so for that intended purpose no special handling is needed and will work out of the box. If a preference of zero token nodes will eventually be needed/requested, it will be added separately from this PR. Currently the voter limits will not be configurable (we might introduce configurable limits later if that would be needed/requested). The feature is enabled by the `group0_limited_voters` feature flag to avoid issues with cluster upgrade (the feature will be only enabled once all nodes in the cluster are upgraded to the version supporting the feature). Fixes: scylladb/scylladb#18793	2025-04-07 12:31:18 +02:00
Tomasz Grabiec	fe8187e594	Merge 'repair: release erm in repair_writer_impl::create_writer when possible' from Aleksandra Martyniuk Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed. Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked. Fixes: #23453. Needs backport to 2025.1 that introduces the tablet repair scheduler. Closes scylladb/scylladb#23455 * github.com:scylladb/scylladb: \test: add test to check concurrent migration and repair of two different tablets repair: release erm in repair_writer_impl::create_writer when possible	2025-04-03 11:15:08 +02:00
Aleksandra Martyniuk	bae6711809	\test: add test to check concurrent migration and repair of two different tablets	2025-04-02 15:30:17 +02:00
Pavel Emelyanov	2ee9cec1d3	Merge 'Remove object_storage.yaml and move the endpoints to scylla.yaml' from Robert Bindar Move `object_storage.yaml` endpoints to `scylla.yaml` This change also removes the `object_storage.yaml` file altogether and adds tests for fetching the endpoints via the `v2/config/object_storage_endpoints` REST api. Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed. This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f Refs https://github.com/scylladb/scylladb/issues/22428 Closes scylladb/scylladb#22952 * github.com:scylladb/scylladb: Remove db::config::object_storage_config Move `object_storage.yaml` endpoints to `scylla.yaml`	2025-04-01 16:01:44 +03:00
Avi Kivity	69684e16d8	Merge 'sstables: add SSTable compression with shared dictionaries ' from Michał Chojnowski This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes: - We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards. - We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options). - We add a cluster feature which guards the creation of dictionary-compressed SSTables. - We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards. - We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries. - We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that). - We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio. - We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement. Known imperfections: - The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first. New feature, no backporting involved. Closes scylladb/scylladb#23025 * github.com:scylladb/scylladb: docs: add user-facing documentation for SSTable compression with shared dicts docs/dev: add sstable-compression-dicts.md test: add test_sstable_compression_dictionaries_autotrain.py test: add test_sstable_compression_dictionaries_basic.py test/pylib/rest_client: add `keyspace_upgrade_sstables` helper main: run a sstable_dict_autotrainer api: add the estimate_compression_ratios API call dict_autotrainer: introduce sstable_dict_autotrainer db/system_keyspace: add query_dict_timestamp compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor main: clean up sstable compression dicts after table drops sstables/compress: discard hidden compression options after the decompressor is created compress: change compressor_ptr from shared_ptr to unique_ptr api: add the retrain_dict API call storage_service: add some dict-related routines main: in compression_dict_updated_callback, recognize and use SSTable compression dicts storage_service: add do_sample_sstables() messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback database: add sample_data_files() database: add take_sstable_set_snapshot() compress: teach `lz4_processor` about dictionaries compress: teach `zstd_processor` about dictionaries sstables: delegate compressor creation to the compressor factory sstables: plug an `sstable_compressor_factory` into `sstables_manager` sstables: introduce sstable_compressor_factory utils/hashers: add get_sha256() gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature compress: add hidden dictionary options compress: remove `compression_parameters::get_compressor()` sstables/compress: remove get_sstable_compressor() sstables/compress: move ownership of `compressor` to `sstable::compression` compress: remove compressor::option_names() compress: clean up the constructor of zstd_processor compress: squash zstd.cc into compress.cc sstables/compress: break the dependency of `compression_parameters` on `compressor` compress.hh: switch compressor::name() from an instance member to a virtual call bytes: adapt fmt_hex to std::span<const std::byte>	2025-04-01 12:47:34 +03:00
Botond Dénes	0fdf2a2090	Merge 'test/pylib: servers_add: support list of property_files' from Benny Halevy So that a multi-dc/multi-rack cluster can be populated in a single call. * Enhancement, no backport required Closes scylladb/scylladb#23341 * github.com:scylladb/scylladb: test/pylib: servers_add: add auto_rack_dc parameter test/pylib: servers_add: support list of property_files	2025-04-01 09:14:20 +03:00
Michał Chojnowski	7b0eeefd79	test/pylib/rest_client: add `keyspace_upgrade_sstables` helper	2025-04-01 00:07:30 +02:00
Michał Chojnowski	a19d6d95f7	api: add the estimate_compression_ratios API call Add an API call which estimates the effectiveness of possible compression config changes. This can be used to make an informed decision about whether to change the compression method, without actually recompressing any SSTables.	2025-04-01 00:07:30 +02:00
Michał Chojnowski	58ae278d10	api: add the retrain_dict API call Add an API call which will retrain the SSTable compression dictionary for a given table. Currently, it needs all nodes to be alive to succeed. We can relax this later.	2025-04-01 00:07:29 +02:00
Robert Bindar	e3a3508960	Move `object_storage.yaml` endpoints to `scylla.yaml` This change also removes the `object_storage.yaml` file altogether and adds tests for fetching the endpoints via the `v2/config/object_storage_endpoints` REST api. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2025-03-31 13:39:39 +03:00
Benny Halevy	a4aa4d74c1	test/pylib: servers_add: add auto_rack_dc parameter To quickly populate nodes in a single dc, each node in its own rack. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-30 19:23:40 +03:00
Benny Halevy	c4dbb11c87	test/pylib: servers_add: support list of property_files So that a multi-dc/multi-rack cluster can be populated in a single call. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-30 19:12:39 +03:00
Evgeniy Naydanov	1a0c14aa50	test.py: async_cql: remove unused event_loop fixture Newer version of pytest-asyncio (0.24.0) allows to control the scope of async loop per fixture. Don't need this workaround anymore.	2025-03-30 03:19:30 +00:00
Evgeniy Naydanov	9bba59631f	test.py: add xdist worker ID to log filenames When run tests in parallel we need to ensure that filenames are unique by adding xdist worker ID to them.	2025-03-30 03:19:30 +00:00
Evgeniy Naydanov	9cb0ec2b42	test.py: topology: run tests using bare pytest command Run ScyllaClusterManager using pytest fixture if `--manager-api` option is not provided. On this stage we're trying to be as close to test.py as possible. test.py runs tests file-by-file, so, effectively, scopes `session`, `package`, and `module` are pretty same. Also, test.py starts ScyllaClusterManager for every test module and this is the reason why fixture `manager_api_sock_path` has scope=`module`. And, in result, we need to change scope for fixture `manager_internal` too.	2025-03-30 03:19:29 +00:00

1 2 3 4 5 ...

596 Commits