Commit Graph

606 Commits

Author SHA1 Message Date
Andrzej Jackowski
7dc0c4cf4f test: close logfile/socket_dir for stopped servers in recycle_cluster
PythonTestSuite::recycle_cluster is a function that releases resources
of an old, dirty cluster to make it reusable. It closes log_file and
maintenance_socket_dir for running nodes in a dirty cluster, however it
doesn't do the same for stopped nodes. It leads to leakage of file
descriptors of stopped nodes, which in turn can lead to hitting ulimit
of open files (that is often 1024) if the leaking test is repeated with
`./test.py --repeat ...`. The problem was detected when tests from
`test/cluster/dtest/` directory were executed with high `repeat` value.

This commit extends `recycle_cluster` to close and cleanup logfile and
`socket_dir` for nodes that are stopped (because self.servers in
ScyllaCluster is ChainMap of self.running and self.stopped).

Closes scylladb/scylladb#24243
2025-05-27 08:37:43 +03:00
Andrei Chekun
537054bfad test.py: add support for coverage for boost test
This PR adds the possibility to gather coverage for the boost tests when they're executed with pytest. Since the pytest will be used as the main runner for boost tests as well, we need this before switching the runners.
2025-05-23 12:54:54 +02:00
Andrei Chekun
c5a7f3415c test.py: get the temp dir from facade
No need to get the temp dir from the options when facade has this information already.
2025-05-23 12:54:48 +02:00
Andrei Chekun
8812b14078 test.py: add missing parameter for boost tests for pytest runner
Since we are running tests with a pytest, we don't need a report at the end of the run.
2025-05-21 19:41:41 +02:00
Andrei Chekun
66b014621e test.py: add support for boost_data_test_case in combined tests
Change the parsing logic of combined tests to support a case when boost_data_test_case used that produced additional lines in the output.
2025-05-21 19:41:41 +02:00
Andrei Chekun
88d24d8ad5 test.py: clean log files after a successful run
Clean different output files from the boost and unit tests.
Move logs for boost test to the testlog directory instead of having additional directory pytest
2025-05-21 19:41:41 +02:00
Andrei Chekun
a956dd8770 test.py: attach output of the boost test to the report
Added attaching the output of the test in case of fail to the Allure report
2025-05-21 19:41:39 +02:00
Andrei Chekun
ac86cc9f6d test.py: fix metrics DB location
Fix the issue introduced with scylladb/scylladb#22960. Suite log dir was changed, and the path for metrics DB was relying on it. As a result, DB is now located in the mode directory instead of the root of the testlog.
2025-05-21 15:37:15 +02:00
Andrei Chekun
b5b69710bd test.py: move run_process to resource_gather.py
Move the run_process method to the resource gather instance, since we need to start monitor to check memory consumption in the cgroup. Since resource_gather needs test.py test object, and pytest has no clue about it, adding a simple namespace object to emulate such a test object. It needed only to gather some information regarding the test to be able to add records to the DB.
Since we have two facades that can share the same run process procedure, adding a common method to handle this to avoid code duplication.
2025-05-21 15:34:34 +02:00
Andrei Chekun
3bcd6db718 test.py: unify using constant for finding repo root directory
Instead of finding dynamically the repo root directory relatively to the temp dir, that's in most cases in the repo, will fail if a non-default temp dir parameter is used. Additionally, to have the single source of truth of finding the repo root directory switching to the constants.
2025-05-21 15:34:34 +02:00
Andrei Chekun
4e18444831 test.py: refactor run_process in facade.py
Add injecting environment variables to the process
Switch from print to propper logger
Set buffer size to 1 to avoid losing any data from the boost test if the test collapsed.
Currently, run process logs and return stdout and stderr, but boost tests are using stderr only. So stderr redirected to stdout. This helps with Jenkins as well, since we are reducing the number of files to store.
2025-05-21 15:34:34 +02:00
Andrei Chekun
38310975c5 test.py: add the possibility to create a test alike object
resource_gather.py needs test.py test object to work. It needs some information about the test to be able to write down this information to the DB with metrics. When running with pytest, there's no such test object, that's why adding make_test_object to mimic the test.py's test object.
Switching the getting the mode for constructing path to chgroup to test
instead of suite. They are the same, but this helps to have emulate less
in make_test_object method.
2025-05-21 15:34:34 +02:00
Evgeniy Naydanov
57c1035146 test.py: migrate alternator_tests.py from dtest
The test almost unmodified except remove unneeded skipif mark
and unused imports.
2025-05-19 12:27:32 +00:00
Evgeniy Naydanov
ac1551892b test.py: initial implementation of dtest/ccm shim
Use universalasync library to make test.py async code compatible
with synchronous code of dtest/ccm

Also, copied unmodified error_example_test.py from dtest as an example.

Run the test in `dev` mode only.
2025-05-19 12:27:31 +00:00
Evgeniy Naydanov
2cb640f95c test.py: manager: add server_get_returncode() method
The method return None if Scylla process is still running or returncode.
If there is no Scylla process launched then raise NoSuchProcess exception.
2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
d874beb17f test.py: manager: change CLI and env options on a node start
Add parameters to server_start() method to provide ability to
change Scylla' CLI and env options on a node start.

Also, add `expected_server_up_state` parameter as we have for
server_add() method.
2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
5d3b54aa9b test.py: REST API: add set_trace_probability() method 2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
a16a4b6171 test.py: REST API: add get_tokens() method
Get a list of the tokens for the specified node.
Optional `endpoint` parameter can be provided.
2025-05-19 11:50:55 +00:00
Evgeniy Naydanov
f6e3fdd778 test.py: rework log_browsing for dtest migration
Rework `ScyllaLogFile.wait_for()` method to make it easier
to add required methods to ScyllaNode class of ccm-like shim.

Also, added `ScyllaLogFile.grep_for_errors()` method and
reworked `ScyllaLogFile.grep()`
2025-05-19 11:50:55 +00:00
Andrei Chekun
c33c0d62e1 test.py: change pattern for cleaning .log files in testlog directory
Currently, test.py will delete recursively all .log files under the
testlog directory instead of cleaning only on testlog directory. With
this change it will not go deeper to delete log files. We still have a
method for cleaning the log files in modes directories.
The downside of this solution, that we will need to explicitly tell all
directories that we want to clean.

Fixes: https://github.com/scylladb/scylladb/issues/24001

Closes scylladb/scylladb#24004
2025-05-13 13:58:36 +03:00
Dawid Mędrek
5d1bb8ebc5 test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair
We assign the newly created nodes to multiple racks. If RF <= 3,
we create as many racks as the provided RF. We disallow the case
of  RF > 3 to avoid trying to create an RF-rack-invalid keyspace;
note that no existing test calls `create_table_insert_data_for_repair`
providing a higher RF. The rationale for doing this is we want to ensure
that the tests calling the function can be run with the
`rf_rack_valid_keyspaces` configuration option enabled.
2025-05-10 16:30:37 +02:00
Tomasz Grabiec
fadfbe8459 Merge 'transport: storage_proxy: release ERM when waiting for query timeout' from Andrzej Jackowski
Before this change, if a read executor had just enough targets to
achieve query's CL, and there was a connection drop (e.g. node failure),
the read executor waited for the entire request timeout to give drivers
time to execute a speculative read in a meantime. Such behavior don't
work well when a very long query timeout (e.g. 1800s) is set, because
the unfinished request blocks topology changes.

This change implements a mechanism to thrown a new
read_failure_exception_with_timeout in the aforementioned scenario.
The exception is caught by CQL server which conducts the waiting, after
ERM is released. The new exception inherits from read_failure_exception,
because layers that don't catch the exception (such as mapreduce
service) should handle the exception just a regular read_failure.
However, when CQL server catch the exception, it returns
read_timeout_exception to the client because after additional waiting
such an error message is more appropriate (read_timeout_exception was
also returned before this change was introduced).

This change:
- Rewrite cql_server::connection::process_request_one to use
  seastar::futurize_invoke and try_catch<> instead of utils::result_try
- Add new read_failure_exception_with_timeout and throws it in storage_proxy
- Add sleep in CQL server when the new exception is caught
- Catch local exceptions in Mapreduce Service and convert them
   to std::runtime_error.
- Add get_cql_exclusive to manager_client.py
- Add test_long_query_timeout_erm

No backport needed - minor issue fix.

Closes scylladb/scylladb#23156

* github.com:scylladb/scylladb:
  test: add test_long_query_timeout_erm
  test: add get_cql_exclusive to manager_client.py
  mapreduce: catch local read_failure_exception_with_timeout
  transport: storage_proxy: release ERM when waiting for query timeout
  transport: remove redundant references in process_request_one
  transport: fix the indentation in process_request_one
  transport: add futures in CQL server exception handling
2025-05-08 12:45:49 +02:00
Pavel Emelyanov
6389099dfb Merge 'test/cluster/test_read_repair.py: improve trace logging test (again)' from Botond Dénes
The test test_read_repair_with_trace_logging wants to test read repair with trace logging. Turns out that node restart + trace-level logging + debug mode is too much and even with 1 minute timeout, the read repair     times out sometimes. Refactor the test to use injection point instead of restart. To make sure the test still tests what it supposed to test, use tracing to assert that read repair did indeed happen.

Fixes: scylladb/scylladb#23968

Needs backport to 2025.1 and 6.2, both have the flaky test

Closes scylladb/scylladb#23989

* github.com:scylladb/scylladb:
  test/cluster/test_read_repair.py: improve trace logging test (again)
  test/cluster: extract execute_with_tracing() into pylib/util.py
2025-05-07 10:32:45 +03:00
Botond Dénes
51025de755 test/cluster: extract execute_with_tracing() into pylib/util.py
To allow reuse in other tests.
2025-05-02 01:53:35 -04:00
Andrei Chekun
22ef09489d test.py: add awareness of extra_scylla_cmdline_options
test_config.yaml can have field extra_scylla_cmdline_options that
previously was not added to the commandline to start Scylla. Now any
extra options will be added to commandline to start tests
2025-04-24 14:05:50 +02:00
Andrei Chekun
2758c4a08e test.py: increase timeout for C++ tests in pytest
Current timeouts it not enough. Tests failed randomly with hitting
timeout. This will allow to test finish normally. As a downside if the
process will hang we will be waiting more. This adjustments will be
changed after we will have metrics how long it takes to test to pass in
each mode.
2025-04-24 14:05:50 +02:00
Andrei Chekun
f5c88e1107 test.py: switch method of finding the root repo directory
Switching to use constant defined in __init__ filet instead of getting
the root directory from pytest's config. This is will allow to have only
one source of truth in defining the  root directory of the project to
avoid cases when root directory defined incorrectly. This change also
simplifies potential changes in future.
2025-04-24 14:05:50 +02:00
Andrei Chekun
06eca04370 test.py: move get_combined_tests to the correct facade
Since get_combined_tests method is used only for boost tests and not all C++ tests, moving it into the correct place
2025-04-24 14:05:49 +02:00
Andrei Chekun
8cc9c0a53a test.py: add common directory for reports
When test.py executing python test it executes it by mode and by file,
so it can say where the report should with mode. With new approach
pytest will execute the tests for all modes inside himself, and we can
only have one report per pytest invocation. That's why we need common
directory for reports and not under the mode directory. It can later be
used for simplification, so any report should be there.
2025-04-24 14:05:49 +02:00
Andrei Chekun
b791af1f16 test.py: add the possibility to provide additional env vars
This will allow inject any environment variable to the test, because
previosly it was taking only the environment variables from the process.
Adding injecting ASAN and UBSAN variablet to the tests
2025-04-24 14:05:49 +02:00
Andrei Chekun
3cb5838619 test.py: move setup cgroups to the generic method
This changes needed for later integration for pytest executing the C++
tests to be able to gather resource metric.
2025-04-24 14:05:49 +02:00
Andrei Chekun
ca615af407 test.py: refactor resource_gather.py
Refactor resource_gather.py to not create the initial cgroup when the process it's already in it. This will allow not going deeper, creating again and again the same cgroup with each test.py execution when the terminal isn't closed.
Add creation of own event loop in case it's not exists. This needed to be able to work with
test.py that creates loop and with pytest that not create loop.
2025-04-24 14:05:49 +02:00
Andrzej Jackowski
1f1e4f09cd test: add get_cql_exclusive to manager_client.py
This commit adds to ManagerClient a get_cql_exclusive function that
allows creating a cql connection with WhiteListRoundRobinPolicy for
a single server. Such connection is useful in tests that kill nodes to
make sure that the live node handles the queries. Before this commit,
some tests used cluster_con from test/cluster/conftest.py, and after
this commit test can start to use a method from MangerClient.

This change:
 - Extend ManagerClient con_gen type to allow LoadBalancingPolicy arg
 - Implement get_cql_exclusive()
2025-04-23 09:29:47 +02:00
Andrei Chekun
57b66e6b2e test.py: move the readme file for LDAP tests to the correct location
README file was created in incorrect location, now it moved to the
directory with source files where it intended to be.
2025-04-22 19:03:28 +02:00
Andrei Chekun
cf4747c151 test.py: eliminate deprecation warning for xml.etree.ElementTree.Element
Testing the truth value of an Element emits DeprecationWarning. This check is done correctly
2025-04-22 19:03:21 +02:00
Andrei Chekun
5c3501e4bf test.py: fix typo in toxiproxy name parameter
Fix typo in toxiproxy name parameter. No any functional changes just
cosmetic fix.
2025-04-22 19:02:12 +02:00
Andrei Chekun
2c37a793d1 test.py: add locking to the sqlite writer for resource gather
SQLite blocking the DB during writes, so it's not possible to make writes from
several thread. To be able to gather metrics in several threads, we need a
locking mechanism for threads during writes. So thread will not try to
write metrics while another thread is performing writes.
2025-04-22 19:01:30 +02:00
Andrei Chekun
800710dc2c test.py: add sqlite datetime adapter for resource gather
Add sqlite datetime adapter for resource gather since default adapters are deprecated from 3.12
2025-04-22 18:59:49 +02:00
Andrei Chekun
bf2a9e267e test.py: change the parameter for get_modes_to_run()
Change the parameter for get_modes_to_run() from session to config to
narrow the scope, and prepare it to later use in method that do not have
access to the session, but have access to the config object
2025-04-22 18:58:33 +02:00
Andrei Chekun
441cee8d9c test.py: fix gathering logs in case of fail
Currently log files have information about run_id twice:
cluster.object_store_test_backup.10.test_abort_restore_with_rpc_error.dev.10_cluster.log
However, sometimes the first run_id can be incorrect:
cluster.object_store_test_backup.1.test_abort_restore_with_rpc_error.dev.10_cluster.log
Removing first run_id in the name to not face this issue and because
it's actually redundant.
Removing creation empty file for scylla manager log, since it redundant
and was done as incorrect assumption on the root cause of the fail.
Add extension to the stacktrace file, so it will be opened in the
browser in Jenkins in the new tab instead of downloading it.

Fixes: https://github.com/scylladb/scylladb/issues/23731

Closes scylladb/scylladb#23797
2025-04-21 13:12:35 +03:00
Nadav Har'El
fbcf77d134 raft: make group0 Raft operation timeout configurable
A recent commit 370707b111 (re)introduced
a timeout for every group0 Raft operation. This timeout was set to 60
seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody".

However, one of the things we do as a group0 operation is schema
changes, and we already noticed a few years ago, see commit
0b2cf21932, that in some extremely
overloaded test machines where tests run hundreds of times (!) slower
than usual, a single big schema operation - such as Alternator's
DeleteTable deleting a table and multiple of its CDC or view tables -
sometimes takes more than 60 seconds. The above fix changed the
client's timeout to wait for 300 seconds instead of 60 seconds,
but now we also need to increase our Raft timeout, or the server can
time out. We've seen this happening recently making some tests flaky
in CI (issue #23543).

So let's make this timeout configurable, as a new configuration option
group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e,
60 seconds), the same as the existing default. The test framework
overrides this default with a a higher 300 second timeout, matching
the client-side timeout.

Before this patch, this timeout was already configurable in a strange
way, using injections. But this was a misstep: We already have more
than a dozen timeouts configurable through the normal configration,
and this one should have been configured in the same way. There is
nothing "holy" about the default of 60 seconds we chose, and who
knows maybe in the future we might need to tweek it in the field,
just like we made the other timeouts tweakable. Injections cannot
be used in release mode, but configuration options can.

Fixes #23543

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23717
2025-04-15 10:57:39 +03:00
Andrei Chekun
8e33d7ab81 test.py: Make the testpy log files in pytest follow the same format
Fix the incorrect log file names between conftest and scylla_manager.
This regression issue, was introduced in #22960.

Currently, scylla manager will output it's logs to the file with the
next pattern:
suite_name.path_to_the_test_file_with_subfolders.run_id.function_name.mode.run_id_cluster.log
On the same time pytest will try to find this log with next name:
suite_name.file_name_without_subfolders_path.py.run_id.function_name.mode.run_id_cluster.log

This inconsistency leads to the situation when the test failed, scylla
manager log file will not be copied to the failed_test directory and
test will have exception on teardown.

Closes scylladb/scylladb#23596
2025-04-14 12:52:48 +03:00
Evgeniy Naydanov
d6b64642c5 test.py: print out path to Scylla log for Python test suites
Test suites with `type: Python` are using single Scylla node
created by test.py, but it's handy to print a path to a log
file in pytest log too to make it easier to find the file
on failures.

Closes scylladb/scylladb#23683
2025-04-14 11:15:37 +03:00
Patryk Jędrzejczak
07a7a75b98 Merge 'raft: implement the limited voters feature' from Emil Maskovsky
Currently if raft is enabled all nodes are voters in group0. However it is not necessary to have all nodes to be voters - it only slows down the raft group operation (since the quorum is large) and makes deployments with asymmetrical DCs problematic (2 DCs with 5 nodes along 1 DC with 10 nodes will lose the majority if large DC is isolated).

The topology coordinator will now maintain a state where there are only limited number of voters, evenly distributed across the DCs and racks.

After each node addition or removal the voters are recalculated and rebalanced if necessary. That means:
* When a new node is added, it might become a voter depending on the current distribution of voters - either if there are still some voter "slots" available, or if the new node is a better candidate than some existing voter (in which case the existing node voter status might be revoked).
* When a voter node is removed or stopped (shut down), its voter status is revoked and another node might become a voter instead (this can also depend on other circumstances, like e.g. changing the number of DCs).
* If a node addition or removal causes a change in number of data centers (DCs) or racks, the rebalance action might become wider (as there are some special rules applying to 1 vs 2 vs more DCs, also changing the number of racks might cause similar effects in the voters distribution)

Special conditions for various number of DCs:
* 1 DC: Can have up to the maximum allowed number of voters (5 - see below)
* 2 DCs: The distribution of the voters will be asymmetric (if possible), meaning that we can tolerate a loss of the DC with the smaller number of voters (if both would have the same number of voters we'd lose majority if any of the DCs is lost). For example, if we have 2 DCs with 2 nodes each, one of them will only have 1 voter (despite the limit of 5). Also, if one of the 2 DCs has more racks than the other and the node count allows it, the DC with the more racks will have more voters.
* 3 and more DCs: The distribution of the voters will be so that every DC has strictly less than half of the total voters (so a loss of any of the DCs cannot lead to the majority loss). Again, DCs with more racks are being preferred in the voter distribution.

At the moment we will be handling the zero-token nodes in the same way as the regular nodes (i.e. the zero-token nodes will not take any priority in the voter distribution). Technically it doesn't make much sense to have a zero-token node that is not a voter (when there are regular nodes in the same DC being voters), but currently the intended purpose of zero-token nodes is to form an "arbiter DC" (in case of 2 DCs, creating a third DC with zero-token nodes only), so for that intended purpose no special handling is needed and will work out of the box. If a preference of zero token nodes will eventually be needed/requested, it will be added separately from this PR.

The maximum number of voters of 5 has been chosen as the smallest "safe" value. We can lose majority when multiple nodes (possibly in different dcs and racks) die independently in a short time span. With less than 5 voters, we would lose majority if 2 voters died, which is very unlikely to happen but not entirely impossible. With 5 voters, at least 3 voters must die to lose majority, which can be safely considered impossible in the case of independent failures.

Currently the limit will not be configurable (we might introduce configurable limits later if that would be needed/requested).

Tests added:
* boost/group0_voter_registry_test.cc: run time on CI: ~3.5s
* topology_custom/test_raft_voters.py: parametrized with 1 or 3 nodes per DC, the run time on CI: 1: ~20s. 3: ~40s, approx 1 min total

Fixes: scylladb/scylladb#18793

No backport: This is a new feature that will not be backported.

Closes scylladb/scylladb#21969

* https://github.com/scylladb/scylladb:
  raft: distribute voters by rack inside DC
  raft/test: fix lint warnings in `test_raft_no_quorum`
  raft/test: add the upgrade test for limited voters feature
  raft topology: handle on_up/on_down to add/remove node from voters
  raft: fix the indentation after the limited voters changes
  raft: implement the limited voters feature
  raft: drop the voter removal from the decommission
  raft/test: disable the `stop_before_becoming_raft_voter` test
  raft/test: stop the server less gracefully in the voters test
2025-04-10 15:29:15 +02:00
Botond Dénes
e5afd9b5fb test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
Such that a given index in the return hosts refers to the same
underlying Scylla instance, as the same index in the passed-in nodes
list. This is what users of this method intuitively expect, but
currently the returned hosts list is unordered (has random order).
2025-04-08 00:11:36 -04:00
Emil Maskovsky
1d06ea3a5a raft: implement the limited voters feature
Currently if raft is enabled all nodes are voters in group0. However it
is not necessary to have all nodes to be voters - it only slows down
the raft group operation (since the quorum is large) and makes
deployments with asymmetrical DCs problematic (2 DCs with 5 nodes along
1 DC with 10 nodes will lose the majority if large DC is isolated).

The topology coordinator will now maintain a state where there are only
limited number of voters, evenly distributed across the DCs and racks.

After each node addition or removal the voters are recalculated and
rebalanced if necessary. That means:
* When a new node is added, it might become a voter depending on the
  current distribution of voters - either if there are still some voter
  "slots" available, or if the new node is a better candidate than some
  existing voter (in which case the existing node voter status might be
  revoked).
* When a voter node is removed or stopped (shut down), its voter status
  is revoked and another node might become a voter instead (this can also
  depend on other circumstances, like e.g. changing the number of DCs).
* If a node addition or removal causes a change in number of datacenters
  (DCs) or racks, the rebalance action might become wider (as there are
  some special rules applying to 1 vs 2 vs more DCs, also changing the
  number of racks might cause similar effects in the voters distribution)

Special conditions for various number of DCs:
* 1 DC: Can have up to the maximum allowed number of voters (5 - see below)
* 2 DCs: The distribution of the voters will be asymmetric (if possible),
  meaning that we can tolerate a loss of the DC with the smaller number
  of voters (if both would have the same number of voters we'd lose the
  majority if any of the DCs is lost).
  For example, if we have 2 DCs with 2 nodes each, one of them will only
  have 1 voter (despite the limit of 5). Also, if one of the 2 DCs has
  more racks than the other and the node count allows it, the DC with
  the more racks will have more voters.
* 3 and more DCs: The distribution of the voters will be so that every
  DC has strictly less than half of the total voters (so a loss of any
  of the DCs cannot lead to the majority loss). Again, DCs with more
  racks are being preferred in the voter distribution.

At the moment we will be handling the zero-token nodes in the same way
as the regular nodes (i.e. the zero-token nodes will not take any
priority in the voter distribution). Technically it doesn't make much
sense to have a zero-token node that is not a voter (when there are
regular nodes in the same DC being voters), but currently the intended
purpose of zero-token nodes is to form an "arbiter DC" (in case of 2 DCs,
creating a third DC with zero-token nodes only), so for that intended
purpose no special handling is needed and will work out of the box.
If a preference of zero token nodes will eventually be needed/requested,
it will be added separately from this PR.

Currently the voter limits will not be configurable (we might introduce
configurable limits later if that would be needed/requested).

The feature is enabled by the `group0_limited_voters` feature flag
to avoid issues with cluster upgrade (the feature will be only enabled
once all nodes in the cluster are upgraded to the version supporting
the feature).

Fixes: scylladb/scylladb#18793
2025-04-07 12:31:18 +02:00
Tomasz Grabiec
fe8187e594 Merge 'repair: release erm in repair_writer_impl::create_writer when possible' from Aleksandra Martyniuk
Currently, repair_writer_impl::create_writer keeps erm to ensure that a sharder is valid. If we repair a tablet, erm blocks the state machine and no operation on any tablet of this table might be performed.

Use auto_refreshing_sharder and topology_guard to ensure that the operation is safe and that tablet operations on the whole table aren't blocked.

Fixes: #23453.

Needs backport to 2025.1 that introduces the tablet repair scheduler.

Closes scylladb/scylladb#23455

* github.com:scylladb/scylladb:
  \test: add test to check concurrent migration and repair of two different tablets
  repair: release erm in repair_writer_impl::create_writer when possible
2025-04-03 11:15:08 +02:00
Aleksandra Martyniuk
bae6711809 \test: add test to check concurrent migration and repair of two different tablets 2025-04-02 15:30:17 +02:00
Pavel Emelyanov
2ee9cec1d3 Merge 'Remove object_storage.yaml and move the endpoints to scylla.yaml' from Robert Bindar
Move `object_storage.yaml` endpoints to `scylla.yaml`

This change also removes the `object_storage.yaml` file
altogether and adds tests for fetching the endpoints
via the `v2/config/object_storage_endpoints` REST api.

Also, `object_storage_config_file` options is moved to a deprecated state as it's no longer needed.

This PR depends on #22951, the reviewers should review patch 393e1ac0ec066475ca94094265a5f88dbbdb1a1f

Refs https://github.com/scylladb/scylladb/issues/22428

Closes scylladb/scylladb#22952

* github.com:scylladb/scylladb:
  Remove db::config::object_storage_config
  Move `object_storage.yaml` endpoints to `scylla.yaml`
2025-04-01 16:01:44 +03:00
Avi Kivity
69684e16d8 Merge 'sstables: add SSTable compression with shared dictionaries ' from Michał Chojnowski
This PR extends Scylla's SSTable compression with the ability to use compression dictionaries shared across compression chunks. This involves several changes:

- We refactor `compression_parameters` and friends (`compressor`, `sstables::local_compression`, `sstables::compression`) to prepare for making the construction of `compressor`s asynchronous, to enable sharing pieces of compressors (the dictionaries) across shards.
- We introduce the notion of "hidden compression options" which are written to `CompressionInfo.db` and used to construct decompressors, like regular options, but don't appear in the schema. (We later stuff the SSTable's dictionary into `CompressionInfo.db` using a sequence of such options).
- We add a cluster feature which guards the creation of dictionary-compressed SSTables.
- We introduce a central "compressor factory" (one instance shared by all shards), which from this point onward is used to construct all `compressor` objects (one per SSTable) used to process the SSTables. When constructing a compressor for writing, it uses the "current"/"recommended" dictionary (which is passed to the factory from the actively-observed contents of the group0-managed `system.dicts`). When constructing a compressor for reading, it uses the dictionary written in the hidden compression options in CompressionInfo.db. And it keeps dictionaries deduplicated, so that each unique live dictionary blob has only one instance in memory, shared across shards.
- We teach the relevant `lz4` and `zstd` compressor wrappers about the dictionaries.
- We add a HTTP API call which samples pieces of the given table (i.e. the Data.db files) from across the cluster, trains a dictionary on it, and publishes it via `system.dicts` as the new current dictionary for that table. (And we add some RPC verbs to support that).
- We add a HTTP API call which estimates the impact of various available compression configurations on the compression ratio.
- We add an autotrainer fiber which periodically retrains dicts for dict-aware tables and publishes them if they seem to be a significant improvement.

Known imperfections:
- The factory currently keeps one dictionary instance on the entire node, but we probably want one copy per NUMA node. I didn't do that because exposing NUMA knowledge to Scylla seems to require some changes in Seastar first.

New feature, no backporting involved.

Closes scylladb/scylladb#23025

* github.com:scylladb/scylladb:
  docs: add user-facing documentation for SSTable compression with shared dicts
  docs/dev: add sstable-compression-dicts.md
  test: add test_sstable_compression_dictionaries_autotrain.py
  test: add test_sstable_compression_dictionaries_basic.py
  test/pylib/rest_client: add `keyspace_upgrade_sstables` helper
  main: run a sstable_dict_autotrainer
  api: add the estimate_compression_ratios API call
  dict_autotrainer: introduce sstable_dict_autotrainer
  db/system_keyspace: add query_dict_timestamp
  compress: add ZstdWithDictsCompressor and LZ4WithDictsCompressor
  main: clean up sstable compression dicts after table drops
  sstables/compress: discard hidden compression options after the decompressor is created
  compress: change compressor_ptr from shared_ptr to unique_ptr
  api: add the retrain_dict API call
  storage_service: add some dict-related routines
  main: in compression_dict_updated_callback, recognize and use SSTable compression dicts
  storage_service: add do_sample_sstables()
  messaging_service: add SAMPLE_SSTABLES and ESTIMATE_SSTABLE_VOLUME verbs
  db/system_keyspace: let `system.dicts` helpers be used for dicts other than the RPC compression dict
  raft/group0_state_machine: on `system.dicts` mutations, pass the affected partitition keys to the callback
  database: add sample_data_files()
  database: add take_sstable_set_snapshot()
  compress: teach `lz4_processor` about dictionaries
  compress: teach `zstd_processor` about dictionaries
  sstables: delegate compressor creation to the compressor factory
  sstables: plug an `sstable_compressor_factory` into `sstables_manager`
  sstables: introduce sstable_compressor_factory
  utils/hashers: add get_sha256()
  gms/feature_service: add the SSTABLE_COMPRESSION_DICTS cluster feature
  compress: add hidden dictionary options
  compress: remove `compression_parameters::get_compressor()`
  sstables/compress: remove get_sstable_compressor()
  sstables/compress: move ownership of `compressor` to `sstable::compression`
  compress: remove compressor::option_names()
  compress: clean up the constructor of zstd_processor
  compress: squash zstd.cc into compress.cc
  sstables/compress: break the dependency of `compression_parameters` on `compressor`
  compress.hh: switch compressor::name() from an instance member to a virtual call
  bytes: adapt fmt_hex to std::span<const std::byte>
2025-04-01 12:47:34 +03:00