On some environment such as VMware instance, /dev/disk/by-uuid/<UUID> is
not available, scylla_raid_setup will fail while mounting volume.
To avoid failing to mount /dev/disk/by-uuid/<UUID>, fetch all available
paths to mount the disk and fallback to other paths like by-partuuid,
by-id, by-path or just using real device path like /dev/md0.
To get device path, and also to dumping device status when UUID is not
available, this will introduce UdevInfo class which communicate udev
using pyudev.
Related #11359Closesscylladb/scylladb#13803
This PR contains several refactoring, related to truncation records handling in `system_keyspace`, `commitlog_replayer` and `table` clases:
* drop map_reduce from `commitlog_replayer`, it's sufficient to load truncation records from the null shard;
* add a check that `table::_truncated_at` is properly initialized before it's accessed;
* move its initialization after `init_non_system_keyspaces`
Closesscylladb/scylladb#15583
* github.com:scylladb/scylladb:
system_keyspace: drop truncation_record
system_keyspace: remove get_truncated_at method
table: get_truncation_time: check _truncated_at is initialized
database: add_column_family: initialize truncation_time for new tables
database: add_column_family: rename readonly parameter to is_new
system_keyspace: move load_truncation_times into distributed_loader::populate_keyspace
commitlog_replayer: refactor commitlog_replayer::impl::init
system_keyspace: drop redundant typedef
system_keyspace: drop redundant save_truncation_record overload
table: rename cache_truncation_record -> set_truncation_time
system_keyspace: get_truncated_position -> get_truncated_positions
The estimation assumes that size of other components are irrelevant,
when estimating the number of partitions for each output sstable.
The sstables are split according to the data file size, therefore
size of other files are irrelevant for the estimation.
With certain data models, like single-row partitions containing small
values, the index could be even larger than data.
For example, assume index is as large as data, then the estimation
would say that 2x more sstables will be generated, and as a result,
each sstable are underestimated to have 2x less keys.
Fix it by only accounting size of data file.
Fixes#15726.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15727
If abort is requsted during bootstrap then a node should exit normally.
To achieve so, abort_requested_exception should be thrown as main
handles it gracefully.
In data_sync_repair_task_impl::run exceptions from all shards are
wrapped together into std::runtime_exception and so they aren't
handled as they are supposed to.
Throw abort_requested_exception when shutdown was requested.
Throw abort_requested_exception also if repair::task_manager_module::is_aborted,
so that force_terminate_all_repair_sessions acts the same regardless
the state of the repair.
To maintain consistency do the same for user_requested_repair_task_impl.
Fixes: #15710.
Closesscylladb/scylladb#15722
before this change, pytest does not populate its suites's
`scylla_env` down to the forked pytest child process. this works
if the test does not care about the env variables in `scylla_env`.
but object_store is an exception, as it launches scylla instances
by itself. so, without the help of `scylla_env`, `run.find_scylla()`
always find the newest file globbed by `build/*/scylla`. this is not
always what we expect. on the contrary, if we launch object_store's
pytest using `test.py`, there are good chances that object_store
ends up with testing a wrong scylla executable if we have multiple
builds under `build/*/scylla`.
so, in this change, we populate `self.suite.scylla_env` down to
the child process created by `PythonTest`, so that all pytest
based tests can have access to its suites's env variables.
in addition to 'SCYLLA' env variable, they also include the
the env variables required by LLVM code coverage instrumentation.
this is also nice to have.
Fixes#15679
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15682
The stall detector uses glibc backtrace function to
collect backtraces, this causes ASAN failures on ARM.
For now we just disable the stall detector in this
configuration, the ticket about migrating
to libunwind: scylladb/seastar#1878
We increase the value of blocked_reactor_notify_ms to
make sure the stall detector never fires.
Fixes#15389Fixes#15090Closesscylladb/scylladb#15720
Currently, when the test is executed too quickly, the timestamp
insterted into the 'my_table' table might be the same as the
timestamp used in the SELECT statement for comparison. However,
the statement only selects rows where the inserted timestamp
is strictly lower than current timestamp. As a result, when this
comparison fails, we may skip executing the following comparison,
which uses a user-defined function, due to which the statement
is supposed to fail with an error. Instead, the select statement
simply returns no rows and the test case fails.
To fix this, simply use the less or equal operator instead
of using the strictly less operator for comparing timestamps.
Fixes#15616Closesscylladb/scylladb#15699
This is in order to aid investigation of falkiness of the test, which
fails due to a timeout during scan after cluster restart in debug mode.
See #14746.
I enable trace-level logging for some scylla-side loggers and
inject logging of sent and received messages on the driver side.
Closesscylladb/scylladb#15696
When a remote view update doesn't succeed there's a log message
saying "Error applying view update...".
This message had log level ERROR, but it's not really a hard error.
View updates can fail for a multitude of reasons, even during normal operation.
A failing view update isn't fatal, it will be saved as a view hint a retried later.
Let's change the log level to WARN. It's something that shouldn't happen too much,
but it's not a disaster either.
ERROR log level causes trouble in tests which assume that an ERROR level message
means that the test has failed.
Refs: https://github.com/scylladb/scylladb/issues/15046#issuecomment-1712748784
For local view updates the log level stays at "ERROR", local view updates shouldn't fail.
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Closesscylladb/scylladb#15640
* tools/cqlsh 66ae7eac...426fa0ea (8):
> Updated Scylla Driver[Issue scylladb/scylla-cqlsh#55]
> copyutil: closing the local end of pipes after processes starts
> setup.py: specify Cython language_level explicitly
> setup.py: pass extensions as a list
> setup.py: reindent block in else branch
> setup.py: early return in get_extension()
> reloc: install build==0.10.0
> reloc: add --verbose option to build_reloc.sh
Fixes: https://github.com/scylladb/scylla-cqlsh/issues/37Closesscylladb/scylladb#15685
test_storage_service_keyspace_cleanup_with_no_owned_ranges
from test_storage_service.py creates snapshots with tags based
on current time. Thus if a test runs on the same node twice
with time distance short enough, there may be a name collision
between the snapshots from two runs. This will cause the second
run to fail on assertions.
Use new_test_snapshot fixture to drop snapshots after the test.
Delete my_snapshot_tags as it's no longer necessary.
Fixes: #15680.
Closesscylladb/scylladb#15683
In 20ff2ae5e1 mutating endpoints were
changed to use PUT. But some of them return a response, and I forgot to
provide `response_type` parameter to `put_json` (which causes
`RESTClient` to actually obtain the response). These endpoints now
return `None`.
Fix this.
Closesscylladb/scylladb#15674
The logger is not thread safe, so a multithreaded test can concurrently
write into the log, yielding unreadable XMLs.
Example:
boost/sstable_directory_test: failed to parse XML output '/scylladir/testlog/x86_64/release/xml/boost.sstable_directory_test.sstable_directory_shared_sstables_reshard_correctly.3.xunit.xml': not well-formed (invalid token): line 1, column 1351
The critical (today's unprotected) section is in boost/test/utils/xml_printer.hpp:
```
inline std::ostream&
operator<<( custom_printer<cdata> const& p, const_string value )
{
*p << BOOST_TEST_L( "<![CDATA[" );
print_escaped_cdata( *p, value );
return *p << BOOST_TEST_L( "]]>" );
}
```
The problem is not restricted to xml, but the unreadable xml file caused
the test to fail when trying to parse it, to present a summary.
New thread-safe variants of BOOST_REQUIRE and BOOST_REQUIRE_EQUAL are
introduced to help multithreaded tests. We'll start patching tests of
sstable_directory_test that will call BOOST_REQUIRE* from multiple
threads. Later, we can expand its usage to other tests.
Fixes#15654.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#15655
This PR is another step in refactoring the Hinted Handoff module. It aims at modernizing the code by moving to coroutines, using `std::ranges` instead of Boost's ones where possible, and uses other features coming with the new C++ standards.
It also tries to make the code clearer and get rid of confusing elements, e.g. using shared pointers where they shouldn't be used or marking methods as virtual even though nothing derives from the class. It also prevents `manager.hh` from giving direct access to internal structures (`hint_endpoint_manager` in this case).
Refs #15358Closesscylladb/scylladb#15631
* github.com:scylladb/scylladb:
db/hints/manager: Reword comments about state
db/hints/manager: Unfriend space_watchdog
db/hints: Remove a redundant alias
db/hints: Remove an unused namespace
db/hints: Coroutinize change_host_filter()
db/hints: Coroutinize drain_for()
db/hints: Clean up can_hint_for()
db/hints: Clean up store_hint()
db/hints: Clean up too_many_in_flight_hints_for()
db/hints: Refactor get_ep_manager()
db/hints: Coroutinize wait_for_sync_point()
db/hints: Use std::span in calculate_current_sync_point
db/hints: Clean up manager::forbid_hints_for_eps_with_pending_hints()
db/hints: Clean up manager::forbid_hints()
db/hints: Clean up manager::allow_hints()
db/hints: Coroutinize compute_hints_dir_device_id()
db/hints: Clean up manager::stop()
db/hints: Clean up manager::start()
db/hints/manager: Clean up the constructor
db/hints: Remove boilerplate drain_lock()
db/hints: Let drain_for() return a future
db/hints: Remove ep_managers_end
db/hints: Remove find_ep_manager
db/hints: Use manager as API for hint_endpoint_manager
db/hints: Don't mark have_ep_manager()'s definition as inline
db/hints: Remove make_directory_initializer()
db/hints/manager: Order constructors
db/hints: Move ~manager() and mark it as noexcept
db/hints: Use reference for storage proxy
db/hints/manager: Explicitly delete copy constructor
db/hints: Capitalize constants
db/hints/manager: Hide declarations
db/hints/manager: Move the defintions of static members to the header
db/hints: Move make_dummy() to the header
db/hints: Don't explicitly define ~directory_initializer()
db/hints: Change the order of logging in ensure_created_and_verified()
db/hints: Coroutinize ensure_rebalanced()
db/hints: Coroutinize ensure_created_and_verified()
db/hints: Improve formatting of directory_initializer::impl
db/hints: Do not rely on the values of enums
db/hints: Move the implementation of directory_initializer
db/hints: Prefer nested namespaces
db/hints: Remove an unused alias from manager.hh
db/hints: Reorder includes in manager.hh and .cc
Refactor the code to be more consistent -- we often did the same thing in multiple ways depending on the endpoint, such as how we returned errors (some endpoints would return them through exceptions, other would wrap into `aiohttp.web.Response`s). Choose the arguably least boilerplate'y way in each case.
Then reduce the boilerplate even further.
Thanks to these refactors, modifying the framework in the future will require less work and it will be more obvious which of the possible ways to modify it should be picked (i.e. consistent with the existing code.)
Closesscylladb/scylladb#15646
* github.com:scylladb/scylladb:
test/pylib: scylla_cluster: reduce `aiohttp` boilerplate
test/pylib: always return data as JSON from endpoints
test/pylib: scylla_cluster: catch `HTTPError` in topology change endpoints
test/pylib: scylla_cluster: do sanity/precondition checks through asserts
test/pylib: scylla_cluster: return errors through exceptions
test/pylib: use JSON data to pass `expected_error` in `server_start`
test/pylib: use PUT instead of GET for mutating endpoints
test/pylib: rest_client: make `data` optional in `put_json`
test/pylib: fix some type errors
The current comments should be clearer to someone
not familiar with the module. This commit also makes
them abide by the limit of 120 characters per line.
space_watchdog is a friend of shard hint manager just to
be able to execute one of its functions. This commit changes
that by unfriending the class and exposing the function.
This commit gets rid of boilerplate in the function,
leverages a range pipe and explicit types to make
the code more readable, and changes the logs to
make it clearer what happens.
fmt::to_string should be preferred to seastar::format.
It's clearer and simpler. Besides that, this commit makes
the code abide by the limit of 120 characters per line.
Currently, the function doesn't return anything.
However, if the futurue doesn't need to be awaited,
the caller can decide that. There is no reason
to make that decision in the function itself.
This commit makes with_file_update_mutex() a method of hint_endpoint_manager
and introduces db::hints::manager::with_file_update_mutex_for() for accessing
it from the outside. This way, hint_endpoint_manager is hidden and no one
needs to know about its existence.
This commit makes db::hints::manager store service::storage_proxy
as a reference instead of a seastar::shared_ptr. The manager is
owned by storage proxy, so it only lives as long as storage proxy
does. Hence, it makes little sense to store the latter as a shared
pointer; in fact, it's very confusing and may be error-prone.
The field never changes, so it's safe to keep it as a reference
(especially because copy and move constructors of db::hints::manager
are both deleted). What's more, we ensure that the hint manager
has access to storage proxy as soon as it's created.
The same changes were applied to db::hints::resource_manager.
The rationale is the same.