In shard compaction tasks per table tasks will be created all at once
and then they will wait for their turn to run.
A function that allows waking up tasks one after another and a function
that makes the task wait for its turn are added.
By default `idl-compiler.py` emits code to pass parameters by value. There was an attribute `[[ref]]`, which makes it to use `const&`, but it was not used systematically and in many cases parameters were redundantly copied. In this PR, all `verb` directives have been reviewed and the `[[ref]]` attribute has been added where it makes sense.
The parameters [are serialised synchronously](https://github.com/scylladb/seastar/blob/master/include/seastar/rpc/rpc_impl.hh#L471) so there should be no lifetime issues. This was not the case before, but the behaviour changed in [this commit](3942546d41). Now it's not a problem to get an object by reference when using `send_` methods.
Fixes: #12504Closes#14003
* github.com:scylladb/scylladb:
tracing::trace_info: pass by ref
storage_proxy: pass inet_address_vector_replica_set by ref
raft: add [[ref]] attribute
repair: add [[ref]] attribute
forward_request: add [[ref]] attribute
storage_proxy: paxos:: add [[ref]] attribute
storage_proxy: read_XXX:: make read_command [[ref]]
storage_proxy: hint_mutation:: make frozen_mutation [[ref]]
storage_proxy: mutation:: make frozen_mutation [[ref]]
occasionally, we are observing build failures like:
```
17:20:54 FAILED: build/release/dist/tar/scylla-debuginfo-5.4.0~dev-0.20230522.5b2687e11800.x86_64.tar.gz
17:20:54 dist/debuginfo/scripts/create-relocatable-package.py --mode release 'build/release/dist/tar/scylla-debuginfo-5.4.0~dev-0.20230522.5b2687e11800.x86_64.tar.gz'
17:20:54 Traceback (most recent call last):
17:20:54 File "/jenkins/workspace/scylla-master/scylla-ci/scylla/dist/debuginfo/scripts/create-relocatable-package.py", line 60, in <module>
17:20:54 os.makedirs(f'build/{SCYLLA_DIR}')
17:20:54 File "<frozen os>", line 225, in makedirs
17:20:54 FileExistsError: [Errno 17] File exists: 'build/scylla-debuginfo-package'
```
to understand the root cause better, instead of swallowing the error,
let's raise the exception it is not caused by non-existing directory.
a similar change was applied to scripts/create-relocatable-package.py
in a0b8aa9b13. which was correct per-se.
but the original intention was to understand the root cause of the
failure when packaging scylla-debuginfo-*.tar.gz, which is created
by the dist/debuginfo/scripts/create-relocatable-package.py.
so, in this change, the change is ported to this script.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14082
In compaction logs table is identified by {keyspace}.{table_id}.
Instead, table name should be used in run_on_existing_tables
logs. To do so, task manager's compaction tasks use table_info
instead of table_id.
Keyspace argument is copied to run_on_existing_tables
to ensure it's alive.
Closes#13816
* github.com:scylladb/scylladb:
compaction: use table_info in compaction tasks
api: move table_info to schema/schema_fwd.hh
CWG 2631 (https://cplusplus.github.io/CWG/issues/2631.html) reports
an issue on how the default argument is evaluated. this problem is
more obvious when it comes to how `std::source_location::current()`
is evaluated as a default argument. but not all compilers have the
same behavior, see https://godbolt.org/z/PK865KdG4.
notebaly, clang-15 evaluates the default argument at the callee
site. so we need to check the capability of compiler and fall back
to the one defined by util/source_location-compat.hh if the compiler
suffers from CWG 2631. and clang-16 implemented CWG2631 in
https://reviews.llvm.org/D136554. But unfortunately, this change
was not backported to clang-15.
before switching over to clang-16, for using std::source_location::current()
as the default parameter and expect the behavior defined by CWG2631,
we have to use the compatible layer provided by Seastar. otherwise
we always end up having the source_location at the callee side, which
is not interesting under most circumstances.
so in this change, all places using the idiom of passing
std::source_location::current() as the default parameter are changed
to use seastar::compat::source_location::current(). despite that
we have `#include "seastarx.h"` for opening the seastar namespace,
to disambiguate the "namespace compat" defined somewhere in scylladb,
the fully qualified name of
`seastar::compat::source_location::current()` is used.
see also 09a3c63345, where we used
std::source_location as an alias of std::experimental::source_location
if it was available. but this does not apply to the settings of our
current toolchain, where we have GCC-12 and Clang-15.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14086
When run like 'open-coredump.sh --help' the options parsing loop doesn't
run because $# == 1 and [ $# -gt 1 ] evaluates to false.
The simplest fix is to parse -h|--help on its own as the options parsing
loop assumes that there's core-file argument present.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14075
read_command, partition_key and paxos::proposal
are marked with [[ref]]. partition_key contains
dynamic allocations and can be big. proposal
contains frozen_mutation, so it's also
contains dynamic allocations.
The call sites are fine, the already passed
by reference.
We had a redundant copies at the call sites of
these methods. Class read_command does not
contain dynamic allocations, but it's quite
but by itself (368 bytes).
We had a redundant copy in receive_mutation_handler
forward_fn callback. This frozen_mutation is
dynamically allocated and can be arbitrary large.
Fixes: #12504
Task manager compaction tasks need table names for logs.
Thus, compaction tasks store table infos instead of table ids.
get_table_ids function is deleted as it isn't used anywhere.
The `view_update_write_response_handler` class, which is a subclass of
`abstract_write_response_handler`, was created for a single purpose:
to make it possible to cancel a handler for a view update write,
which means we stop waiting for a response to the write, timing out
the handler immediately. This was done to solve issue with node
shutdown hanging because it was waiting for a view update to finish;
view updates were configured with 5 minute timeout. See #3966, #4028.
Now we're having a similar problem with hint updates causing shutdown
to hang in tests (#8079).
`view_update_write_response_handler` implements cancelling by adding
itself to an intrusive list which we then iterate over to timeout each
handler when we shutdown or when gossiper notifies `storage_proxy`
that a node is down.
To make it possible to reuse this algorithm for other handlers, move
the functionality into `abstract_write_response_handler`. We inherit
from `bi::list_base_hook` so it introduces small memory overhead to
each write handler (2 pointers) which was only present for view update
handlers before. But those handlers are already quite large, the
overhead is small compared to their size.
Use this new functionality to also cancel hint write handlers when we
shutdown. This fixes#8079.
Closes#14047
* github.com:scylladb/scylladb:
test: reproducer for hints manager shutdown hang
test: pylib: ScyllaCluster: generalize config type for `server_add`
test: pylib: scylla_cluster: add explicit timeout for graceful server stop
service: storage_proxy: make hint write handlers cancellable
service: storage_proxy: rename `view_update_handlers_list`
service: storage_proxy: make it possible to cancel all write handler types
This reverts commit 52e4edfd5e, reversing
changes made to d2d53fc1db. The associated test
fails with about 10% probablity, which blocks other work.
Fixes#13919Reopens#13747
When scylla starts it may go to sleep along the way before the "serving"
message appears. If SIGINT is sent at that time the whole thing unrolls
and the main code ends up catching the sleep_aborted exception, printing
the error in logs and exiting with non-zero code. However, that's not an
error, just the start was interrupted earlier than it was expected by
the stop_signal thing.
fixes: #12898
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14034
Currently, partitioned_sstable_set::insert may erase a sstable
from the set inadvertently, if an exception is thrown while
(re-)inserting it.
To prevent that, simply return early after detecting that
insertion didn't took place, based on the unordered_set::insert
result.
This issue is theoretical, as there are no known case
of re-inserting sstables into the partitioned sstable set.
Fixes#14060
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#14061
Task manager's tasks that have parent task inherit sequence number
from their parents. Thus they do not need to have a new sequence number
generated as it will be overwritten anyway.
Closes#14045
libxcrypt is used by auth subsystem, for instance, `crypt_r()` provided
by this library is used by passwords.cc. so let's link against it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14030
The _update_timer callback calls adjust() that
depends on _current_backlog and currently, _current_backlog is
destroyed before _update_timer.
This is benign since there are no preemption points in
the destructor, but it's more correct and elegant
to destroy the timer first, before other members it depends on.
Fixes#14056
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#14057
occasionally, we are observing build failures like:
```
17:20:54 FAILED: build/release/dist/tar/scylla-debuginfo-5.4.0~dev-0.20230522.5b2687e11800.x86_64.tar.gz
17:20:54 dist/debuginfo/scripts/create-relocatable-package.py --mode release 'build/release/dist/tar/scylla-debuginfo-5.4.0~dev-0.20230522.5b2687e11800.x86_64.tar.gz'
17:20:54 Traceback (most recent call last):
17:20:54 File "/jenkins/workspace/scylla-master/scylla-ci/scylla/dist/debuginfo/scripts/create-relocatable-package.py", line 60, in <module>
17:20:54 os.makedirs(f'build/{SCYLLA_DIR}')
17:20:54 File "<frozen os>", line 225, in makedirs
17:20:54 FileExistsError: [Errno 17] File exists: 'build/scylla-debuginfo-package'
```
to understand the root cause better, instead of swallowing the error,
let's raise the exception it is not caused by non-existing directory.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13978
This set encapsulates ks/cf directories creation and deletion into keyspace and table classes methods. This is needed to facilitate making the storage initialization storage-type aware in the future. Also this makes the replica/ code less involved in formatting sstables' directory path by hand.
refs: #13020
refs: #12707Closes#14048
* github.com:scylladb/scylladb:
keyspace: Introduce init_storage()
keyspace: Remove column_family_directory()
table: Introduce destroy_storage()
table: Simplify init_storage()
table: Coroutinize init_storage()
table: Relocate ks.make_directory_for_column_family()
distributed_loader: Use cf.dir() instead of ks.column_family_directory()
test: Don't create directory for system tables in cql_test_env
if we just want to build a single test and scylla executables, we
might want to use `configure.py` like:
./configure.py --mode debug --compiler clang++ --with scylla --with test/boost/database_test
which generates `build.ninja` for us, with following rules:
build $builddir/debug/test/boost/database_test_g: link.debug ... | $builddir/debug/seastar/libseastar.so
$builddir/debug/seastar/libseastar_testing.so
libs = $seastar_libs_debug $libs -lthrift -lboost_system $seastar_testing_libs_debug
libs = $seastar_libs_debug
but the last line prevents database_test_g for linking against
the third-party libraries like libabsl, which could have been
pulled in by $libs. but the second assignment expression just
makes the value of `libs` identical to that of `seastar_libs_debug`.
but that library does not include the libraries which are only
used by scylla. so we could run into link failure with the
`build.ninja` generated with this command line. like:
```
FAILED: build/debug/test/boost/database_test_g
...
ld.lld: error: undefined symbol: seastar::testing::entry_point(int, char**)
>>> referenced by scylla_test_case.hh:22 (./test/lib/scylla_test_case.hh:22)
>>> build/debug/test/boost/database_test.o:(main)
...
ld.lld: error: undefined symbol: boost::unit_test::unit_test_log_t::set_checkpoint(boost::unit_test::basic_cstring<char const>, unsigned long, boost::unit_tes
t::basic_cstring<char const>)
>>> referenced by database_test.cc:298 (test/boost/database_test.cc:298)
>>> build/debug/test/boost/database_test.o:(require_exist(seastar::basic_sstring<char, unsigned int, 15u, true> const&, bool))
...
```
with this change, the extra assignment expression is dropped. this
should not cause any regression. as f'$seastar_libs_{mode}' as
been included as a part of `local_libs` before the grand if-the-else
block in the for loop before this `f.write()` statement.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14041
This reverts commit 3c54d5ec5e.
The reverted change fixed the FTBFS of the test in question with Clang 16,
which rightly stopped convert the LHS of `"hello" == sstring{"hello"}` to
the type of the type acceptable by the member operator even we have a
constructor for this conversion, like
class sstring {
public:
bar_t(const char*);
bool operator==(const sstring&) const;
bool operator!=(const sstring&) const;
};
because we have an operator!=, as per the draft of C++ standard
https://eel.is/c++draft/over.match.oper#4 :
> A non-template function or function template F named operator==
> is a rewrite target with first operand o unless a search for the
> name operator!= in the scope S from the instantiation context of
> the operator expression finds a function or function template
> that would correspond ([basic.scope.scope]) to F if its name were
> operator==, where S is the scope of the class type of o if F is a
> class member, and the namespace scope of which F is a member
> otherwise.
in 397f4b51c3, the seastar submodule was
updated. in which, we now have a dedicated overload for the `const char*`
case. so the compiler is now able to compile the expression like
`"hello" == sstring{"hello"}` in C++20 now.
so, in this change, the workaround is reverted.
Closes#14040
When erasing a sstable first check if its run_id
exists in _all_runs, otherwise do nothing with
that respect, and then if the run becomes empty
when erasing the last sstable (and it could have been
a single-sstable run from get go), erase the run
from `_all_runs`.
Fixes#14052
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#14054
If the executable of a matching unit or boost test is not executable,
warn to console and skip.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13982
If server shutdown hangs, the `manager.server_stop_gracefully` call
would eventually (after 5 minutes) timeout with a cryptic
`TimeoutError`; it's a generic timeout for performing requests by the
tests to `ScyllaClusterManager`. It was non-obvious how to find what
actually caused the timeout - you'd have to browse multiple logs.
Introduce an explicit timeout in `ScyllaServer.stop_gracefully`. Set it
to 1 minute. Whether this is a good value may be arguable, but shutdown
taking longer than that probably indicates problems. The important thing
is that this timeout is shorter than the generic request timeout.
If this times out we get a nice error in the test:
```
E test.pylib.rest_client.HTTPError: HTTP error 500, uri: http+unix://api/cluster/server/1/stop_gracefully, params: None, json: None, body:
E Stopping server ScyllaServer(1, 127.162.40.1, 826d5884-4696-4a22-80a7-cc872aa43102) gracefully took longer than 60s
```
Whether a write handler should be cancellable is now controlled by a
parameter passed to `create_write_response_handler`. We plumb it down
from `send_to_endpoint` which is called by hints manager.
This will cause hint write handlers to immediately timeout when we
shutdown or when a destination node is marked as dead.
Fixes#8079
The list will be used for non-view-update write handlers as well, so
generalize the name. Also generalize some variable names used in the
implementation.
This commit only renames things + some comments were added,
there are no logical changes.
The `view_update_write_response_handler` class, which is a subclass of
`abstract_write_response_handler`, was created for a single purpose: to
make it possible to cancel a handler for a view update write, which
means we stop waiting for a response to the write, timing out the
handler immediately. This was done to solve issue with node shutdown
hanging because it was waiting for a view update to finish; view updates
were configured with 5 minute timeout. See #3966, #4028.
Now we're having a similar problem with hint updates causing shutdown to
hang in tests (#8079).
`view_update_write_response_handler` implements cancelling by adding
itself to an intrusive list which we then iterate over to timeout each
handler when we shutdown or when gossiper notifies `storage_proxy` that
a node is down.
To make it possible to reuse this algorithm for other handlers, move the
functionality into `abstract_write_response_handler`. We inherit from
`bi::list_base_hook` so it introduces small memory overhead to each
write handler (2 pointers) which was only present for view update
handlers before. But those handlers are already quite large, the
overhead is small compared to their size.
Not all handlers are added to the cancelling list, this is controlled by
the `cancellable` parameter passed to the constructor. For now we're
only cancelling view handlers as before. In following commits we'll also
cancel hint handlers.
instead of using BOOST_REQUIRE() use, for instance
BOOST_REQUIRE_NE() and BOOST_REQUIRE_EQUAL() for better
error message when the test fails, as Boost::test would
print out the LHS and RHS of the comparison expression
if it fails.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14050
Some assorted cleanups here: consolidation of schema agreement waiting
into a single place and removing unused code from the gossiper.
CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/1458/
Reviewed-by: Konstantin Osipov <kostja@scylladb.com>
* gleb/gossiper-cleanups of github.com:scylladb/scylla-dev:
storage_service: avoid unneeded copies in on_change
storage_service: remove check that is always true
storage_service: rename handle_state_removing to handle_state_removed
storage_service: avoid string copy
storage_service: delete code that handled REMOVING_TOKENS state
gossiper: remove code related to advertising REMOVING_TOKEN state
migration_manager: add wait_for_schema_agreement() function
There's a helper verification_error() that prints a warning and returns
excpetional future. The one is converted into void throwing one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Similarly to class table, the keyspace class also needs to create
directory for itself for some reason. It looks excessive as table
creation would call recursive_touch_directory() and would create the ks
directory too, but this call is there
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's no longer used outside of make_column_family_config(). Not to
encourage people to use it -- drop it and open-code into that single
caller
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When table is DROP-ed the directory with all its sstables is removed
(unless it contains snapshots). Wrap this into table.destroy_storage()
method, later it will need to become sstable::storage-specific
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>