scylla redistribute iotune, so let's enable the related building
options, so that we can built iotune on demand.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this series adds `--node-exporter-dir` and `--build-dir` options to `create-relocatable-package.py`. this enables us to use create relocatable package from arbitrary build directories.
Refs #15241Closes#15299
* github.com:scylladb/scylladb:
create-relocatable-package.py: add --node-exporter-dir option
build: specify the build dir instead mode
so we can point `debian_files_gen.py` to builddir other than
'build', and can optionally use other output directory. this would
help to reduce the number of "magic numbers" in our building system.
Refs https://github.com/scylladb/scylladb/issues/15241Closes#15282
* github.com:scylladb/scylladb:
dist/debian: specify debian/* file encodings
dist/debian: wrap lines whose length exceeds 100 chars
dist/debian: add command line option for builddir
dist/debian: modularize debian_files_gen.py
actually, we never use the its output in our workflow. and the
output is distracting when building the package. so, in this
change, let's print it only on demand. this feature is preserved
just in case some of us would want to use this script for getting
the version number string.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15327
if user fails to set "CMAKE_BUILD_TYPE", it would be empty. and
CMake would fail with confusing error messages like
```
CMake Error at CMakeLists.txt:21 (list):
list sub-command FIND requires three arguments.
CMake Error at CMakeLists.txt:27 (include):
include could not find requested file:
mode.
```
so, in this change
* the the default CMAKE_BUILD_TYPE to "Release"
* quote the ${CMAKE_BUILD_TYPE} when searching it
in the allowed build type lists.
this should address the issues above.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15326
The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three.
Closes#15280
* github.com:scylladb/scylladb:
system_keyspace: Don't require snitch argument on start
system_keyspace: Don't cache local dc:rack pair
system_keyspace: Save local info with explicit location
storage_service: Get endpoint location from snitch, not system keyspace
snitch: Introduce and use get_location() method
repair: Local location variables instead of system keyspace's one
repair: Use full endpoint location instead of datacenter part
A reviewer noted that test_update_expression_list_append_non_list_arguments
has too much code duplication - the same long API call to run
"SET a = list_append(...)" was repeated many times.
So in this patch we add a short inner function "try_list_append" to
avoid this duplication.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes: #15298
- Adds type for each option.
- Filters out unused / invalid values, moves them to a separate section.
- Adds the term "liveness" to the glossary.
- Removes unused and invalid properties from the docs.
- Updates to the latest version of pyaml.
docs: rename config template directive
Closes#15164
in this series, we try to improve `unified-installer.rst`
- encourage user to install smaller package
- run `./install.sh` directly instead relying on that `sh` points to `bash`
Closes#15325
* github.com:scylladb/scylladb:
doc: run install.sh directly
doc: install headless jdk in sample command line
Find progress of repair tasks based on the number of ranges
that have been repaired.
Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156).
Closes#14698
* github.com:scylladb/scylladb:
test: repair tasks test
repair: add methods making repair progress more precise
tasks: make progress related methods virtual
repair: add get_progress method to shard_repair_task_impl
repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()
repair: log a name of a particular table repair is working on
tasks: delete move and copy constructors from task_manager::task::impl
SIGSEGV was caught during tablet streaming, and the reason was
that storage_service::_group0 (via set_group0()) is only set on
shard 0, therefore when streaming ran on any other shard,
it tried to dereference garbage, which resulted in the crash.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#15307
before this change, filesystem_storage::open() reuses
`sstable::make_component_file_writer()` to create the
temporary toc, it will rename the temporary toc to the
real TOC when sealing the sstable.
but this prevents us from reusing filesystem_storage in
yet another storage backend. as the
1. create temporary
2. rename temporary to toc
dance only applies to filesystem_storage. when
filesystem_storage calls into sstable, it calls `sst.make_component_file_writer()`,
which in turn calls the `_storage->make_component_sink()`.
but at this moment, `_storage` is not necessarily `filesystem_storage`
anymore. it could be a wrapper around `filesystem_storage`,
which is not aware of the create-rename dance. and could do
a lot more than create a temporary file when asked to
"make_component_sink()".
if we really want to go this way by reusing sstable's API
in `filesystem_storage` to create a temporary toc, we will
have to rename the whatever temporary toc component created
by the wrapper backend to the toc with the seal() func. but
again, this rename op is only implemented in the
filesystem_storage backend. to mirror this operation in
the wrapper backend does not make sense at all -- it
does not have to be aware of the filesystem_storage's internals.
so in this change, instead of reusing the
`sstable::make_component_file_writer()`, we just inline
its implementation in filesystem_storage to avoid this
problem. this is also an improvement from the design
perspective, as the storage should not call into its
the higher abstraction -- sstable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14443
seastar has deprecated the overload which accepts `server_name`,
let's use the one which accepts `tls::tls_options`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15324
Currently, the topology coordinator has the
`topology::transition_state::publish_cdc_generation` state responsible
for publishing the already created CDC generations to the user-facing
description tables. This process cannot fail as it would cause some CDC
updates to be missed. On the other hand, we would like to abort the
`publish_cdc_generation` state when bootstrap aborts. Of course, we
could also wait until handling this state finishes, even in the case of
the bootstrap abort, but that would be inefficient. We don't want to
unnecessarily block topology operations by publishing CDC generations.
The solution proposed by this PR is to remove the
`publish_cdc_generation` state completely and introduce a new background
fiber of the topology coordinator -- `cdc_generation_publisher` -- that
continually publishes committed CDC generations.
Apart from introducing the CDC generation publisher, we add
`test_cdc_generation_publishing.py` that verifies its correctness and we
adapt other CDC tests to the new changes.
Fixes#15194Closes#15281
* github.com:scylladb/scylladb:
test: test_cdc: introduce wait_for_first_cdc_generation
test: move cdc_streams_check_and_repair check
test: add test_cdc_generation_publishing
docs: remove information about publish_cdc_generation
raft topology: introduce the CDC generation publisher
system_keyspace: load unpublished_cdc_generations to topology
raft topology: mark committed CDC generations as unpublished
raft topology: add unpublished_cdc_generations to system.topology
Add tests for gossiper/endpoint/live and gossiper/endpoint/down
which run only in release mode.
Enable test_remove_node_with_concurrent_ddl and fix types and
variables names used by it, so that they can be reused in gossiper
test.
Fixes: #15223.
Closes#15244
* github.com:scylladb/scylladb:
test: topology: add gossiper test
test: fix types and variable names in wait_for_host_down
ClangBuildAnalyzer reports cql3/cql_statement.hh as being one of the
most expensive header files in the project - being included (mostly
indirectly) in 129 source files, and costing a total of 844 CPU seconds
of compilation.
This patch is an attempt, only *partially* successful, to reduce the
number of times that cql_statement.hh is included. It succeeds in
lowering the number 129 to 99, but not less :-( One of the biggest
difficulties in reducing it further is that query_processor.hh includes
a lot of templated code, which needs stuff from cql_statement.hh.
The solution should be to un-template the functions in
query_processor.hh and move them from the header to a source file, but
this is beyond the scope of this patch and query_processor.hh appears
problematic in other respects as well.
Unfortunately the compilation speedup by this patch is negligible
(the `du -bc build/dev/**/*.o` metric shows less than 0.01% reduction).
Beyond the fact that this patch only removes 30% of the inclusions of
this header, it appears that most of the source files that no longer
include cql_statement.hh after this patch, included anyway many of the
other headers that cql_statement.hh included, so the saving is minimal.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15212
strictly speaking, `sh` is not necessarily bash. while `install.sh`
is written in the Bash dialect. and it errors out if it is not executed
with Bash. and we don't need to add "-x" when running the script, if
we have to, we should add it in `install.sh` not ask user to add this
option. also, `install.sh` is executable with a shebang line using
bash, so we can just execute it.
so, in this change, we just launch this script in the command line
sample.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in comparison with java-11-openjdk, java-11-openjdk-headless does not
offer audio and video support, and has less dependencies. for instance,
java-11-openjdk depends on the X11 libraries, and it also provides
icons representing JDK. but since scylla is a server side application,
we don't expect user to run a desktop on it. so there is no need to
support audio and video.
in this change, we just suggest the a "smaller" package, which is
actually also a dependency of java-11-open-jdk.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
After introducing the CDC generation publisher,
test_cdc_log_entries_use_cdc_streams could (at least in theory)
fail by accessing system_distributed.cdc_streams_descriptions_v2
before the first CDC generation has been published.
To avoid flakiness, we simply wait until the first CDC generation
is published in a new function -- wait_for_first_cdc_generation.
The part of test_topology_ops that tests the
cdc_streams_check_and_repair request could (at least in theory)
fail on
`assert(len(gen_timestamps) + 1 == len(new_gen_timestamps))`
after introducing the CDC generation publisher because we can
no longer assume that all previously committed CDC generations
have been published before sending the request.
To prevent flakiness, we move this part of the test to
test_cdc_generations_are_published. This test allows for ensuring
that all previous CDC generations have been published.
Additionally, checking cdc_streams_check_and_repair there is
simpler and arguably fits the test better.
We add two test cases that test the new CDC generation publisher
to detect potential bugs like incorrect order of publications or
not publishing some generations at all.
The purpose of the second test case --
test_multiple_unpublished_cdc_generations -- is to enforce and test
a scenario when there are multiple unpublished CDC generations at
the same time. We expect that this is a rare case. The main fiber
of the topology coordinator would have to make much more progress
(like finishing two bootstraps) than the CDC generation publisher
fiber. Since multiple unpublished CDC generations might never
appear in other tests but could be handled incorrectly, having
such a test is valuable.
Currently, the topology coordinator has the
topology::transition_state::publish_cdc_generation state
responsible for publishing the already created CDC generations
to the user-facing description tables. This process cannot fail
as it would cause some CDC updates to be missed. On the other
hand, we would like to abort the publish_cdc_generation state when
bootstrap aborts. Of course, we could also wait until handling this
state finishes, even in the case of the bootstrap abort, but that
would be inefficient. We don't want to unnecessarily block topology
operations by publishing CDC generations.
The solution is to remove the publish_cdc_generation state
completely and introduce a new background fiber of the topology
coordinator -- cdc_generation_publisher -- that continually
publishes committed CDC generations.
The implementation of the CDC generation publisher is very similar
to the main fiber of the topology coordinator. One noticeable
difference is that we don't catch raft::commit_status_unknown,
which is handled raft_group0_client::add_entry.
Note that this modification changes the Raft-based topology a bit.
Previously, the publish_cdc_generation state had to end before
entering the next state -- write_both_read_old. Now, committed
CDC generations can theoretically be published at any time.
Although it is correct because the following states don't depend on
publish_cdc_generation, it can cause problems in tests. For example,
we can't assume now that a CDC generation is published just because
the bootstrap operation has finished.
We extend service::topology with the list of unpublished CDC
generations and load its contents from system.topology. This step
is the last one in making unpublished CDC generations accessible
to the topology coordinator.
Note that when we load unpublished_cdc_generations, we don't
perform any sanity checks contrary to current_cdc_generation_uuid.
Every unpublished CDC generation was a current generation once,
and we checked it at that moment.
We add committed CDC generations to unpublished_cdc_generations
so that we can load them to topology and properly handle them
in the following commits.
In the following commits, we replace the
topology::transition_state::publish_cdc_generation state with
a background fiber that continually publishes committed CDC
generations. To make these generations accessible to the
topology coordinator, we store them in the new column of
system.topology -- unpublished_cdc_generations.
If a test isn't going to use task manager or isn't interested in
statuses of finished tasks, then keeping them in the memory
for some time (currently 10s by default) after they are finished
is a memory waste.
Set default task_ttl value to zero. It can be changed by setting
--task-ttl-in-seconds or through rest api (/task_manager/ttl).
In conf/scylla.yaml set task-ttl-in-seconds to 10.
Closes#15239
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda with seastar::async(). The cql env already provides a helper for that
Closes#15305
* github.com:scylladb/scylladb:
cql_query_test: Fix indentation after previous patch
cql_query_test: Use do_with_cql_env_thread() explicitly
This code was supposed to be moved into
`mutate_live_and_unreachable_endpoints`
in 2c27297dbd
but it looks like the original statements were left
in place outside the mutate function.
This patch just removes the stale code since the required
logic is already done inside `mutate_live_and_unreachable_endpoints`.
Fixesscylladb/scylladb#15296
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15304
The Alternator tests can run against HTTPS - namely when using
test/alternator/run with the "--https" option (local Alternator
configured with HTTPS) or "--aws" option (DynamoDB, using HTTPS).
In some cases we make these HTTPS requests with verify=False, to avoid
checking the SSL certificates. E.g., this is necessary for Alternator
with a self-signed certificate. Unfortunately, the urllib3 library adds
an ugly warning message when SSL certificate verification is disabled.
In the past we tried to disable these warnings, using the documented
urllib3.disable_warnings() function, but it didn't help. It turns out
that pytest has its own warning handling, so to disable warnings in
pytest we must say so in a special configuration parameter in pytest.ini.
So in this patch, we drop the disable_warnings call from conftest.py
(where it didn't help), and instead put a similar declaration in
pytest.ini. The disable_warnings call in the test/alternator/run
script needs to remain - it is run outside pytest, so pytest.ini
doesn't affect it.
After this patch, running test/alternator/run with --https or --aws
finishes without warnings, as desired.
Fixes#15287
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15292
Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.
Prepare for extending the token_metadata interface
to provide host_id in update_topology.
We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).
Closes#15297
* github.com:scylladb/scylladb:
locator: topology: is_configured_this_node: delete spurious semicolumn
locator: topology: is_configured_this_node: compare host_id first
Some tests use non-threaded do_with_cql_env() and wrap the inner lambda
with seastar::async(). The cql env already provides a helper for that
Indentation is deliberately left broken until next patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
before this change, we assume that node_exporter artifacts are
always located under `build/node_exporter`. but this could might
hold anymore, if we want to have a self-contained build, in the sense
that different builds do not share the same set of node_exporter
artifacts. this could be a waste as the node_exporter artifacts
are identical across different builds, but this makes things
a lot simpler -- different builds do not have to hardwire to
a certain directory.
so, a new option is added to `create-relocatable-package.py`, this
allows us to specify the directory where node_export artifacts
are located.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
instead of specifying the build "mode", and assuming that
the build directory is always located at "build/${mode}", specify
the build directory explicitly. this allows us to use
`create-relocatable-package.py` to package artifacts built
at build directory whose path does not comply to the
naming convention, for instance, we might want to build
scylla in `build/yet-another-super-feature/release`.
so, in this change, we trade `--mode` for an option named
`--build-dir` and update `configure.py` accordingly.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, when running into a zero chunk_len, scylla
crashes with `assert(chunk_size != 0)`. but we can do better than
printing a backtrace like:
```
scylla: sstables/compress.cc:158: void
sstables::compression::segmented_offsets::init(uint32_t): Assertion `chunk_size != 0' failed.
```
so, in this change, a `malformed_sstable_exception` is throw in place
of an `assert()`, which is supposed to verify the programming
invariants, not for identifying corrupted data file.
Fixes#15265
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15264
Passing the gate_closed_exception to the task promise
ends up with abandoned exception since no-one is waiting
for it.
Instead, enter the gate when the task is made
so it will fail make_task if the gate is already closed.
Fixes scylladb/scylladb#15211
In addition, this series adds a private abort_source for each task_manager module
(chained to the main task_manager::abort_source) and abort is requested on task_manager::module::stop().
gate holding in compaction_manager is hardened
and makes sure to stop compaction_manager and task_manager in sstable_compaction_test cases.
Closes#15213
* github.com:scylladb/scylladb:
compaction_manager: stop: close compaction_state:s gates
compaction_manager: gracefully handle gate close
task_manager: task: start: fixup indentation
task_manager: module: make_task: enter gate when the task is created
task_manaer: module: stop: request abort
task_manager: task::impl: subscribe to module about_source
test: compaction_manager_stop_and_drain_race_test: stop compaction and task managers
test: simple_backlog_controller_test: stop compaction and task managers
Since 5d1f60439a we have
this node's host_id in topology config, so it can be used
to determine this node when adding it.
Prepare for extending the token_metadata interface
to provide host_id in update_topology.
We would like to compare the host_id first to be able to distinguish
this node from a node we're replacing that may have the same ip address
(but different host_id).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Improved the coverage of the tests for the list_append() function
in UpdateExpression - test that if one of its arguments is not a list,
including a missing attribute or item, it is reported as an error as
expected.
The new tests pass on both Alternator and DynamoDB.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#15291
so we can point debian_files_gen.py to builddir other than
'build', and can optionally use other output directory. this would
help to reduce the number of "magic numbers" in our building system.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
restructure the script into functions, prepare for the change which
allows us to specify the build directory when preparing the "debian"
packaging recipes.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
instead of flattening the functions into the script, let's structure them into functions. so they can be reused. and more maintainable this way.
Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15242
* github.com:scylladb/scylladb:
build: early return when appropriate
build: extract generate_compdb() out
Right now, the function allows for passing the path to a file as a seastar::sstring,
which is then converted to std::filesystem::path -- implicitly to the caller.
However, the function performs I/O, and there is no reason to accept any other type
than std::filesystem::path, especially because the conversion is straightforward.
Callers can perform it on their own.
This commit introduces the more constrained API.
Closes#15266