The object in question is used to facilitate creation of table objects for compaction tests. Currently the table_for_test carries a bunch of auxiliary objects that are needed for table creation, such as stats of all sorts and table state. However, there's also some "infrastructure" stuff onboard namely:
- reader concurrency semaphore
- cache tracker
- task manager
- compaction manager
And those four are excessive because all the tests in question run inside the sstables::test_env that has most of it.
This PR removes the mentioned objects from table_for_tests and re-uses those from test_env. Also, while at it, it also removes the table::config object from table_for_tests so that it looks more like core code that creates table does.
Closesscylladb/scylladb#15889
* github.com:scylladb/scylladb:
table_for_tests: Use test_env's compaction manager
sstables::test_env: Carry compaction manager on board
table_for_tests: Stop table on stop
table_for_tests: Get compaction manager from table
table_for_tests: Ditch on-board concurrency semaphore
table_for_tests: Require config argument to make table
table_for_tests: Create table config locally
table_for_tests: Get concurrency semaphore from table
table_for_tests: Get table directory from table itself
table_for_tests: Reuse cache tracker from sstables manager
table_for_tests: Remove unused constructor
tests: Split the compaction backlog test case
sstable_test_env: Coroutinize and move to .cc test_env::stop()
Replacing `restrict_replication_simplestrategy` config option with
2 config options: `replication_strategy_{warn,fail}_list`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on replication strategy when creating/altering a keyspace.
The reason to rather replace than extend `restrict_replication_simplestrategy` config
option is that it was not used and we wanted to generalize it.
Only soft guardrail is enabled by default and it is set to SimpleStrategy,
which means that we'll generate a CQL warning whenever replication strategy
is set to SimpleStrategy. For new cloud deployments we'll move
SimpleStrategy from warn to the fail list.
Guardrails violations will be tracked by metrics.
Resolves#5224
Refs #8892 (the replication strategy part, not the RF part)
Closesscylladb/scylladb#15399
Handler of STREAM_MUTATION_FRAGMENTS verb creates and starts reader. The
resulting future is then checked for being exceptional and an error
message is printed in logs.
However, if reader fails because of socket being closed by peer, the
error looks excessive. In that case the exception is just regular
handling of the socket/stream closure and can be demoted down to debug
level.
fixes: #15891
Similar cherry-picking of log level exists in e.g. storage proxy, see
for example 56bd9b5d (service: storage_proxy: do not report abort
requests in handle_write )
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15892
before this change, we feed `build_reloc.sh` with hardwired arguments
when building python3 submodule. but this is not flexible, and hurts
the maintainability.
in this change, we mirror the behavior of `configure.py`, and collect
the arguments from the output of `install-dependencies.sh`, and feed
the collected argument to `build_reloc.sh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15885
We have observed do_repair_ranges() receiving tens of thousands of
ranges to repairs on occasion. do_repair_ranges() repairs all ranges in
parallel, with parallel_for_each(). This is normally fine, as the lambda
inside parallel_for_each() takes a semaphore and this will result in
limited concurrency.
However, in some instances, it is possible that most of these ranges are
skipped. In this case the lambda will become synchronous, only logging a
message. This can cause stalls beacuse there are no opportunities to
yield. Solve this by adding an explicit yield to prevent this.
Fixes: #14330Closesscylladb/scylladb#15879
The purpose of `maybe_fix_legacy_secondary_index_mv_schema` was to deal
with legacy materialized view schemas used for secondary indexes,
schemas which were created before the notion of "computed columns" was
introduced. Back then, secondary index schemas would use a regular
"token" column. Later it became a computed column and old schemas would
be migrated during rolling upgrade.
The migration code was introduced in 2019
(db8d4a0cc6) and then fixed in 2020
(d473bc9b06).
The fix was present in Enterprise 2022.1 and in OSS 4.5. So, assuming
that users don't try crazy things like upgrading from 2021.X to 2023.X
(which we do not support), all clusters will have already executed the
migration code once they upgrade to 2023.X, meaning we can get rid of
it.
The main motivation of this PR is to get rid of the
`db::schema_tables::merge_schema` call in `parse_schema_tables`. In Raft
mode this was the only call to `merge_schema` outside "group 0 code" and
in fact it is unsafe -- it uses locally generated mutations with locally
generated timestamp (`api::new_timestamp()`), so if we actually did it,
we would permanently diverge the group 0 state machine across nodes
(the schema pulling code is disabled in Raft mode). Fortunately, this
should be dead code by now, as explained in the previous paragraph.
The migration code is now turned into a sanity check, if the users
try something crazy, they will get an error instead of silent data
corruption.
Closesscylladb/scylladb#15695
* github.com:scylladb/scylladb:
view: remove unused `_backing_secondary_index`
schema_tables: turn view schema fixing code into a sanity check
schema_tables: make comment more precise
feature_service: make COMPUTED_COLUMNS feature unconditionally true
to be compatible with `configure.py` which allows us to optionally
specify the --date-stamp option for SCYLLA-VERSION-GEN. this option
is used by our CI workflow.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15896
this change silences following compiling warning due to using the
deprecated API by using the recommended API in place of the deprecated
one:
```
/home/kefu/dev/scylladb/alternator/server.cc:569:27: warning: 'set_tls_credentials' is deprecated: use listen(socket_address addr, server_credentials_ptr credentials) [-Wdeprecated-declarations]
_https_server.set_tls_credentials(creds->build_reloadable_server_credentials([](const std::unordered_set<sstring>& files, std::exception_ptr ep) {
^
/home/kefu/dev/scylladb/seastar/include/seastar/http/httpd.hh:186:7: note: 'set_tls_credentials' has been explicitly marked deprecated here
[[deprecated("use listen(socket_address addr, server_credentials_ptr credentials)")]]
^
1 warning generated.
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15884
Currently the cache updaters aren't exception safe
yet they are intended to be.
Instead of allowing exceptions from
`external_updater::execute` escape `row_cache::update`,
abort using `on_fatal_internal_error`.
Future changes should harden all `execute` implementations
to effectively make them `noexcept`, then the pure virtual
definition can be made `noexcept` to cement that.
Fixesscylladb/scylladb#15576Closesscylladb/scylladb#15577
* github.com:scylladb/scylladb:
row_cache: abort on exteral_updater::execute errors
row_cache: do_update: simplify _prev_snapshot_pos setup
Now when the sstables::test_env provides the compaction manager
instance, the table_for_tests can start using it and can remove c.m. and
the sidecar task_manager.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Most of the test cases that use sstables::test_env do not mess with
table objects, they only need sstables. However, compaction test cases
do need table objects and, respectively, a compaction manager instance.
Today those test cases create compaction manager instance for each table
they create, but that's a bit heaviweight and doesn't work the way core
code works. This patch prepares the sstables::test_env to provide
compaction manager on demand by starting it as soon as it's asked to
create table object.
For now this compaction manager is unused, but it will be in next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patches will stop using compaction manager from table_for_tests in
favor of external one (spoiler: the one from sstables::test_env), thus
the compaction manager would outsurvive the table_for_tests object and
the table object wrapped by it. So in order for the table_for_tests to
stop correctly, it also needs to stop the wrapped table too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's table_for_tests::get_compaction_manager() helper that's
excessive as compaction manager reference can be provided by the wrapped
table object itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's not used any longer and can be removed. This make table_for_tests
stopping code a bit shorter as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the continuation of the previous patch. Make the caller of
table_for_tests constructor provide the table::config. This makes the
table_for_tests constructor shorter and more self-contained.
Also, the caller now needs to provide the reference to reader
concurrency semaphore, and that's good news, because the only caller for
today is the sstables::test_env that does have it. This makes the
semaphore sitting on table_for_tests itself unused and it will be
removed eventually.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The table_for_tests keeps a copy of table::config on board. That's not
"idiomatic" as table config is a temporary object that should only be
needed while creating table object. Fortunately, the copy of config on
table_for_tests is no longer needed and it can be made temporary.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Making compaction permit needs a semaphore. Current code gets it from
the table_for_tests, but the very same semaphore reference sits on the
table. So get it from table, as the core code does. This will allow
removing the dedicated semaphore from table_for_tests in the future.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Making sstable for a table needs passing table directory as an argument.
Current table_for_tests's helper gets the directory from table config,
but the very same path sits on the table itself. This makes testing code
to construct sstable look closer to the core code and is also the
prerequisite for removing the table config from table_for_tests in the
future.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When making table object it needs the cache tracker reference. The
table_for_tests keeps one on board, but the very same object already
sits on the sstables manager which has public getter.
This makes the table_for_tests's cache tracker object not needed.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To improve parallelizm of embedded test sub-cases.
By coinsidence, indentation fix is not required.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's going to get larger, so better to move.
Also when coroutinized it's goind to be easier to extend.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
before this change, we only check the existence of compile_commands.json
before creating a symlink to build/*/compile_commands.json. but there are
chances that multiple ninja tasks are calling into `configure.py` for
updating `build.ninja`: this does not break the process, as the last one
wins: we just unconditionally `mv build.ninja.new build.ninja` for
updating the this file. but this could break the build of
`'compile_commands.json`: we create a symlink with Python, and if it
fails the Python script errors out.
in this change, we just ignore the `FileExistsError` when creating
the symlink to `compile_commands.json`. because, if this symlink,
we've achieved the goal, and should not consider it a failure.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15870
This PR implements the following new nodetool commands:
* cleanup
* clearsnapshots
* listsnapshots
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15843
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the listsnapshots command
tools/scylla-nodetool: implement clearsnapshot command
tools/scylla-nodetool: implement the cleanup command
test/nodetool: rest_api_mock: add more options for multiple requests
tools/scylla-nodetool: log responses with trace level
* seastar 17183ed4e4...830ce86738 (6):
> coroutine: fix use-after-free in parallel_for_each
> build: do not provide zlib as an ingredient
> http: do not use req.content_length as both input parameter
> io_tester: disable -Wuninitialized when including boost.accumulators
> scheduling: revise the doxygen comment of create_scheduling_group()
> Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka
Closesscylladb/scylladb#15871
While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code.
This PR contains fixes for nodetool compact, stop and scrub.
Closesscylladb/scylladb#15636
* github.com:scylladb/scylladb:
docs: nodetool compact: remove common arguments
docs: nodetool stop: fix compaction types and examples
docs: nodetool compact: remove unsupported partition option
There's one that doesn't need tempdir path argument since it gets one
from the env onboard tempdir anyway
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15825
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.
Referenced issue: #14290Closesscylladb/scylladb#15856
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15859
we enable sanitizer only in Debug and Sanitize build modes, if we pass
`-fno-sanitize-address-use-after-scope` to compiler when the sanitizer
is not enabled when compiling, Clang complains like:
```
clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```
this breaks the build on the build modes where sanitizers are not
enabled.
so, in this change, we only disable the sanitize-address-use-after-scope
sanitizer if the sanitizers are enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15868
Uses a single db::config + extensions, allowing both handling
of enterprise-only scylla.yaml keys, as well as loading sstables
utilizing extension in that universe.
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
Currently, when we calculate the number of deactivated segments
in test_commitlog_delete_when_over_disk_limit, we only count the
segments that were active during the first flush. However, during
the test, there may have been more than one flush, and a segment
could have been created between them. This segment would sometimes
get deactivated and even destroyed, and as a result, the count of
destroyed segments would appear larger than the count of deactivated
ones.
This patch fixes this behavior by accounting for all segments that
were active during any flush instead of just segments active during
the first flush.
Fixes#10527Closesscylladb/scylladb#14610
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.
The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.
The easy fix is to perform the only potentially-throwing step first.
Fixes#15822Closesscylladb/scylladb#15864
Currently, it's possible for a test to pass even if the server crashes
during a graceful shutdown. Additionally, the server may crash in the
middle of a test, resulting in a test failure with an inaccurate
description. This commit updates the test framework to monitor the
server's return code and throw an exception in the event of an abnormal
server shutdown.
Fixesscylladb/scylla#15365Closesscylladb/scylladb#15660
before this change, when running object_store tests with `pytest`
directly, an instance of MinIoServer is started as a function-scope
fixture, but the environmental variables set by it stay with the
process, even after the fixture is teared down. So, when the 2nd test
in the same process check these environmental variables, it would
under the impression that there is already a S3 server running, and
thinks it is drived by `test.py`, hence try to reuse the S3 server.
But the MinIoServer instance is teared down at that moment, when
the first test is completed.
So the test is likely to fail when the Scylla instance tries
to read the missing conf file previously created by the MinIoServer.
after this change, the environmental variables are reset, so they
won't be seen by the succeeding tests in the same pytest session.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15779
this series is one of the steps to remove global statements in `configure.py`.
not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.
Refs #15379Closesscylladb/scylladb#15818
* github.com:scylladb/scylladb:
build: move the code with side effects into a single function
build: create outdir when outdir is explictly used
build: group the code with side effects together
build: do not rely on updating global with a dict
build: extract generate_version() out
build: extract get_release_cxxflags() out
build: extract get_extra_cxxflags() out
build: move thrift_libs to where it is used
build: move pkg closer to where it is used
build: remove unused variable
build: move variable closer to where it is used
it was a copy-pasta error.
- s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
- s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15849
Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier.
However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet.
Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.
Refs: scylladb/scylladb#15686Fixes: scylladb/scylladb#15738Closesscylladb/scylladb#15724
* github.com:scylladb/scylladb:
test: test_topology_ops: continuously write during the test
raft topology: assign tokens after join node response rpc
storage_service: fix indentation after previous commit
raft topology: loosen assumptions about transition nodes having tokens
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.
This fix affects also alternator's secondary indexes.
Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:
./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000
Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:
p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0
With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.
Fixes https://github.com/scylladb/scylladb/issues/15844Closesscylladb/scylladb#15845
We need to wait until the first node becomes normal in
`join_node_request_handler` to ensure that joining nodes are not
handled as the first node in the cluster.
If we placed a join request before the first node becomes normal,
the topology coordinator would incorrectly skip the join node
handshake in `handle_node_transition` (`case node_state::none`).
It would happen because the topology coordinator decides whether
a node is the first in the cluster by checking if there are no
normal nodes. Therefore, we must ensure at least one normal node
when the topology coordinator handles a join request for a
non-first node.
We change the previous check because it can return true if there
are no normal nodes. `topology::is_empty` would also return false
if the first node was still new or in transition.
Additionally, calling `join_node_request_handler` before the first
node sets itself as normal is frequent during concurrent bootstrap,
so we remove "unlikely" from the comment.
Fixes: scylladb/scylladb#15807Closesscylladb/scylladb#15775
The output is changed slightly, compared to the current nodetool:
* Number columns are aligned to the right
* Number columns don't have decimal places
* There are no trailing whitespaces