We refactor system_distributed_keyspace::start so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.
We remove a catch expression together with the FIXME from
get_updated_service_levels (add_new_columns_if_missing before the
patch) because we cannot treat the service_levels update
differently anymore.
After adding the keyspace_metadata parameter to
migration_listener::on_before_create_column_family,
tablet_allocator doesn't need to load it from the database.
This change is necessary before merging migration_manager::announce
calls in the following commit.
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
In the following commits, we reduce the number of the
migration_manager::anounce calls by merging some of them in a way
that logically makes sense. Some of these merges are similar --
we announce a new keyspace and its tables together. However,
we cannot use the current prepare_new_column_family_announcement
there because it assumes that the keyspace has already been created
(when it loads the keyspace from the database). Luckily, this
assumption is not necessary as this function only needs
keyspace_metadata. Instead of loading it from the database, we can
pass it as a parameter.
This PR implements the following new nodetool commands:
* cleanup
* clearsnapshots
* listsnapshots
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15843
* github.com:scylladb/scylladb:
tools/scylla-nodetool: implement the listsnapshots command
tools/scylla-nodetool: implement clearsnapshot command
tools/scylla-nodetool: implement the cleanup command
test/nodetool: rest_api_mock: add more options for multiple requests
tools/scylla-nodetool: log responses with trace level
* seastar 17183ed4e4...830ce86738 (6):
> coroutine: fix use-after-free in parallel_for_each
> build: do not provide zlib as an ingredient
> http: do not use req.content_length as both input parameter
> io_tester: disable -Wuninitialized when including boost.accumulators
> scheduling: revise the doxygen comment of create_scheduling_group()
> Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka
Closesscylladb/scylladb#15871
While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code.
This PR contains fixes for nodetool compact, stop and scrub.
Closesscylladb/scylladb#15636
* github.com:scylladb/scylladb:
docs: nodetool compact: remove common arguments
docs: nodetool stop: fix compaction types and examples
docs: nodetool compact: remove unsupported partition option
There's one that doesn't need tempdir path argument since it gets one
from the env onboard tempdir anyway
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15825
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.
Referenced issue: #14290Closesscylladb/scylladb#15856
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15859
we enable sanitizer only in Debug and Sanitize build modes, if we pass
`-fno-sanitize-address-use-after-scope` to compiler when the sanitizer
is not enabled when compiling, Clang complains like:
```
clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```
this breaks the build on the build modes where sanitizers are not
enabled.
so, in this change, we only disable the sanitize-address-use-after-scope
sanitizer if the sanitizers are enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15868
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
Currently, when we calculate the number of deactivated segments
in test_commitlog_delete_when_over_disk_limit, we only count the
segments that were active during the first flush. However, during
the test, there may have been more than one flush, and a segment
could have been created between them. This segment would sometimes
get deactivated and even destroyed, and as a result, the count of
destroyed segments would appear larger than the count of deactivated
ones.
This patch fixes this behavior by accounting for all segments that
were active during any flush instead of just segments active during
the first flush.
Fixes#10527Closesscylladb/scylladb#14610
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.
The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.
The easy fix is to perform the only potentially-throwing step first.
Fixes#15822Closesscylladb/scylladb#15864
Currently, it's possible for a test to pass even if the server crashes
during a graceful shutdown. Additionally, the server may crash in the
middle of a test, resulting in a test failure with an inaccurate
description. This commit updates the test framework to monitor the
server's return code and throw an exception in the event of an abnormal
server shutdown.
Fixesscylladb/scylla#15365Closesscylladb/scylladb#15660
before this change, when running object_store tests with `pytest`
directly, an instance of MinIoServer is started as a function-scope
fixture, but the environmental variables set by it stay with the
process, even after the fixture is teared down. So, when the 2nd test
in the same process check these environmental variables, it would
under the impression that there is already a S3 server running, and
thinks it is drived by `test.py`, hence try to reuse the S3 server.
But the MinIoServer instance is teared down at that moment, when
the first test is completed.
So the test is likely to fail when the Scylla instance tries
to read the missing conf file previously created by the MinIoServer.
after this change, the environmental variables are reset, so they
won't be seen by the succeeding tests in the same pytest session.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15779
this series is one of the steps to remove global statements in `configure.py`.
not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.
Refs #15379Closesscylladb/scylladb#15818
* github.com:scylladb/scylladb:
build: move the code with side effects into a single function
build: create outdir when outdir is explictly used
build: group the code with side effects together
build: do not rely on updating global with a dict
build: extract generate_version() out
build: extract get_release_cxxflags() out
build: extract get_extra_cxxflags() out
build: move thrift_libs to where it is used
build: move pkg closer to where it is used
build: remove unused variable
build: move variable closer to where it is used
it was a copy-pasta error.
- s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
- s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15849
Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier.
However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet.
Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.
Refs: scylladb/scylladb#15686Fixes: scylladb/scylladb#15738Closesscylladb/scylladb#15724
* github.com:scylladb/scylladb:
test: test_topology_ops: continuously write during the test
raft topology: assign tokens after join node response rpc
storage_service: fix indentation after previous commit
raft topology: loosen assumptions about transition nodes having tokens
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.
This fix affects also alternator's secondary indexes.
Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:
./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000
Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:
p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0
With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.
Fixes https://github.com/scylladb/scylladb/issues/15844Closesscylladb/scylladb#15845
We need to wait until the first node becomes normal in
`join_node_request_handler` to ensure that joining nodes are not
handled as the first node in the cluster.
If we placed a join request before the first node becomes normal,
the topology coordinator would incorrectly skip the join node
handshake in `handle_node_transition` (`case node_state::none`).
It would happen because the topology coordinator decides whether
a node is the first in the cluster by checking if there are no
normal nodes. Therefore, we must ensure at least one normal node
when the topology coordinator handles a join request for a
non-first node.
We change the previous check because it can return true if there
are no normal nodes. `topology::is_empty` would also return false
if the first node was still new or in transition.
Additionally, calling `join_node_request_handler` before the first
node sets itself as normal is frequent during concurrent bootstrap,
so we remove "unlikely" from the comment.
Fixes: scylladb/scylladb#15807Closesscylladb/scylladb#15775
The output is changed slightly, compared to the current nodetool:
* Number columns are aligned to the right
* Number columns don't have decimal places
* There are no trailing whitespaces
With this, both requests and responses to/from the remote are logged
when trace-level logging is enabled. This should greatly simplify
debugging any problems.
use the captalized "ALLOW FILTERING" in the error message, because the
error message is a part of the user interface, it would be better to
keep it aligned with our document, where "ALLOW FILTERING" is used.
so, in this change, the lower-cased "allow filtering" error message is
changed to "ALLOW FILTERING", and the tests are updated accordingly.
see also a0ffbf3291
Refs #14321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15718
actually we've created outdir when using it as the parent directory
of `tempfile.tempdir`, but there are many places where we use
`tempfile.tempdir` for, for instance, testing the compiler flags,
and these tests will be removed once we migrate to CMake, so they
do not really matter when reviewing the change which migrates to
CMake.
the point of this change is to help the review understand the major
changes performed by the migration.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
we use `globals().update(vars(args))` for updating the global variables
with a dict in `args`, this is convenient, but it hurts the readability.
let's reference the parsed options explicitly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
prepare for the change to read the SCYLLA-*-FILE in functions not
doing this in global scope.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
on top of per-mode cxxflags, we apply more of them based on settings
and building environment. to reduce the statements in global scope,
let's extract the related code into a function.
Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`optional_packages` was introduced in 8b0a26f06d, but we don't
offer the alternative versions of libsystemd anymore, and this
variable is not used in `configure.py`, so let's drop it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This is the continuation for 19fc01be23
Registering API handlers for services need to
* use only the required service (sharded<> one if needed)
* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)
There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency.
Closesscylladb/scylladb#15831
* github.com:scylladb/scylladb:
api: Remove http::context -> token_metadata dependency
api: Pass shared_token_metadata instead of storage_service
api: Move snitch endpoints that use token metadata only
api: Move storage_service endpoints that use token metadata only
Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running.
To prevent this, keep the erm alive for the duration of the query.
Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it.
Fixes: #15802Closesscylladb/scylladb#15808
* github.com:scylladb/scylladb:
cql3: mutation_fragments_select_statement: use host_id instead of node
cql3: mutation_fragments_select_statement: pin erm reference
In order to detect issues where requests are routed incorrectly during
topology changes, modify the test_topology_ops test so that it runs a
background process that continuously writes while the test performs
topology changes in the cluster.
At the end of the test check whether:
- All writes were successful (we only require CL=LOCAL_ONE)
- Whether there are any errors from the replica side logic in the nodes'
logs (which happen e.g. when node receives writes before learning
about the schema)