Commit Graph

39530 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
df199eec11 db: system_distributed_keyspace: fix indentation
Broken in the previous commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
91ff8007b3 db: system_distributed_keyspace: announce once in start
We refactor system_distributed_keyspace::start so that it takes at
most one group 0 guard and calls migration_manager::announce at
most once.

We remove a catch expression together with the FIXME from
get_updated_service_levels (add_new_columns_if_missing before the
patch) because we cannot treat the service_levels update
differently anymore.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
5027c5f1e5 tablet_allocator: update on_before_create_column_family
After adding the keyspace_metadata parameter to
migration_listener::on_before_create_column_family,
tablet_allocator doesn't need to load it from the database.

This change is necessary before merging migration_manager::announce
calls in the following commit.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
a762179972 migration_listener: add parameter to on_before_create_column_family
After adding the new prepare_new_column_family_announcement that
doesn't assume the existence of a keyspace, we also need to get
rid of the same assumption in all on_before_create_column_family
calls. After all, they may be initiated before creating the
keyspace. However, some listeners require keyspace_metadata, so we
pass it as a new parameter.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
a2e48b1a5b alternator: executor: use new prepare_new_column_family_announcement
We can use the new prepare_new_column_family_announcement function
that doesn't assume the existence of the keyspace instead of the
previous work-around.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
4ad2d895a3 alternator: executor: introduce create_keyspace_metadata
We need to store a new keyspace's keyspace_metadata as a local
variable in create_table_on_shard0. In the following commit, we
use it to call the new prepare_new_column_family_announcement
function.
2023-10-31 12:08:03 +01:00
Patryk Jędrzejczak
fb2703de50 migration_manager: add new prepare_new_column_family_announcement
In the following commits, we reduce the number of the
migration_manager::anounce calls by merging some of them in a way
that logically makes sense. Some of these merges are similar --
we announce a new keyspace and its tables together. However,
we cannot use the current prepare_new_column_family_announcement
there because it assumes that the keyspace has already been created
(when it loads the keyspace from the database). Luckily, this
assumption is not necessary as this function only needs
keyspace_metadata. Instead of loading it from the database, we can
pass it as a parameter.
2023-10-31 12:08:03 +01:00
Avi Kivity
949e9f1205 Merge 'Nodetool additional commands 3/N' from Botond Dénes
This PR implements the following new nodetool commands:
* cleanup
* clearsnapshots
* listsnapshots

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15843

* github.com:scylladb/scylladb:
  tools/scylla-nodetool: implement the listsnapshots command
  tools/scylla-nodetool: implement clearsnapshot command
  tools/scylla-nodetool: implement the cleanup command
  test/nodetool: rest_api_mock: add more options for multiple requests
  tools/scylla-nodetool: log responses with trace level
2023-10-30 21:53:36 +02:00
Avi Kivity
5a7d15a666 Update seastar submodule
* seastar 17183ed4e4...830ce86738 (6):
  > coroutine: fix use-after-free in parallel_for_each
  > build: do not provide zlib as an ingredient
  > http: do not use req.content_length as both input parameter
  > io_tester: disable -Wuninitialized when including boost.accumulators
  > scheduling: revise the doxygen comment of create_scheduling_group()
  > Merge 'Added ability to configure different credentials per HTTP listeners' from Michał Maślanka

Closes scylladb/scylladb#15871
2023-10-30 21:39:12 +02:00
Avi Kivity
03a801b61b Merge 'Nodetools docs improvements 1/N' from Botond Dénes
While working on https://github.com/scylladb/scylladb/issues/15588, I noticed problems with the existing documentation, when comparing it with the actual code.
This PR contains fixes for nodetool compact, stop and scrub.

Closes scylladb/scylladb#15636

* github.com:scylladb/scylladb:
  docs: nodetool compact: remove common arguments
  docs: nodetool stop: fix compaction types and examples
  docs: nodetool compact: remove unsupported partition option
2023-10-30 20:17:14 +02:00
Pavel Emelyanov
c88de8f91e test/compaction: Use shorter make_table_for_tests() overload
There's one that doesn't need tempdir path argument since it gets one
from the env onboard tempdir anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15825
2023-10-30 20:16:29 +02:00
Paweł Zakrzewski
384427bd02 doc: Replace instances of SimpleStrategy with NetworkTopologyStrategy
The goal is to make the available defaults safe for future use, as they
are often taken from existing config files or documentation verbatim.

Referenced issue: #14290

Closes scylladb/scylladb#15856
2023-10-30 20:15:48 +02:00
Pavel Emelyanov
7fa7a9495d task_manager: Don't leave task_ttl uninitialized
When task_manager is constructed without config (tests) its task_ttl is
left uninitialized (i.e. -- random number gets in there). This results
in tasks hanging around being registered for infinite amount of time
making long-living task manager look hanged.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15859
2023-10-30 20:15:05 +02:00
Kefu Chai
d01b9f95a0 build: cmake: disable sanitize-address-use-after-scope only when needed
we enable sanitizer only in Debug and Sanitize build modes, if we pass
`-fno-sanitize-address-use-after-scope` to compiler when the sanitizer
is not enabled when compiling, Clang complains like:

```
clang-16: error: argument unused during compilation: '-fno-sanitize-address-use-after-scope' [-Werror,-Wunused-command-line-argument]
```

this breaks the build on the build modes where sanitizers are not
enabled.

so, in this change, we only disable the sanitize-address-use-after-scope
sanitizer if the sanitizers are enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15868
2023-10-30 20:14:12 +02:00
Avi Kivity
d450a145ce Revert "Merge 'reduce announcements of the automatic schema changes ' from Patryk Jędrzejczak"
This reverts commit 4b80130b0b, reversing
changes made to a5519c7c1f. It's suspected
of causing dtest failures due to a bug in coroutine::parallel_for_each.
2023-10-29 18:32:06 +02:00
Wojciech Mitros
f08e7aad61 test: account for multiple flushes of commitlog segments
Currently, when we calculate the number of deactivated segments
in test_commitlog_delete_when_over_disk_limit, we only count the
segments that were active during the first flush. However, during
the test, there may have been more than one flush, and a segment
could have been created between them. This segment would sometimes
get deactivated and even destroyed, and as a result, the count of
destroyed segments would appear larger than the count of deactivated
ones.

This patch fixes this behavior by accounting for all segments that
were active during any flush instead of just segments active during
the first flush.

Fixes #10527

Closes scylladb/scylladb#14610
2023-10-29 18:30:32 +02:00
Michał Chojnowski
93ea3d41d8 position_in_partition: make operator= exception-safe
The copy assignment operator of _ck can throw
after _type and _bound_weight have already been changed.
This leaves position_in_partition in an inconsistent state,
potentially leading to various weird symptoms.

The problem was witnessed by test_exception_safety_of_reads.
Specifically: in cache_flat_mutation_reader::add_to_buffer,
which requires the assignment to _lower_bound to be exception-safe.

The easy fix is to perform the only potentially-throwing step first.

Fixes #15822

Closes scylladb/scylladb#15864
2023-10-29 18:30:32 +02:00
Andrii Patsula
5807ef0bb7 test: Verify server exit code during graceful process shutdown.
Currently, it's possible for a test to pass even if the server crashes
during a graceful shutdown. Additionally, the server may crash in the
middle of a test, resulting in a test failure with an inaccurate
description.  This commit updates the test framework to monitor the
server's return code and throw an exception in the event of an abnormal
server shutdown.

Fixes scylladb/scylla#15365

Closes scylladb/scylladb#15660
2023-10-29 18:30:32 +02:00
Kefu Chai
2be5a86a14 test/pylib: unset the env variables set by MinIoServer
before this change, when running object_store tests with `pytest`
directly, an instance of MinIoServer is started as a function-scope
fixture, but the environmental variables set by it stay with the
process, even after the fixture is teared down. So, when the 2nd test
in the same process check these environmental variables, it would
under the impression that there is already a S3 server running, and
thinks it is drived by `test.py`, hence try to reuse the S3 server.
But the MinIoServer instance is teared down at that moment, when
the first test is completed.

So the test is likely to fail when the Scylla instance tries
to read the missing conf file previously created by the MinIoServer.

after this change, the environmental variables are reset, so they
won't be seen by the succeeding tests in the same pytest session.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15779
2023-10-29 18:30:32 +02:00
Botond Dénes
132ae92c75 Merge 'build: extract code fragments into functions' from Kefu Chai
this series is one of the steps to remove global statements in `configure.py`.

not only the script is more structured this way, this also allows us to quickly identify the part which should/can be reused when migrating to CMake based building system.

Refs #15379

Closes scylladb/scylladb#15818

* github.com:scylladb/scylladb:
  build: move the code with side effects into a single function
  build: create outdir when outdir is explictly used
  build: group the code with side effects together
  build: do not rely on updating global with a dict
  build: extract generate_version() out
  build: extract get_release_cxxflags() out
  build: extract get_extra_cxxflags() out
  build: move thrift_libs to where it is used
  build: move pkg closer to where it is used
  build: remove unused variable
  build: move variable closer to where it is used
2023-10-29 18:30:32 +02:00
Avi Kivity
e349a2657c Merge 'Allow running perf-simple-query with tablets' from Tomasz Grabiec
Usage:

```
build/dev/scylla perf-simple-query --tablets
```

Closes scylladb/scylladb#15656

* github.com:scylladb/scylladb:
  perf_simple_query: Allow running with tablets
  tests: cql_test_env: Allow creating keyspace with tablets
  tests: cql_test_env: Register storage_service in migration notifier
  test: cql_test_env: Initialize node state in topology
2023-10-29 18:30:32 +02:00
Aleksandr Bykov
6b991b4791 doc: add note about run test.py with toolchain/dbuild
test.py tests could be run with toolchain/dbuild and in this case
there is no need to executed ./install-dependicies.sh.

Closes scylladb/scylladb#15837
2023-10-29 18:30:32 +02:00
Kefu Chai
3a6e359328 build: cmake: add token_metadata.cc to api
`token_metadata.cc` moved into api in e4c0a4d34d, let's update CMake
accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15857
2023-10-29 18:30:32 +02:00
Kefu Chai
8819865c8d build: cmake: correct the variable names in mode.Dev.cmake
it was a copy-pasta error.

- s/CMAKE_CXX_FLAGS_RELEASE/CMAKE_CXX_FLAGS_DEV/
- s/Seastar_OptimizationLevel_RELEASE/Seastar_OptimizationLevel_DEV/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15849
2023-10-29 18:30:32 +02:00
Kamil Braun
1c0ae2e7ef Merge 'raft topology: assign tokens after join node response rpc' from Piotr Dulikowski
Currently, when the topology coordinator accepts a node, it moves it to bootstrap state and assigns tokens to it (either new ones during bootstrap, or the replaced node's tokens). Only then it contacts the joining node to tell it about the decision and let it perform a read barrier.

However, this means that the tokens are inserted too early. After inserting the tokens the cluster is free to route write requests to it, but it might not have learned about all of the schema yet.

Fix the issue by inserting the tokens later, after completing the join node response RPC which forces the receiving node to perform a read barrier.

Refs: scylladb/scylladb#15686
Fixes: scylladb/scylladb#15738

Closes scylladb/scylladb#15724

* github.com:scylladb/scylladb:
  test: test_topology_ops: continuously write during the test
  raft topology: assign tokens after join node response rpc
  storage_service: fix indentation after previous commit
  raft topology: loosen assumptions about transition nodes having tokens
2023-10-29 18:30:32 +02:00
Marcin Maliszkiewicz
020a9c931b db: view: run local materialized view mutations on a separate smp service group
When base write triggers mv write and it needs to be send to another
shard it used the same service group and we could end up with a
deadlock.

This fix affects also alternator's secondary indexes.

Testing was done using (yet) not committed framework for easy alternator
performance testing: https://github.com/scylladb/scylladb/pull/13121.
I've changed hardcoded max_nonlocal_requests config in scylla from 5000 to 500 and
then ran:

./build/release/scylla perf-alternator-workloads --workdir /tmp/scylla-workdir/ --smp 2 \
--developer-mode 1 --alternator-port 8000 --alternator-write-isolation forbid --workload write_gsi \
--duration 60 --ring-delay-ms 0 --skip-wait-for-gossip-to-settle 0 --continue-after-error true --concurrency 2000

Without the patch when scylla is overloaded (i.e. number of scheduled futures being close to max_nonlocal_requests) after couple seconds
scylla hangs, cpu usage drops to zero, no progress is made. We can confirm we're hitting this issue by seeing under gdb:

p seastar::get_smp_service_groups_semaphore(2,0)._count
$1 = 0

With the patch I wasn't able to observe the problem, even with 2x
concurrency. I was able to make the process hang with 10x concurrency
but I think it's hitting different limit as there wasn't any depleted
smp service group semaphore and it was happening also on non mv loads.

Fixes https://github.com/scylladb/scylladb/issues/15844

Closes scylladb/scylladb#15845
2023-10-29 18:30:32 +02:00
Patryk Jędrzejczak
a6236072ee raft topology: join_node_request_handler: wait until first node becomes normal
We need to wait until the first node becomes normal in
`join_node_request_handler` to ensure that joining nodes are not
handled as the first node in the cluster.

If we placed a join request before the first node becomes normal,
the topology coordinator would incorrectly skip the join node
handshake in `handle_node_transition` (`case node_state::none`).
It would happen because the topology coordinator decides whether
a node is the first in the cluster by checking if there are no
normal nodes. Therefore, we must ensure at least one normal node
when the topology coordinator handles a join request for a
non-first node.

We change the previous check because it can return true if there
are no normal nodes. `topology::is_empty` would also return false
if the first node was still new or in transition.

Additionally, calling `join_node_request_handler` before the first
node sets itself as normal is frequent during concurrent bootstrap,
so we remove "unlikely" from the comment.

Fixes: scylladb/scylladb#15807

Closes scylladb/scylladb#15775
2023-10-29 18:30:32 +02:00
Botond Dénes
16ce212c31 tools/scylla-nodetool: implement the listsnapshots command
The output is changed slightly, compared to the current nodetool:
* Number columns are aligned to the right
* Number columns don't have decimal places
* There are no trailing whitespaces
2023-10-27 01:26:54 -04:00
Botond Dénes
27854a50be tools/scylla-nodetool: implement clearsnapshot command 2023-10-27 01:26:54 -04:00
Botond Dénes
b32ee54ba0 tools/scylla-nodetool: implement the cleanup command
The --jobs command-line argument is accepted but ignored, just like the
current nodetool does.
2023-10-27 01:26:53 -04:00
Botond Dénes
7e3a78d73d test/nodetool: rest_api_mock: add more options for multiple requests
Change the current bool multiple param to a weak enum, allowing for a
third value: ANY, which allows for 0 matches too.
2023-10-26 08:31:12 -04:00
Botond Dénes
b878dcc1c3 tools/scylla-nodetool: log responses with trace level
With this, both requests and responses to/from the remote are logged
when trace-level logging is enabled. This should greatly simplify
debugging any problems.
2023-10-26 08:28:37 -04:00
Kefu Chai
227136ddf5 main.cc: specify shortname for scheduling groups
so, for instance, the logging message looks like:
```
INFO  2023-10-24 15:19:37,290 [shard 0:strm] storage_service - entering STARTING mode
```
instead of
```
INFO  2023-10-24 15:19:37,290 [shard 0:stre] storage_service - entering STARTING mode
```

Fixes #15267
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15821
2023-10-26 10:52:05 +03:00
Kefu Chai
d43afd576e cql3/restrictions/statement_restrictions: s/allow filtering/ALLOW FILTERING/
use the captalized "ALLOW FILTERING" in the error message, because the
error message is a part of the user interface, it would be better to
keep it aligned with our document, where "ALLOW FILTERING" is used.

so, in this change, the lower-cased "allow filtering" error message is
changed to "ALLOW FILTERING", and the tests are updated accordingly.

see also a0ffbf3291

Refs #14321
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15718
2023-10-26 10:00:37 +03:00
Kefu Chai
bfd99fad7f build: move the code with side effects into a single function
so that we can optionally utilize CMake for generating the building
system instead.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
85cc9073c9 build: create outdir when outdir is explictly used
actually we've created outdir when using it as the parent directory
of `tempfile.tempdir`, but there are many places where we use
`tempfile.tempdir` for, for instance, testing the compiler flags,
and these tests will be removed once we migrate to CMake, so they
do not really matter when reviewing the change which migrates to
CMake.

the point of this change is to help the review understand the major
changes performed by the migration.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
6c7cc927b5 build: group the code with side effects together
so we can move them into a single function

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
a375ce2ac1 build: do not rely on updating global with a dict
we use `globals().update(vars(args))` for updating the global variables
with a dict in `args`, this is convenient, but it hurts the readability.
let's reference the parsed options explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
a25a153e9f build: extract generate_version() out
so we don't do less things with side effects in the global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
cb6531b1a8 build: extract get_release_cxxflags() out
prepare for the change to read the SCYLLA-*-FILE in functions not
doing this in global scope.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
ec7ac3c750 build: extract get_extra_cxxflags() out
on top of per-mode cxxflags, we apply more of them based on settings
and building environment. to reduce the statements in global scope,
let's extract the related code into a function.

Refs #15379
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 12:58:19 +08:00
Kefu Chai
8646e6c5d1 build: move thrift_libs to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:38 +08:00
Kefu Chai
8b76f2a835 build: move pkg closer to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Kefu Chai
ea6bf6b908 build: remove unused variable
`optional_packages` was introduced in 8b0a26f06d, but we don't
offer the alternative versions of libsystemd anymore, and this
variable is not used in `configure.py`, so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Kefu Chai
846218a8bc build: move variable closer to where it is used
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-10-26 11:47:37 +08:00
Yaniv Kaul
600822379d Docs: small typo in cql extensions page
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#15840
2023-10-25 17:27:04 +03:00
Botond Dénes
5d1e9d8c46 Merge 'Sanitize API -> token_metadata dependency' from Pavel Emelyanov
This is the continuation for 19fc01be23

Registering API handlers for services need to

* use only the required service (sharded<> one if needed)
* get the service to handle requests via argument, not from http context (http context, in turn, is going not to depend on anything)

There are several endpoints scattered over storage_service and snitch that use token metadata and topology. This PR makes those endpoints work the described way and drop the api::ctx -> token_metadata dependency.

Closes scylladb/scylladb#15831

* github.com:scylladb/scylladb:
  api: Remove http::context -> token_metadata dependency
  api: Pass shared_token_metadata instead of storage_service
  api: Move snitch endpoints that use token metadata only
  api: Move storage_service endpoints that use token metadata only
2023-10-25 17:19:39 +03:00
Anna Stuchlik
ad29ba4cad doc: add info about encrypted tables to Backup
This commit updates the introduction of the Backup Your Data page to include information about encryption.

Fixes https://github.com/scylladb/scylladb/issues/15573

Closes scylladb/scylladb#15612
2023-10-25 17:15:15 +03:00
Avi Kivity
782c6a208a Merge 'cql3: mutation_fragments_select_statement: keep erm alive for duration of the query' from Botond Dénes
Said statement keeps a reference to erm indirectly, via a topology node pointer, but doesn't keep erm alive. This can result in use-after-free. Furthermore, it allows for vnodes being pulled from under the query's feet, as it is running.
To prevent this, keep the erm alive for the duration of the query.
Also, use `host_id` instead of `node`, the node pointer is not needed really, as the statement only uses the host id from it.

Fixes: #15802

Closes scylladb/scylladb#15808

* github.com:scylladb/scylladb:
  cql3: mutation_fragments_select_statement: use host_id instead of node
  cql3: mutation_fragments_select_statement: pin erm reference
2023-10-25 15:03:07 +03:00
Piotr Dulikowski
a3ba4b3109 test: test_topology_ops: continuously write during the test
In order to detect issues where requests are routed incorrectly during
topology changes, modify the test_topology_ops test so that it runs a
background process that continuously writes while the test performs
topology changes in the cluster.

At the end of the test check whether:

- All writes were successful (we only require CL=LOCAL_ONE)
- Whether there are any errors from the replica side logic in the nodes'
  logs (which happen e.g. when node receives writes before learning
  about the schema)
2023-10-25 11:50:17 +02:00