Commit Graph

38768 Commits

Author SHA1 Message Date
Aleksandra Martyniuk
e90e10112f node_ops: repair: move node_ops_id to node_ops directory 2023-09-13 10:40:04 +02:00
Avi Kivity
2c810e221a Merge 'Gossiper: replace seastar threads with coroutines' from Benny Halevy
Many of the gossiper internal functions currently use seastar threads for historical reasons,
but since they are short living, the cost of spawning a seastar thread for them is excessive
and they can be simplified and made more efficient using coroutines.

Closes #15364

* github.com:scylladb/scylladb:
  gossiper: reindent do_stop_gossiping
  gossiper: coroutinize do_stop_gossiping
  gossiper: reindent assassinate_endpoint
  gossiper: coroutinize assassinate_endpoint
  gossiper: coroutinize handle_ack2_msg
  gossiper: handle_ack_msg: always log warning on exception
  gossiper: reindent handle_ack_msg
  gossiper: coroutinize handle_ack_msg
  gossiper: reindent handle_syn_msg
  gossiper: coroutinize handle_syn_msg
  gossiper: message handlers: no need to capture shared_from_this
  gossiper: add_local_application_state: throw internal error if endpoint state is not found
  gossiper: coroutinize add_local_application_state
2023-09-12 21:50:52 +03:00
Benny Halevy
47dc287efd gossiper: reindent do_stop_gossiping
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:33:09 +03:00
Benny Halevy
8fa65ed016 gossiper: coroutinize do_stop_gossiping
Simplify the function.  It does not need to spawn
a seastar thread.

While at it, declare it as private since it's called
only internally by the gossiper (and on shard 0).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:33:09 +03:00
Benny Halevy
a792babbda gossiper: reindent assassinate_endpoint
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:33:09 +03:00
Benny Halevy
5dbc168c03 gossiper: coroutinize assassinate_endpoint
It has no need to spawn a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:33:09 +03:00
Benny Halevy
29b9596050 gossiper: coroutinize handle_ack2_msg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:33:09 +03:00
Benny Halevy
cc030a5040 gossiper: handle_ack_msg: always log warning on exception
Unlike handle_syn_msg, the warning is currently printed only
`if (_ack_handlers.contains(from.addr))`.
Unclear why. It is interesting in any case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:32:40 +03:00
Benny Halevy
990ac23d19 gossiper: reindent handle_ack_msg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:27:08 +03:00
Benny Halevy
2ca2118130 gossiper: coroutinize handle_ack_msg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:26:03 +03:00
Benny Halevy
8c065bf023 gossiper: reindent handle_syn_msg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:24:14 +03:00
Benny Halevy
264f4daded gossiper: coroutinize handle_syn_msg
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:23:09 +03:00
Benny Halevy
63ab5f1ab3 gossiper: message handlers: no need to capture shared_from_this
The handlers future is waited on under `background_msg`
which is closed in gossiper::stop so the instance is
already guranteed to be kept valid.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:21:07 +03:00
Benny Halevy
8bfec81985 gossiper: add_local_application_state: throw internal error if endpoint state is not found
If the function is called too early, the first get_endpoint_state_ptr
would throw an exception that is later caught and degraded
into a warning.

But that endpoint_state should never disappear after yielding,
so call on_internal_error in that case.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:21:07 +03:00
Benny Halevy
d1c67300d4 gossiper: coroutinize add_local_application_state
There is no need for it to spawn a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-12 19:20:41 +03:00
Avi Kivity
a1b2ca6184 Merge 'build: cmake: package cqlsh and fix the noarch postfix of python3 package' from Kefu Chai
in this series, the packaging of tools modules are improved:

- package cqlsh also. as cqlsh should be redistributed as a part of the unified package
- use ${arch} in the postfix of the python3 package. the python3 package is not architecture independent.
- set the version with tide for `Scylla_VERSION`, so it can be reused elsewhere.

Refs #15241

Closes #15369

* github.com:scylladb/scylladb:
  build: cmake: build cqlsh as a submodule
  build: cmake: always use the version with tilde
  build: cmake: build python3 dist tarball with arch postfix
  build: cmake: use the default comment message
2023-09-12 16:27:03 +03:00
David Garcia
5177ddac17 Support advanced db config scenarios
docs: skip html tags from description

Closes #15338
2023-09-12 15:29:16 +03:00
Tomasz Grabiec
6e83e54b0d Merge 'gossiper: get rid of uses_host_id' from Benny Halevy
This function practically returned true from inception.

In d38deef499
it started using messaging_service().knows_version(endpoint)
that also returns `true` unconditionally, to this day

So there's no point calling it since we can assume
that `uses_host_id` is true for all versions.

Closes #15343

* github.com:scylladb/scylladb:
  storage_service: fixup indentation after last patch
  gossiper: get rid of uses_host_id
2023-09-12 12:44:56 +02:00
Kefu Chai
571fab4179 build: cmake: build cqlsh as a submodule
since we also redistribute cqlsh, let's package it as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
4ff5ce9933 build: cmake: always use the version with tilde
since we always use tilde ("~") in the verson number,
let's just cache it as an internal variable in CMake.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
111d20958e build: cmake: build python3 dist tarball with arch postfix
now that `configure.py` always generate python3 dist tarball with
${arch} postfix, let's mirror this behavior. as `build_unified.sh`
uses this naming convention.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Kefu Chai
760b7c8772 build: cmake: use the default comment message
it turns out "Generating submodule python3 in python3" is not
as informative as default one:
"/home/kefu/dev/scylladb/tools/python3/build/scylla-python3-5.4.0~dev-0.20230908.1668d434e458.noarch.tar.gz"
so let's drop the "COMMENT" argument.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-12 18:18:31 +08:00
Botond Dénes
bc4b3e4fa3 Merge 'build: cmake: add packaging support ' from Kefu Chai
this change allows CMake to build the dist tarball for a certain build.

Refs https://github.com/scylladb/scylladb/issues/15241

Closes #15352

* github.com:scylladb/scylladb:
  build: cmake: add packaging support
  build: cmake: enable build of seastar/apps/iotune
2023-09-12 09:59:53 +03:00
Avi Kivity
89ba4e4a5e Merge 'Stop using anonymous minio bucket for tests' from Pavel Emelyanov
Currently minio starts with a bucket that has public anonymous access. Respectively, all tests use unsigned S3 requests. That was done for simplicity, and its better to apply some policy to the bucket and, consequentially, make tests sign their requests.

Other than the obvious benefit that we test requests signing in unit tests, another goal of this PR is to make it possible to simulate and test various error paths locally, e.g. #13745 and #13022

Closes #14525

* github.com:scylladb/scylladb:
  test/s3: Remove AWS_S3_EXTRA usage
  test/s3: Run tests over non-anonymous bucket
  test/minio: Create random temp user on start
  code: Rename S3_PUBLIC_BUCKET_FOR_TEST
2023-09-11 23:12:56 +03:00
Tomasz Grabiec
f77e90a0f0 tests: test_tablets: Reconnect the driver after server restart
This is a workaround for the flakiness of the test where INSERT
statements following the rolling restart fail with "No host available"
exception. The hypothesis is that those INSERTS race with driver
reconnecting to the cluster and if INSERTs are attempted before
reconnection is finished, the driver will refuse to execute the
statements.

The real fix should be in the driver to join with reconnections but
before that is ready we want to fix CI flakiness.

Refs #14746

Closes #15355
2023-09-11 21:58:46 +03:00
Kefu Chai
34e3302c01 dbuild: use --userns option when using podman
instead of fabricating a `/etc/password` manually, we can just
leave it to podman to add an entry in `/etc/password` in container.
as podman allows us to map user's account to the same UID in the
container. see
https://docs.podman.io/en/stable/markdown/options/userns.container.html.

this is not only a cosmetic change, it also avoid the permission denied
failure when accessing `/etc/passwd` in the container when selinux is
enabled. without this change, we would otherwise need to either add the
selinux lable to the bind volume with ':Z' option address the failure
like:

```
type=AVC msg=audit(1693449115.261:2599): avc:  denied  { open } for  pid=2298247 comm="bash" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
type=AVC msg=audit(1693449115.263:2600): avc:  denied  { open } for  pid=2298249 comm="id" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
```

found in `/var/log/audit/audit.log`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15230
2023-09-11 21:41:48 +03:00
Avi Kivity
b8a655f55e Update tools/python3 submodule
* tools/python3 45fbd05...3e833f1 (1):
  > install.sh: replace <tab> with spaces
2023-09-11 21:38:02 +03:00
Avi Kivity
628e6ffd33 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes #14923

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-09-11 19:20:19 +03:00
Kefu Chai
a0dcbb09c3 build: cmake: add packaging support
this change allows CMake to build the dist tarball for a certain build.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-11 23:05:30 +08:00
Kefu Chai
649c8f248d build: cmake: enable build of seastar/apps/iotune
scylla redistribute iotune, so let's enable the related building
options, so that we can built iotune on demand.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-11 23:02:52 +08:00
Nadav Har'El
45ec76cfbf Merge 'Enlighten native-transport shutdown' from Pavel Emelyanov
When `nodetool disablebinary` command executes its handler aborts listening sockets, shuts down all client connections _and_ (!) then waits for the connections to stop existing. Effectively the command tries to make sure that no activity initiated by a CQL query continues, even though client would never see its result (client sockets are closed)

This makes the disablebinary command hang for long sometimes, which is not really nice. The proposal is to wait for the connections to terminate in the background. So once disablebinary command exists what's guaranteed is that all client connections are aborted and new connections are not admitted, but some activity started by them may still be running (e.g. up until `nodetool drain` is issued). Driver-side sockets won't get the queries' results anyway.

The behavior of `disablebinary` is not documented wrt whether it should wait for CQL processing to stop or not, so technically we're not breaking anything. However, it can happen that it's a disruptive change and some setups may behave differently after it.

refs: #14031
refs: #14711

Closes #14743

* github.com:scylladb/scylladb:
  test/cql-pytest: Add enable|disable-binary test case
  test.py: Add suite option to auto-dirty cluster after test
  test/pylib: Add nodetool enable|disable-binary commands
  transport: Shutdown server on disablebinary
  generic_server: Introduce shutdown()
  generic_server: Decouple server stopped from connection stopped
  transport/controller: Coroutinize do_stop_server()
  transport/controller: Coroutinize stop_server()
2023-09-11 17:54:52 +03:00
Pavel Emelyanov
821a9c1fd4 test/cql-pytest: Add enable|disable-binary test case
The test checks that `nodetool disablebinary` makes subsequent queries
fail and `nodetool enablebinary` lets client to establish new
connections.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:38:49 +03:00
Pavel Emelyanov
375b8c6213 test.py: Add suite option to auto-dirty cluster after test
ScyllaCluster can be marked as 'dirty' which means that the cluster is
in unusable state (after test) and shouldn't be re-used by other tests
launched by test.py. For now this is only implemented via the cluster
manager class which is only available for topology tests.

Add a less flexible short-cut for cql-pytest-s via suite.yaml marking.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
2c3b30b395 test/pylib: Add nodetool enable|disable-binary commands
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
b42391bfbe transport: Shutdown server on disablebinary
... and do the real "sharded::stop" in the background. On node shutdown
it needs to pick up all dangling background stopping.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
4682c7f9a5 generic_server: Introduce shutdown()
The method waits for listening sockets to stop listening and aborts the
connected sockets, but doesn't wait for the established connections to
finish processing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
6dcf653995 generic_server: Decouple server stopped from connection stopped
The _stopped future resolves when all "sockets" stop -- listening and
connected ones. Furure patching will need to wait for listening sockets
to stop separately from connected ones.

Rename the `_stopped` to reflect what it is now while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Pavel Emelyanov
bc2d44994a transport/controller: Coroutinize do_stop_server()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Pavel Emelyanov
7701aa0789 transport/controller: Coroutinize stop_server()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Benny Halevy
7119c1d8cc token_metadata: update_topology: make endpoint_dc_rack arg optional
It's better to pass a disengaged optional when
the caller doesn't have the information rather than
passing the default dc_rack location so the latter
will never implicitly override a known endpoint dc/rack location.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #15300
2023-09-11 16:16:19 +02:00
Benny Halevy
08f8fd30ea gossiper: get rid of comment about advertise_removing
It was deleted in 66ff072540.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20230911140349.1809014-1-bhalevy@scylladb.com>
2023-09-11 16:14:26 +02:00
Benny Halevy
ed32ba7431 storage_service: fixup indentation after last patch
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-11 16:51:28 +03:00
Benny Halevy
f855479c9d gossiper: get rid of uses_host_id
This function practically returned true from inception.

In d38deef499
It started using messaging_service().knows_version(endpoint)
that also returns `true` unconditionally, to this day

So there's no point calling it since we can assume
that `uses_host_id` is true for all versions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-09-11 16:48:07 +03:00
Kefu Chai
87088b65b6 util: replace <tab> with spaces
to be aligned with seastar's coding-style.md: scylladb uses seastar's
coding-style.md. so let's adhere to it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15345
2023-09-11 14:38:46 +03:00
Botond Dénes
e5f724d5bd Merge 'create-relocatable-package.py: add --node-exporter-dir and --build-dir options' from Kefu Chai
this series adds `--node-exporter-dir` and `--build-dir` options to `create-relocatable-package.py`. this enables us to use create relocatable package from arbitrary build directories.

Refs #15241

Closes #15299

* github.com:scylladb/scylladb:
  create-relocatable-package.py: add --node-exporter-dir option
  build: specify the build dir instead mode
2023-09-11 14:32:53 +03:00
Botond Dénes
0c107c2076 Merge 'dist/debian: add command line option for builddir ' from Kefu Chai
so we can point `debian_files_gen.py` to builddir other than
'build', and can optionally use other output directory. this would
help to reduce the number of "magic numbers" in our building system.

Refs https://github.com/scylladb/scylladb/issues/15241

Closes #15282

* github.com:scylladb/scylladb:
  dist/debian: specify debian/* file encodings
  dist/debian: wrap lines whose length exceeds 100 chars
  dist/debian: add command line option for builddir
  dist/debian: modularize debian_files_gen.py
2023-09-11 14:31:33 +03:00
Botond Dénes
f770ff7a2b test/topology_custom: add test_read_repair.py 2023-09-11 07:07:12 -04:00
Botond Dénes
b55cead5cd replica/mutation_dump: detect end-of-page in range-scans
The current read-loop fails to detect end-of-page and if the query
result buider cuts the page, it will just proceed to the next
partition. This will result in distorted query results, as the result
builder will request for the consumption to stop after each clustering
row.
To fix, check if the page was cut before moving on to the next
partition.
A unit test reproducing the bug was also added.
2023-09-11 07:02:14 -04:00
Botond Dénes
82f4563757 tools/scylla-sstable: write: abort parser thread if writing fails
Currently if writing the sstable fails, e.g. because the input data is
out-of-order, the json parser thread hangs because its output is no
longer consumed. This results in the entire application just freezing.
Fix this by aborting the parsing thread explicitely in the
json_mutation_stream_parser destructor. If the parser thread existed
successfully, this will be a no-op, but on the error-path, this will
ensure that the parser thread doesn't hang.
2023-09-11 07:02:14 -04:00
Botond Dénes
46e37436d0 test/pylib: add REST methods to get node exe and workdir paths 2023-09-11 07:02:14 -04:00