Commit Graph

38745 Commits

Author SHA1 Message Date
Pavel Emelyanov
38d0ea0916 batchlog_manager: Fix drain() reentrability
Currently drain() is called twise -- first time from
storage_service::drain() (on shutdown), second via
batchlog_manager::stop(). The routine is unintentinally re-entrable,
because:
- explicit check for not aborting the abort source twise
- breaking semaphore can be done multiple times
- co-await-ing of the _started future works because the future is shared

That's not extremely elegant, better to make the drain() bail out early
if it was already called.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 16:30:07 +03:00
Botond Dénes
bc4b3e4fa3 Merge 'build: cmake: add packaging support ' from Kefu Chai
this change allows CMake to build the dist tarball for a certain build.

Refs https://github.com/scylladb/scylladb/issues/15241

Closes #15352

* github.com:scylladb/scylladb:
  build: cmake: add packaging support
  build: cmake: enable build of seastar/apps/iotune
2023-09-12 09:59:53 +03:00
Avi Kivity
89ba4e4a5e Merge 'Stop using anonymous minio bucket for tests' from Pavel Emelyanov
Currently minio starts with a bucket that has public anonymous access. Respectively, all tests use unsigned S3 requests. That was done for simplicity, and its better to apply some policy to the bucket and, consequentially, make tests sign their requests.

Other than the obvious benefit that we test requests signing in unit tests, another goal of this PR is to make it possible to simulate and test various error paths locally, e.g. #13745 and #13022

Closes #14525

* github.com:scylladb/scylladb:
  test/s3: Remove AWS_S3_EXTRA usage
  test/s3: Run tests over non-anonymous bucket
  test/minio: Create random temp user on start
  code: Rename S3_PUBLIC_BUCKET_FOR_TEST
2023-09-11 23:12:56 +03:00
Tomasz Grabiec
f77e90a0f0 tests: test_tablets: Reconnect the driver after server restart
This is a workaround for the flakiness of the test where INSERT
statements following the rolling restart fail with "No host available"
exception. The hypothesis is that those INSERTS race with driver
reconnecting to the cluster and if INSERTs are attempted before
reconnection is finished, the driver will refuse to execute the
statements.

The real fix should be in the driver to join with reconnections but
before that is ready we want to fix CI flakiness.

Refs #14746

Closes #15355
2023-09-11 21:58:46 +03:00
Kefu Chai
34e3302c01 dbuild: use --userns option when using podman
instead of fabricating a `/etc/password` manually, we can just
leave it to podman to add an entry in `/etc/password` in container.
as podman allows us to map user's account to the same UID in the
container. see
https://docs.podman.io/en/stable/markdown/options/userns.container.html.

this is not only a cosmetic change, it also avoid the permission denied
failure when accessing `/etc/passwd` in the container when selinux is
enabled. without this change, we would otherwise need to either add the
selinux lable to the bind volume with ':Z' option address the failure
like:

```
type=AVC msg=audit(1693449115.261:2599): avc:  denied  { open } for  pid=2298247 comm="bash" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
type=AVC msg=audit(1693449115.263:2600): avc:  denied  { open } for  pid=2298249 comm="id" path="/etc/passwd" dev="tmpfs" ino=5931 scontext=system_u:system_r:container_t:s0:c252,c259 tcontext=unconfined_u:object_r:user_tmp_t:s0 tclass=file permissive=0
```

found in `/var/log/audit/audit.log`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15230
2023-09-11 21:41:48 +03:00
Avi Kivity
b8a655f55e Update tools/python3 submodule
* tools/python3 45fbd05...3e833f1 (1):
  > install.sh: replace <tab> with spaces
2023-09-11 21:38:02 +03:00
Avi Kivity
628e6ffd33 Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes
Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stem from the fact that resulting reconcilable_result will be large:

1. Large allocations.  Serialization of reconcilable_result causes large allocations for storing result rows in std::deque
2. Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator  does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s
3. Too large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs https://github.com/scylladb/scylladb/issues/9111.

This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition):
* Node1: 1 live row, 1M dead rows
* Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of the query.

Before:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

After:
```
Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]
```

Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before).

Refs https://github.com/scylladb/scylladb/issues/7929
Refs https://github.com/scylladb/scylladb/issues/3672
Refs https://github.com/scylladb/scylladb/issues/7933
Fixes https://github.com/scylladb/scylladb/issues/9111

Closes #14923

* github.com:scylladb/scylladb:
  test/topology_custom: add test_read_repair.py
  replica/mutation_dump: detect end-of-page in range-scans
  tools/scylla-sstable: write: abort parser thread if writing fails
  test/pylib: add REST methods to get node exe and workdir paths
  test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
  service/storage_proxy: add trace points for the actual read executor type
  service/storage_proxy: add trace points for read-repair
  storage_proxy: Add more trace-level logging to read-repair
  database: Fix accounting of small partitions in mutation query
  database, storage_proxy: Reconcile pages with no live rows incrementally
2023-09-11 19:20:19 +03:00
Kefu Chai
a0dcbb09c3 build: cmake: add packaging support
this change allows CMake to build the dist tarball for a certain build.

Refs #15241
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-11 23:05:30 +08:00
Kefu Chai
649c8f248d build: cmake: enable build of seastar/apps/iotune
scylla redistribute iotune, so let's enable the related building
options, so that we can built iotune on demand.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-11 23:02:52 +08:00
Nadav Har'El
45ec76cfbf Merge 'Enlighten native-transport shutdown' from Pavel Emelyanov
When `nodetool disablebinary` command executes its handler aborts listening sockets, shuts down all client connections _and_ (!) then waits for the connections to stop existing. Effectively the command tries to make sure that no activity initiated by a CQL query continues, even though client would never see its result (client sockets are closed)

This makes the disablebinary command hang for long sometimes, which is not really nice. The proposal is to wait for the connections to terminate in the background. So once disablebinary command exists what's guaranteed is that all client connections are aborted and new connections are not admitted, but some activity started by them may still be running (e.g. up until `nodetool drain` is issued). Driver-side sockets won't get the queries' results anyway.

The behavior of `disablebinary` is not documented wrt whether it should wait for CQL processing to stop or not, so technically we're not breaking anything. However, it can happen that it's a disruptive change and some setups may behave differently after it.

refs: #14031
refs: #14711

Closes #14743

* github.com:scylladb/scylladb:
  test/cql-pytest: Add enable|disable-binary test case
  test.py: Add suite option to auto-dirty cluster after test
  test/pylib: Add nodetool enable|disable-binary commands
  transport: Shutdown server on disablebinary
  generic_server: Introduce shutdown()
  generic_server: Decouple server stopped from connection stopped
  transport/controller: Coroutinize do_stop_server()
  transport/controller: Coroutinize stop_server()
2023-09-11 17:54:52 +03:00
Pavel Emelyanov
821a9c1fd4 test/cql-pytest: Add enable|disable-binary test case
The test checks that `nodetool disablebinary` makes subsequent queries
fail and `nodetool enablebinary` lets client to establish new
connections.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:38:49 +03:00
Pavel Emelyanov
375b8c6213 test.py: Add suite option to auto-dirty cluster after test
ScyllaCluster can be marked as 'dirty' which means that the cluster is
in unusable state (after test) and shouldn't be re-used by other tests
launched by test.py. For now this is only implemented via the cluster
manager class which is only available for topology tests.

Add a less flexible short-cut for cql-pytest-s via suite.yaml marking.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
2c3b30b395 test/pylib: Add nodetool enable|disable-binary commands
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
b42391bfbe transport: Shutdown server on disablebinary
... and do the real "sharded::stop" in the background. On node shutdown
it needs to pick up all dangling background stopping.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
4682c7f9a5 generic_server: Introduce shutdown()
The method waits for listening sockets to stop listening and aborts the
connected sockets, but doesn't wait for the established connections to
finish processing.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Pavel Emelyanov
6dcf653995 generic_server: Decouple server stopped from connection stopped
The _stopped future resolves when all "sockets" stop -- listening and
connected ones. Furure patching will need to wait for listening sockets
to stop separately from connected ones.

Rename the `_stopped` to reflect what it is now while at it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Pavel Emelyanov
bc2d44994a transport/controller: Coroutinize do_stop_server()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Pavel Emelyanov
7701aa0789 transport/controller: Coroutinize stop_server()
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:32:07 +03:00
Benny Halevy
7119c1d8cc token_metadata: update_topology: make endpoint_dc_rack arg optional
It's better to pass a disengaged optional when
the caller doesn't have the information rather than
passing the default dc_rack location so the latter
will never implicitly override a known endpoint dc/rack location.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #15300
2023-09-11 16:16:19 +02:00
Benny Halevy
08f8fd30ea gossiper: get rid of comment about advertise_removing
It was deleted in 66ff072540.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20230911140349.1809014-1-bhalevy@scylladb.com>
2023-09-11 16:14:26 +02:00
Kefu Chai
87088b65b6 util: replace <tab> with spaces
to be aligned with seastar's coding-style.md: scylladb uses seastar's
coding-style.md. so let's adhere to it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15345
2023-09-11 14:38:46 +03:00
Botond Dénes
e5f724d5bd Merge 'create-relocatable-package.py: add --node-exporter-dir and --build-dir options' from Kefu Chai
this series adds `--node-exporter-dir` and `--build-dir` options to `create-relocatable-package.py`. this enables us to use create relocatable package from arbitrary build directories.

Refs #15241

Closes #15299

* github.com:scylladb/scylladb:
  create-relocatable-package.py: add --node-exporter-dir option
  build: specify the build dir instead mode
2023-09-11 14:32:53 +03:00
Botond Dénes
0c107c2076 Merge 'dist/debian: add command line option for builddir ' from Kefu Chai
so we can point `debian_files_gen.py` to builddir other than
'build', and can optionally use other output directory. this would
help to reduce the number of "magic numbers" in our building system.

Refs https://github.com/scylladb/scylladb/issues/15241

Closes #15282

* github.com:scylladb/scylladb:
  dist/debian: specify debian/* file encodings
  dist/debian: wrap lines whose length exceeds 100 chars
  dist/debian: add command line option for builddir
  dist/debian: modularize debian_files_gen.py
2023-09-11 14:31:33 +03:00
Botond Dénes
f770ff7a2b test/topology_custom: add test_read_repair.py 2023-09-11 07:07:12 -04:00
Botond Dénes
b55cead5cd replica/mutation_dump: detect end-of-page in range-scans
The current read-loop fails to detect end-of-page and if the query
result buider cuts the page, it will just proceed to the next
partition. This will result in distorted query results, as the result
builder will request for the consumption to stop after each clustering
row.
To fix, check if the page was cut before moving on to the next
partition.
A unit test reproducing the bug was also added.
2023-09-11 07:02:14 -04:00
Botond Dénes
82f4563757 tools/scylla-sstable: write: abort parser thread if writing fails
Currently if writing the sstable fails, e.g. because the input data is
out-of-order, the json parser thread hangs because its output is no
longer consumed. This results in the entire application just freezing.
Fix this by aborting the parsing thread explicitely in the
json_mutation_stream_parser destructor. If the parser thread existed
successfully, this will be a no-op, but on the error-path, this will
ensure that the parser thread doesn't hang.
2023-09-11 07:02:14 -04:00
Botond Dénes
46e37436d0 test/pylib: add REST methods to get node exe and workdir paths 2023-09-11 07:02:14 -04:00
Botond Dénes
dc269cb6bd test/pylib/rest_client: add load_new_sstables, keyspace_{flush,compaction}
To support the equivalent (roughly) of the following nodetool commands:
* nodetool refresh
* nodetool flush
* nodetool compact
2023-09-11 07:01:20 -04:00
Botond Dénes
ff29f43060 service/storage_proxy: add trace points for the actual read executor type
There is currently a trace point for when the read executor is created,
but this only contains the initial replica set and doesn't mention which
read executor is created in the end. This patch adds trace points for
each different return path, so it is clear from the trace whether
speculative read can happen or not.
2023-09-11 06:56:13 -04:00
Botond Dénes
727e519c3a service/storage_proxy: add trace points for read-repair
Currently the fact that read-repair was triggered can only be inferred
from seeing mutation reads in the trace. This patch adds an explicit
trace point for when read repair is triggered and also when it is
finished or retried.
2023-09-11 06:56:13 -04:00
Tomasz Grabiec
f76f5f6bfe storage_proxy: Add more trace-level logging to read-repair
Extremely helpful in debugging.
2023-09-11 06:56:13 -04:00
Tomasz Grabiec
0d773c9f9f database: Fix accounting of small partitions in mutation query
The partition key size was ignored by the accounter, as well as the
partition tombstone. As a result, a sequence of partitions with just
tombstones would be accounted as taking no memory and page size
limitter to not kick in.

Fix by accounting the real size of accumulated frozen_mutation.

Also, break pages across partitions even if there are no live rows.
The coordinator can handle it now.

Refs #7933
2023-09-11 06:56:13 -04:00
Tomasz Grabiec
2c8a0e4175 database, storage_proxy: Reconcile pages with no live rows incrementally
Currently, mutation query on replica side will not respond with a result
which doesn't have at least one live row. This causes problems if there
is a lot of dead rows or partitions before we reach a live row, which
stems from the fact that resulting reconcilable_result will be large:

* Large allocations. Serialization of reconcilable_result causes large
  allocations for storing result rows in std::deque
* Reactor stalls. Serialization of reconcilable_result on the replica
  side and on the coordinator side causes reactor stalls. This impacts
  not only the query at hand. For 1M dead rows, freezing takes 130ms,
  unfreezing takes 500ms. Coordinator does multiple freezes and
  unfreezes. The reactor stall on the coordinator side is >5s.
* Large repair mutations. If reconciliation works on large pages, repair
  may fail due to too large mutation size. 1M dead rows is already too
  much: Refs #9111.

This patch fixes all of the above by making mutation reads respect the
memory accounter's limit for the page size, even for dead rows.

This patch also addresses the problem of client-side timeouts during
paging. Reconciling queries processing long strings of tombstones will
now properly page tombstones,like regular queries do.

My testing shows that this solution even increases efficiency. I tested
with a cluster of 2 nodes, and a table of RF=2. The data layout was as
follows (1 partition):

    Node1: 1 live row, 1M dead rows
    Node2: 1M dead rows, 1 live row

This was designed to trigger reconciliation right from the very start of
the query.

Before:

Running query (node2, CL=ONE, cold cache)
Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]

After:

Running query (node2, CL=ONE, cold cache)
Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (node2, CL=ONE, hot cache)
Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)]
Running query (all-nodes, CL=ALL, reconcile, cold-cache)
Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)]

Non-reconciling queries have almost identical duration (1 few ms changes
can be observed between runs). Note how in the after case, the
reconciling read also produces 100 pages, vs. just 2 pages in the before
case, leading to a much lower duration (less than 1/4 of the before).

Refs #7929
Refs #3672
Refs #7933
Fixes #9111
2023-09-11 06:56:13 -04:00
Botond Dénes
685486a20d Update tools/python3 submodule
* tools/python3 30b8fc21...45fbd056 (1):
  > build_reloc: do not run SCYLLA-VERSION-GEN twice
2023-09-11 10:59:56 +03:00
Kefu Chai
2bbffccaca SCYLLA-VERSION-GEN: do not print version by default
actually, we never use the its output in our workflow. and the
output is distracting when building the package. so, in this
change, let's print it only on demand. this feature is preserved
just in case some of us would want to use this script for getting
the version number string.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15327
2023-09-11 10:50:50 +03:00
Kefu Chai
bcc05305ae build: cmake: set the default CMAKE_BUILD_TYPE to Release
if user fails to set "CMAKE_BUILD_TYPE", it would be empty. and
CMake would fail with confusing error messages like

```
CMake Error at CMakeLists.txt:21 (list):
  list sub-command FIND requires three arguments.

CMake Error at CMakeLists.txt:27 (include):
  include could not find requested file:

    mode.
```

so, in this change

* the the default CMAKE_BUILD_TYPE to "Release"
* quote the ${CMAKE_BUILD_TYPE} when searching it
  in the allowed build type lists.

this should address the issues above.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15326
2023-09-11 10:49:28 +03:00
Botond Dénes
b062b245ad Merge 'Don't cache dc:rack on system keyspace local cache' from Pavel Emelyanov
The local node's dc:rack pair is cached on system keyspace on start. However, most of other code don't need it as they get dc:rack from topology or directly from snitch. There are few places left that still mess with sysks cache, but they are easy to patch. So after this patch all the core code uses two sources of dc:rack -- topology / snitch -- instead of three.

Closes #15280

* github.com:scylladb/scylladb:
  system_keyspace: Don't require snitch argument on start
  system_keyspace: Don't cache local dc:rack pair
  system_keyspace: Save local info with explicit location
  storage_service: Get endpoint location from snitch, not system keyspace
  snitch: Introduce and use get_location() method
  repair: Local location variables instead of system keyspace's one
  repair: Use full endpoint location instead of datacenter part
2023-09-11 10:26:26 +03:00
Nadav Har'El
ea56c8efcd test/alternator: reduce code duplication in test for list_append()
A reviewer noted that test_update_expression_list_append_non_list_arguments
has too much code duplication - the same long API call to run
"SET a = list_append(...)" was repeated many times.

So in this patch we add a short inner function "try_list_append" to
avoid this duplication.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes: #15298
2023-09-11 10:09:35 +03:00
David Garcia
a14bcf7c6a docs: improve configuration properties reference
- Adds type for each option.
- Filters out unused / invalid values, moves them to a separate section.
- Adds the term "liveness" to the glossary.
- Removes unused and invalid properties from the docs.
- Updates to the latest version of pyaml.

docs: rename config template directive

Closes #15164
2023-09-11 09:47:16 +03:00
Botond Dénes
d92620868d Merge 'docs: improve command line samples in unified-installer.rst' from Kefu Chai
in this series, we try to improve `unified-installer.rst`

- encourage user to install smaller package
- run `./install.sh` directly instead relying on that `sh` points to `bash`

Closes #15325

* github.com:scylladb/scylladb:
  doc: run install.sh directly
  doc: install headless jdk in sample command line
2023-09-11 09:34:14 +03:00
Botond Dénes
7385f93816 Merge 'Task manager repair tasks progress' from Aleksandra Martyniuk
Find progress of repair tasks based on the number of ranges
that have been repaired.

Fixes: [#1156](https://github.com/scylladb/scylla-enterprise/issues/1156).

Closes #14698

* github.com:scylladb/scylladb:
  test: repair tasks test
  repair: add methods making repair progress more precise
  tasks: make progress related methods virtual
  repair: add get_progress method to shard_repair_task_impl
  repair: add const noexcept qualifiers to shard_repair_task_impl::ranges_size()
  repair: log a name of a particular table repair is working on
  tasks: delete move and copy constructors from task_manager::task::impl
2023-09-11 09:32:23 +03:00
Raphael S. Carvalho
c7e02a1077 storage_service: Enforce tablet streaming runs on shard 0
SIGSEGV was caught during tablet streaming, and the reason was
that storage_service::_group0 (via set_group0()) is only set on
shard 0, therefore when streaming ran on any other shard,
it tried to dereference garbage, which resulted in the crash.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #15307
2023-09-08 20:45:13 +03:00
Kefu Chai
ce6464b649 sstable: do not call into sstable in filesystem_storage::open()
before this change, filesystem_storage::open() reuses
`sstable::make_component_file_writer()` to create the
temporary toc, it will rename the temporary toc to the
real TOC when sealing the sstable.

but this prevents us from reusing filesystem_storage in
yet another storage backend. as the

1. create temporary
2. rename temporary to toc

dance only applies to filesystem_storage. when
filesystem_storage calls into sstable, it calls `sst.make_component_file_writer()`,
which in turn calls the `_storage->make_component_sink()`.
but at this moment, `_storage` is not necessarily `filesystem_storage`
anymore. it could be a wrapper around `filesystem_storage`,
which is not aware of the create-rename dance. and could do
a lot more than create a temporary file when asked to
"make_component_sink()".

if we really want to go this way by reusing sstable's API
in `filesystem_storage` to create a temporary toc, we will
have to rename the whatever temporary toc component created
by the wrapper backend to the toc with the seal() func. but
again, this rename op is only implemented in the
filesystem_storage backend. to mirror this operation in
the wrapper backend does not make sense at all -- it
does not have to be aware of the filesystem_storage's internals.

so in this change, instead of reusing the
`sstable::make_component_file_writer()`, we just inline
its implementation in filesystem_storage to avoid this
problem. this is also an improvement from the design
perspective, as the storage should not call into its
the higher abstraction -- sstable.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14443
2023-09-08 19:57:39 +03:00
Kefu Chai
ce291f4385 s3/client: do not use deprecated tls::connect() overload
seastar has deprecated the overload which accepts `server_name`,
let's use the one which accepts `tls::tls_options`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #15324
2023-09-08 18:44:45 +03:00
Avi Kivity
0656810c28 Update tools/java submodule
* tools/java 585b30fda6...9dddad27bf (1):
  > install-dependencies.sh: do not install weak dependencies

Frozen toolchain regenerated.

Closes #15322
2023-09-08 17:22:07 +03:00
Kamil Braun
26d9a82636 Merge 'raft topology: replace publish_cdc_generation with a bg fiber' from Patryk Jędrzejczak
Currently, the topology coordinator has the
`topology::transition_state::publish_cdc_generation` state responsible
for publishing the already created CDC generations to the user-facing
description tables. This process cannot fail as it would cause some CDC
updates to be missed. On the other hand, we would like to abort the
`publish_cdc_generation` state when bootstrap aborts. Of course, we
could also wait until handling this state finishes, even in the case of
the bootstrap abort, but that would be inefficient. We don't want to
unnecessarily block topology operations by publishing CDC generations.

The solution proposed by this PR is to remove the
`publish_cdc_generation` state completely and introduce a new background
fiber of the topology coordinator -- `cdc_generation_publisher` -- that
continually publishes committed CDC generations.

Apart from introducing the CDC generation publisher, we add
`test_cdc_generation_publishing.py` that verifies its correctness and we
adapt other CDC tests to the new changes.

Fixes #15194

Closes #15281

* github.com:scylladb/scylladb:
  test: test_cdc: introduce wait_for_first_cdc_generation
  test: move cdc_streams_check_and_repair check
  test: add test_cdc_generation_publishing
  docs: remove information about publish_cdc_generation
  raft topology: introduce the CDC generation publisher
  system_keyspace: load unpublished_cdc_generations to topology
  raft topology: mark committed CDC generations as unpublished
  raft topology: add unpublished_cdc_generations to system.topology
2023-09-08 15:08:41 +02:00
Kamil Braun
8bff5843b5 Merge 'test: topology: add tests for gossiper/endpoint/live and gossiper/endpoint/down' from Aleksandra Martyniuk
Add tests for gossiper/endpoint/live and gossiper/endpoint/down
which run only in release mode.

Enable test_remove_node_with_concurrent_ddl and fix types and
variables names used by it, so that they can be reused in gossiper
test.

Fixes: #15223.

Closes #15244

* github.com:scylladb/scylladb:
  test: topology: add gossiper test
  test: fix types and variable names in wait_for_host_down
2023-09-08 12:43:11 +02:00
Nadav Har'El
548386a0bb treewide: reduce include of cql_statement.hh
ClangBuildAnalyzer reports cql3/cql_statement.hh as being one of the
most expensive header files in the project - being included (mostly
indirectly) in 129 source files, and costing a total of 844 CPU seconds
of compilation.

This patch is an attempt, only *partially* successful, to reduce the
number of times that cql_statement.hh is included. It succeeds in
lowering the number 129 to 99, but not less :-( One of the biggest
difficulties in reducing it further is that query_processor.hh includes
a lot of templated code, which needs stuff from cql_statement.hh.
The solution should be to un-template the functions in
query_processor.hh and move them from the header to a source file, but
this is beyond the scope of this patch and query_processor.hh appears
problematic in other respects as well.

Unfortunately the compilation speedup by this patch is negligible
(the `du -bc build/dev/**/*.o` metric shows less than 0.01% reduction).
Beyond the fact that this patch only removes 30% of the inclusions of
this header, it appears that most of the source files that no longer
include cql_statement.hh after this patch, included anyway many of the
other headers that cql_statement.hh included, so the saving is minimal.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #15212
2023-09-08 13:23:50 +03:00
Kefu Chai
7591b1b384 doc: run install.sh directly
strictly speaking, `sh` is not necessarily bash. while `install.sh`
is written in the Bash dialect. and it errors out if it is not executed
with Bash. and we don't need to add "-x" when running the script, if
we have to, we should add it in `install.sh` not ask user to add this
option. also, `install.sh` is executable with a shebang line using
bash, so we can just execute it.

so, in this change, we just launch this script in the command line
sample.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-08 17:21:30 +08:00
Kefu Chai
1e0c7d14aa doc: install headless jdk in sample command line
in comparison with java-11-openjdk, java-11-openjdk-headless does not
offer audio and video support, and has less dependencies. for instance,
java-11-openjdk depends on the X11 libraries, and it also provides
icons representing JDK. but since scylla is a server side application,
we don't expect user to run a desktop on it. so there is no need to
support audio and video.

in this change, we just suggest the a "smaller" package, which is
actually also a dependency of java-11-open-jdk.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-08 17:21:30 +08:00